MongoDBGraphStore#

class langchain_mongodb.graphrag.graph.MongoDBGraphStore(*, connection_string: str | None = None, database_name: str | None = None, collection_name: str | None = None, collection: Collection | None = None, entity_extraction_model: BaseChatModel, entity_prompt: ChatPromptTemplate | None = None, query_prompt: ChatPromptTemplate | None = None, max_depth: int = 3, allowed_entity_types: List[str] | None = None, allowed_relationship_types: List[str] | None = None, entity_examples: str | None = None, entity_name_examples: str = '', validate: bool = False, validation_action: str = 'warn', rerank_path: str | List[str] | None = None, rerank_model: str | None = None, num_docs_to_rerank: int = 1000)[source]#

GraphRAG DataStore

GraphRAG is a ChatModel that provides responses to semantic queries based on a Knowledge Graph that an LLM is used to create. As in Vector RAG, we augment the Chat Model’s training data with relevant information that we collect from documents.

In Vector RAG, one uses an “Embedding” model that converts both the query, and the potentially relevant documents, into vectors, which can then be compared, and the most similar supplied to the Chat Model as context to the query.

In Graph RAG, one uses an “Entity-Extraction” model that converts text into Entities and their relationships, a Knowledge Graph. Comparison is done by Graph traversal, finding entities connected to the query prompts. These are then supplied to the Chat Model as context. The main difference is that GraphRAG’s output is typically in a structured format.

GraphRAG excels in finding links and common entities, even if these come from different articles. It can combine information from distinct sources providing richer context than Vector RAG in certain cases.

Here are a few examples of so-called multi-hop questions where GraphRAG excels: - What is the connection between ACME Corporation and GreenTech Ltd.? - Who is leading the SolarGrid Initiative, and what is their role? - Which organizations are participating in the SolarGrid Initiative? - What is John Doe’s role in ACME’s renewable energy projects? - Which company is headquartered in San Francisco and involved in the SolarGrid Initiative?

In Graph RAG, one uses an Entity-Extraction model that interprets text documents that it is given and extracting the query, and the potentially relevant documents, into graphs. These are composed of nodes that are entities (nouns) and edges that are relationships. The idea is that the graph can find connections between entities and hence answer questions that require more than one connection.

In MongoDB, Knowledge Graphs are stored in a single Collection. Each MongoDB Document represents a single entity (node), and its relationships (edges) are defined in a nested field named “relationships”. The schema, and an example, are described in the entity_context prompts module.

When a query is made, the model extracts the entities in it, then traverses the graph to find connections. The closest entities and their relationships form the context that is included with the query to the Chat Model.

Consider this example Query: “Does John Doe work at MongoDB?” GraphRAG can answer this question even if the following two statements come from completely different sources. - “Jane Smith works with John Doe.” - “Jane Smith works at MongoDB.”

Methods

`__init__`(*[, connection_string, ...])
`add_documents`(documents)	Extract entities and upsert into the collection.
`chat_response`(query[, chat_model, prompt])	Responds to a query given information found in Knowledge Graph.
`close`()	Close the resources used by the MongoDBGraphStore.
`extract_entities`(raw_document, **kwargs)	Extract entities and their relations using chosen prompt and LLM.
`extract_entity_names`(raw_document, **kwargs)	Extract entity names from a document for similarity_search.
`find_entity_by_name`(name)	Utility to get Entity dict from Knowledge Graph / Collection.
`from_connection_string`(connection_string, ...)	Construct a MongoDB KnowLedge Graph for RAG from a MongoDB connection URI.
`related_entities`(starting_entities[, ...])	Traverse Graph along relationship edges to find connected entities.
`similarity_search`(input_document)	Retrieve list of connected Entities found via traversal of KnowledgeGraph.
`to_networkx`([nx_opts, json_opts])	Utility converts Entity Collection to NetworkX DiGraph
`view`([layout, nx_opts, json_opts, ...])	Draws a Knowledge Graph as Holoviews/Bokeh interactive plot.

Parameters:

connection_string (Optional[str])
database_name (Optional[str])
collection_name (Optional[str])
collection (Optional[Collection])
entity_extraction_model (BaseChatModel)
entity_prompt (Optional[ChatPromptTemplate])
query_prompt (Optional[ChatPromptTemplate])
max_depth (int)
allowed_entity_types (Optional[List[str]])
allowed_relationship_types (Optional[List[str]])
entity_examples (Optional[str])
entity_name_examples (str)
validate (bool)
validation_action (str)
rerank_path (Optional[Union[str, List[str]]])
rerank_model (Optional[str])
num_docs_to_rerank (int)

__init__(*, connection_string: str | None = None, database_name: str | None = None, collection_name: str | None = None, collection: Collection | None = None, entity_extraction_model: BaseChatModel, entity_prompt: ChatPromptTemplate | None = None, query_prompt: ChatPromptTemplate | None = None, max_depth: int = 3, allowed_entity_types: List[str] | None = None, allowed_relationship_types: List[str] | None = None, entity_examples: str | None = None, entity_name_examples: str = '', validate: bool = False, validation_action: str = 'warn', rerank_path: str | List[str] | None = None, rerank_model: str | None = None, num_docs_to_rerank: int = 1000)[source]#

Parameters:

connection_string (str | None) – A valid MongoDB connection URI.
database_name (str | None) – The name of the database to connect to.
collection_name (str | None) – The name of the collection to connect to.
collection (Collection | None) – A Collection that will represent a Knowledge Graph. ** One may pass a Collection in lieu of connection_string, database_name, and collection_name.
entity_extraction_model (BaseChatModel) – LLM for converting documents into Graph of Entities and Relationships.
entity_prompt (ChatPromptTemplate | None) – Prompt to fill graph store with entities following schema. Defaults to .prompts.ENTITY_EXTRACTION_INSTRUCTIONS
query_prompt (ChatPromptTemplate | None) – Prompt extracts entities and relationships as search starting points. Defaults to .prompts.NAME_EXTRACTION_INSTRUCTIONS
max_depth (int) – Maximum recursion depth in graph traversal.
allowed_entity_types (List[str] | None) – If provided, constrains search to these types.
allowed_relationship_types (List[str] | None) – If provided, constrains search to these types.
entity_examples (str | None) – A string containing any number of additional examples to provide as context for entity extraction.
entity_name_examples (str) – A string appended to prompts.NAME_EXTRACTION_INSTRUCTIONS containing examples.
validate (bool) – If True, entity schema will be validated on every insert or update.
validation_action (str) – One of {“warn”, “error”}. - If “warn”, the default, documents will be inserted but errors logged. - If “error”, an exception will be raised if any document does not match the schema.
rerank_path (str | List[str] | None) – Field or list of fields on entity documents to rerank on. Enables $rerank when set. The entity _id (name) is a natural choice. Requires MongoDB 8.3+ and Native Reranking enabled in Atlas.
rerank_model (str | None) – Voyage AI reranking model (e.g. "rerank-2.5-lite"). Uses latest model if omitted.
num_docs_to_rerank (int) – Number of graph traversal results passed to the reranker. Lower values reduce reranking cost; higher values give the reranker a larger candidate pool. Defaults to 1000, the MongoDB maximum.

add_documents(documents: Document | List[Document]) → List[BulkWriteResult][source]#

Extract entities and upsert into the collection.

Each entity is represented by a single MongoDB Document. Existing entities identified in documents will be updated.

Parameters:: documents (Document | List[Document]) – list of textual documents and associated metadata.
Returns:: List containing metadata on entities inserted and updated, one value for each input document.
Return type:: List[BulkWriteResult]

chat_response(query: str, chat_model: BaseChatModel | None = None, prompt: ChatPromptTemplate | None = None) → BaseMessage[source]#

Responds to a query given information found in Knowledge Graph.

Parameters:

query (str) – Prompt before it is augmented by Knowledge Graph.
chat_model (BaseChatModel | None) – ChatBot. Defaults to entity_extraction_model.
prompt (ChatPromptTemplate | None) – Alternative Prompt Template. Defaults to prompts.rag_prompt.

Returns:

Response Message. response.content contains text.

Return type:

BaseMessage

close() → None[source]#

Close the resources used by the MongoDBGraphStore.

Return type:: None

extract_entities(raw_document: str, **kwargs: Any) → List[Entity][source]#

Extract entities and their relations using chosen prompt and LLM.

Parameters:

raw_document (str) – A single text document as a string. Typically prose.
kwargs (Any)

Returns:

List of Entity dictionaries.

Return type:

List[Entity]

extract_entity_names(raw_document: str, **kwargs: Any) → List[str][source]#

Extract entity names from a document for similarity_search.

The second entity extraction has a different form and purpose than the first as we are looking for starting points of our search and paths to follow. We aim to find source nodes, but no target nodes or edges.

Parameters:

raw_document (str) – A single text document as a string. Typically prose.
kwargs (Any)

Returns:

List of entity names / _ids.

Return type:

List[str]

find_entity_by_name(name: str) → Entity | None[source]#

Utility to get Entity dict from Knowledge Graph / Collection. :param name: _id string to look for.

Returns:: List of Entity dicts if any match name.
Parameters:: name (str)
Return type:: Optional[Entity]

classmethod from_connection_string(connection_string: str, database_name: str, collection_name: str, entity_extraction_model: BaseChatModel, entity_prompt: ChatPromptTemplate = ChatPromptTemplate(input_variables=['allowed_entity_types', 'allowed_relationship_types', 'entity_examples', 'entity_schema', 'input_document'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['allowed_entity_types', 'allowed_relationship_types', 'entity_examples', 'entity_schema'], input_types={}, partial_variables={}, template='\n## Overview\nYou are a meticulous analyst tasked with identifying potential entities from unstructured text\nto build a knowledge graph in a structured json format of entities (nodes) and their relationships (edges).\n**Include as many entities and relationships as you can.**\n\nINPUT: You will be provided a text document.\nOUTPUT:\n- You will produce valid json according the "Output Schema" section below.\n- Your response **must be** a **valid JSON document** with NO extra text, explanations, or markdown formatting.\n- The extracted entities and relationships **MUST STRICTLY CONFORM** to the constraints outlined below.\n- Any entities or relationships not matching the allowed types must be **EXCLUDED**.\n\n\n## Entities\nAn entity in a knowledge graph is a uniquely identifiable object or concept\n(such as a person, organization, location, object, or event),\nrepresented as a node with attributes (properties) and relationships to other entities.\n\nUse the reserved field name `_id` for the name. It will be a unique primary key,\nand MongoDB automatically creates an index for the `_id` field.\n\nMaintain Entity Consistency when extracting entities. If an entity, such as "John Doe",\nis mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "John", "Mr Doe", "he"),\nalways use the most complete identifier for that entity throughout the knowledge graph.\nIn this example, use "John Doe" as the entity `_id.`\n\n**Allowed Entity Types**:\n- Extract ONLY entities whose `type` matches one of the following: {allowed_entity_types}.\n- NOTE: If this list is empty, ANY `type` is permitted.\n\n### Examples of Exclusions:\n- If `allowed_entity_types` is `["Person", "Organization"]`, and the text mentions "Event" or "Location",\n these entities must **NOT** be included in the output.\n\n## Relationships\nRelationships represent edges in the knowledge graph. Relationships describe a specific edge type.\nRelationships MUST include a target entity, but Entities can be extracted that DO NOT have relationships!\nEnsure consistency and generality in relationship names when constructing knowledge schemas.\nInstead of using specific and momentary types such as \'worked_at\', use more general and timeless relationship types\nlike \'employee\'. Add details as attributes. Make sure to use general and timeless relationship types!\n\n### CRITICAL: Array Length Alignment\nThe relationships object contains three arrays: `target_ids`, `types`, and `attributes`.\n**These three arrays MUST have EXACTLY the same length.**\n- Each position (index) in these arrays describes ONE complete relationship.\n- Position 0 in `target_ids`, `types`, and `attributes` together describe the first relationship.\n- Position 1 in `target_ids`, `types`, and `attributes` together describe the second relationship.\n- And so on...\n\nIf a relationship has no attributes, you MUST still include an empty object `{{}}` in the `attributes` array at that position.\n\nExample of CORRECT alignment:\n```json\n"relationships": {{\n "target_ids": ["Entity A", "Entity B"],\n "types": ["partners", "supplier"],\n "attributes": [\n {{"since": ["2020"]}},\n {{}}\n ]\n}}\n```\n\nExample of INCORRECT (DO NOT DO THIS):\n```json\n"relationships": {{\n "target_ids": ["Entity A", "Entity B"],\n "types": ["partners"],\n "attributes": [{{"since": ["2020"]}}]\n}}\n```\n\n**Allowed Relationship Types**:\n- Extract ONLY relationships whose `type` matches one of the following: {allowed_relationship_types}.\n- If this list is empty, ANY relationship type is permitted.\n- Map synonymous or related terms to the closest matching allowed type. For example:\n\t-\t“works for” or “employed by” → employee\n\t-\t“manages” or “supervises” → manager\n- If a relationship cannot be named with one of the allowed keys, **DO NOT include it**.\n- An entity need not have a relationships object if no relationship is found that matches the allowed relation types.\n\n### Examples of Exclusions:\n- If `allowed_relationship_types` is `["employs", "friend"]` and the text implies a "partner" relationship,\n the entities can be added, but the "partner" relationship must **NOT** be included.\n\n## Validation\nBefore producing the final output:\n1. Validate that all extracted entities have an `_id` and `type`.\n2. Validate that all `type` values are in {allowed_entity_types}.\n3. Validate that all relationships use keys in {allowed_relationship_types}.\n4. **CRITICAL**: For each entity with relationships, verify that `target_ids`, `types`, and `attributes` arrays have EXACTLY the same length.\n5. Exclude any entities or relationships failing validation.\n\n## Output Schema\nOutput a valid JSON document with a single top-level key, `entities`, as an array of objects.\nEach object must conform to the following schema:\n{entity_schema}\n\n{entity_examples}\n'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input_document'], input_types={}, partial_variables={}, template='{input_document}'), additional_kwargs={})]), query_prompt: ChatPromptTemplate = ChatPromptTemplate(input_variables=['allowed_entity_types', 'entity_name_examples', 'input_document'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['allowed_entity_types', 'entity_name_examples'], input_types={}, partial_variables={}, template='\nYou are an analyst tasked with identifying potential entities in text documents.\nYou will be provided a short document from which you infer entity names.\nIdentify as many as possible.\n\nProvide your response as a valid JSON Array of entity names\nor human-readable identifiers, found in the text.\n\n**Allowed Entity Types**:\n- By default, all types are permitted.\n- If a non-empty list is provided, extract ONLY entities whose `type` matches one of the following: [{allowed_entity_types}].\n\n### Examples of Exclusions:\n- If `allowed_entity_types` is `["Person", "Organization"]`, and the text mentions an "Event" or "Location",\n these entities must **NOT** be included in the output.\n\n ## Examples:\n Example 1: `allowed_entity_types` is `[]`\n input: "John Doe works at ACME in New York"\n output: ["John Doe", "ACME", "New York"]\n\n In this example, you would identify 3 entities:\n John Doe of type person; ACME of type organization; New York of type place.\n\nExample 2: `allowed_entity_types` is `[organization, place]`\n input: "John Doe works at ACME in New York"\n output: ["ACME", "New York"]\n\n In this example, you would identify only 2 entities:\n ACME of type organization; New York of type place.\n John Doe, of type person, would be excluded.\n\n 2. input: "In what continent is Brazil?\n output: ["Brazil"]\n\nThis example is in the form of a question. There is one entity,\n\n3. input: "For legal and operational purposes, many governments and organizations adopt specific definitions."\n output: []\n\nIn the third example, there are no entities.\nThough there are concepts and nouns that might be types or attributes of entities,\nthere is nothing here that could be seen as being a unique identifier or name.\n\n4. input: ""\n output: []\n\nIn final third example, there are no entities.\n\n### (Optional) Additional Examples\n\n{entity_name_examples}\n'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input_document'], input_types={}, partial_variables={}, template='{input_document}'), additional_kwargs={})]), max_depth: int = 3, allowed_entity_types: List[str] | None = None, allowed_relationship_types: List[str] | None = None, entity_examples: str | None = None, entity_name_examples: str = '', validate: bool = False, validation_action: str = 'warn') → MongoDBGraphStore[source]#

Construct a MongoDB KnowLedge Graph for RAG from a MongoDB connection URI.

Parameters:

connection_string (str) – A valid MongoDB connection URI.
database_name (str) – The name of the database to connect to.
collection_name (str) – The name of the collection to connect to.
entity_extraction_model (BaseChatModel) – LLM for converting documents into Graph of Entities and Relationships.
entity_prompt (ChatPromptTemplate) – Prompt to fill graph store with entities following schema.
query_prompt (ChatPromptTemplate) – Prompt extracts entities and relationships as search starting points.
max_depth (int) – Maximum recursion depth in graph traversal.
allowed_entity_types (List[str] | None) – If provided, constrains search to these types.
allowed_relationship_types (List[str] | None) – If provided, constrains search to these types.
entity_examples (str | None) – A string containing any number of additional examples to provide as context for entity extraction.
entity_name_examples (str) – A string appended to prompts.NAME_EXTRACTION_INSTRUCTIONS containing examples.
validate (bool) – If True, entity schema will be validated on every insert or update.
validation_action (str) – One of {“warn”, “error”}. - If “warn”, the default, documents will be inserted but errors logged. - If “error”, an exception will be raised if any document does not match the schema.

Returns:

A new MongoDBGraphStore instance.

Return type:

MongoDBGraphStore

related_entities(starting_entities: List[str], max_depth: int | None = None, rerank_query: str | None = None) → List[Entity][source]#

Traverse Graph along relationship edges to find connected entities.

Parameters:

starting_entities (List[str]) – Traversal begins with documents whose _id fields match these strings.
max_depth (Optional[int]) – Recursion continues until no more matching documents are found, or until the operation reaches a recursion depth specified by this parameter.
rerank_query (Optional[str]) – Original text query used for $rerank scoring. Required when rerank_path is set on the store.

Returns:

List of connected entities, reranked by relevance if rerank_path is set on the store.

Return type:

List[Entity]

similarity_search(input_document: str) → List[Entity][source]#

Retrieve list of connected Entities found via traversal of KnowledgeGraph.

Use LLM & Prompt to find entities within the input_document itself.
Find Entity Nodes that match those found in the input_document.
Traverse the graph using these as starting points.

Parameters:: input_document (str) – String to find relevant documents for.
Returns:: List of connected Entity dictionaries, reranked by relevance if rerank_path is set on the store.
Return type:: List[Entity]

to_networkx(nx_opts: dict | None = None, json_opts: dict | None = None, **kwargs: Any) → networkx.DiGraph[source]#

Utility converts Entity Collection to NetworkX DiGraph

NOTE: Requires optional-dependency “viz”, i.e. pip install “langchain-mongodb[viz]”.

Parameters:

nx_opts (Optional[dict]) – Keyword arguments for networkx calls.
json_opts (Optional[dict]) – Keyword arguments for printing of node attributes and types.
**kwargs (Any) – Keyword arguments available for compatibility.

Return type:

networkx.DiGraph

Returns: networkx.DiGraph

Draws a Knowledge Graph as Holoviews/Bokeh interactive plot.

We first convert the entity collection to a NetworkX Graph, and then convert it to a Holoviews Graph via their API.

The default layout chosen is the spring_layout. This maximizes the distance between nodes. As our entities have a type field, however, another good layout choice might be layout=nx.multipartite_layout, nx_opts[“subset_key”]= “type” as multipartite layout positions nodes in straight lines by subset key.

NOTE: Requires optional-dependency “viz”, i.e. pip install “langchain-mongodb[viz]”.

You can save the view as any HoloViews object with .save. The type will be inferred from the filename’s suffix, (e.g., hv.save(graph, “graph.html”)) or by clicking the download widget on the Bokeh plot from a Jupyter notebook.

Parameters:

layout (Optional[Callable]) – networkx layout. Defaults to networkx.spring_layout.
nx_opts (Optional[dict]) – Keyword arguments for to_networkx function.
json_opts (Optional[dict]) – Keyword arguments for printing of node attributes and types.
edge_opts (Optional[dict]) – Keyword arguments to draw edges.
node_opts (Optional[dict]) – Keyword arguments to draw nodes.
**kwargs (Any) – Keyword arguments available for compatibility.

Return type:

holoviews.Graph

Returns: holoviews.Graph