MongoDBGraphStore#
- class langchain_mongodb.graphrag.graph.MongoDBGraphStore(*, connection_string: str | None = None, database_name: str | None = None, collection_name: str | None = None, collection: Collection | None = None, entity_extraction_model: BaseChatModel, entity_prompt: ChatPromptTemplate | None = None, query_prompt: ChatPromptTemplate | None = None, max_depth: int = 2, allowed_entity_types: List[str] | None = None, allowed_relationship_types: List[str] | None = None, entity_examples: str | None = None, entity_name_examples: str = '', validate: bool = False, validation_action: str = 'warn')[source]#
GraphRAG DataStore
GraphRAG is a ChatModel that provides responses to semantic queries based on a Knowledge Graph that an LLM is used to create. As in Vector RAG, we augment the Chat Model’s training data with relevant information that we collect from documents.
In Vector RAG, one uses an “Embedding” model that converts both the query, and the potentially relevant documents, into vectors, which can then be compared, and the most similar supplied to the Chat Model as context to the query.
In Graph RAG, one uses an “Entity-Extraction” model that converts text into Entities and their relationships, a Knowledge Graph. Comparison is done by Graph traversal, finding entities connected to the query prompts. These are then supplied to the Chat Model as context. The main difference is that GraphRAG’s output is typically in a structured format.
GraphRAG excels in finding links and common entities, even if these come from different articles. It can combine information from distinct sources providing richer context than Vector RAG in certain cases.
Here are a few examples of so-called multi-hop questions where GraphRAG excels: - What is the connection between ACME Corporation and GreenTech Ltd.? - Who is leading the SolarGrid Initiative, and what is their role? - Which organizations are participating in the SolarGrid Initiative? - What is John Doe’s role in ACME’s renewable energy projects? - Which company is headquartered in San Francisco and involved in the SolarGrid Initiative?
In Graph RAG, one uses an Entity-Extraction model that interprets text documents that it is given and extracting the query, and the potentially relevant documents, into graphs. These are composed of nodes that are entities (nouns) and edges that are relationships. The idea is that the graph can find connections between entities and hence answer questions that require more than one connection.
In MongoDB, Knowledge Graphs are stored in a single Collection. Each MongoDB Document represents a single entity (node), and it relationships (edges) are defined in a nested field named “relationships”. The schema, and an example, are described in the
entity_context
prompts module.When a query is made, the model extracts the entities in it, then traverses the graph to find connections. The closest entities and their relationships form the context that is included with the query to the Chat Model.
Consider this example Query: “Does John Doe work at MongoDB?” GraphRAG can answer this question even if the following two statements come from completely different sources. - “Jane Smith works with John Doe.” - “Jane Smith works at MongoDB.”
Methods
__init__
(*[, connection_string, ...])add_documents
(documents)Extract entities and upsert into the collection.
chat_response
(query[, chat_model, prompt])Responds to a query given information found in Knowledge Graph.
close
()Close the resources used by the MongoDBGraphStore.
extract_entities
(raw_document, **kwargs)Extract entities and their relations using chosen prompt and LLM.
extract_entity_names
(raw_document, **kwargs)Extract entity names from a document for similarity_search.
find_entity_by_name
(name)Utility to get Entity dict from Knowledge Graph / Collection.
from_connection_string
(connection_string, ...)Construct a MongoDB KnowLedge Graph for RAG from a MongoDB connection URI.
related_entities
(starting_entities[, max_depth])Traverse Graph along relationship edges to find connected entities.
similarity_search
(input_document)Retrieve list of connected Entities found via traversal of KnowledgeGraph.
- Parameters:
connection_string (Optional[str])
database_name (Optional[str])
collection_name (Optional[str])
collection (Optional[Collection])
entity_extraction_model (BaseChatModel)
entity_prompt (Optional[ChatPromptTemplate])
query_prompt (Optional[ChatPromptTemplate])
max_depth (int)
allowed_entity_types (Optional[List[str]])
allowed_relationship_types (Optional[List[str]])
entity_examples (Optional[str])
entity_name_examples (str)
validate (bool)
validation_action (str)
- __init__(*, connection_string: str | None = None, database_name: str | None = None, collection_name: str | None = None, collection: Collection | None = None, entity_extraction_model: BaseChatModel, entity_prompt: ChatPromptTemplate | None = None, query_prompt: ChatPromptTemplate | None = None, max_depth: int = 2, allowed_entity_types: List[str] | None = None, allowed_relationship_types: List[str] | None = None, entity_examples: str | None = None, entity_name_examples: str = '', validate: bool = False, validation_action: str = 'warn')[source]#
- Parameters:
connection_string (str | None) – A valid MongoDB connection URI.
database_name (str | None) – The name of the database to connect to.
collection_name (str | None) – The name of the collection to connect to.
collection (Collection | None) – A Collection that will represent a Knowledge Graph. ** One may pass a Collection in lieu of connection_string, database_name, and collection_name.
entity_extraction_model (BaseChatModel) – LLM for converting documents into Graph of Entities and Relationships.
entity_prompt (ChatPromptTemplate | None) – Prompt to fill graph store with entities following schema. Defaults to .prompts.ENTITY_EXTRACTION_INSTRUCTIONS
query_prompt (ChatPromptTemplate | None) – Prompt extracts entities and relationships as search starting points. Defaults to .prompts.NAME_EXTRACTION_INSTRUCTIONS
max_depth (int) – Maximum recursion depth in graph traversal.
allowed_entity_types (List[str] | None) – If provided, constrains search to these types.
allowed_relationship_types (List[str] | None) – If provided, constrains search to these types.
entity_examples (str | None) – A string containing any number of additional examples to provide as context for entity extraction.
entity_name_examples (str) – A string appended to prompts.NAME_EXTRACTION_INSTRUCTIONS containing examples.
validate (bool) – If True, entity schema will be validated on every insert or update.
validation_action (str) – One of {“warn”, “error”}. - If “warn”, the default, documents will be inserted but errors logged. - If “error”, an exception will be raised if any document does not match the schema.
- add_documents(documents: Document | List[Document]) List[BulkWriteResult] [source]#
Extract entities and upsert into the collection.
Each entity is represented by a single MongoDB Document. Existing entities identified in documents will be updated.
- Parameters:
documents (Document | List[Document]) – list of textual documents and associated metadata.
- Returns:
List containing metadata on entities inserted and updated, one value for each input document.
- Return type:
List[BulkWriteResult]
- chat_response(query: str, chat_model: BaseChatModel | None = None, prompt: ChatPromptTemplate | None = None) BaseMessage [source]#
Responds to a query given information found in Knowledge Graph.
- Parameters:
query (str) – Prompt before it is augmented by Knowledge Graph.
chat_model (BaseChatModel | None) – ChatBot. Defaults to entity_extraction_model.
prompt (ChatPromptTemplate | None) – Alternative Prompt Template. Defaults to prompts.rag_prompt.
- Returns:
Response Message. response.content contains text.
- Return type:
BaseMessage
- extract_entities(raw_document: str, **kwargs: Any) List[Entity] [source]#
Extract entities and their relations using chosen prompt and LLM.
- Parameters:
raw_document (str) – A single text document as a string. Typically prose.
kwargs (Any)
- Returns:
List of Entity dictionaries.
- Return type:
List[Entity]
- extract_entity_names(raw_document: str, **kwargs: Any) List[str] [source]#
Extract entity names from a document for similarity_search.
The second entity extraction has a different form and purpose than the first as we are looking for starting points of our search and paths to follow. We aim to find source nodes, but no target nodes or edges.
- Parameters:
raw_document (str) – A single text document as a string. Typically prose.
kwargs (Any)
- Returns:
List of entity names / _ids.
- Return type:
List[str]
- find_entity_by_name(name: str) Entity | None [source]#
Utility to get Entity dict from Knowledge Graph / Collection. :param name: _id string to look for.
- Returns:
List of Entity dicts if any match name.
- Parameters:
name (str)
- Return type:
Optional[Entity]
- classmethod from_connection_string(connection_string: str, database_name: str, collection_name: str, entity_extraction_model: BaseChatModel, entity_prompt: ChatPromptTemplate = ChatPromptTemplate(input_variables=['allowed_entity_types', 'allowed_relationship_types', 'entity_examples', 'entity_schema', 'input_document'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['allowed_entity_types', 'allowed_relationship_types', 'entity_examples', 'entity_schema'], input_types={}, partial_variables={}, template='\n## Overview\nYou are a meticulous analyst tasked with extracting information from unstructured text\nto build a knowledge graph in a structured json format of entities (nodes) and their relationships (edges).\nThe graph will be stored in a MongoDB Collection and traversed using $graphLookup\nfrom starting points of entity nodes matching names found in a query, and follow their relationships.\n\nUse the following as guidelines.\n\n- Simplicity: The graph should have as few entities and relationship types as needed to convey the information in the input.\n- Consistency: Connections can only be made if entities and relationships use consistent naming.\n- Generality: The graph should be useful for describing the concepts in not just this document but other similar documents.\n- Accuracy: Do not add any information that is not explicitly mentioned in the text.\n\nINPUT: You will be provided a text document.\nOUTPUT:\n- You will produce valid json according the "Output Schema" section below.\n- Your response **must be** a **valid JSON document** with NO extra text, explanations, or markdown formatting.\n- The extracted entities and relationships **MUST STRICTLY CONFORM** to the constraints outlined below.\n- Any entities or relationships not matching the allowed types must be **EXCLUDED**.\n\n\n## Entities\nAn entity in a knowledge graph is a uniquely identifiable object or concept\n(such as a person, organization, location, object, or event),\nrepresented as a node with attributes (properties) and relationships to other entities.\n\nUse the reserved field name `_id` for the name. It will be a unique primary key,\nand MongoDB automatically creates an index for the `_id` field.\n\nMaintain Entity Consistency when extracting entities. If an entity, such as "John Doe",\nis mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "John", "Mr Doe", "he"),\nalways use the most complete identifier for that entity throughout the knowledge graph.\nIn this example, use "John Doe" as the entity `_id.`\n\n**Allowed Entity Types**:\n- Extract ONLY entities whose `type` matches one of the following: {allowed_entity_types}.\n- NOTE: If this list is empty, ANY `type` is permitted.\n\n### Examples of Exclusions:\n- If `allowed_entity_types` is `["Person", "Organization"]`, and the text mentions "Event" or "Location",\n these entities must **NOT** be included in the output.\n\n## Relationships\nRelationships represent edges in the knowledge graph. Relationships describe a specific edge type.\nRelationships MUST include a target entity, but Entities can be extracted that DO NOT have relationships!\nEnsure consistency and generality in relationship names when constructing knowledge schemas.\nInstead of using specific and momentary types such as \'worked_at\', use more general and timeless relationship types\nlike \'employee\'. Add details as attributes. Make sure to use general and timeless relationship types!\n\n**Allowed Relationship Types**:\n- Extract ONLY relationships whose `type` matches one of the following: {allowed_relationship_types}.\n- If this list is empty, ANY relationship type is permitted.\n- Map synonymous or related terms to the closest matching allowed type. For example:\n\t-\t“works for” or “employed by” → employee\n\t-\t“manages” or “supervises” → manager\n- If a relationship cannot be named with one of the allowed keys, **DO NOT include it**.\n- An entity need not have a relationships object if no relationship is found that matches the allowed relation types.\n\n### Examples of Exclusions:\n- If `allowed_relationship_types` is `["employs", "friend"]` and the text implies a "partner" relationship,\n the entities can be added, but the "partner" relationship must **NOT** be included.\n\n## Validation\nBefore producing the final output:\n1. Validate that all extracted entities have an `_id` and `type`.\n2. Validate that all `type` values are in {allowed_entity_types}.\n3. Validate that all relationships use keys in {allowed_relationship_types}.\n4. Exclude any entities or relationships failing validation.\n\n## Output Schema\nOutput a valid JSON document with a single top-level key, `entities`, as an array of objects.\nEach object must conform to the following schema:\n{entity_schema}\n\n{entity_examples}\n'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input_document'], input_types={}, partial_variables={}, template='{input_document}'), additional_kwargs={})]), query_prompt: ChatPromptTemplate = ChatPromptTemplate(input_variables=['allowed_entity_types', 'entity_name_examples', 'input_document'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['allowed_entity_types', 'entity_name_examples'], input_types={}, partial_variables={}, template='\nYou are a meticulous analyst tasked with extracting information from documents to form\nknowledge graphs of entities (nodes) and their relationships (edges).\n\nYou will be provided a short document (query) from which you infer the entity names.\nYou need not think about relationships between the entities. You only need names.\n\nProvide your response as a valid JSON Array of entity names\nor human-readable identifiers, found in the text.\n\n**Allowed Entity Types**:\n- Extract ONLY entities whose `type` matches one of the following: {allowed_entity_types}.\n- NOTE: If this list is empty, ANY `type` is permitted.\n\n### Examples of Exclusions:\n- If `allowed_entity_types` is `["Person", "Organization"]`, and the text mentions "Event" or "Location",\n these entities must **NOT** be included in the output.\n\n ## Examples:\n Example 1: `allowed_entity_types` is `[]`\n input: "John Doe works at ACME in New York"\n output: ["John Doe", "ACME", "New York"]\n\n In this example, you would identify 3 entities:\n John Doe of type person; ACME of type organization; New York of type place.\n\nExample 2: `allowed_entity_types` is `[organization, place]`\n input: "John Doe works at ACME in New York"\n output: ["ACME", "New York"]\n\n In this example, you would identify only 2 entities:\n ACME of type organization; New York of type place.\n John Doe, of type person, would be excluded.\n\n 2. input: "In what continent is Brazil?\n output: ["Brazil"]\n\nThis example is in the form of a question. There is one entity,\n\n3. input: "For legal and operational purposes, many governments and organizations adopt specific definitions."\n output: []\n\nIn the final example, there are no entities.\nThough there are concepts and nouns that might be types or attributes of entities,\nthere is nothing here that could be seen as being a unique identifier or name.\n\n### (Optional) Additional Examples\n\n{entity_name_examples}\n'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input_document'], input_types={}, partial_variables={}, template='{input_document}'), additional_kwargs={})]), max_depth: int = 2, allowed_entity_types: List[str] | None = None, allowed_relationship_types: List[str] | None = None, entity_examples: str | None = None, entity_name_examples: str = '', validate: bool = False, validation_action: str = 'warn') MongoDBGraphStore [source]#
Construct a MongoDB KnowLedge Graph for RAG from a MongoDB connection URI.
- Parameters:
connection_string (str) – A valid MongoDB connection URI.
database_name (str) – The name of the database to connect to.
collection_name (str) – The name of the collection to connect to.
entity_extraction_model (BaseChatModel) – LLM for converting documents into Graph of Entities and Relationships.
entity_prompt (ChatPromptTemplate) – Prompt to fill graph store with entities following schema.
query_prompt (ChatPromptTemplate) – Prompt extracts entities and relationships as search starting points.
max_depth (int) – Maximum recursion depth in graph traversal.
allowed_entity_types (List[str] | None) – If provided, constrains search to these types.
allowed_relationship_types (List[str] | None) – If provided, constrains search to these types.
entity_examples (str | None) – A string containing any number of additional examples to provide as context for entity extraction.
entity_name_examples (str) – A string appended to prompts.NAME_EXTRACTION_INSTRUCTIONS containing examples.
validate (bool) – If True, entity schema will be validated on every insert or update.
validation_action (str) – One of {“warn”, “error”}. - If “warn”, the default, documents will be inserted but errors logged. - If “error”, an exception will be raised if any document does not match the schema.
- Returns:
A new MongoDBGraphStore instance.
- Return type:
Traverse Graph along relationship edges to find connected entities.
- Parameters:
starting_entities (List[str]) – Traversal begins with documents whose _id fields match these strings.
max_depth (Optional[int]) – Recursion continues until no more matching documents are found, or until the operation reaches a recursion depth specified by this parameter.
- Returns:
List of connected entities.
- Return type:
List[Entity]
- similarity_search(input_document: str) List[Entity] [source]#
Retrieve list of connected Entities found via traversal of KnowledgeGraph.
Use LLM & Prompt to find entities within the input_document itself.
Find Entity Nodes that match those found in the input_document.
Traverse the graph using these as starting points.
- Parameters:
input_document (str) – String to find relevant documents for.
- Returns:
List of connected Entity dictionaries.
- Return type:
List[Entity]