

GraphMind：用于深度检索与多跳推理的全栈 Agentic 架构

AI大模型观察站

2025-12-28

导读：复杂问题需要跨文档、多实体与多步推理能力。本文介绍 GraphMind 的全栈 Agentic 架构，解析其如何结合深度检索、知识图结构与多跳推理，实现更准确、可解释的智能决策，适用于复杂问答与企业级

GraphMind：用于 Deep Retrieval 与 Multi-Hop Reasoning 的 Full-Stack Agentic Architecture

这篇博客为读者讲解一套完整、可用于生产的架构，利用 GraphRAG 流水线将复杂的非结构化数据转化为高准确度、可检索的知识。文中演示了 Chonkie 如何进行语义切分以保留上下文，Neo4j 如何同时存储 Vector 和 Graph 表示以实现双重检索能力，以及 LiteLLM 如何编排推理流程。文章还解释了系统如何通过 Intelligent Routing 动态切换快速直答与深度多跳推理代理（agents）。读完后，读者将理解如何构建一个稳健的、具备 agentic 能力的 RAG 系统，将 vector search、graph traversal 与编排的 LLM 推理无缝融合。

架构概览

该架构首先摄取原始文档，并通过 Chonkie（遵循句子边界与逻辑流的语义切分引擎）处理，确保切分后不丢失上下文。随后，LiteLLM 从这些 chunks 中抽取 Atomic Facts 与实体，并写入 Neo4j。这会形成一个 “Knowledge Graph”，在其中数据点不再是孤立的向量，而是相互连接的节点。

当用户提交查询时，Intelligent Router 会分析其复杂度。简单问题走 “Fast Path”——直接进行向量检索，毫秒级返回答案。复杂问题则触发 “GraphReader Agent”，执行多跳推理：遍历图、收集相关事实并综合形成深入的回答。最终系统既高效（简单查询成本低），又可靠（复杂查询具备深度推理能力）。

为什么选择 GraphRAG？

GraphRAG 带来多项契合企业级基础设施的优势。

Deep Reasoning：标准 RAG 基于关键词相似度检索文档。GraphRAG 理解的是关系。即使文档中不在同一句话出现，它也“知道” “PD-1” 与 “Immunotherapy” 相关。
Cost Efficiency：借助 Intelligent Routing，简单问题不会消耗昂贵的 token。只有确实需要时，才启用重型的 agentic 推理。
Traceability：每个答案都由一条具体的事实路径构建而成。你可以精确追踪 agent 访问了哪些节点，降低幻觉。
Robustness：系统包含 “Hybrid Fallback” 机制。如果高级 agent 失败，会优雅降级到标准向量检索，确保用户总能拿到答案。

技术栈

我们使用现代、模块化的栈来构建这个 “Cognitive Graph”。每个组件都因其在流水线中的特定优势而被选用。

Chonkie：专用切分库，将长文档切成有意义的 “chunks”（非随机切分），保留语义上下文。
PyMuPDF4LLM：稳健的 PDF 解析，保留版式与表格。
Neo4j：大脑。在同一数据库中同时存储文本（Vector）与连接关系（Graph），支持 hybrid retrieval。
LiteLLM：推理引擎。提供统一接口调用多种 LLM（Ollama、Mistral、GPT-4）以做 agentic 决策。
LangChain & LangSmith：编排器与调试器。LangChain 管理状态与流程，LangSmith 提供对 agent “思维过程”的可观测性。

数据摄取深潜

在回答问题之前，我们需要先“搭脑子”。摄取流水线使用 GraphReader Extractor 将原始文本转化为结构化的 Knowledge Graph。

1. 提取 Atomic Facts

我们并非把文本整块丢进数据库，而是用 LLM 提炼 “Atomic Facts”——最小且不可再分的信息单元。这样可避免大块文本常见的 “lost in the middle” 问题。

示例：输入文本：“Keytruda is an immunotherapy that targets PD-1 to treat lung cancer.”

提取的 Atomic Facts（JSON）：


   
   
   {
    "atomic_facts":[
        {
            "key_elements":["Keytruda","Immunotherapy"],
            "atomic_fact":"Keytruda is a type of immunotherapy."
        },
        {
            "key_elements":["Keytruda","PD-1","Lung Cancer"],
            "atomic_fact":"Keytruda targets PD-1 to treat lung cancer."
        }
    ]
}

2. 实体消歧（去重）

在真实数据中，“Google”“Google Inc.” 与 “Alphabet” 可能指向同一实体。我们使用 SpaCy 与 Fuzzy Search 来识别并合并这些重复。

Technique：Semantic Similarity 与 Levenshtein Distance
阈值：设置严格阈值（如 0.95）以避免误判
结果：更干净的图，关于 “Google” 的所有事实都指向同一个节点，而非分散成三个。

3. 索引策略

我们不只建一个索引，而是建两个以最大化召回：

Vector Index：通过 Mistral/Ollama 生成 1024 维 embeddings，并用 cosine 相似度查找语义相近的事实。
Keyword Index (Fulltext)：标准的 Lucene 索引，用于精确匹配（如药品名或编号），弥补向量检索可能漏掉的情况。


   
   
    Creating the Vector Index in Neo4j
CREATE VECTOR INDEX fact_embeddings IF NOT EXISTS
FOR (n:FactNode) ON (n.embedding)
OPTIONS {indexConfig: {
 `vector.dimensions`: 1024,
 `vector.similarity_function`: 'cosine'
}}

检索深潜

1. GraphReader Retriever

这是 agentic 的核心。不像普通 retriever 只执行查询并返回结果，
GraphReaderRetriever 是一个实现完整 “Thinking Loop”（Discovery -> Traversal -> Evaluation）的类。

2. 带评分的向量检索

我们不会盲目取 Top 结果，而是分析相似度分数（similarity score）来判断置信度。


   
   
   # Inside graph_reader_retriever.py
def initial_discovery(self, state):
    # ... setup retriever ...
    results = retriever.search(query_vector=question_embedding, top_k=5)
    
    # Extract scores to gauge relevance
    scores = [item.score for item in results.items]
    print(f"Vector hits: {len(results.items)} (Scores: {scores})")
    
    if max(scores) < 0.7:
        print("Warning: Low confidence match. Expanding search...")
        # Trigger fallback or broader search

实现

我们使用现代、模块化的栈构建这个 “Cognitive Graph”。每个组件都因其在流水线中的特定优势被选用。

Chonkie：专用切分库，将长文档切成有意义的 “chunks”（非随机切分），保留语义上下文。
Neo4j：大脑。在同一数据库中同时存储文本（Vector）与连接关系（Graph），支持 hybrid retrieval。
LiteLLM：推理引擎。提供统一接口调用多种 LLM（Ollama、Mistral、GPT-4）以做 agentic 决策。
LangChain：编排器。管理我们的 agentic graph 的状态与流程。

数据摄取流水线

在回答问题之前，我们需要先“搭脑子”。该摄取流水线将原始文本转换为结构化的 Knowledge Graph。

工作流

流水线遵循严格顺序以确保数据质量：

Chunking：将文档切分为语义块。
Extraction：使用 LLM 识别实体（如 “PD-1”）与 Atomic Facts（如 “PD-1 inhibits T-cells”）。
Indexing：将事实嵌入为向量以实现快速检索。
Resolution：使用 SpaCy 合并重复实体（使 “Google” 与 “Google Inc.” 收敛为一个节点）。

代码片段：构建流水线

以下展示如何用 LangGraph 定义该工作流：


   
   
   def _build_workflow(self):
    """Build LangGraph workflow for ingestion."""
    workflow = StateGraph(GraphRAGState)
    
    # Add all nodes
    workflow.add_node("document_loader", self._load_documents)
    workflow.add_node("text_chunker", self._chunk_documents)
    workflow.add_node("entity_extractor", self._extract_graph)
    workflow.add_node("fact_extractor", self._extract_atomic_facts)
    workflow.add_node("fact_writer", self._write_atomic_facts)
    workflow.add_node("vector_indexer", self._create_vector_index)
    workflow.add_node("entity_resolver", self._resolve_entities)
    
    # Define the flow
    workflow.set_entry_point("document_loader")
    workflow.add_edge("document_loader", "text_chunker")
    workflow.add_edge("text_chunker", "entity_extractor")
    workflow.add_edge("entity_extractor", "fact_extractor")
    workflow.add_edge("fact_extractor", "fact_writer")
    workflow.add_edge("fact_writer", "vector_indexer")
    workflow.add_edge("vector_indexer", "entity_resolver")
    workflow.add_edge("entity_resolver", END)
    
    return workflow.compile()

实现

先看项目脚手架：


   
   
   .
├── src
│   ├── GraphRAG
│   │   ├── Unstructured-kgpipeline-lexical
│   │   │   ├── langchain_graphrag_pipeline.py  # Core Pipeline
│   │   │   ├── graph_reader_retriever.py       # The Agentic Retriever
│   │   │   ├── retrieval_chain.py              # Routing Logic
│   │   │   ├── langchain_config.yaml           # Configuration
│   │   │   └── requirements.txt
├── .env
└── data
    └── medical_research.pdf

阶段 1：Intelligent Router（retrieval_chain.py）

精彩之处在这里。我们不会把每个查询一视同仁，而是用一个轻量 LLM 调用来对问题进行分类。它像 “Traffic Cop”，将简单查询引导至 Fast Path，复杂查询交给 Agent。


   
   
   def _classify_question_complexity(self, question: str) -> str:
    """Classify question complexity using LLM."""
    try:
        prompt = f"""Analyze this question and classify its complexity:
        Question: {question}
        Criteria:
        - SIMPLE: Single fact lookup, direct entity query (e.g., "What is X?")
        - COMPLEX: Multi-hop reasoning, comparative analysis (e.g., "How does X affect Y?")
        Respond with ONLY a JSON object: {{"complexity": "simple"}} or {{"complexity": "complex"}}"""
        response = self.llm.invoke(prompt, format="json")
        result = json.loads(response.content)
        return result.get("complexity", "simple").lower()
    except Exception:
        return "simple"

阶段 2：Routing Logic（retrieval_chain.py）

分类完成后，系统进行路由。注意这里的稳健回退机制：若 Agentic 路径失败（例如图查询报错），会自动回退到标准的 Hybrid Retriever。


   
   
   def answer_question(self, question: str, use_agentic: Optional[bool] = None) -> str:
    # 1. Determine Complexity
    if use_agentic is None:
        complexity = self._classify_question_complexity(question)
    else:
        complexity = "complex"if use_agentic else"simple"
    
    # 2. Try GraphReader (Primary)
    ifself.agentic_enabled:
        try:
            returnself.graph_reader_retriever.run_retrieval(question, complexity=complexity)
        except Exceptionas e:
            print(f"[INFO] Falling back to standard hybrid retrieval")
            
    # 3. Fallback (Secondary)
    returnself.hybrid_retriever.answer_question(question)

阶段 3：Agentic Core（graph_reader_retriever.py）

这是整个操作的“大脑”。The
GraphReader 不只是一个函数；它是个状态机。它包含用于 Discovery、Analysis 与 Evaluation 的节点。

The Hop Analyzer（The Explorer）：该节点执行“思考”。它审视已发现的事实，并决定在图上下一步去哪。


   
   
   def hop_analyzer(self, state: RetrievalState) -> RetrievalState:
    """Agentic Traversal: Performs undirected hop to find related facts."""
    # Cypher query to find neighbors
    query = """
        MATCH (f:FactNode)-[:HAS_ENTITY]->(e:EntityNode)<-[:HAS_ENTITY]-(neighbor:FactNode)
        WHERE f.fact IN $current_facts
        RETURN DISTINCT neighbor.fact AS new_fact
    """
    # ... execute query and update state ...
    return state

The Evaluator（The Judge）：在回答之前，agent 会暂停自检。信息是否足够？若不够，则回环继续。


   
   
   def evaluate_answer(self, state: RetrievalState) -> dict:
    """Decides if answer is sufficient, needs deepdive, or needs more hops."""
    prompt = f"""
    Original Question: {state['original_question']}
    Current Facts: {json.dumps(state['notebook'])}
    
    Is the question answered?
    1. 'sufficient': Answer is complete.
    2. 'hop_more': Need more related facts.
    3. 'deepdive': Need a new line of questioning.
    """
    # ... invoke LLM and return decision ...

结果：Multi-Hop Reasoning 实战

来看一个复杂医学问题的真实执行轨迹：“How does immunotherapy target PD-1 to treat lung cancer?”

第一步：Initial Discovery

系统找到切入点节点。

Found：“PD-1 is an immune checkpoint receptor.”
Found：“Immunotherapy uses checkpoint inhibitors.”

第二步：Hop（Graph Traversal）

Hop Analyzer 看到 “PD-1” 与 “T-cells” 之间的联系，决定沿此路径前进。

Hop：“PD-1 acts as a brake on T-cells.”
Hop：“Blocking PD-1 releases this brake.”

第三步：Synthesis（Final Answer）

Context Manager 将这些事实整合为最终响应。

Final Answer：Immunotherapy targets PD-1, a checkpoint protein on immune T-cells, to treat lung cancer by blocking its interaction with PD-L1 on cancer cells. Normally, this interaction acts as an "off switch" that prevents T-cells from attacking the cancer. By using checkpoint inhibitors (a form of immunotherapy), this signal is blocked, allowing the T-cells to recognize and destroy the lung cancer cells.

结语

这套实践落地展示了 GraphRAG 流水线如何从根本上提升企业级 AI 系统的质量与可靠性。通过结合 Chonkie 的语义切分、Neo4j 的图能力，以及 Agentic Routing，我们构建出比仅靠文本的系统更具“看、读、推理”精度的技术栈。

最终得到的是一个将 “thinking” 设为一等公民的稳健工作流。智能路由策略确保每个查询都获得恰到好处的算力——简单问题快速解答，复杂问题深入推理。该架构证明，RAG 的下一次进化不只是更好的 embeddings，而是关于真正的 “cognitive architecture”。

【声明】内容源于网络

AI大模型观察站

专注于人工智能大模型的最新进展，涵盖Transformer架构、LLM训练优化、推理加速、多模态应用等核心技术领域。通过深度解析论文、开源项目和行业动态，揭示大模型技术的演进趋势，助力开发者、研究者和AI爱好者把握前沿创新。

内容 263

粉丝 0

AI大模型观察站专注于人工智能大模型的最新进展，涵盖Transformer架构、LLM训练优化、推理加速、多模态应用等核心技术领域。通过深度解析论文、开源项目和行业动态，揭示大模型技术的演进趋势，助力开发者、研究者和AI爱好者把握前沿创新。

总阅读226

粉丝0

内容263