由于 Microsoft 的 GraphRAG 论文只显示了模糊定义的提升,我发现 GraphRAG 提高了忠实度,但没有其他 RAGAS 指标 - 知识图谱的 ROI 可能无法证明炒作的合理性。
与基于向量的 RAG 相比,GraphRAG(通过 Cypher 在 Neo4j 中完全创建和检索时)增强了忠实度(类似于精度的 RAGAS 指标 - 例如,它是否准确反映了 RAG 文档中的信息),但不会影响其他 RAGAS 指标。考虑到性能开销,它可能无法提供足够的 ROI 来证明对准确性优势的炒作是合理的。
影响(见文章底部的本分析中的潜在偏差列表):
提高准确性: GraphRAG 可能适用于需要高精度的领域,例如医疗或法律应用。
复杂关系:它可能在涉及复杂实体关系的场景中表现出色,例如分析社交网络或供应链。
权衡:提高忠实度的代价是知识图谱设置和维护的复杂性增加,因此炒作可能没有道理。

介绍:
这篇文章是 GraphRAG 分析第 1 部分的后续,该部分对拜登和特朗普之间的美国总统辩论记录(截至本博客文章,该文档不在任何模型的训练数据中)执行 RAG,将 Neo4j(图形数据库)的向量数据库与 FAISS(非图形数据库)的向量数据库进行比较。这允许对数据库进行清晰的比较,而在这篇文章(第 2 部分)中,比较结合了 Neo4j 中的知识图创建和检索,使用 cypher 与 FAISS 基线进行比较,以评估这两种方法对同一文档的 RAGAS 指标的表现。
下面的代码演练,笔记本托管在我的 Github 上。
设置环境
首先,让我们设置环境并导入必要的库:
import warnings
warnings.filterwarnings('ignore')
import os
import asyncio
import nest_asyncio
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from dotenv import load_dotenv
from typing import List, Dict, Union
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Neo4jVector, FAISS
from langchain_core.retrievers import BaseRetriever
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate, ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.schema import Document
from neo4j import GraphDatabase
from ragas import evaluate
from ragas.metrics import faithfulness, answer_relevancy, context_relevancy, context_recall
from datasets import Dataset
import random
import re
from tqdm.asyncio import tqdm
from concurrent.futures import ThreadPoolExecutor
# API keys
load_dotenv()
openai_api_key = os.getenv("OPENAI_API_KEY")
neo4j_url = os.getenv("NEO4J_URL")
neo4j_user = os.getenv("NEO4J_USER")
neo4j_password = os.getenv("NEO4J_PASSWORD")
设置 Neo4j 连接
要使用 Neo4j 作为图形数据库,让我们设置连接并创建一些实用函数:
# Connection strings
driver = GraphDatabase.driver(neo4j_url, auth=(neo4j_user, neo4j_password))
# Function to clear the Neo4j instance
def clear_neo4j_data(tx):
tx.run("MATCH (n) DETACH DELETE n")
# Ensure vector index exists in Neo4j
def ensure_vector_index(recreate=False):
with driver.session() as session:
result = session.run("""
SHOW INDEXES
YIELD name, labelsOrTypes, properties
WHERE name = 'entity_index'
AND labelsOrTypes = ['Entity']
AND properties = ['embedding']
RETURN count(*) > 0 AS exists
""").single()
index_exists = result['exists'] if result else False
if index_exists and recreate:
session.run("DROP INDEX entity_index")
print("Existing vector index 'entity_index' dropped.")
index_exists = False
if not index_exists:
session.run("""
CALL db.index.vector.createNodeIndex(
'entity_index',
'Entity',
'embedding',
1536,
'cosine'
)
""")
print("Vector index 'entity_index' created successfully.")
else:
print("Vector index 'entity_index' already exists. Skipping creation.")
# Add embeddings to entities in Neo4j
def add_embeddings_to_entities(tx, embeddings):
query = """
MATCH (e:Entity)
WHERE e.embedding IS NULL
WITH e LIMIT 100
SET e.embedding = $embedding
"""
entities = tx.run("MATCH (e:Entity) WHERE e.embedding IS NULL RETURN e.name AS name LIMIT 100").data()
for entity in tqdm(entities, desc="Adding embeddings"):
embedding = embeddings.embed_query(entity['name'])
tx.run(query, embedding=embedding)
这些功能帮助我们管理 Neo4j 数据库,确保每次运行都有一个干净的石板,并且我们的向量索引设置正确。
数据处理和图形创建
现在,让我们加载数据并创建知识图谱:
# Load and process the PDF
pdf_path = "debate_transcript.pdf"
loader = PyPDFLoader(pdf_path)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)
# Function to create graph structure
def create_graph_structure(tx, texts):
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
for text in tqdm(texts, desc="Creating graph structure"):
prompt = ChatPromptTemplate.from_template(
"Given the following text, identify key entities and their relationships. "
"Format the output as a list of tuples, each on a new line: (entity1, relationship, entity2)\\n\\n"
"Text: {text}\\n\\n"
"Entities and Relationships:"
)
response = llm(prompt.format_messages(text=text.page_content))
# Process the response and create nodes and relationships
lines = response.content.strip().split('\\n')
for line in lines:
if line.startswith('(') and line.endswith(')'):
parts = line[1:-1].split(',')
if len(parts) == 3:
entity1, relationship, entity2 = [part.strip() for part in parts]
# Create nodes and relationship
query = (
"MERGE (e1:Entity {name: $entity1}) "
"MERGE (e2:Entity {name: $entity2}) "
"MERGE (e1)-[:RELATED {type: $relationship}]->(e2)"
)
tx.run(query, entity1=entity1, entity2=entity2, relationship=relationship)
这种方法使用 GPT-3.5-Turbo 从我们的文本中提取实体和关系,根据我们文档的内容创建一个动态知识图谱。
设置检索器
我们将设置两种类型的检索器:一种使用 FAISS 进行基于向量的检索,另一种使用 Neo4j 进行基于图形的检索。
# Embeddings model
embeddings = OpenAIEmbeddings()
# Create FAISS retriever
faiss_vector_store = FAISS.from_documents(texts, embeddings)
faiss_retriever = faiss_vector_store.as_retriever(search_kwargs={"k": 2})
# Neo4j retriever
def create_neo4j_retriever():
# Clear existing data
with driver.session() as session:
session.run("MATCH (n) DETACH DELETE n") # equivalent to the clear_neo4j_data function created earlier in code
# Create graph structure
with driver.session() as session:
session.execute_write(create_graph_structure, texts)
# Add embeddings to entities
with driver.session() as session:
max_attempts = 10
attempt = 0
while attempt < max_attempts:
count = session.execute_read(lambda tx: tx.run("MATCH (e:Entity) WHERE e.embedding IS NULL RETURN COUNT(e) AS count").single()['count'])
if count == 0:
break
session.execute_write(add_embeddings_to_entities, embeddings)
attempt += 1
if attempt == max_attempts:
print("Warning: Not all entities have embeddings after maximum attempts.")
# Create Neo4j retriever
neo4j_vector_store = Neo4jVector.from_existing_index(
embeddings,
url=neo4j_url,
username=neo4j_user,
password=neo4j_password,
index_name="entity_index",
node_label="Entity",
text_node_property="name",
embedding_node_property="embedding"
)
return neo4j_vector_store.as_retriever(search_kwargs={"k": 2})
# Cypher-based retriever
def cypher_retriever(search_term: str) -> List[Document]:
with driver.session() as session:
result = session.run(
"""
MATCH (e:Entity)
WHERE e.name CONTAINS $search_term
RETURN e.name AS name, [(e)-[r:RELATED]->(related) | related.name + ' (' + r.type + ')'] AS related
LIMIT 2
""",
search_term=search_term
)
documents = []
for record in result:
content = f"Entity: {record['name']}\\nRelated: {', '.join(record['related'])}"
documents.append(Document(page_content=content))
return documents
FAISS 检索器使用向量相似性来查找相关信息,而 Neo4j 检索器利用图形结构来查找相关实体及其关系。
创建 RAG 链
现在,让我们创建我们的 RAG 链:
def create_rag_chain(retriever):
llm = ChatOpenAI(model_name="gpt-3.5-turbo")
template = """Answer the question based on the following context:
{context}
Question: {question}
Answer:"""
prompt = PromptTemplate.from_template(template)
if callable(retriever):
# For Cypher retriever
retriever_func = lambda q: retriever(q)
else:
# For FAISS retriever
retriever_func = retriever
return (
{"context": retriever_func, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Create RAG chains
faiss_rag_chain = create_rag_chain(faiss_retriever)
cypher_rag_chain = create_rag_chain(cypher_retriever)
这些链将检索器与语言模型相关联,以根据检索到的上下文生成答案。
评估设置
为了评估我们的 RAG 系统,我们将创建一个真值数据集并使用 RAGAS 框架:
def create_ground_truth(texts: List[Union[str, Document]], num_questions: int = 100) -> List[Dict]:
llm_ground_truth = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.2)
def get_text(item):
return item.page_content if isinstance(item, Document) else item
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_text(' '.join(get_text(doc) for doc in texts))
ground_truth = []
question_prompt = ChatPromptTemplate.from_template(
"Given the following text, generate {num_questions} diverse and specific questions that can be answered based on the information in the text. "
"Provide the questions as a numbered list.\\n\\nText: {text}\\n\\nQuestions:"
)
all_questions = []
for split in tqdm(all_splits, desc="Generating questions"):
response = llm_ground_truth(question_prompt.format_messages(num_questions=3, text=split))
questions = response.content.strip().split('\\n')
all_questions.extend([q.split('. ', 1)[1] if '. ' in q else q for q in questions])
random.shuffle(all_questions)
selected_questions = all_questions[:num_questions]
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
for question in tqdm(selected_questions, desc="Generating ground truth"):
answer_prompt = ChatPromptTemplate.from_template(
"Given the following question, provide a concise and accurate answer based on the information available. "
"If the answer is not directly available, respond with 'Information not available in the given context.'\\n\\nQuestion: {question}\\n\\nAnswer:"
)
answer_response = llm(answer_prompt.format_messages(question=question))
answer = answer_response.content.strip()
context_prompt = ChatPromptTemplate.from_template(
"Given the following question and answer, provide a brief, relevant context that supports this answer. "
"If no relevant context is available, respond with 'No relevant context available.'\\n\\n"
"Question: {question}\\nAnswer: {answer}\\n\\nRelevant context:"
)
context_response = llm(context_prompt.format_messages(question=question, answer=answer))
context = context_response.content.strip()
ground_truth.append({
"question": question,
"answer": answer,
"context": context,
})
return ground_truth
async def evaluate_rag_async(rag_chain, ground_truth, name):
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
generated_answers = []
for item in tqdm(ground_truth, desc=f"Evaluating {name}"):
question = splitter.split_text(item["question"])[0]
try:
answer = await rag_chain.ainvoke(question)
except AttributeError:
answer = rag_chain.invoke(question)
truncated_answer = splitter.split_text(str(answer))[0]
truncated_context = splitter.split_text(item["context"])[0]
truncated_ground_truth = splitter.split_text(item["answer"])[0]
generated_answers.append({
"question": question,
"answer": truncated_answer,
"contexts": [truncated_context],
"ground_truth": truncated_ground_truth
})
dataset = Dataset.from_pandas(pd.DataFrame(generated_answers))
result = evaluate(
dataset,
metrics=[
context_relevancy,
faithfulness,
answer_relevancy,
context_recall,
]
)
return {name: result}
async def run_evaluations(rag_chains, ground_truth):
results = {}
for name, chain in rag_chains.items():
result = await evaluate_rag_async(chain, ground_truth, name)
results.update(result)
return results
# Main execution function
async def main():
# Ensure vector index
ensure_vector_index(recreate=True)
# Create retrievers
neo4j_retriever = create_neo4j_retriever()
# Create RAG chains
faiss_rag_chain = create_rag_chain(faiss_retriever)
neo4j_rag_chain = create_rag_chain(neo4j_retriever)
# Generate ground truth
ground_truth = create_ground_truth(texts)
# Run evaluations
rag_chains = {
"FAISS": faiss_rag_chain,
"Neo4j": neo4j_rag_chain
}
results = await run_evaluations(rag_chains, ground_truth)
return results
# Run the main function
if __name__ == "__main__":
nest_asyncio.apply()
try:
results = asyncio.run(asyncio.wait_for(main(), timeout=7200)) # 2 hour timeout
plot_results(results)
# Print detailed results
for name, result in results.items():
print(f"Results for {name}:")
print(result)
print()
except asyncio.TimeoutError:
print("Evaluation timed out after 2 hours.")
finally:
# Close the Neo4j driver
driver.close()
此设置会创建一个 Ground Truth 数据集,使用 RAGAS 指标评估我们的 RAG 链,并将结果可视化。

结果和分析
该分析揭示了 GraphRAG 和基于向量的 RAG 在大多数指标上的性能惊人相似,但有一个区别:
忠诚:
Neo4j GraphRAG 的表现明显优于 FAISS(0.54 对 0.18)
基于图形的方法在忠实度方面表现出色,可能是因为它保留了信息的关系上下文。在检索信息时,它可以遵循实体之间的显式关系,确保检索到的上下文与文档中信息的原始结构更紧密地保持一致。
影响和用例
虽然整体性能相似性表明,对于许多应用程序,在基于图形和基于向量的 RAG 之间进行选择可能不会对结果产生重大影响,但在某些特定情况下,GraphRAG 在忠实度方面的优势可能至关重要:
忠实度关键型应用程序:在保持确切关系和上下文至关重要的领域(例如,法律或医学领域),GraphRAG 可以提供显着的好处。
复杂关系查询:对于涉及实体之间复杂联系的场景(例如,调查金融网络或分析社会关系),GraphRAG 遍历关系的能力可能是有利的。
维护和更新:FAISS 等基于向量的系统可能更容易维护和更新,尤其是对于频繁变化的数据集。
计算资源:大多数指标的相似性能表明,根据特定用例和可用资源,设置和维护图形数据库的额外复杂性可能并不总是合理的。
关于潜在偏差的说明:
知识图谱创建:图结构是使用 GPT-3.5-Turbo 创建的,这可能会在实体和关系的提取方式上引入自身的偏差或不一致。
检索方法:FAISS 检索器使用向量相似性搜索,而 Neo4j 检索器使用 Cypher 查询。这些根本不同的方法可能偏向于某些类型的查询或信息结构,但这就是正在评估的内容。
上下文窗口限制:这两种方法都使用固定的上下文窗口大小,如果需要任何不同,则可能无法捕获知识图谱结构的全部复杂性。
数据集特异性:总体(这是所有 AI 工具分析的 100% 给定的):分析是针对单个文档(辩论记录)进行的,这可能并不代表所有潜在的用例。

