教程：使用 Gemini 文件搜索工具构建播客知识库- 大数跨境

首页

教程：使用 Gemini 文件搜索工具构建播客知识库

索引目录

2025-11-19

导读：关注「索引目录」公众号，获取更多干货。

关注「索引目录」公众号，获取更多干货。

最近推出了一款全新的文件搜索工具，这是一个完全托管的 RAG 系统，直接集成到 Gemini API 中，它抽象化了检索流程，让您可以专注于构建。欲了解所有详情，请查看博客文章，或继续阅读Mark McDonald的教程。

想象一下，您可以毫不费力地回忆起最喜欢的播客中的特定细节，或者在重温长篇剧集时快速了解剧情梗概。人工智能对话，加上播客的完整上下文，让这一切变得轻而易举。

在本教程中，我们将构建一个专门用于此场景的工具。我们将创建一个 Python 应用程序，它可以接收播客 RSS 源，转录每集内容，并使用文件搜索工具对其进行索引。这样，我们就可以使用自然语言提问，并根据播客的实际内容获得答案，答案中还会包含指向特定剧集的引用。

解决方案概述

该应用程序由两部分组成：

摄取（ingest.py）：
下载剧集，转录它们，并将转录稿上传到文件搜索存储。
查询（query.py）：
接受用户问题，搜索文件搜索库，并使用 Gemini 生成答案。

步骤 1：创建文件搜索存储库

文件搜索存储库是用于限定文档范围的容器。在本例中，我们使用单个存储库来存储所有播客，以便一次性搜索所有播客。

首先，安装Python SDK。

from google import genai
from google.genai import types

client = genai.Client()

要创建一个新商店，我们使用client.file_search_stores.create。我们将使用可选的显示名称来标识我们的播客索引。

store = client.file_search_stores.create(
    config={'display_name': 'My Podcast Store'}
)

第二步：转录剧集

为了建立内容索引，我们需要将音频转换为文本。我们下载音频文件，然后使用Gemini 2.5 Flash-Lite型号进行转录。我们选择 Flash-Lite 是因为它速度极快且经济高效，非常适合这项任务。

在 Gemini 中ingest.py，该transcribe_audio函数可以处理这种情况，您可以向 Gemini 添加任何提示指令，以帮助管理转录的质量，例如跳过介绍或标记发言人。

response = client.models.generate_content(
      model='gemini-2.5-flash-lite',
      contents=[
        types.Part.from_uri(
          file_uri=audio_file.uri,
          mime_type=audio_file.mime_type
        ),
        "Transcribe this audio. Output only the transcription. Label the speakers. Do not include any obvious ad-reads or promotional segments in the transcription (if unsure, leave them in)."
      ]
  )

步骤 3：上传带有元数据的成绩单

拿到文字稿后，我们就可以将其上传到我们的商店。文件搜索工具的一项强大功能是，您可以提供自定义元数据，用于在生成时进行筛选，从而将源数据限制在特定的播客或日期范围内。

要上传文件，我们使用client.file_search_stores.upload_to_file_search_store。这会在同一次调用中处理文件内容的上传并附加自定义元数据。

以下是准备元数据和上传文件的示例ingest.py。完整代码还添加了许多其他字段。

metadata = [
    {'key': 'title', 'string_value': ep.title},
    {'key': 'podcast', 'string_value': feed_info.title},
]

# Bring any tags from the feed itself
if 'tags' in ep:
    for tag in ep.tags:
        metadata.append({'key': 'tag', 'string_value': tag.term})

op = client.file_search_stores.upload_to_file_search_store(
    file_search_store_name=store_name,
    file=transcript_filename,
    config={
        'custom_metadata': metadata,
        'display_name': ep.title
    }
)

步骤 4：查询商店

现在到了最有趣的部分：提问！

为了在生成请求中启用文件搜索，我们会传递定义FileSearch要搜索的文件存储的工具，以及我们需要的任何筛选条件。

从query.py：

if args.podcast:
    metadata_filter = f"podcast = {args.podcast}"

file_search = types.FileSearch(
    file_search_store_names=[store.name],
    metadata_filter=metadata_filter  # Optional filter
)
tool = types.Tool(file_search=file_search)

response = client.models.generate_content(
    model='gemini-2.5-flash',
    contents=question,
    config=types.GenerateContentConfig(
        tools=[tool]
    )
)

当我们client.models.generate_content使用此工具进行呼叫时，Gemini 会自动搜索我们的商店以查找相关信息来回答用户的问题。

第五步：显示结果和引用

Gemini 的回复不仅包括答案，还包括引用，准确显示了使用了上传文件的哪些部分。

print("\nAnswer:")
print(response.text)

print("\nCitations:")
for i, chunk in enumerate(response.candidates[0].grounding_metadata.grounding_chunks):
    if chunk.retrieved_context:
        title = chunk.retrieved_context.title or "Unknown Episode"
        print(f"\nCitation {i+1}:")
        print(f"Episode: {title}")
        print(f"Text: {chunk.retrieved_context.text}")

这样一来，用户可以验证答案并进一步探索原始资料。

运行应用程序

收听播客：

python ingest.py "https://feeds.example.com/podcast.rss" --limit 5

这将下载最近 5 集节目，将其转录并上传到“播客”商店。

提出问题：

python query.py "Why are red delicious apples so bad?" --podcast="..."

Gemini 将从索引的文本中检索相关片段，将其作为查询的输入上下文传递，并提供带有引用的答案。

接下来呢？

通过使用 Gemini 文件搜索 API，我们将音频文件集合转化为一个内容丰富、可搜索的知识库。我们无需担心分块、嵌入或设置矢量数据库——API 会自动处理所有这些工作。添加元数据后，我们仅用极少的代码就构建了一个强大的搜索工具。