

加速向量检索：hipVS 与 hipRAFT 在 AMD 上的实践

AMD开发者中心

2025-12-02

原文作者：Sukriti Choudhary, Sujin Philip, Kevin Joseph, Fabricio Flores, Eliot Li, Lalith Narasimhan, Phani Vaddadi, Vish Vadlamani.

在这篇文章中，你将上手 hipVS ， AMD 的 GPU 加速向量检索库，并了解它与 hipRAFT 的关系。hipRAFT 是 hipVS 及其他 ROCmDS 项目的基础库。我们将通过一个交互式 Jupyter Notebook，演示 hipVS 中四种主流的向量检索方法：Brute-Force KNN、IVF-Flat、IVF-PQ 和 CAGRA。它们在准确率、性能、显存之间有不同权衡。你将学习如何使用 hipVS API 构建与查询向量索引，应用于语义搜索、推荐系统与 RAG。由于 API 与 NVIDIA 的 cuVS 保持兼容，在相同环境与依赖下迁移到AMD GPU 通常只需少量改动。

PART 01

什么是向量检索？

向量检索是在高维 embedding 空间中按相似度进行检索的方法。每个条目与查询都表示为数值向量。通过度量向量间的距离或相似度，可以发掘语义或上下文层面的联系。这种方法特别适合处理文本、图像、音频等非结构化数据，常用于语义搜索、推荐系统和大语言模型（LLM）的上下文检索。

下图展示了一个基于图像的向量检索示例：上方是查询图像（网球击球），后面是前五个匹配结果，均为视觉上相似的网球场景。下方散点图把 embedding 向量投影到 2D；浅蓝点是全部索引向量，查询与其最近邻以红色标出（查询为带描边的大点）。红色点簇靠近查询，说明语义相近的帧在 embedding 空间中彼此接近。

PART 02

什么是 hipVS？

hipVS 加入 AMD ROCm Data Science Toolkit（ROCm-DS） [1]，为 ROCm 生态补上 GPU 加速的向量检索能力。它是 Nvidia’s RAPIDS® cuVS 的 HIP 迁移实现，面向AMD GPU 提供高吞吐量的 ANN 索引与检索。已实现现代 ANN 方法，如 IVF-Flat、IVF-PQ 以及基于图的 CAGRA，同时支持压缩或低精度 embedding 和高效批量查询。

什么是 hipRAFT？它与 hipVS 的关系是什么？

hipRAFT 是 RAPIDS RAFT 的 HIP/ROCm 迁移版，提供可复用的 GPU 基础组件与工具（资源句柄、stream、内存分配器、多 GPU 通信），用于构建高性能数据科学算法。hipVS 基于 hipRAFT 构建，依赖 hipRAFT 的资源句柄统一协调 stream 与内存池，同时使用其数学/图/最近邻工具及通信后端，支持单机单卡与多卡执行。简言之，hipRAFT 提供基础设施与通用 kernel，hipVS 在此之上实现 IVF-Flat、IVF-PQ、CAGRA 等高效 ANN 索引。

hipVS 与 hipRAFT 的关键特性：

API 兼容：从 NVIDIA cuVS 迁移到 hipVS、从 RAFT 迁移到 hipRAFT 几乎无缝，便于把工作负载上到 AMD GPU。

多语言 API：面向开发者，在 hipRAFT 中提供 Python 与 C++ 原生 API；在 hipVS 中提供 C、C++、Python 与 Rust，以适配多样工作负载。

ROCm-DS 体系：hipVS 与 hipRAFT 是 ROCm-DS 统一开源组件的核心成员，覆盖数据科学、分析与 AI 的端到端 GPU 加速。

开源协作：在 Apache-2.0 与 MIT 协议下开放，欢迎社区共建。

AI 与数据科学的 GPU 基础积木：例如为 hipGraph 等库提供支撑，加速跨领域的可扩展分析与机器学习。

更多信息参考hipVS 文档 [2] 与 hipRAFT 文档 [3]。

PART 03

hipVS 的应用场景

hipVS 可用于多类 AI 应用，常见包括：

1.生成式 AI 与 RAG

o 通过对海量 embedding 的极速向量检索，赋能 RAG。

o 支持上下文检索、文档相似度、语义搜索。

2.推荐系统

o 加速用户-物品相似度与内容召回的检索。

o 支持规模化个性化，例如电商、流媒体与广告实时匹配用户与最相关的商品/媒体。

3.计算机视觉与多媒体检索

o 基于深度学习 embedding 加速图像、视频、音频的相似检索。

o 用于视觉搜索、内容去重、媒体推荐。

4.科学与工业 AI

o 支持基因组学、材料科学、制造质检、自动化系统中的大规模聚类与相似映射。

o 用于缺陷检测、模式识别、高维数据的模型验证。

PART 04

环境准备

接下来我们通过一个 Jupyter Notebook 逐步演示 hipVS Python API 提供的几种向量检索方法：Brute Force KNN、IVF-Flat、IVF-PQ、CAGRA。它们在准确率、速度与显存占用上各有取舍。

依赖要求

AMD GPU：支持的硬件与操作系统详见 ROCm 文档 [4]。

ROCm 7.0.2：安装参见 ROCm Linux 安装文档 [5]。

Docker：安装参见 Ubuntu Docker 引导 [6]。

本文使用一个定制 Dockerfile [7]，包含构建镜像所需指令，你无需手动安装依赖即可运行文中示例。推荐使用该容器环境，更省心也最稳定。

配置 Jupyter Notebook 环境

git clone https://github.com/ROCm/rocm-blogs.gitcd rocm-blogs/blogs/software-tools-optimization/hipvs

构建并启动容器，构建细节见hipvs/docker/Dockerfile [7]。

cd src/dockerdocker compose builddocker compose up

浏览器访问 http://127.0.0.1:8888/lab 并打开/src/hipvs.ipynb。

注意： Notebook 中你会看到包导入使用 “cuvs”。这是因为 hipVS 采用了广泛使用的 cuVS API。该 API 兼容性使已有 cuVS 工作负载可轻松迁移至 AMD 设备，在 AMD ROCm 平台上完成相同的数据处理任务。

环境就绪后，我们开始探索 hipVS 提供的多种向量检索方法。

PART 05

向量检索索引概览

向量检索索引用于组织 embedding，以便高效相似度检索。在 ANN 中，索引通过不同技术在速度、显存、准确率上做权衡。hipVS 提供多类索引，各自适用不同场景，常见包括：

Flat/Exact：对查询向量与全量数据逐一计算距离找近邻。实现简单且结果精确，但在大规模数据集上较慢。

Inverted File（IVF）：将数据划分为多个簇，仅在与查询最相关的簇中检索，比 Flat 更快，但可能牺牲部分准确率。

Product Quantization（PQ）：将向量压缩为更小的码，加速距离近似计算。更省显存，速度快，但准确率可能下降。

基于图的索引：构建相似图，节点为向量，边连接相近向量。通过图遍历实现高效近邻搜索，能兼顾高准确与高速度，但图结构会占用更多内存。CAGRA 是代表方法之一。

PART 06

编码模型与数据集

我们将使用simplewiki-2020-11-01 [8] 数据集，它来自 Simple English Wikipedia，适合 NLP 实验。数据为清洗后的简明英语文章，以 JSON Lines [9] 格式存储。文章被切分为段落，并用 transformers 编码模型进行编码。

simplewiki 数据集的样例如下：

{'id': '9822', 'title': 'Ted Cassidy', 'paragraphs': ['Ted Cassidy (July 31, 1932 - January 16, 1979) was an American actor. He was best known for his roles as Lurch and Thing on "The Addams Family".']}...{'id': '9850', 'title': 'Crater', 'paragraphs': ["A crater is a round dent on a planet. They are usually shaped like a circle or an oval. They are usually made by something like a meteor hitting the surface of a planet. Underground activity such as volcanoes or explosions can also cause them but it's not as likely."]}

编码过程（数据集与查询）使用 Hugging Face 上的 nq-distilbert-base-v1 [10]。这是一个 sentence-transformers 模型，可将句子/段落映射到 768 维稠密向量，适合聚类、语义搜索等任务。查看 simplewiki 第一条以及其编码片段：

simplewiki_save_path = './data/simplewiki-2020-11-01.jsonl.gz'simplewiki_url = 'http://sbert.net/datasets/simplewiki-2020-11-01.jsonl.gz'
# This is the encoder-transformer model from Hugging Facemodel_name = 'sentence-transformers/nq-distilbert-base-v1'encoder = SentenceTransformer(model_name)
get_simplewiki_dataset(simplewiki_url, simplewiki_save_path)passages, corpus_embeddings = create_and_encode_passages(simplewiki_save_path, encoder)
print(f'\nNumber of passages: {len(passages)}')print(f'\nExample of passage:\n{passages[0]}')print(f'\nExample of embedded passage:\n{corpus_embeddings[0][:10]}')

运行上述代码，输出类似：

Number of passages: 509663
Example of passage:['Ted Cassidy', 'Ted Cassidy (July 31, 1932 - January 16, 1979) was an American actor. He was best known for his roles as Lurch and Thing on "The Addams Family".']
Example of embedded passage:tensor([-0.7203,  0.7746, -0.8595, -0.3508,  0.6317,  0.0244, -0.6441,  0.9293,        -0.6116, -0.3703], device='cuda:0')

完成数据探索与编码后，即可开始用 hipVS Python API 体验不同向量检索方法。

PART 07

使用 Brute Force KNN 的向量检索

在 hipVS 中，brute-force KNN [11] 通过在 GPU 上对每个查询 Q 与索引中的每个向量 x 计算距离/相似度，找到精确近邻。本质等价于稠密矩阵乘 QX^T。适合需要高精度且数据集能放入 GPU 显存的场景。

先实例化资源对象：

# Resources  is a lightweight python wrapper around the corresponding# C++ Resources class. It stores and manages the runtime state (e.g., memory allocations, device handles, and other context data) needed to ensure safe resource reuse across multiple hipVS calls.# This instance is passed to each `algorithm.build` method.resources = Resources()

注意： Resources 是 hipRAFT C++ 资源类的轻量 Python 包装，暴露在 hipVS 文档 [12] 中。

构建索引：

bf_index = brute_force.build(corpus_embeddings, metric='sqeuclidean', resources=resources)
# This function is asynchronous so we need to explicitly synchronize the GPU before we can measure the execution timeresources.sync()

准备查询向量：

query="What is creating tides?"question_embedding = encoder.encode(query, convert_to_tensor=True)

执行检索，返回 top-5，并计时：

%%time
top_k=5distances, neighbors = brute_force.search(bf_index, question_embedding[None], top_k)

输出示例：

CPU times: user 43.1 ms, sys: 27.8 ms, total: 70.9 msWall time: 69 ms

查看最相近的 5 条：

for k in range(top_k):    print(f'Distance: {distances[0][k]}',f'Neighbor: {passages[neighbors[0][k]]}\n')

输出示例（节选）：

@font-face{font-family:"Times New Roman";}@font-face{font-family:"宋体";}@font-face{font-family:"Aptos";}@font-face{font-family:"等线";}@font-face{font-family:"Arial";}p.MsoNormal{mso-style-name:正文;mso-style-parent:"";margin:0pt;margin-bottom:.0001pt;font-family:Aptos;mso-fareast-font-family:等线;mso-bidi-font-family:'Times New Roman';font-size:12.0000pt;mso-font-kerning:1.0000pt;}span.msoIns{mso-style-type:export-only;mso-style-name:"";text-decoration:underline;text-underline:single;color:blue;}span.msoDel{mso-style-type:export-only;mso-style-name:"";text-decoration:line-through;color:red;}@page{mso-page-border-surround-header:no;	mso-page-border-surround-footer:no;}@page Section0{}div.Section0{page:Section0;}Distance: 94.91021728515625 Neighbor: ['Tide', "A tide is the periodic rising and falling of Earth's ocean surface caused mainly by the gravitational pull of the Moon acting on the oceans. Tides cause changes in the depth of marine and estuarine (river mouth) waters. Tides also make oscillating currents known as tidal streams (~'rip tides'). This means that being able to predict the tide is important for coastal navigation. The strip of seashore that is under water at high tide and exposed at low tide, called the intertidal zone, is an important ecological product of ocean tides."] Distance: 159.54246520996094 Neighbor: ['Tidal energy', "Many things affect tides. The pull of the Moon is the largest effect and most of the energy comes from the slowing of the Earth's spin."] Distance: 159.74078369140625 Neighbor: ['Storm surge', 'A storm surge is a sudden rise of water hitting areas close to the coast. Storm surges are usually created by a hurricane or other tropical cyclone. The surge happens because a storm has fast winds and low atmospheric pressure. Water is pushed on shore and the water level rises. Strong storm surges can flood coastal towns and destroy homes. A storm surge is considered the deadliest part of a hurricane. They kill many people each year.'] Distance: 178.28079223632812 Neighbor: ['Sea', 'Wind blowing over the surface of a body of water forms waves. The friction between air and water caused by a gentle breeze on a pond causes ripples to form. A strong blow over the ocean causes larger waves as the moving air pushes against the raised ridges of water. The waves reach their greatest height when the rate at which they travel nearly matches the speed of the wind. The waves form at right angles to the direction from which the wind blows. In open water, if the wind continues to blow, as happens in the Roaring Forties in the southern hemisphere, long, organized masses of water called swell roll across the ocean. If the wind dies down, the wave formation is reduced but waves already formed continue to travel in their original direction until they meet land. Small waves form in small areas of water with islands and other landmasses but large waves form in open stretches of sea where the wind blows steadily and strongly. When waves meet other waves coming from different directions, interference between the two can produce broken, irregular seas.'] Distance: 181.4980010986328 Neighbor: ['Tidal force', 'Tidal force is caused by gravity and makes tides happen. This is because the gravitational field changes across the middle of a body (the diameter).']

PART 08

使用 IVF-Flat 的向量检索

hipVS IVF-Flat [13] 是基于倒排文件（IVF）的 ANN 索引。它将数据分为 K 个簇，并记录每个簇内的向量列表。IVF-Flat 通过 k-means 将向量分配到 n_lists 个簇；查询时仅探测最近的 n_probes 个簇，并在这些簇内进行精确的簇内暴力搜索。通过调节扫描簇数来实现“速度-召回”的权衡。

创建索引时需要配置参数：簇数n_lists、度量 metric、
-means 迭代轮数、训练集采样比例等。

注意：索引训练（如 IVF-Flat、IVF-PQ、CAGRA）会使用全量或代表性子集构建结构（如质心、码本或 k-NN 图）。在大数据集上常用采样以缩短构建时间，前提是样本能反映整体分布。剩余数据（或子集）可作验证集，用于调参（如 n_lists、n_probes），平衡召回、时延与内存。

index_params = ivf_flat.IndexParams(n_lists=1024,                                     metric='sqeuclidean',                                     kmeans_n_iters=20,                                    kmeans_trainset_fraction=0.5                                   )ivf_flat_index = ivf_flat.build(index_params, corpus_embeddings, resources=resources)resources.sync()

设置搜索的n_probes，并编码查询：

# n_probes is the number of clusters we select in the first (coarse) search step.# This is the only hyper parameter for search.search_params = ivf_flat.SearchParams(n_probes=30)
query="What is creating tides?"question_embedding = encoder.encode(query, convert_to_tensor=True)

执行检索、返回 top-5，并计时：

%%time# Search top 5 nearest neighbors.top_k=5distances, indices = ivf_flat.search(search_params, ivf_flat_index, question_embedding[None], k=top_k,)

输出示例：

CPU times: user 47.3 ms, sys: 11.9 ms, total: 59.2 msWall time: 56.5 ms

如预期，IVF-Flat 比 Brute Force 更快。查看结果：

Distance: 94.91002655029297 Neighbor: ['Tide', "A tide is the periodic rising and falling of Earth's ocean surface caused mainly by the gravitational pull of the Moon acting on the oceans. (...) and exposed at low tide, called the intertidal zone, is an important ecological product of ocean tides."]
...
Distance: 185.30377197265625 Neighbor: ['Ocean surface wave', 'Ocean surface waves are surface (...) When a wave hits shallow water, it "breaks" because the bottom moves more slowly than the top.']

PART 09

使用 IVF-PQ 的向量检索

hipVS IVF-PQ [14] 将 IVF 与 PQ 组合。与 IVF-Flat 类似，先用 k-means 聚类并将向量分配到最近簇。在查询时，只在若干最近簇内检索。不同之处在于 PQ 将向量分割成更小的子向量，并将各子向量映射到训练得到的码字上，从而以紧凑码存储向量并直接基于量化码近似计算距离。关于 Product Quantization 的细节可参考论文 [15]。

创建索引并设置参数：

pq_dim = 1while pq_dim * 2 < corpus_embeddings.shape[1]:    pq_dim = pq_dim * 2
index_params = ivf_pq.IndexParams(n_lists=1024, metric='sqeuclidean', pq_dim=pq_dim)index = ivf_pq.build(index_params, corpus_embeddings, resources=resources)
resources.sync()

设置搜索参数、编码查询：

search_params = ivf_pq.SearchParams()show_properties(search_params)

执行检索、返回 top-5，并计时：

%%timetop_k=5distances, neighbors = ivf_pq.search(search_params, index, question_embedding[None], top_k, resources=resources)

运行时间示例：

CPU times: user 53.3 ms, sys: 8.19 ms, total: 61.5 msWall time: 60.2 ms

虽然 IVF-PQ 的检索时间通常低于 Brute Force、略高于 IVF-Flat，但它通过索引中存储量化向量显著降低显存占用，使大规模检索更易在 GPU 内存中完成。

查看结果：

for k in range(top_k):print(f'Distance: {distances[0][k]}',f'Neighbor: {passages[neighbors[0][k]]}\n')

Distance: 96.26384735107422 Neighbor: ['Tide', "A tide is the periodic rising an (...) is under water at high tide and exposed at low tide, called the intertidal zone, is an important ecological product of ocean tides."]
...
Distance: 180.87840270996094 Neighbor: ['Tidal force', 'Tidal force is caused by gravity and makes tides happen. This is because the gravitational field changes across the middle of a body (the diameter).']

PART 10

使用 CAGRA 的向量检索

hipVS CAGRA 是为 GPU 优化的基于图的 ANN 索引。它构建 K-NN 图，将每个向量链接到它的近邻，并通过沿图边的遍历向查询靠近来搜索近邻。借助 GPU 并行，CAGRA 能在大规模数据上实现高召回与低时延。详细见 CAGRA 论文 [16]。

创建索引与搜索参数：

# Set the index parametersbuild_params = cagra.IndexParams(metric="sqeuclidean")
# Build the indexindex = cagra.build(build_params, corpus_embeddings)
# Set the search parameterssearch_params = cagra.SearchParams()show_properties(search_params)

编码查询、执行检索并返回 top-5：

# Encoding the queryquery="What is creating tides?"question_embedding = encoder.encode(query, convert_to_tensor=True)

%%time# Search and return the top five closest elements top_k=5distances, neighbors = cagra.search(search_params,                                     index,                                     question_embedding[None],                                     top_k)

时间示例：

CPU times: user 3.39 ms, sys: 7.8 ms, total: 11.2 msWall time: 10.3 ms

查看结果：

for k in range(top_k):    print(f'Distance: {distances[0][k]}',f'Neighbor: {passages[neighbors[0][k]]}\n')

Distance: 94.91004180908203 Neighbor: ['Tide', "A tide is (...) and exposed at low tide, called the intertidal zone, is an important ecological product of ocean tides."]
...
Distance: 181.4979705810547 Neighbor: ['Tidal force', 'Tidal force is (...) This is because the gravitational field changes across the middle of a body (the diameter).']

PART 11

总结

本文介绍了 hipVS 与 hipRAFT 在 AMD GPU 上的使用，并通过 Notebook 跑通了四种检索方法：Brute-Force KNN、IVF-Flat、IVF-PQ、CAGRA。本次首次公开版本重点在于功能可用与 API 兼容，便于将既有工作负载以最小改动迁移到 AMD GPU。后续将持续进行性能优化、新特性补充与文档完善。欢迎你将 hipVS 与 hipRAFT 用于实际向量检索需求，并反馈使用体验。请持续关注 AMD 发布的 hipVS 特性更新、基准结果与最佳实践指南。

PART 12

致谢

感谢更广泛的 AMD 团队为 hipVS 与 hipRAFT 的实现做出的贡献： Philipp Samfass, Dominic Etienne Charrier, Michael Obersteiner, Mohammad NorouziArab, Lior Galanti, Matthew Cordery, Jason Riedy, Marco Grond, Bhavesh Lad, Pankaj Gupta, Bhanu Kiran Atturu, Ritesh Hiremath, Radha Srimanthula, Randy Hartgrove, Amit Kumar, Ram Seenivasan, Saad Rahim, Ehud Sharlin, Ramesh Mantha.

PART 13

免责声明

第三方内容由其权利人直接许可，与 AMD 无关。所有链接的第三方内容按“现状”提供，不附带任何形式的担保。使用此类第三方内容由你自行决定，因使用第三方内容产生的任何责任概由你自行承担，AMD 在任何情况下均不对第三方内容承担责任。

PART 14

参考链接

1.ROCm-DS：https://rocm.docs.amd.com/projects/rocm-ds/en/latest/

2.hipVS 文档：https://rocm.docs.amd.com/projects/hipVS/en/latest/

3.hipRAFT 文档：https://rocm.docs.amd.com/projects/hipRaft/en/latest

5.ROCm 安装（Linux）：https://rocm.docs.amd.com/en/latest/

6.Docker 安装（Ubuntu）：https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository

7.定制 Dockerfile：https://github.com/ROCm/rocm-blogs/tree/release/blogs

8.simplewiki-2020-11-01：https://sbert.net/datasets/simplewiki-2020-11-01.jsonl.gz

9.JSON Lines：https://jsonlines.org/

10.nq-distilbert-base-v1：https://huggingface.co/sentence-transformers/nq-distilbert-base-v1

11.brute-force KNN 文档：https://rocm.docs.amd.com/projects/hipVS/en/latest/reference/python_api/neighbors_brute_force.html

12.Resources 文档：https://rocm.docs.amd.com/projects/hipVS/en/latest/reference/python_api/common.html#cuvs.common.Resources

13.IVF-Flat 文档：https://rocm.docs.amd.com/projects/hipVS/en/latest/reference/python_api/neighbors_ivf_flat.html

14.IVF-PQ 文档：https://rocm.docs.amd.com/projects/hipVS/en/latest/reference/python_api/neighbors_ivf_pq.html

15.产品量化论文：https://inria.hal.science/inria-00514462v2/document

16.CAGRA 论文：https://arxiv.org/pdf/2308.15136

【声明】内容源于网络

AMD开发者中心

AMD开发者中心为开发者提供工具、技术和资源，助力构建AI解决方案。ROCm、Ryzen AI软件和ZenDNN，帮助您实现模型加速与部署。开发者可通过文档、SDK及教程快速上手。立即关注AMD开发者中心，开启智能未来！

内容 65

粉丝 0

AMD开发者中心 AMD开发者中心为开发者提供工具、技术和资源，助力构建AI解决方案。ROCm、Ryzen AI软件和ZenDNN，帮助您实现模型加速与部署。开发者可通过文档、SDK及教程快速上手。立即关注AMD开发者中心，开启智能未来！

总阅读49

粉丝0

内容65