Understanding Memory Management on Hardware-Coherent Platforms



Understanding Memory Management on Hardware-Coherent Platforms

跨境人老刘

2025-10-22

导读：首先手上得有这样的NV硬件,

Oct 14, 2025

TL;DR

NVIDIA 的基于一致性驱动的内存管理（CDMM）模式是 NUMA 模式的替代方案，允许 NVIDIA 驱动程序控制和管理 GPU 内存，而不是操作系统，这对于 Kubernetes 等应用程序特别有用。

CDMM 模式防止 GPU 内存被作为软件 NUMA 节点暴露给操作系统，将其与 CPU 系统内存分离，并赋予 NVIDIA 驱动程序对 GPU 内存使用的完全控制权。

在 CDMM 模式下，系统分配的内存不会被迁移到 GPU，尽管 GPU 仍然可以通过 C2C 链路访问它，而且像 numactl 或 mbind 这样的工具不会影响 GPU 内存管理。

Reference:

https://en.wikipedia.org/wiki/Non-uniform_memory_access
Whitepape, https://nvda.ws/4nDHMTt
https://www.nvidia.com/en-us/data-center/nvlink-c2c/
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html

If you’re an application developer or a cluster administrator, you’ve likely seen how non-uniform memory access (NUMA) can impact system performance. When an application is not fully NUMA-aware, performance can be inconsistent and unpredictable.

如果您是应用程序开发者或集群管理员，您可能已经看到非一致性内存访问（NUMA）如何影响系统性能。当应用程序不是完全 NUMA 感知时，性能可能不一致且不可预测。

Because of these challenges, NVIDIA released the Coherent Driver-based Memory Management (CDMM) mode for the NVIDIA driver for platforms that are hardware-coherent, such as GH200, GB200 and GB300. CDMM allows the NVIDIA driver, instead of the OS, to control and manage the GPU memory. This permits much more fine-grained control by the application to put data in the appropriate memory space and subsequently extract maximum performance.

由于这些挑战，NVIDIA 为硬件一致性平台（如 GH200、GB200 和 GB300）发布了基于一致性驱动的内存管理（CDMM）模式。CDMM 允许 NVIDIA 驱动程序而不是操作系统来控制和管理工作站内存。这使应用程序能够更细粒度地控制将数据放入适当的内存空间，并随后提取最大性能。

In this blog we’re going to describe the differences between NUMA and CDMM and how they can impact application performance. We also published a whitepaper on this topic that you can check out for even more information.

在这篇博客中，我们将描述 NUMA 和 CDMM 之间的区别以及它们如何影响应用程序性能。我们还发布了关于这个主题的白皮书，您可以查看以获取更多信息。

What is NUMA?

NUMA mode is the current default for the NVIDIA Driver on hardware coherent platforms. NUMA exposes the entire CPU (host) memory and GPU (device) memory to the OS. This means that standard Linux APIs such as malloc and mmap, as well as CUDA APIs, can allocate memory on both the CPU and GPU. It also facilitates dynamic memory migration between CPU and GPU via user space APIs, or automatically by the kernel to optimize resource utilization.

NUMA 模式是 NVIDIA 驱动程序在硬件一致性平台上的当前默认设置。NUMA 将整个 CPU（主机）内存和 GPU（设备）内存暴露给操作系统。这意味着标准 Linux API（如 malloc 和 mmap）以及 CUDA API 都可以在 CPU 和 GPU 上分配内存。它还通过用户空间 API 或由内核自动进行 CPU 和 GPU 之间的动态内存迁移，以优化资源利用率。

An important side effect to consider, though, is that NUMA mode causes GPU memory to be treated as a generic memory pool, meaning that the ability to strictly isolate GPU memory from general OS system functions is limited. In typical NUMA behavior, memory may spill onto the GPU, which may not be desirable for application performance.

然而，需要考虑的一个重要副作用是，NUMA 模式将 GPU 内存视为一个通用内存池，这意味着严格隔离 GPU 内存与通用操作系统系统功能的能力有限。在典型的 NUMA 行为中，内存可能会溢出到 GPU 上，这对应用程序性能可能不是理想的。

That’s why NVIDIA provides an alternative: Coherent Driver-based Memory Management (CDMM) mode.

这就是为什么 NVIDIA 提供了一种替代方案：基于一致性驱动的内存管理（CDMM）模式。

What are hardware-coherent platforms?

Several NVIDIA systems—including the GH200, GB200, and GB300—contain direct NVLink chip-to-chip (C2C) connections between the CPU and the GPU. That introduces a powerful capability not present on PCIe-connected systems: hardware coherent memory. It allows both CPU and GPU memory to be directly addressed from either processor.

几个 NVIDIA 系统 —— 包括 GH200、GB200 和 GB300—— 在 CPU 和 GPU 之间包含直接的 NVLink 芯片到芯片（C2C）连接。这引入了 PCIe 连接系统上不存在的一项强大功能：硬件一致性内存。它允许 CPU 和 GPU 内存从任一处理器直接寻址。

This can have some unintended consequences for applications that rely on specific behaviors of NUMA. In particular, the operating system may select GPU memory for unexpected or surprising uses, such as caching files or avoiding out-of-memory (OOM) conditions from an allocation request. For some applications and workflows, especially those that have been optimized for a particular layout of CPU and GPU memory (like Kubernetes), these differences may be undesirable.

这可能会对依赖 NUMA 特定行为的某些应用程序产生一些非预期的后果。特别是，操作系统可能会选择 GPU 内存用于意外或令人惊讶的目的，例如缓存文件或避免因分配请求而导致的内存不足（OOM）情况。对于某些应用程序和工作流程，尤其是那些针对特定的 CPU 和 GPU 内存布局进行了优化的应用程序（如 Kubernetes），这些差异可能是不受欢迎的。

The new CDMM mode addresses these challenges and will be particularly useful for applications like Kubernetes. 新的 CDMM 模式解决了这些挑战，并将特别适用于像 Kubernetes 这样的应用程序。

How NUMA impacts Kubernetes

Because Kubernetes is such a ubiquitous way to operate large GPU clusters, there are some specific and unexpected behaviors that can be encountered when running Kubernetes in NUMA mode. These behaviors may hurt performance and even application functionality.

由于 Kubernetes 是一种非常普遍的用于操作大型 GPU 集群的方式，因此在 NUMA 模式下运行 Kubernetes 时可能会遇到一些特定且意外的行为。这些行为可能会损害性能，甚至影响应用程序的功能。

Memory over-reporting: Kubernetes incorrectly includes GPU memory in its system memory count, leading to pods requesting more memory than available and causing OOM failures.
内存虚报：Kubernetes 错误地将 GPU 内存包含在其系统内存计数中，导致 Pod 请求的内存超过可用内存，从而引发 OOM 失败。
Pod memory limits apply to GPU memory, not just system memory: Kubernetes pod memory limits, designed for system memory, incorrectly apply to both system and GPU memory when system-allocated memory is used, as each GPU is exposed as a NUMA node. This breaks the intended Pod spec API contract.
Pod 内存限制适用于 GPU 内存，而不仅仅是系统内存：Kubernetes Pod 内存限制，设计用于系统内存，在系统分配内存时错误地应用于系统和 GPU 内存，因为每个 GPU 都作为 NUMA 节点暴露。这破坏了预期的 Pod 规范 API 契约。
Isolating GPU memory amongst pods: Kubernetes pods, by default, can access all memory across NUMA nodes, including GPU memory. This allows containers to allocate memory on GPUs they don’t have access to, breaking isolation.
在 Pod 之间隔离 GPU 内存：默认情况下，Kubernetes Pod 可以访问所有 NUMA 节点的内存，包括 GPU 内存。这允许容器在它们无法访问的 GPU 上分配内存，从而破坏隔离。

For these reasons, we recommend using CDMM mode when using Kubernetes.

由于这些原因，我们建议在使用 Kubernetes 时使用 CDMM 模式。

What is CDMM?

CDMM is an alternative operating mode for NVIDIA drivers that prevents GPU memory from being exposed to the operating system as a software NUMA node. Instead, the NVIDIA device driver directly manages GPU memory, separating it from the CPU’s system memory. This approach is inspired by the PCIe-attached GPU model, where GPU memory is distinct from system memory.

CDMM 是 NVIDIA 驱动程序的一种替代操作模式，它防止 GPU 内存被操作系统作为软件 NUMA 节点暴露。相反，NVIDIA 设备驱动程序直接管理 GPU 内存，将其与 CPU 的系统内存分离。这种方法受到 PCIe 连接 GPU 模型的启发，在该模型中，GPU 内存与系统内存是不同的。

In CDMM mode, the CPU memory is managed by the Linux kernel and the GPU memory is managed by the NVIDIA driver. This means the NVIDIA driver, not the OS, is responsible for managing the GPU memory and has full control over how the GPU memory is used, thereby offering greater control and often better application performance.

在 CDMM 模式下，CPU 内存由 Linux 内核管理，而 GPU 内存由 NVIDIA 驱动程序管理。这意味着 NVIDIA 驱动程序，而不是操作系统，负责管理 GPU 内存，并完全控制 GPU 内存的使用方式，从而提供更大的控制权，并通常带来更好的应用程序性能。

How CDMM affects CUDA developers

The primary impact of CDMM is in the migration of system allocated memory. In the current implementation of CDMM, system allocated memory will not be migrated to the GPU. The GPU can still access system allocated memory across the C2C link, but memory pages will not be migrated.

CDMM 的主要影响在于系统分配内存的迁移。在 CDMM 的当前实现中，系统分配的内存不会被迁移到 GPU。GPU 仍然可以通过 C2C 链路访问系统分配的内存，但内存页不会迁移。

For example, when an application uses hints to encourage migration using functions such as cudaMemPrefetchAsync(), cudaMemPrefetchBatchAsync(),cudaMemDiscardAndPrefetchBatchAsync(), and cudaMemAdvise(SetPreferredLocation), the pages will not migrate.

例如，当应用程序使用 cudaMemPrefetchAsync() 、 cudaMemPrefetchBatchAsync() 、 cudaMemDiscardAndPrefetchBatchAsync() 和 cudaMemAdvise(SetPreferredLocation) 等函数通过提示来鼓励迁移时，页面将不会迁移。

How CDMM affects system administration

When the system is in CDMM mode, there will still be NUMA nodes corresponding to the GPUs, but they will not present any memory to the OS. Using tools such as numactl or mbind won’t have any effect when applied to GPU memory. We recommend these tools NOT be used in CDMM mode for any GPU memory management. They can still be used to manage system memory.

当系统处于 CDMM 模式时，仍然会有与 GPU 对应的 NUMA 节点，但它们不会向操作系统呈现任何内存。当使用 numactl 或 mbind 等工具应用于 GPU 内存时，将没有任何效果。我们建议不要在 CDMM 模式下使用这些工具进行任何 GPU 内存管理。它们仍然可以用于管理系统内存。

CDMM is currently the default mode for Kubernetes-based GPU operator deployments starting from Linux driver 580.65.06 and greater. To enable CDMM you need to pass a kernel module parameter and value when the driver is loaded. For the exact command and syntax to enable CDMM mode, please see the CDMM whitepaper.

CDMM 目前是 Linux 驱动版本 580.65.06 及更高版本中基于 Kubernetes 的 GPU 算子部署的默认模式。要启用 CDMM，需要在驱动加载时传递一个内核模块参数和值。有关启用 CDMM 模式的精确命令和语法，请参阅 CDMM 白皮书。

Guidelines for CDMM and NUMA usage

The following highlights the main differences between CDMM and NUMA modes, and when to consider using one mode or the other.

以下列出了 CDMM 模式和 NUMA 模式之间的主要区别，以及何时考虑使用其中一种模式。

Application-specific memory management

NUMA mode: Best for applications using OS NUMA APIs and relying on OS management of total system memory (CPU memory + GPU memory).
NUMA 模式：最适合使用操作系统 NUMA API 的应用程序，并依赖于操作系统管理整个系统内存（CPU 内存 + GPU 内存）。
CDMM mode: Ideal for applications needing direct GPU memory control, bypassing OS.
CDMM 模式：最适合需要直接控制 GPU 内存、绕过操作系统的应用程序。

Memory pooling

NUMA mode: Allows GPU and CPU memory to form a larger single pool. Workloads benefit from aggregated memory and bandwidth management.
NUMA 模式：允许 GPU 和 CPU 内存形成一个更大的单一池。工作负载受益于聚合的内存和带宽管理。
CDMM mode: Driver-managed, preventing OS from using GPU memory in a larger pool. GPU memory is dedicated to GPU-specific data.
CDMM 模式：驱动程序管理，防止操作系统在更大的内存池中使用 GPU 内存。GPU 内存专门用于 GPU 特定数据。

GPU memory usage: visibility and measurement

NUMA mode: Standard tools report GPU memory use within the integrated pool, filterable by NUMA thereby providing overall system memory view.
NUMA 模式：标准工具报告 GPU 内存使用情况，可通过 NUMA 进行筛选，从而提供整个系统内存的视图。
CDMM mode: Offers fine-grained control and visibility into GPU memory. Driver-managed GPU memory gives administrators and developers clear understanding of consumption for performance diagnosis and optimization.
CDMM 模式：提供对 GPU 内存的细粒度控制和可见性。驱动程序管理的 GPU 内存使管理员和开发人员能够清晰地了解消耗情况，以便进行性能诊断和优化。

Summary

The following table highlights the major differences in how memory is handled between NUMA and CDMM modes.

以下表格突出了 NUMA 和 CDMM 模式下内存处理的主要差异。

Table 1. Summary of differences in memory behavior between NUMA and CDMM modes

表 1. NUMA 模式和 CDMM 模式下内存行为的差异总结

By understanding and strategically implementing CDMM, developers and administrators can unlock the full potential of NVIDIA hardware-coherent memory architectures, ensuring optimal performance and control for their GPU-accelerated workloads.

通过理解和策略性地实施 CDMM，开发人员和管理员可以释放 NVIDIA 硬件一致性内存架构的全部潜力，确保其 GPU 加速工作负载的最佳性能和控制。

If you’re using a hardware-coherent platform such as GH200, GB200 or GB300, take a look at the whitepaper. And consider enabling CDMM mode to allow for fine-grained application control of GPU memory, especially if you’re using Kubernetes.

如果您正在使用 GH200、GB200 或 GB300 等硬件一致性平台，请查看白皮书。并考虑启用 CDMM 模式，以实现 GPU 内存的细粒度应用程序控制，特别是如果您正在使用 Kubernetes。

【声明】内容源于网络

跨境人老刘

跨境分享录 | 长期输出专业干货

内容 40156

粉丝 3

跨境人老刘跨境分享录 | 长期输出专业干货

总阅读227.1k

粉丝3

内容40.2k