点击上方卡片,关注「CloudPilot AI」
回复关键词【案例】
查看多邻国、Canva等名企的云端降本实践
01/
背景
在 Kubernetes 上运行 vLLM 这类 AI 推理服务时,冷启动不只来自模型加载,也来自容器镜像本身。推理镜像通常包含 PyTorch、CUDA、Python 依赖和系统库,体积很容易达到数 GB 甚至十几 GB。传统 containerd/overlayfs 路径下,节点必须先完整下载并解压镜像,Pod 才能启动;这会拖慢弹性扩容、GPU 节点冷启动,也会间接影响第一次请求体验。
Lazy loading 可以把这个过程拆开:先用索引挂载镜像文件系统,让容器尽早启动,再按需从 registry 读取真正访问到的文件。SOCI、eStargz、Nydus 都证明了这个方向的价值,但实际落地时经常会带来新的复杂度:构建 index、转换镜像、维护额外 tag、改 CI/CD 或改应用 image reference。
Hermes - https://github.com/cloudpilot-ai/hermes 的目标是把这件事做到极简。业务团队继续发布和使用原来的 OCI image,不改 Dockerfile,不重建镜像,不改 CI;平台侧只需要定义 HermesPolicy,由 Hermes 在集群内自动构建、缓存并服务 SOCI index。节点上的 Hermes daemon 获取这些 index 后,继续从原始 registry lazy load 镜像数据。换句话说,Hermes 把 lazy loading 从“应用团队要接入的一套镜像流程”,变成了“策略驱动的 Kubernetes 集群能力”。当然,Pod Ready 变快并不自动等于 first token 变快,所以 Hermes 的验证也需要同时关注 container startup、vLLM readiness、first request TTFT 和 warmup 后的真实请求延迟。
接下来我们用EKS+Karpenter+Hermes验证Lazy loading效果。
02/
实验步骤
Step 1: 创建测试EKS集群
你可以通过下载此代码:https://github.com/cloudpilot-ai/examples/tree/main/clusters/eks-spot,快速运行如下命令创建集群:
terraform apply --auto-approve
然后获取 kubeconfig:
export KUBECONFIG=~/.kube/eksaws eks update-kubeconfig --name cluster-jw --region us-east-2
可选:清理旧测试资源
kubectl -n default delete deployment hermes-vllm-workload --ignore-not-foundkubectl delete hermespolicy prod-large-images --ignore-not-found || truekubectl delete nodeclaim -l karpenter.sh/nodepool=hermes --ignore-not-found || truekubectl delete nodeclaim -l karpenter.sh/nodepool=non-hermes --ignore-not-found || true
Step 2: 安装Karpenter到EKS
请参考官方文档安装Karpenter: https://karpenter.sh/docs/getting-started/getting-started-with-karpenter/
Step 3: 安装 Hermes Controller/CRD
Hermes daemon 运行在每个开启 Hermes 的节点上,controller/CRD 需要先部署到集群里,用来监听 HermesPolicy 和 Pod,并构建、缓存、服务 SOCI index。
git clone https://github.com/cloudpilot-ai/hermes.gitcd hermeskubectl apply -f deploy/hermespolicy-crd.yamlkubectl apply -f deploy/hermes-controller-eks.yamlkubectl -n hermes-system rollout status deploy/hermes-controllerkubectl -n hermes-system get svc hermes-controller -o wide
Hermes controller 默认通过 NodePort 对外提供 index/ztoc artifact;节点上的 hermes-daemon 会通过本机/节点 IP 对应的 NodePort 访问 controller,然后继续从原始 OCI registry lazy load 镜像数据。
apiVersion: karpenter.k8s.aws/v1kind: EC2NodeClassmetadata:name: non-hermesspec:amiSelectorTerms:- alias: al2023@v20260423blockDeviceMappings:- deviceName: /dev/xvdaebs:encrypted: truevolumeSize: 100GivolumeType: gp3kubelet:evictionHard:memory.available: 10%metadataOptions:httpEndpoint: enabledhttpProtocolIPv6: disabledhttpPutResponseHopLimit: 2httpTokens: requiredrole: CloudPilotNodeRole-cluster-jwsecurityGroupSelectorTerms:- tags:cluster.cloudpilot.ai/cluster-jw: "true"subnetSelectorTerms:- tags:cluster.cloudpilot.ai/cluster-jw: "true"tags:cloudpilot.ai/managed: "true"---apiVersion: karpenter.sh/v1kind: NodePoolmetadata:name: non-hermesspec:disruption:budgets:- nodes: "2"consolidateAfter: 60mconsolidationPolicy: WhenEmptyOrUnderutilizedtemplate:metadata:labels:node.cloudpilot.ai/managed: "true"spec:expireAfter: NevernodeClassRef:group: karpenter.k8s.awskind: EC2NodeClassname: non-hermesrequirements:- key: karpenter.k8s.aws/instance-gpu-countoperator: DoesNotExist- key: karpenter.k8s.aws/instance-categoryoperator: NotInvalues:- a- t- key: kubernetes.io/archoperator: Invalues:- amd64- key: kubernetes.io/osoperator: Invalues:- linux- key: karpenter.sh/capacity-typeoperator: Invalues:- on-demand- key: karpenter.k8s.aws/instance-memoryoperator: Ltvalues:- "32769"- key: karpenter.k8s.aws/instance-cpuoperator: Ltvalues:- "17"- key: beta.kubernetes.io/instance-typeoperator: NotInvalues:- c1.medium- m1.small- key: karpenter.k8s.aws/instance-familyoperator: Invalues:- c5aweight: 2
开启Hermes的配置(记住修改NodeClass的security groups,subnet和role配置):
apiVersion: karpenter.k8s.aws/v1kind: EC2NodeClassmetadata:name: hermesspec:amiSelectorTerms:- alias: al2023@v20260423blockDeviceMappings:- deviceName: /dev/xvdaebs:encrypted: truevolumeSize: 100GivolumeType: gp3kubelet:evictionHard:memory.available: 10%metadataOptions:httpEndpoint: enabledhttpProtocolIPv6: disabledhttpPutResponseHopLimit: 2httpTokens: requiredrole: CloudPilotNodeRole-cluster-jwsecurityGroupSelectorTerms:- tags:cluster.cloudpilot.ai/cluster-jw: "true"subnetSelectorTerms:- tags:cluster.cloudpilot.ai/cluster-jw: "true"tags:cloudpilot.ai/managed: "true"userData: |-#!/bin/bashset -euxo pipefailexport HERMES_INSTALLER_URL="https://raw.githubusercontent.com/cloudpilot-ai/hermes/main/hack/eks/install-hermes-daemon.sh"export HERMES_DAEMON_URL="https://github.com/cloudpilot-ai/hermes/releases/download/v0.0.1-alpha.1/hermes-daemon-linux-amd64.tar.gz"export HERMES_DAEMON_SHA256="93ea8d73e1c8b5324c8ee8ba9b4a5f50d686d60ba8453547460987d7d54ba861"curl -fsSL "${HERMES_INSTALLER_URL}" | \HERMES_DAEMON_URL="${HERMES_DAEMON_URL}" \HERMES_DAEMON_SHA256="${HERMES_DAEMON_SHA256}" \bash -s -----apiVersion: karpenter.sh/v1kind: NodePoolmetadata:name: hermesspec:disruption:budgets:- nodes: "2"consolidateAfter: 60mconsolidationPolicy: WhenEmptyOrUnderutilizedtemplate:metadata:labels:node.cloudpilot.ai/managed: "true"spec:expireAfter: NevernodeClassRef:group: karpenter.k8s.awskind: EC2NodeClassname: hermesrequirements:- key: karpenter.k8s.aws/instance-gpu-countoperator: DoesNotExist- key: karpenter.k8s.aws/instance-categoryoperator: NotInvalues:- a- t- key: kubernetes.io/archoperator: Invalues:- amd64- key: kubernetes.io/osoperator: Invalues:- linux- key: karpenter.sh/capacity-typeoperator: Invalues:- on-demand- key: karpenter.k8s.aws/instance-memoryoperator: Ltvalues:- "32769"- key: karpenter.k8s.aws/instance-cpuoperator: Ltvalues:- "17"- key: beta.kubernetes.io/instance-typeoperator: NotInvalues:- c1.medium- m1.small- key: karpenter.k8s.aws/instance-familyoperator: Invalues:- c5aweight: 2
$ kubectl get nodepool -ANAME NODECLASS NODES READY AGEhermes hermes 1 True 11hnon-hermes non-hermes 1 True 11h
export NAMESPACE=defaultexport ECR_REGION=us-east-1export ECR_REGISTRY=763104351884.dkr.ecr.us-east-1.amazonaws.comexport SECRET_NAME=hermes-ecr-us-east-1kubectl -n "$NAMESPACE" create secret docker-registry "$SECRET_NAME" \--docker-server="$ECR_REGISTRY" \--docker-username=AWS \--docker-password="$(aws ecr get-login-password --region "$ECR_REGION")" \--dry-run=client -o yaml | kubectl apply -f -
最后部署如下yaml,后续开启测试:
apiVersion: apps/v1kind: Deploymentmetadata:name: hermes-vllm-workloadnamespace: defaultlabels:app: hermes-vllm-workloadspec:replicas: 0selector:matchLabels:app: hermes-vllm-workloadtemplate:metadata:labels:app: hermes-vllm-workloadhermes.cloudpilot.ai/test: vllmspec:imagePullSecrets:- name: hermes-ecr-us-east-1nodeSelector:karpenter.sh/nodepool: non-hermescontainers:- name: vllmimage: 763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm:0.9-gpu-py312-ec2imagePullPolicy: Alwayscommand:- sh- -lc- sleep 3600resources:requests:cpu: 4
Step 5: 测试无懒加载效果
kubectl delete nodeclaim -l karpenter.sh/nodepool=non-hermes
运行如下命令:
kubectl -n default patch deployment hermes-vllm-workload --type='merge' -p '{"spec":{"template":{"spec":{"nodeSelector":{"karpenter.sh/nodepool":"non-hermes"}}}}}'kubectl scale deploy/hermes-vllm-workload --replicas=1
观察 Pod Ready 时间:
$ kubectl get pod -owide -wNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATEShermes-vllm-workload-784449c98d-bkpj8 0/1 Pending 0 9s <none> <none> <none> <none>hermes-vllm-workload-784449c98d-bkpj8 0/1 Pending 0 20s <none> <none> <none> <none>hermes-vllm-workload-784449c98d-bkpj8 0/1 Pending 0 29s <none> ip-10-0-3-237.us-east-2.compute.internal <none> <none>hermes-vllm-workload-784449c98d-bkpj8 0/1 ContainerCreating 0 29s <none> ip-10-0-3-237.us-east-2.compute.internal <none> <none>hermes-vllm-workload-784449c98d-bkpj8 0/1 ContainerCreating 0 4m39s <none> ip-10-0-3-237.us-east-2.compute.internal <none> <none>hermes-vllm-workload-784449c98d-bkpj8 1/1 Running 0 5m4s 10.0.11.32 ip-10-0-3-237.us-east-2.compute.internal <none> <none>$ kubectl get nodeclaim -ANAME TYPE CAPACITY ZONE NODE READY AGEnon-hermes-ls7hq c5a.2xlarge on-demand us-east-2a ip-10-0-3-237.us-east-2.compute.internal True 5m9s
可以发现从调度成功,到真正 Ready,中间经过的 pull image,大概耗时 5m4s - 29s = 4m35s
Step 6: 测试Hermes懒加载效果
apiVersion: hermes.cloudpilot.ai/v1alpha1kind: HermesPolicymetadata:name: prod-large-imagesspec:paused: falseimageSelectors:- imageRegex: ".*vllm.*"- imageRegex: ".*nginx.*"platforms:- linux/amd64
最后观察此 CR,等到 status 字段显示如下,Phase 显示 Ready:
$ kubectl get hermespolicy -oyamlapiVersion: v1items:- apiVersion: hermes.cloudpilot.ai/v1alpha1kind: HermesPolicymetadata:annotations:kubectl.kubernetes.io/last-applied-configuration: |{"apiVersion":"hermes.cloudpilot.ai/v1alpha1","kind":"HermesPolicy","metadata":{"annotations":{},"name":"prod-large-images"},"spec":{"imageSelectors":[{"imageRegex":".*vllm.*"},{"imageRegex":".*nginx.*"}],"paused":false,"platforms":["linux/amd64"]}}creationTimestamp: "2026-05-27T15:13:46Z"generation: 1name: prod-large-imagesresourceVersion: "243525"uid: efa35cb4-2911-4b33-94a1-3408b7d84fd1spec:imageSelectors:- imageRegex: .*vllm.*- imageRegex: .*nginx.*paused: falseplatforms:- linux/amd64status:images:- imageDigestRef: 763104351884.dkr.ecr.us-east-1.amazonaws.com/vllm@sha256:7ca69228a9066855929a9260bed4f8f076f3433f57fc0c05cc1ae425fd19d2b9lastBuildTime: "2026-05-28T02:51:11Z"phase: Readyplatform: linux/amd64observedGeneration: 1ready: 1kind: Listmetadata:resourceVersion: ""
这里的 Ready 表示 SOCI artifact 已经构建并缓存好,后续 Pod 启动会走 Hermes lazy loading。
然后运行如下命令:
kubectl scale deploy/hermes-vllm-workload --replicas=0kubectl -n default patch deployment hermes-vllm-workload --type='merge' -p '{"spec":{"template":{"spec":{"nodeSelector":{"karpenter.sh/nodepool":"hermes"}}}}}'
同样,为了避免 Hermes 节点复用本地镜像缓存,建议在正式计时前确保测试 Pod 已经删除,并让 hermes NodePool 使用新 NodeClaim。
kubectl wait --for=delete pod -l app=hermes-vllm-workload -n default --timeout=180s || truekubectl delete nodeclaim -l karpenter.sh/nodepool=hermeskubectl scale deploy/hermes-vllm-workload --replicas=1
最后观察 Pod Ready 时间:
$ kubectl get pod -owide -wNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATEShermes-vllm-workload-544dfbcc66-nwd2h 0/1 Pending 0 9s <none> <none> <none> <none>hermes-vllm-workload-544dfbcc66-nwd2h 0/1 Pending 0 21s <none> <none> <none> <none>hermes-vllm-workload-544dfbcc66-nwd2h 0/1 Pending 0 30s <none> ip-10-0-2-194.us-east-2.compute.internal <none> <none>hermes-vllm-workload-544dfbcc66-nwd2h 0/1 ContainerCreating 0 30s <none> ip-10-0-2-194.us-east-2.compute.internal <none> <none>hermes-vllm-workload-544dfbcc66-nwd2h 1/1 Running 0 44s 10.0.12.224 ip-10-0-2-194.us-east-2.compute.internal <none> <none>$ kubectl get nodeclaim -ANAME TYPE CAPACITY ZONE NODE READY AGEhermes-t4mk2 c5a.2xlarge on-demand us-east-2a ip-10-0-2-194.us-east-2.compute.internal True 56s
可以发现从调度成功,到真正 Ready,中间经过的 lazy loading 路径,大概耗时 44s - 30s = 14s
03/
总结
推荐阅读
全球抢 GPU,Kubernetes 却闲置?看 DRA 如何让算力按需飞
别了,EC2 Auto Scaling!AWS 2025 变革信号背后的行业真相
公司 GPU 还在 “摸鱼” 吗?这项Kubernetes 技术或许能帮你节省百万算力成本
公司介绍
CloudPilot AI 是一家总部位于旧金山硅谷的科技公司。致力于彻底变革云基础设施的管理方式
我们秉持“为全世界最严苛的团队自动扩展Kubernetes集群”的使命,已为数百家全球顶尖科技公司提供服务,累计为客户节省超过5亿美金,平均节省67%。
免费试用,2步5分钟,降低50%云成本:
cloudpilot.ai

