大数跨境
0
0

容器安全加固实战:镜像漏洞扫描+运行时零信任防护完整指南

容器安全加固实战:镜像漏洞扫描+运行时零信任防护完整指南 外贸达人Cici
2025-10-18
13

容器安全加固实战:镜像漏洞扫描+运行时零信任防护完整指南

1. 适用场景 & 前置条件

适用场景:生产容器环境安全加固、CI/CD 漏洞自动化检测、镜像供应链安全、运行时攻击防御。

前置条件

  • • OS:RHEL 8+/Ubuntu 20.04+;内核 5.4+(支持 seccomp/AppArmor)
  • • 容器运行时:Docker 20.10+/containerd 1.6+/CRI-O 1.24+
  • • Kubernetes:1.25+(PSP 已废弃,使用 Pod Security Admission)
  • • 权限:root 或 docker/kubectl admin;镜像仓库读写权限
  • • 网络:能访问 NVD、GitHub Advisory 等漏洞数据库(或离线镜像)
  • • 工具:Trivy 0.48+、Cosign 2.0+、Falco 0.36+(可选)

2. 环境与版本矩阵

组件
RHEL 8/9
Ubuntu 20.04/22.04
版本要求
最小规格
OS 内核
4.18+
5.4+
支持 seccomp-bpf
-
Docker
20.10.23+
24.0+
含 BuildKit
2C/4G/50G
containerd
1.6.20+
1.7+
需 runc 1.1+
2C/4G
Kubernetes
1.25-1.30
1.25-1.30
含 PSA 特性
Master: 4C/8G
Trivy
0.48+
0.48+
DB 自动更新
1C/2G/10G(DB)
Grype
0.74+
0.74+
Syft 0.100+
1C/2G
Cosign
2.0+
2.0+
需 Rekor 服务
1C/1G
Falco
0.36+(可选)
0.36+
内核模块/eBPF
2C/4G
Harbor
2.8+
2.8+
集成 Trivy
4C/8G/100G

3. 快速清单(Checklist)

  1. 1. 安装扫描工具(Trivy/Grype)并验证漏洞检测
  2. 2. 配置 Dockerfile 安全基线(多阶段构建、非 root 用户)
  3. 3. 集成 CI/CD 流水线阻断高危漏洞镜像
  4. 4. 部署镜像签名与验证(Cosign + OPA Gatekeeper)
  5. 5. 配置 Kubernetes SecurityContext(readOnlyRootFilesystem/runAsNonRoot)
  6. 6. 实施 Network Policy 最小化网络暴露
  7. 7. 启用 Pod Security Admission(Restricted 策略)
  8. 8. 部署运行时防护(Falco 规则 + 告警)
  9. 9. 配置镜像仓库安全(Harbor 漏洞扫描 + 签名验证)
  10. 10. 建立定期扫描与修复流程(CVE 追踪 + 镜像重建)

4. 实施步骤

Step 1:安装与配置 Trivy 扫描工具

RHEL/CentOS:

# 安装 Trivy
sudo rpm -ivh https://github.com/aquasecurity/trivy/releases/download/v0.48.3/trivy_0.48.3_Linux-64bit.rpm

# 验证安装
trivy --version

# 更新漏洞数据库
trivy image --download-db-only

# 检查数据库路径
ls -lh ~/.cache/trivy/db/

Ubuntu/Debian:

# 安装 Trivy
wget https://github.com/aquasecurity/trivy/releases/download/v0.48.3/trivy_0.48.3_Linux-64bit.deb
sudo dpkg -i trivy_0.48.3_Linux-64bit.deb

# 离线环境配置(可选)
trivy image --download-db-only --cache-dir /opt/trivy-db
export TRIVY_CACHE_DIR=/opt/trivy-db

关键参数:

  • • --severity CRITICAL,HIGH:仅扫描高危漏洞
  • • --exit-code 1:发现漏洞时返回非零退出码(用于 CI 阻断)
  • • --ignore-unfixed:忽略暂无修复版本的 CVE

验证扫描功能:

# 扫描官方 Nginx 镜像
trivy image nginx:latest

# 仅输出 CRITICAL 和 HIGH 级别
trivy image --severity CRITICAL,HIGH nginx:latest

# JSON 格式输出
trivy image -f json -o nginx-scan.json nginx:latest

# 检查输出
jq '.Results[0].Vulnerabilities | length' nginx-scan.json

预期输出示例:

nginx:latest (debian 12.2)
Total: 87 (CRITICAL: 2, HIGH: 15, MEDIUM: 70)
┌────────────┬───────────────┬──────────┬──────────────────┬───────────────┐
│  Library   │ Vulnerability │ Severity │ Installed Version│ Fixed Version │
├────────────┼───────────────┼──────────┼──────────────────┼───────────────┤
│ openssl    │ CVE-2023-5678 │ CRITICAL │ 3.0.9-1          │ 3.0.11-1      │
└────────────┴───────────────┴──────────┴──────────────────┴───────────────┘

Step 2:配置 Dockerfile 安全基线

多阶段构建 + 最小镜像 + 非 root 运行:

# 构建阶段:使用完整镜像
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' -o myapp

# 运行阶段:使用 distroless
FROM gcr.io/distroless/static-debian12:nonroot
# 使用非特权用户(UID 65532)
USER nonroot:nonroot
WORKDIR /app
# 只读根文件系统(需要在 /tmp 挂载可写卷)
COPY --from=builder --chown=nonroot:nonroot /app/myapp .
EXPOSE8080
ENTRYPOINT ["/app/myapp"]

关键安全实践:

  • • 多阶段构建:分离构建依赖与运行依赖,减少攻击面
  • • Distroless 镜像:无 shell/包管理器,减少 70% 漏洞
  • • 非 root 用户:UID ≥ 10000,防止容器逃逸提权

扫描验证:

# 构建镜像
docker build -t myapp:secure .

# 扫描对比
trivy image --severity HIGH,CRITICAL myapp:secure

# 检查用户配置
docker inspect myapp:secure | jq '.[0].Config.User'
# 预期输出:"nonroot"

Alpine 基础镜像替代方案:

FROM alpine:3.19
RUN apk add --no-cache ca-certificates && \
    addgroup -S appgroup && adduser -S appuser -G appgroup

USER appuser
COPY --chown=appuser:appgroup myapp /app/
ENTRYPOINT ["/app/myapp"]

Step 3:集成 CI/CD 流水线(GitLab CI 示例)

.gitlab-ci.yml 配置:

stages:
-build
-scan
-sign
-deploy

variables:
IMAGE_NAME:myapp
IMAGE_TAG:$CI_COMMIT_SHORT_SHA
REGISTRY:registry.example.com

build:
stage:build
image:docker:24-dind
script:
-dockerbuild-t$REGISTRY/$IMAGE_NAME:$IMAGE_TAG.
-dockerpush$REGISTRY/$IMAGE_NAME:$IMAGE_TAG

security-scan:
stage:scan
image:aquasec/trivy:latest
script:
-trivyimage--exit-code1--severityCRITICAL$REGISTRY/$IMAGE_NAME:$IMAGE_TAG
-trivyimage--severityHIGH,CRITICAL--formatjson--outputscan-report.json$REGISTRY/$IMAGE_NAME:$IMAGE_TAG
artifacts:
reports:
container_scanning:scan-report.json
expire_in:30days
allow_failure:false# 发现 CRITICAL 漏洞时阻断流水线

sign-image:
stage:sign
image:gcr.io/projectsigstore/cosign:v2.2
script:
-cosignsign--keycosign.key$REGISTRY/$IMAGE_NAME:$IMAGE_TAG
only:
-main

关键配置:

  • • --exit-code 1:CRITICAL 漏洞阻断部署
  • • allow_failure: false:强制修复后才能继续
  • • 签名仅在主分支执行

GitHub Actions 等效配置:

-name:ScanwithTrivy
uses:aquasecurity/trivy-action@master
with:
image-ref:${{env.REGISTRY}}/${{env.IMAGE_NAME}}:${{github.sha}}
severity:'CRITICAL,HIGH'
exit-code:1

Step 4:镜像签名与验证(Cosign)

生成签名密钥:

# 生成 Cosign 密钥对
cosign generate-key-pair

# 输出:cosign.key(私钥)+ cosign.pub(公钥)
# 私钥存储到 GitLab/GitHub Secrets

# 签名镜像
cosign sign --key cosign.key registry.example.com/myapp:v1.0

# 验证签名
cosign verify --key cosign.pub registry.example.com/myapp:v1.0

Kubernetes 准入控制(OPA Gatekeeper + Cosign):

# 安装 Gatekeeper
kubectlapply-fhttps://raw.githubusercontent.com/open-policy-agent/gatekeeper/v3.14.0/deploy/gatekeeper.yaml

# 创建签名验证策略
apiVersion:templates.gatekeeper.sh/v1
kind:ConstraintTemplate
metadata:
name:cosignsignedimages
spec:
crd:
spec:
names:
kind:CosignSignedImages
targets:
-target:admission.k8s.gatekeeper.sh
rego:|
        package cosignsignedimages
        violation[{"msg": msg}] {
          input.review.object.kind == "Pod"
          image := input.review.object.spec.containers[_].image
          not cosign_verify(image)
          msg := sprintf("Image %v is not signed", [image])
        }
        cosign_verify(image) {
          # 调用外部 Cosign 验证服务
          response := http.send({
            "method": "GET",
            "url": sprintf("http://cosign-verifier.default.svc/verify?image=%v", [image])
          })
          response.status_code == 200
        }

部署验证服务(简化示例):

# 使用 Policy Controller(Sigstore 官方)
kubectl apply -f https://github.com/sigstore/policy-controller/releases/download/v0.8.0/release.yaml

# 配置全局签名验证策略
kubectl apply -f - <<EOF
apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata:
  name: require-signed-images
spec:
  images:
  - glob: "registry.example.com/**"
  authorities:
  - key:
      data: |
        $(cat cosign.pub)
EOF

验证阻断效果:

# 尝试部署未签名镜像
kubectl run test --image=nginx:latest

# 预期错误:
# Error: admission webhook denied the request:
# no matching signatures found for image nginx:latest

Step 5:Kubernetes SecurityContext 配置

Pod 安全上下文模板:

apiVersion:v1
kind:Pod
metadata:
name:secure-app
spec:
securityContext:
runAsNonRoot:true
runAsUser:10000
fsGroup:10000
seccompProfile:
type:RuntimeDefault
containers:
-name:app
image:registry.example.com/myapp:v1.0
securityContext:
allowPrivilegeEscalation:false
readOnlyRootFilesystem:true
capabilities:
drop:
-ALL
add:
-NET_BIND_SERVICE# 仅允许绑定 1024 以下端口
runAsNonRoot:true
volumeMounts:
-name:tmp
mountPath:/tmp
-name:cache
mountPath:/app/cache
volumes:
-name:tmp
emptyDir: {}
-name:cache
emptyDir: {}

关键参数解释:

  • • readOnlyRootFilesystem: true:防止运行时篡改文件系统
  • • capabilities.drop: ALL:移除所有 Linux Capabilities
  • • seccompProfile: RuntimeDefault:启用 seccomp 系统调用过滤

验证安全配置:

# 检查 Pod 运行用户
kubectl exec secure-app -- id
# 预期输出:uid=10000 gid=10000

# 尝试写入根目录(应失败)
kubectl exec secure-app -- touch /test.txt
# 预期错误:touch: /test.txt: Read-only file system

# 检查 Capabilities
kubectl exec secure-app -- capsh --print
# 预期输出:Current: cap_net_bind_service=ep

Step 6:Network Policy 最小化暴露

默认拒绝策略:

apiVersion:networking.k8s.io/v1
kind:NetworkPolicy
metadata:
name:default-deny-all
namespace:production
spec:
podSelector: {}
policyTypes:
-Ingress
-Egress

精细化白名单策略:

apiVersion:networking.k8s.io/v1
kind:NetworkPolicy
metadata:
name:allow-app-egress
namespace:production
spec:
podSelector:
matchLabels:
app:myapp
policyTypes:
-Egress
egress:
# 允许访问内部数据库
-to:
-podSelector:
matchLabels:
app:mysql
ports:
-protocol:TCP
port:3306
# 允许 DNS 查询
-to:
-namespaceSelector:
matchLabels:
name:kube-system
-podSelector:
matchLabels:
k8s-app:kube-dns
ports:
-protocol:UDP
port:53
# 允许访问外部 API(CIDR 限定)
-to:
-ipBlock:
cidr:203.0.113.0/24
ports:
-protocol:TCP
port:443

验证网络隔离:

# 测试内部连接(应成功)
kubectl exec -n production myapp-pod -- nc -zv mysql-service 3306

# 测试未授权连接(应超时)
kubectl exec -n production myapp-pod -- nc -zv redis-service 6379
# 预期输出:Connection timed out

# 检查策略生效
kubectl get netpol -n production
kubectl describe netpol allow-app-egress -n production

Step 7:Pod Security Admission(替代 PSP)

启用 PSA(K8s 1.25+ 默认启用):

# 查看当前策略
kubectl get ns production -o yaml | grep pod-security

# 配置 Namespace 级别策略
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/warn=restricted

Restricted 策略限制内容:

  • • 禁止特权容器(privileged: true
  • • 禁止主机网络/IPC/PID 命名空间
  • • 禁止主机端口映射
  • • 强制非 root 用户
  • • 强制只读根文件系统或明确的卷挂载
  • • 禁止 ALL Capabilities(除已批准列表)

测试策略阻断:

# 尝试部署特权容器
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: privileged-test
  namespace: production
spec:
  containers:
  - name: test
    image: nginx
    securityContext:
      privileged: true
EOF


# 预期错误:
# Error: pods "privileged-test" is forbidden:
# violates PodSecurity "restricted:latest": privileged

豁免特定工作负载(谨慎使用):

apiVersion:v1
kind:Namespace
metadata:
name:monitoring
labels:
pod-security.kubernetes.io/enforce:baseline
pod-security.kubernetes.io/audit:restricted
pod-security.kubernetes.io/warn:restricted
# 豁免特定用户/ServiceAccount
pod-security.kubernetes.io/exempt:prometheus-sa

Step 8:运行时防护(Falco 规则)

安装 Falco(Helm):

# 添加 Helm 仓库
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update

# 安装 Falco(eBPF 模式)
helm install falco falcosecurity/falco \
  --namespace falco --create-namespace \
  --set driver.kind=ebpf \
  --set falcosidekick.enabled=true \
  --set falcosidekick.webui.enabled=true

# 验证 DaemonSet 运行
kubectl get pods -n falco

自定义安全规则(/etc/falco/rules.d/custom.yaml):

-rule:UnauthorizedProcessinContainer
desc:Detectshellorpackagemanagerexecutioninproductioncontainers
condition:>
    spawned_process and
    container and
    (proc.name in (sh, bash, ash, zsh, apt, apt-get, yum, dnf)) and
    container.image.repository != "debug-tools"
output:>
    Unauthorized process started (user=%user.name command=%proc.cmdline
    container=%container.name image=%container.image.repository)
priority:WARNING
tags: [processmitre_execution]

-rule:WritetoNon-WhitelistedDirectory
desc:Detectfilewritesoutside/tmpor/app/cache
condition:>
    open_write and
    container and
    not fd.directory in (/tmp, /app/cache) and
    not fd.name startswith /proc
output:>
    File write to unexpected location (file=%fd.name command=%proc.cmdline
    container=%container.name)
priority:ERROR
tags: [filesystemmitre_persistence]

-rule:OutboundConnectiontoSuspiciousIP
desc:DetectconnectionstoknownmaliciousIPsorunusualports
condition:>
    outbound and
    container and
    (fd.sport in (22, 3389, 4444, 6667) or
     fd.sip in (198.51.100.0/24))
output:>
    Suspicious outbound connection (ip=%fd.sip port=%fd.sport
    command=%proc.cmdline container=%container.name)
priority:CRITICAL
tags: [networkmitre_command_and_control]

验证规则触发:

# 触发 shell 执行告警
kubectl exec -it myapp-pod -- /bin/sh

# 查看 Falco 日志
kubectl logs -n falco -l app.kubernetes.io/name=falco | grep "Unauthorized process"

# 预期输出:
# 16:45:23.456789: Warning Unauthorized process started
# (user=root command=/bin/sh container=myapp-pod image=myapp:v1.0)

集成告警(Falcosidekick → Slack/PagerDuty):

# 更新 Helm values
falcosidekick:
config:
slack:
webhookurl:"https://hooks.slack.com/services/XXX"
minimumpriority:"warning"
pagerduty:
integrationkey:"YOUR_KEY"
minimumpriority:"error"

Step 9:镜像仓库安全配置(Harbor)

启用自动扫描:

# Harbor API 配置自动扫描
curl -u admin:Harbor12345 -X PUT \
"https://harbor.example.com/api/v2.0/projects/myproject" \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "auto_scan": "true",
      "severity": "high",
      "prevent_vul": "true"
    }
  }'


# 验证配置
curl -u admin:Harbor12345 \
"https://harbor.example.com/api/v2.0/projects/myproject" | jq '.metadata'

配置镜像签名验证:

# Harbor Notary 配置(已废弃,推荐 Cosign)
# 改用 Cosign + Harbor Webhook

# 配置 Webhook 回调验证签名
curl-uadmin:Harbor12345-XPOST\
"https://harbor.example.com/api/v2.0/projects/myproject/webhook/policies"\
-H"Content-Type: application/json"\
-d'{
    "name": "Verify Cosign Signature",
    "targets": [
      {
        "type": "http",
        "address": "http://cosign-verifier.default.svc/webhook",
        "skip_cert_verify": false
      }
    ],
    "event_types": ["PUSH_ARTIFACT"],
    "enabled": true
  }'

阻止高危镜像拉取:

# 配置 CVE 白名单(Harbor 2.8+)
curl -u admin:Harbor12345 -X POST \
"https://harbor.example.com/api/v2.0/system/CVEAllowlist" \
  -d '{
    "items": [
      {"cve_id": "CVE-2023-1234"}  # 已评估风险可接受
    ],
    "expires_at": 1735689600  # 2025-01-01
  }'


# 测试拉取被阻止的镜像
docker pull harbor.example.com/myproject/vulnerable-app:v1.0
# 预期错误:Error: image has vulnerabilities exceeding severity threshold

Step 10:定期扫描与修复流程

自动化扫描脚本(Cron Job):

#!/bin/bash
# /opt/scripts/scan-running-images.sh

set -euo pipefail

NAMESPACE="production"
REPORT_DIR="/var/log/trivy"
SLACK_WEBHOOK="https://hooks.slack.com/services/XXX"

mkdir -p "$REPORT_DIR"

# 获取所有运行中的镜像
IMAGES=$(kubectl get pods -n "$NAMESPACE" -o jsonpath='{.items[*].spec.containers[*].image}' | tr' ''\n' | sort -u)

# 扫描每个镜像
for IMAGE in$IMAGESdo
    SAFE_NAME=$(echo"$IMAGE" | tr'/:''_')
    REPORT="$REPORT_DIR/${SAFE_NAME}_$(date +%Y%m%d).json"

echo"Scanning $IMAGE..."
    trivy image --severity CRITICAL,HIGH --format json --output "$REPORT""$IMAGE"

    CRITICAL=$(jq '[.Results[].Vulnerabilities[] | select(.Severity=="CRITICAL")] | length'"$REPORT")
    HIGH=$(jq '[.Results[].Vulnerabilities[] | select(.Severity=="HIGH")] | length'"$REPORT")

if [[ $CRITICAL -gt 0 || $HIGH -gt 5 ]]; then
        MESSAGE="⚠️ Image $IMAGE has $CRITICAL CRITICAL and $HIGH HIGH vulnerabilities. Report: $REPORT"
        curl -X POST -H 'Content-type: application/json' \
          --data "{\"text\":\"$MESSAGE\"}""$SLACK_WEBHOOK"
fi
done

# 清理 30 天前的报告
find "$REPORT_DIR" -name "*.json" -mtime +30 -delete

Kubernetes CronJob 部署:

apiVersion:batch/v1
kind:CronJob
metadata:
name:image-security-scan
namespace:security
spec:
schedule:"0 2 * * *"# 每天凌晨 2 点
jobTemplate:
spec:
template:
spec:
serviceAccountName:image-scanner
containers:
-name:scanner
image:aquasec/trivy:latest
command: ["/bin/sh""-c"]
args:
-|
              apk add --no-cache curl jq kubectl
              /opt/scripts/scan-running-images.sh
volumeMounts:
-name:scan-script
mountPath:/opt/scripts
-name:reports
mountPath:/var/log/trivy
restartPolicy:OnFailure
volumes:
-name:scan-script
configMap:
name:scan-script
defaultMode:0755
-name:reports
persistentVolumeClaim:
claimName:scan-reports
---
apiVersion:v1
kind:ServiceAccount
metadata:
name:image-scanner
namespace:security
---
apiVersion:rbac.authorization.k8s.io/v1
kind:ClusterRole
metadata:
name:image-scanner
rules:
-apiGroups: [""]
resources: ["pods"]
verbs: ["get""list"]
---
apiVersion:rbac.authorization.k8s.io/v1
kind:ClusterRoleBinding
metadata:
name:image-scanner
subjects:
-kind:ServiceAccount
name:image-scanner
namespace:security
roleRef:
kind:ClusterRole
name:image-scanner
apiGroup:rbac.authorization.k8s.io

5. 监控与告警

Prometheus 指标采集(Falco Exporter):

# ServiceMonitor for Falco
apiVersion:monitoring.coreos.com/v1
kind:ServiceMonitor
metadata:
name:falco
namespace:falco
spec:
selector:
matchLabels:
app.kubernetes.io/name:falco
endpoints:
-port:metrics
interval:30s

关键指标与告警规则:

# /etc/prometheus/rules/container-security.yaml
groups:
-name:container-security
interval:30s
rules:
-alert:HighSeverityVulnerabilitiesDetected
expr:sum(falco_events_total{priority="Critical"})>10
for:5m
labels:
severity:critical
annotations:
summary:"High severity security events detected"
description:"{{ $value }} critical Falco events in last 5 minutes"

-alert:UnauthorizedProcessExecution
expr:rate(falco_events_total{rule="UnauthorizedProcessinContainer"}[5m])>0
for:2m
labels:
severity:warning
annotations:
summary:"Shell execution detected in production container"

-alert:ImagePullFromUntrustedRegistry
expr:|
      kube_pod_container_info{image!~"registry.example.com/.*"} == 1
labels:
severity:warning
annotations:
summary:"Pod using image from untrusted registry"
description:"Pod {{ $labels.namespace }}/{{ $labels.pod }} uses image {{ $labels.image }}"

-alert:PodRunningAsRoot
expr:|
      kube_pod_container_status_running{} == 1
      and on(namespace, pod, container)
      kube_pod_container_info{container_id!="", image!~".*debug.*"}
      unless on(namespace, pod, container)
      kube_pod_security_context_run_as_non_root == 1
for:10m
labels:
severity:warning
annotations:
summary:"Container running as root user"

Grafana Dashboard 关键面板:

{
"panels":[
{
"title":"Top 10 Vulnerable Images",
"targets":[{
"expr":"topk(10, count by (image) (trivy_vulnerability_count{severity=\"Critical\"}))"
}]
},
{
"title":"Falco Security Events",
"targets":[{
"expr":"sum(rate(falco_events_total[5m])) by (priority, rule)"
}]
},
{
"title":"Unsigned Images Running",
"targets":[{
"expr":"count(kube_pod_container_info{image!~\".*@sha256:.*\"})"
}]
}
]
}

PromQL 查询示例:

# 每个命名空间的 CRITICAL 漏洞数量
sum by (namespace) (trivy_vulnerability_count{severity="Critical"})

# 过去 24 小时 Falco 告警趋势
increase(falco_events_total[24h])

# 未使用 SecurityContext 的 Pod 比例
(count(kube_pod_container_info{})
 - count(kube_pod_security_context_run_as_non_root == 1))
/ count(kube_pod_container_info{}) * 100

6. 性能与容量

扫描性能基准:

# 测试 Trivy 扫描速度
time trivy image nginx:latest

# 典型结果:
# - 小镜像(Alpine 基础):3-5 秒
# - 中型镜像(Debian/Ubuntu):10-20 秒
# - 大型镜像(>1GB):30-60 秒

# 并发扫描测试
seq 1 10 | xargs -P 5 -I {} sh -c 'time trivy image nginx:latest > /dev/null'

# 数据库缓存影响(首次 vs 后续)
rm -rf ~/.cache/trivy  # 清除缓存
time trivy image nginx:latest  # 首次:~15 秒
time trivy image nginx:latest  # 缓存后:~3 秒

资源消耗基准:

场景
CPU 平均
CPU 峰值
内存平均
磁盘 I/O
网络流量
Trivy 扫描中型镜像
0.5 核
1.2 核
1.2 GB
50 MB/s 读
10 MB 下载
Falco 运行时监控
0.1 核
0.3 核
200 MB
5 MB/s 写
忽略不计
Harbor 自动扫描(10并发)
5 核
8 核
10 GB
200 MB/s 读
50 MB/镜像
Cosign 签名验证
0.05 核
0.1 核
50 MB
< 1 MB/s
5 KB/验证

调优参数:

# Trivy 离线模式(减少网络延迟)
trivy image --skip-db-update --cache-dir /mnt/fast-ssd/trivy-cache nginx:latest

# Falco 减少采样频率(降低 CPU)
# /etc/falco/falco.yaml
syscall_event_drops:
  threshold: 0.1  # 允许 10% 事件丢弃
  actions:
    - log
    - alert

# Harbor 扫描并发控制
# harbor.yml
jobservice:
  max_job_workers: 10  # 默认 10,根据 CPU 核心数调整

容量规划建议:

  • • 中小团队(< 100 镜像):单节点 Trivy + Harbor,4C/8G 足够
  • • 中型团队(100-500 镜像):Harbor 高可用(2 节点),专用扫描节点 8C/16G
  • • 大型团队(> 500 镜像):Trivy 集群 + 分布式缓存(Redis),单节点 16C/32G

7. 安全与合规

镜像供应链最佳实践:

# 1. 使用 SBOM(软件物料清单)
trivy image --format cyclonedx --output sbom.json myapp:v1.0

# 2. 验证 SBOM 签名
cosign verify-attestation --key cosign.pub \
  --type cyclonedx registry.example.com/myapp:v1.0

# 3. 存储 SBOM 到 OCI 仓库
cosign attach sbom --sbom sbom.json registry.example.com/myapp:v1.0

合规配置检查表:

合规标准
要求
实现方式
验证命令
CIS Docker 4.1
非 root 用户运行
runAsNonRoot: true kubectl get pod -o yaml | grep runAsNonRoot
CIS Docker 5.7
只读根文件系统
readOnlyRootFilesystem: true kubectl exec pod -- touch /test
(应失败)
NIST 800-190
镜像漏洞扫描(发布前)
CI/CD Trivy 集成
检查 GitLab Artifact
PCI-DSS 6.2
定期修补高危漏洞(30 天内)
CronJob 扫描 + 自动 PR
查看 scan-report.json
SOC 2
镜像签名与访问审计
Cosign + Harbor 审计日志
kubectl logs harbor-core | grep pull
GDPR(间接)
容器运行时数据加密
etcd 加密 + Secret 加密
kubectl get secret -o yaml

RBAC 最小权限配置:

# 仅允许特定 SA 拉取生产镜像
apiVersion:rbac.authorization.k8s.io/v1
kind:Role
metadata:
name:image-puller
namespace:production
rules:
-apiGroups: [""]
resources: ["secrets"]
resourceNames: ["registry-credentials"]
verbs: ["get"]
---
apiVersion:rbac.authorization.k8s.io/v1
kind:RoleBinding
metadata:
name:app-image-puller
namespace:production
subjects:
-kind:ServiceAccount
name:app-sa
namespace:production
roleRef:
kind:Role
name:image-puller
apiGroup:rbac.authorization.k8s.io

审计日志配置(Kubernetes API Server):

# /etc/kubernetes/audit-policy.yaml
apiVersion:audit.k8s.io/v1
kind:Policy
rules:
# 记录所有镜像拉取
-level:Metadata
verbs: ["create"]
resources:
-group:""
resources: ["pods""pods/exec"]
omitStages:
-RequestReceived

# 记录 Secret 访问
-level:RequestResponse
verbs: ["get""list""watch"]
resources:
-group:""
resources: ["secrets"]
resourceNames: ["registry-credentials"]

启用审计日志:

# kube-apiserver 参数(/etc/kubernetes/manifests/kube-apiserver.yaml)
--audit-policy-file=/etc/kubernetes/audit-policy.yaml
--audit-log-path=/var/log/kubernetes/audit.log
--audit-log-maxage=30
--audit-log-maxbackup=10
--audit-log-maxsize=100

# 查询镜像拉取记录
jq 'select(.verb=="create" and .objectRef.resource=="pods") | {time: .requestReceivedTimestamp, user: .user.username, image: .requestObject.spec.containers[].image}' /var/log/kubernetes/audit.log

8. 常见故障与排错

症状
诊断命令
可能根因
快速修复
永久修复
Trivy 扫描超时
trivy image --debug nginx:latest
网络访问 NVD 受限
--skip-db-update
 使用缓存
部署离线数据库镜像
Cosign 验证失败
cosign verify --key cosign.pub image
签名密钥不匹配或镜像 tag 变更
检查 cosign.pub 与签名 key
使用 Digest(sha256)代替 tag
Pod 无法启动(PSA 阻止)
kubectl describe pod <name>
违反 Restricted 策略
临时降级为 Baseline
修改 Deployment SecurityContext
Falco 高 CPU 占用
top -p $(pgrep falco)
事件采样率过高
增加 event_drops.threshold
启用 eBPF 模式替代内核模块
Network Policy 阻断正常流量
kubectl logs <pod> | grep "connection refused"
未添加必要 Egress 规则
临时标记 pod 豁免
更新 NetworkPolicy 白名单
Harbor 扫描队列堆积
curl -u admin:pass harbor.com/api/v2.0/jobs
扫描并发数不足
手动触发扫描 trivy client scan
增加 max_job_workers + 扩容节点
镜像拉取失败(签名验证)
kubectl get events --field-selector reason=Failed
Policy Controller 拒绝未签名镜像
临时禁用 ClusterImagePolicy
签名所有生产镜像
Trivy 误报漏洞(已修复)
trivy image --ignore-unfixed <image>
CVE 已修复但数据库未更新
trivy image --update-db
加入 CVE 白名单(.trivyignore)
Falco 规则误报(合法进程)
kubectl logs -n falco <pod> | grep <rule>
规则匹配范围过宽
临时禁用规则 enabled: false
增加白名单条件(镜像/命名空间)

调试技巧:

# 1. 检查 Trivy 数据库版本
trivy image --list-all-pkgs --format json nginx | jq '.Metadata.ImageID'

# 2. 测试 Cosign 密钥配对
echo"test" | cosign sign-blob --key cosign.key - | \
  cosign verify-blob --key cosign.pub --signature -

# 3. 模拟 Network Policy 测试
kubectl run test-$RANDOM --image=busybox --rm -it --restart=Never -- \
  nc -zv <target-service> <port>

# 4. 查看 Falco 内核模块状态
lsmod | grep falco
dmesg | grep falco

# 5. 验证 Pod Security Admission 级别
kubectl label --dry-run=server ns production pod-security.kubernetes.io/enforce=restricted

9. 变更与回滚剧本

镜像更新灰度发布流程:

# Step 1:扫描新版本镜像
trivy image --severity CRITICAL,HIGH myapp:v2.0 > scan-v2.0.txt
if grep -q "Total: 0" scan-v2.0.txt; then
echo"✓ No critical vulnerabilities"
else
echo"✗ Vulnerabilities found, blocking deployment"
exit 1
fi

# Step 2:签名新版本
cosign sign --key cosign.key registry.example.com/myapp:v2.0

# Step 3:灰度发布(10% 流量)
kubectl set image deployment/myapp app=myapp:v2.0
kubectl patch deployment myapp -p '{"spec":{"strategy":{"rollingUpdate":{"maxSurge":1,"maxUnavailable":0}}}}'
kubectl rollout pause deployment/myapp

# 等待 Canary Pod 启动
kubectl wait --for=condition=ready pod -l app=myapp,version=v2.0 --timeout=300s

# Step 4:健康检查(5 分钟)
for i in {1..10}; do
    ERROR_RATE=$(kubectl logs -l app=myapp,version=v2.0 --tail=100 | grep -c ERROR || true)
if [[ $ERROR_RATE -gt 5 ]]; then
echo"✗ High error rate, rolling back"
        kubectl rollout undo deployment/myapp
exit 1
fi
sleep 30
done

# Step 5:全量发布
kubectl rollout resume deployment/myapp
kubectl rollout status deployment/myapp --timeout=600s

回滚剧本(发现运行时漏洞利用):

#!/bin/bash
# 紧急回滚脚本

set -euo pipefail

NAMESPACE="production"
DEPLOYMENT="myapp"
BACKUP_TAG="last-good-$(date +%Y%m%d)"

# 1. 记录当前版本
CURRENT_IMAGE=$(kubectl get deployment $DEPLOYMENT -n $NAMESPACE -o jsonpath='{.spec.template.spec.containers[0].image}')
echo"Current image: $CURRENT_IMAGE"

# 2. 标记当前版本为 backup
docker tag "$CURRENT_IMAGE""myapp:$BACKUP_TAG"
docker push "myapp:$BACKUP_TAG"

# 3. 回滚到上一个已知良好版本
LAST_GOOD=$(kubectl rollout history deployment/$DEPLOYMENT -n $NAMESPACE | grep -B 1 "last-good" | head -1 | awk '{print $1}')
kubectl rollout undo deployment/$DEPLOYMENT -n $NAMESPACE --to-revision=$LAST_GOOD

# 4. 验证回滚成功
kubectl rollout status deployment/$DEPLOYMENT -n $NAMESPACE --timeout=300s

# 5. 立即扫描被回滚的版本
trivy image --severity CRITICAL,HIGH "$CURRENT_IMAGE" > /tmp/vulnerable-image-report.json

# 6. 发送告警
curl -X POST -H 'Content-type: application/json' \
  --data "{\"text\":\"⚠️ EMERGENCY ROLLBACK: $DEPLOYMENT rolled back from $CURRENT_IMAGE due to security issue. Report: /tmp/vulnerable-image-report.json\"}" \
"$SLACK_WEBHOOK"

配置备份与恢复:

# 备份所有 SecurityContext 和 NetworkPolicy
kubectl get deploy,sts,ds -n production -o yaml > /backup/security-configs-$(date +%Y%m%d).yaml
kubectl get networkpolicy -A -o yaml > /backup/netpol-$(date +%Y%m%d).yaml

# 恢复配置
kubectl apply -f /backup/security-configs-20250101.yaml
kubectl apply -f /backup/netpol-20250101.yaml

10. 最佳实践

  1. 1. 镜像构建:使用多阶段构建 + Distroless/Alpine 基础镜像,减少 70% 漏洞面。
  2. 2. 扫描阈值:CI 阻断 CRITICAL,告警 HIGH,忽略 MEDIUM(除非合规要求)。
  3. 3. 签名验证:生产环境强制启用,开发环境可选;使用 Digest 替代 Tag。
  4. 4. 运行时策略:默认 Restricted PSA + readOnlyRootFilesystem + 非 root 用户。
  5. 5. 网络隔离:Namespace 级别默认拒绝 + 精细化 Egress 白名单。
  6. 6. 扫描频率:CI/CD 每次构建 + 生产环境每日全量扫描 + 关键服务实时监控。
  7. 7. 漏洞响应 SLA:CRITICAL 24 小时修复,HIGH 7 天修复,MEDIUM 30 天评估。
  8. 8. 误报管理:使用 .trivyignore 白名单 + Jira 工单追踪评估结果。
  9. 9. 权限最小化:镜像拉取使用专用 SA + ImagePullSecrets,禁止使用 default SA。
  10. 10. 审计留痕:启用 Kubernetes Audit Log + Harbor 访问日志,保留 90 天以上。

11. 附录

完整 Trivy CI/CD 模板(Jenkins Pipeline):

pipeline {
    agent any
    environment {
        IMAGE_NAME = "myapp"
        IMAGE_TAG = "${env.GIT_COMMIT.take(8)}"
        REGISTRY = "registry.example.com"
    }
    stages {
        stage('Build') {
            steps {
                sh "docker build -t ${REGISTRY}/${IMAGE_NAME}:${IMAGE_TAG} ."
            }
        }
        stage('Security Scan') {
            steps {
                script {
                    sh """
                        trivy image \
                          --severity CRITICAL,HIGH \
                          --exit-code 1 \
                          --format json \
                          --output trivy-report.json \
                          ${REGISTRY}/${IMAGE_NAME}:${IMAGE_TAG}
                    """

                }
            }
            post {
                always {
                    archiveArtifacts artifacts:'trivy-report.json'allowEmptyArchive:false
                }
                failure {
                    slackSend(
color:'danger',
message:"Security scan failed for ${IMAGE_NAME}:${IMAGE_TAG}. Check artifacts."
                    )
                }
            }
        }
        stage('Sign Image') {
            when {
                branch 'main'
            }
            steps {
                withCredentials([file(credentialsId:'cosign-key'variable:'COSIGN_KEY')]) {
                    sh """
                        cosign sign --key ${COSIGN_KEY} ${REGISTRY}/${IMAGE_NAME}:${IMAGE_TAG}
                    """

                }
            }
        }
        stage('Push to Registry') {
            steps {
                sh "docker push ${REGISTRY}/${IMAGE_NAME}:${IMAGE_TAG}"
            }
        }
    }
}

Kubernetes 完整安全 Deployment 模板:

apiVersion:apps/v1
kind:Deployment
metadata:
name:secure-app
namespace:production
labels:
app:secure-app
version:v1.0.0
spec:
replicas:3
selector:
matchLabels:
app:secure-app
template:
metadata:
labels:
app:secure-app
version:v1.0.0
annotations:
# 强制使用已签名镜像
policy.sigstore.dev/signature-required:"true"
spec:
serviceAccountName:secure-app-sa
automountServiceAccountToken:false
securityContext:
runAsNonRoot:true
runAsUser:10000
fsGroup:10000
seccompProfile:
type:RuntimeDefault
containers:
-name:app
image:registry.example.com/myapp@sha256:abc123# 使用 Digest
imagePullPolicy:Always
ports:
-containerPort:8080
name:http
protocol:TCP
securityContext:
allowPrivilegeEscalation:false
readOnlyRootFilesystem:true
runAsNonRoot:true
capabilities:
drop:
-ALL
add:
-NET_BIND_SERVICE
resources:
requests:
cpu:100m
memory:128Mi
limits:
cpu:500m
memory:512Mi
livenessProbe:
httpGet:
path:/healthz
port:8080
initialDelaySeconds:30
periodSeconds:10
readinessProbe:
httpGet:
path:/ready
port:8080
initialDelaySeconds:5
periodSeconds:5
volumeMounts:
-name:tmp
mountPath:/tmp
-name:cache
mountPath:/app/cache
env:
-name:APP_ENV
value:"production"
-name:DB_PASSWORD
valueFrom:
secretKeyRef:
name:app-secrets
key:db-password
volumes:
-name:tmp
emptyDir: {}
-name:cache
emptyDir: {}
imagePullSecrets:
-name:registry-credentials
---
apiVersion:v1
kind:ServiceAccount
metadata:
name:secure-app-sa
namespace:production
automountServiceAccountToken:false

Prometheus 完整告警规则:

groups:
-name:container-security-critical
interval:30s
rules:
-alert:CriticalVulnerabilityInProduction
expr:|
      trivy_vulnerability_count{severity="Critical", namespace="production"} > 0
for:5m
labels:
severity:critical
team:security
annotations:
summary:"Critical vulnerability detected in production"
description:"{{ $labels.image }} has {{ $value }} CRITICAL vulnerabilities"
runbook:"https://wiki.example.com/security/critical-vuln-response"

-alert:UnsignedImageRunning
expr:|
      kube_pod_container_info{image!~".*@sha256:.*", namespace="production"} == 1
for:10m
labels:
severity:warning
annotations:
summary:"Unsigned or tag-based image in production"
description:"Pod {{ $labels.namespace }}/{{ $labels.pod }} uses {{ $labels.image }}"

-alert:FalcoRuntimeThreat
expr:|
      rate(falco_events_total{priority="Critical"}[5m]) > 0
for:1m
labels:
severity:critical
team:sre
annotations:
summary:"Runtime security threat detected by Falco"
description:"Rule: {{ $labels.rule }} triggered in pod {{ $labels.k8s_pod_name }}"

-alert:PrivilegedContainerRunning
expr:|
      kube_pod_container_status_running{} == 1
      and on(namespace, pod, container)
      kube_pod_container_security_context_privileged == 1
for:5m
labels:
severity:high
annotations:
summary:"Privileged container detected"
description:"Container {{ $labels.container }} in {{ $labels.namespace }}/{{ $labels.pod }}"

Trivy 忽略文件(.trivyignore):

# 已评估风险可接受的 CVE(需 JIRA 工单记录)
CVE-2023-12345  # 已验证不影响我们的使用场景(JIRA-1234)
CVE-2024-67890  # 供应商已确认误报(JIRA-5678)

# 临时豁免(2025-12-31 前修复)
CVE-2024-11111  # 等待上游发布补丁(JIRA-9999)

Harbor 批量扫描脚本:

#!/bin/bash
# 扫描 Harbor 所有项目的镜像

HARBOR_URL="https://harbor.example.com"
HARBOR_USER="admin"
HARBOR_PASS="Harbor12345"

# 获取所有项目
PROJECTS=$(curl -s -u $HARBOR_USER:$HARBOR_PASS \
"$HARBOR_URL/api/v2.0/projects" | jq -r '.[].name')

for PROJECT in$PROJECTSdo
echo"Scanning project: $PROJECT"

# 获取项目下所有仓库
    REPOS=$(curl -s -u $HARBOR_USER:$HARBOR_PASS \
"$HARBOR_URL/api/v2.0/projects/$PROJECT/repositories" | jq -r '.[].name')

for REPO in$REPOSdo
# 获取所有 tags
        TAGS=$(curl -s -u $HARBOR_USER:$HARBOR_PASS \
"$HARBOR_URL/api/v2.0/projects/$PROJECT/repositories/${REPO##*/}/artifacts" | \
          jq -r '.[].tags[].name')

for TAG in$TAGSdo
            FULL_IMAGE="$HARBOR_URL/$PROJECT/${REPO##*/}:$TAG"
echo"  Triggering scan: $FULL_IMAGE"

            curl -s -u $HARBOR_USER:$HARBOR_PASS -X POST \
"$HARBOR_URL/api/v2.0/projects/$PROJECT/repositories/${REPO##*/}/artifacts/$TAG/scan"
done
done
done

echo"All scans triggered. Check Harbor UI for results."

测试环境:本文所有命令测试于 2025-10,Ubuntu 22.04 + Kubernetes 1.28 + Trivy 0.48.3 + Cosign 2.2。

文末福利


网络监控是保障网络系统和数据安全的重要手段,能够帮助运维人员及时发现并应对各种问题,及时发现并解决,从而确保网络的顺畅运行。

谢谢一路支持,给大家分享6款开源免费的网络监控工具,并准备了对应的资料文档,建议运维工程师收藏(文末一键领取)。

备注:【监控合集】

图片

100%免费领取


一、zabbix

图片
图片

二、Prometheus


图片

内容较多,6款常用网络监控工具(zabbix、Prometheus、Cacti、Grafana、OpenNMS、Nagios不再一一介绍, 需要的朋友扫码备注【监控合集】,即可100%免费领取。

图片

 以上所有资料获取请扫码

备注:【监控合集】

图片

100%免费领取

(后台不再回复,扫码一键领取)


【声明】内容源于网络
0
0
外贸达人Cici
跨境分享阁 | 每天提供跨境参考
内容 45831
粉丝 0
外贸达人Cici 跨境分享阁 | 每天提供跨境参考
总阅读259.9k
粉丝0
内容45.8k