代码智能体云端沙盒工程落地方案（Claude Agent SDK + Coding Skill）

原创灵阙教研团队

S 精选进阶 | 约 8 分钟阅读更新于 2026-01-05

AI 导读

代码智能体云端沙盒工程落地方案 Claude Agent SDK + Coding Skill 目标：在云端提供安全、可复现、可观测的“代码执行/修改”沙盒环境，让智能体能读写仓库、运行命令、生成补丁并交付结果；能力基于 Claude Agent SDK，并用 Skill（SKILL.md）封装 coding 能力。最小权限 allowed_tools Runner 强隔离执行 Skill...

代码智能体云端沙盒工程落地方案 Claude Agent SDK + Coding Skill

目标：在云端提供安全、可复现、可观测 的“代码执行/修改”沙盒环境，让智能体能读写仓库、运行命令、生成补丁并交付结果；能力基于 Claude Agent SDK，并用 Skill（SKILL.md）封装 coding 能力。

最小权限 allowed_tools Runner 强隔离执行 Skill 文件系统加载审计 + 可观测多租户可演进

1. 关键事实与约束（官方要点）

Claude Agent SDK 提供与 Claude Code 类似的“工具调用 + agent loop + 上下文管理”，可用 Python/TypeScript 编写。
Agent Skills 通过文件系统工件存在：在 .claude/skills/<skill-name>/SKILL.md 定义；SDK 不提供编程式注册 API，需要通过 setting_sources=["user","project"]（Python）/ settingSources（TS）从文件系统加载，并在 allowed_tools 中启用 "Skill"。
官方开源仓库（Python/TypeScript）可作为工程依赖与版本基线。
工程原则：生产级 agent 需要最小权限、可观测、可回放的工程化设计。

重点：Skill 是“文件系统加载”，不是运行时注册；工程上要确保 repo 工作区里带着 .claude/skills，并在 SDK options 里启用。

2. 总体架构（控制面 + 执行面 + 资源面）

2.1 控制面（Control Plane）

API Gateway + Auth（JWT/OAuth、组织/项目隔离、RBAC）
Job Orchestrator（下发任务、排队、超时/重试、状态机）
Audit & Policy（谁触发了什么、用了哪些工具、改了哪些文件）

2.2 执行面（Execution Plane）

Agent Service（运行 Claude Agent SDK，推理与工具编排）
Sandbox Runner（执行命令/读写文件/运行测试的沙盒执行器）
Artifact Store（diff、日志、测试报告、构建产物）

2.3 资源面（Resources）

Git Provider（GitHub/GitLab/自建 Gitea）
缓存：依赖缓存（pip/npm/maven）、构建缓存（ccache、bazel cache）
Secrets：Vault/KMS（短期 token、最小权限）

3. 沙盒隔离方案选型（按“不可信代码”标准）

你的场景通常属于“执行不可信/半可信代码 + 允许文件操作（可能还有网络）”，隔离强度要高于普通 CI。

方案	优点	风险/成本	适用阶段
Docker + seccomp/AppArmor（最快）	成熟、成本低、易运维	容器逃逸风险相对更高；多租户需极谨慎	MVP / 单租户
gVisor/krun（用户态内核）	比纯 Docker 更强隔离；K8s 结合方便	兼容性/性能可能受影响	多租户过渡
Kata / Firecracker MicroVM（强隔离）	接近 VM 级隔离，适合多租户与不可信执行	镜像/启动/缓存/调度更复杂	生产多租户

推荐路线：MVP 用 Docker + 强约束（rootless、只读根 FS、禁特权、严格资源限额、默认断网），再逐步迁移到 gVisor 或 Kata/Firecracker。

4. 执行模型：工具代理（Agent）与沙盒（Sandbox）解耦

4.1 为什么要解耦

Agent Service 负责“思考与决策”：选择工具、规划修改、生成补丁。
Sandbox Runner 负责“执行与隔离”：Bash、文件读写、测试运行。
不可信执行面收敛到单一安全边界；审计更完整；扩缩容更简单。

4.2 推荐工具调用形态

SDK 侧保持工具语义（Read/Write/Bash/Skill）。
落地时：把工具执行映射到 Runner API（Agent Service 不直接在宿主机执行命令）。

强建议：Runner 层必须做“路径白名单 + 大小上限 + 超时 + 资源限额”，不要仅依赖 Prompt/Skill 约束。

5. “Coding Skill”封装方案（用 SKILL.md 模块化编码能力）

5.1 目录结构（技能与工程同仓）

repo/
  .claude/
    skills/
      coding/
        SKILL.md
        templates/
        scripts/
  src/...

5.2 SKILL.md 的职责

规定何时调用此 skill（description）
给出严格工作流：如何改、如何测、如何输出补丁
声明安全边界：禁泄露 secrets、默认无外网等（策略以 runner 落地为准）

5.3 示例：.claude/skills/coding/SKILL.md

---
name: coding
description: |
  When the user asks to implement, refactor, debug, or test code in this repository,
  use this skill to plan changes, modify files safely, run tests, and return a patch/diff.
---

## Operating Rules
- Always read relevant files before editing.
- Prefer minimal, reversible changes.
- After changes, run the smallest relevant test command.
- Produce final output as:
  1) Summary
  2) Commands executed + results
  3) Git diff / patch
  4) Follow-ups / risks

## Allowed Workflow
1) Inspect repo structure and locate entry points.
2) Apply edits with clear rationale.
3) Run format/lint if available.
4) Run unit/integration tests if available.
5) If tests fail, revert or iterate.

## Safety
- Do not exfiltrate secrets or print environment variables.
- Do not access the network unless explicitly allowed by policy.
- Never run destructive commands outside the workspace.

6. Agent Service（Claude Agent SDK）落地方式

6.1 关键配置点

cwd 指向工作区（挂载 repo）
setting_sources=["project","user"] 加载 .claude/skills
allowed_tools 最小化开放（建议仅 Skill + Read/Write/Bash + 你确实需要的）
工具适配层：把 Read/Write/Bash 等请求转发到 Runner API

6.2 Python 侧伪代码骨架（展示集成点）

import asyncio
from claude_agent_sdk import query, ClaudeAgentOptions

async def run_agent(prompt: str, workspace_path: str):
    options = ClaudeAgentOptions(
        cwd=workspace_path,
        setting_sources=["project"],  # load .claude/skills from repo
        allowed_tools=["Skill", "Read", "Write", "Bash"],  # minimize in production
    )

    async for msg in query(prompt=prompt, options=options):
        # Stream messages; tool calls should be routed to your runner
        print(msg)

asyncio.run(run_agent("Fix failing tests in this repo", "/workspaces/job-123"))

工程上：不要让 Agent Service 直接在宿主机执行 bash；统一走 Runner 做隔离与审计。

7. Sandbox Runner 设计（安全、资源、可观测）

7.1 Runner 对外 API（建议）

POST /jobs/{id}/exec：执行命令（cwd 固定、超时、资源限额）
POST /jobs/{id}/read：读取文件（路径白名单、大小上限）
POST /jobs/{id}/write：写入文件（路径白名单、审计记录）
POST /jobs/{id}/diff：生成 git diff
POST /jobs/{id}/artifacts：上传测试报告/日志

7.2 Runner 内部安全策略（必须做）

资源限额：CPU/内存/磁盘/进程数/打开文件数
超时：单命令超时 + 单 job 总时长上限
文件系统：工作区可写；其余只读；禁止挂载宿主敏感目录
权限：rootless；drop capabilities；禁特权容器
网络：默认无外网；如需依赖下载，用代理 + 域名白名单 + 流量审计
审计：命令、exit code、stdout/stderr（截断）、耗时、trace id

8. 数据流与序列图（从任务到补丁）

下方是 Mermaid 代码块（离线环境会以文本显示；如需渲染，请按页首注释引入 Mermaid）。

sequenceDiagram
  participant U as User
  participant API as Control API
  participant OR as Orchestrator
  participant AG as Agent Service (Claude Agent SDK)
  participant RN as Sandbox Runner
  participant ST as Artifact Store

  U->>API: Create coding task (repo + prompt)
  API->>OR: enqueue(job)
  OR->>RN: provision sandbox (workspace mount)
  OR->>AG: start agent session (cwd=workspace, skills=project)
  AG->>RN: Read/Write/Bash tool calls (via adapter)
  RN-->>AG: tool results (stdout/stderr/files)
  AG->>RN: request git diff + test report
  RN->>ST: upload logs/artifacts/diff
  OR-->>API: job completed (links)
  API-->>U: summary + diff + artifacts

9. 工程目录与部署形态（K8s 参考实现）

9.1 Repo（基础设施 + 服务）

infra/
  terraform/
  helm/
services/
  control-api/
  orchestrator/
  agent-service/
  sandbox-runner/
shared/
  proto-or-openapi/
  policy/
  observability/

9.2 K8s 部署建议

agent-service：Deployment（无状态，水平扩展）
sandbox-runner：DaemonSet 或 NodePool（靠近隔离运行时）
orchestrator：Deployment + 队列（Redis/NATS/Kafka）
artifact store：S3/MinIO
ingress：API Gateway

10. 观测与评估（上线后能“看见”智能体）

10.1 必备日志

每轮对话：prompt、模型输出（注意脱敏）
每次工具调用：tool name、参数摘要、耗时、返回摘要
文件变更：变更文件列表、diff（可配置保存策略）

10.2 质量评估

离线回放：固定 repo + 固定任务集（regression）
指标：任务成功率（tests/build pass）、迭代次数（tool calls）、人工介入率、耗时与成本

11. 落地里程碑（按交付物拆分）

Phase 1：MVP（单租户/可信 repo）

Docker runner + 严格资源限制
Agent Service 接通 Claude Agent SDK
Coding skill 生效：能读写/跑测试/产出 diff

Phase 2：多租户与安全加固

runner 切换 gVisor 或 Kata（更强隔离）
网络白名单与依赖代理
完整审计、配额、计费/限流

Phase 3：产品化

Web UI：查看会话、日志、diff、一键应用补丁
PR 自动化：可选（由控制面代提交）
企业级：SSO、组织策略、私有网络、镜像仓库对接

12. 最小可用配置清单

[ ] .claude/skills/coding/SKILL.md（如上模板）
[ ] Agent Service：启用 setting_sources=["project"] + allowed_tools=["Skill","Read","Write","Bash"]
[ ] Runner：rootless + 只读根 fs + 资源限额 + 超时 + 网络默认阻断
[ ] 产物：diff + 执行日志 + 测试报告（统一存储）
[ ] 审计：工具调用链路可追踪（trace id）

提示：如果你要我把它进一步“工程化成可跑的脚手架”，我可以按你现有技术栈（K8s/非K8s、云厂商、Runner 选型、是否允许外网）给出更贴近落地的目录、接口与配置样例。

参考（建议你放到内部文档/README 的末尾）

Claude Agent SDK 概览（SDK 能力与定位）
Skills 在 SDK 中的使用方式（SKILL.md、setting_sources、allowed_tools 约束）
Claude Agent SDK（Python/TypeScript）开源仓库
Anthropic 工程文章：构建 Agent 的原则与最佳实践方向

注：以上“参考”在你内部落地时建议替换为你们合规允许的链接/镜像地址，并固化版本（tag/commit）以便可复现。