AI 视频智能体产品级白皮书

SOTA Video Agent Blueprint:从一句话到电影级成片的可回放制作系统。 核心由多智能体剧组三圣经与锚点一致性No-Rollback 版本化QC 归因修复闭环预算自适应调度共同构成。
Core: Planner Orchestrator Pipeline: Brief → Story → Shot → Assets → Edit → QC → Publish Deterministic: Remotion/Code for Text & UI Artifacts: Contract-First JSON

0. 摘要

本白皮书提出一套面向“电影级体验”的 AI 视频智能体(Video Agent) 产品架构:系统不以“单模型生成视频”为目标, 而以“可控制作(Production)”为核心——把用户意图转化为可执行的影视制作流水线,并通过 多智能体剧组分工、契约化工具调用、不可变资产版本化(No-Rollback)、质量闭环(QC→归因→自动修复) 实现 高一致性、高稳定性、可迭代、可扩展、可回放 的 SOTA 体验。

核心结论:真正的 SOTA 来自系统工程——“制作系统”>“生成模型”。你交付的是可解释、可修复、可复刻的成片流水线。

1. 体验定义与成功指标

1.1 体验定义:用户要的不是“生成”,而是“交付”

  • 更像一个制作团队:能规划、能解释、能修复、能复刻
  • 交付稳定:失败可自动恢复,不靠人类重复写 prompt
  • 可控可调:质量/成本/速度可明确取舍
  • 可复用:角色/品牌/风格长期资产化

1.2 核心指标(必须可测)

语义对齐
脚本要点覆盖率 / intent 命中
一致性
角色/世界观/调色/品牌漂移率
稳定性
一次成功率、平均重试、自动修复率
可控性
风格/节奏/比例/字幕语言稳定生效
成本效率
每分钟成本、缓存复用率
时延体验
首屏预览时间、最终交付时间

2. 产品形态与用户旅程

2.1 三档模式(同一底座,不同暴露程度)

  • One-Click(默认):一句话 → 30 秒内给草稿 + 3 个风格候选
  • Pro(可控):展示“导演计划”:脚本/镜头表/风格圣经/预算滑杆(快/稳/贵)
  • Studio(可编排):可编辑 shotlist、模板系统、资产库复用、品牌包、多语言配音、批量生成

2.2 “导演计划”是信任的关键

每次生成前输出:Plan(怎么做)+ Trade-off(质量/成本/速度)+ Failover(失败如何自动修复)。

3. 端到端流水线:从 Brief 到 Publish

3.1 总览:制作而非生成

BriefStoryShotAssetsEditQCPublish

3.2 产物清单(Artifacts)

  • brief.json:用户意图与约束(时长、比例、受众、禁忌、品牌)
  • script.md / script.json:旁白/对白/信息密度/情绪曲线
  • style_bible.json:风格圣经(色彩、镜头语言、光照、材质、字体)
  • character_bible.json:角色圣经(禁改项、服装、动作模板)
  • world_bible.json:世界观圣经(场景规律、道具、氛围)
  • shotlist.json:镜头表(逐镜头可执行规格)
  • edit_manifest.json:剪辑清单(等价 EDL / Remotion props)
  • qc_report.json:质量报告(打分、归因、修复建议)
  • final.mp4:最终成片 + final_meta.json

4. 多智能体“电影剧组”架构

4.1 角色分工(每个 agent 输出可评估产物)

Producer(制片)
预算/时延/风险;并发与降级策略
Director(导演)
叙事与镜头语言;风格总控
Scriptwriter(编剧)
脚本、旁白、情绪曲线、信息密度
Storyboarder(分镜)
脚本 → shotlist(结构化 JSON)
Art Director(美术)
三圣经(Style/Character/World)与锚点策略
Runner(出片执行)
镜头级调用模型生成 clips
Editor(剪辑)
节奏、转场、字幕模板、踩点对齐
QC Inspector(质检)
打分、归因、触发自动修复与重试

4.2 协作原则

  • Contract-First:所有 agent 通过 JSON 契约对齐,不靠“猜”
  • Deterministic Assembly:字幕/图形/UI 由代码渲染,避免 AI 文字模糊抖动
  • No-Rollback:失败不覆盖旧资产,只增量版本,保证可追溯与可复刻

5. 一致性体系:Anchor + Bible 是上限

5.1 三圣经(Bibles)

  • Style Bible:色彩、对比度、颗粒感、镜头语言、光照规则、字体与排版、安全区
  • Character Bible:面部特征、发型、服饰套装、体态、表情范围、禁改项
  • World Bible:场景资产、道具清单、时代风格、物理规则、环境氛围(雨/雾/尘)

5.2 锚点(Anchors)分层

  • 角色锚点:三视图 + 关键表情/动作关键帧
  • 场景锚点:场景定调图 + LUT/调色规则
  • 道具锚点:关键道具外观锁定
  • 音色锚点(可选):旁白声线与情绪基线
一致性优先:先锚定(Anchor)→ 再驱动(Drive)→ 再组装(Assemble)。

6. 镜头表(Shotlist)规范:系统可执行的核心文档

6.1 Shotlist 最小字段(示例)

复制
{
  "shot_id": "S03",
  "duration_sec": 4.0,
  "type": "character|world|vfx|ui",
  "intent": "表达关键卖点/情绪转折/信息点",
  "prompt": {
    "visual": "...",
    "motion": "...",
    "camera": "35mm, dolly-in, shallow DOF",
    "lighting": "soft key, warm rim",
    "style_refs": ["style_bible:v1"],
    "anchors": ["char_anchor:v2", "scene_anchor:v1"]
  },
  "audio": {
    "voiceover": "旁白文本",
    "sfx": ["whoosh_soft"],
    "music_cue": "beat@12.5"
  },
  "subtitle": {
    "text": "字幕文本",
    "template": "kinetic_01",
    "safe_area": true
  },
  "quality_target": {
    "min_score": 0.82,
    "critical": ["identity", "readability"]
  }
}

6.2 镜头分级(成本/质量自适应)

  • S 级:主镜头(产品核心/人物特写)→ 更强模型 + 更多采样 + 更严 QC
  • A 级:叙事推进镜头 → 平衡成本
  • B 级:转场/氛围/B-roll → 便宜模型或素材库复用

7. 后期与组装:Remotion 作为确定性渲染引擎

7.1 为什么必须代码渲染字幕/图形

  • AI 生成文字易模糊、抖动、错别字、布局不可控
  • 工程渲染确保 清晰度、可读性、安全区合规、品牌一致
  • Remotion/FFmpeg 负责最终合成、响度归一化、多码率封装

7.2 编辑清单(Edit Manifest)

每个 clip 的入点/出点、转场、字幕时间轴、UI overlay、BGM 对齐点 → 输出 edit_manifest.json 一键渲染复现。

8. 质量闭环(QC):SOTA 稳定性的来源

8.1 质量维度(建议至少 8 项)

语义对齐
镜头是否表达脚本要点
角色一致性
脸/发型/服装/体型漂移
画面稳定
闪烁、形变、鬼影
运动合理
物理/姿态/口型
字幕可读
遮挡、安全区、断行
节奏
平均时长、停顿、信息密度
音画一致
旁白匹配、踩点
品牌一致
色彩/字体/Logo 规则

8.2 QC 输出(qc_report.json)

  • 每镜头分数 + 总分
  • 失败归因标签:identity_drift / flicker / subtitle_occlusion / off_brief
  • 修复建议:自动生成修复策略与重试参数

8.3 失败归因 → 自动修复矩阵(核心)

失败类型典型症状自动修复动作
identity_drift 角色脸漂移 回到角色锚点重采样;提高锚点权重;限制服装/发型
flicker/warp 闪烁/形变 更换参数;缩短镜头;转 B-roll;后处理去闪烁
off_brief 与意图不符 重写该镜头 intent + prompt;替换镜头类型
subtitle_occlusion 字幕遮挡主体 Remotion 模板自动换位 + 智能避让主体
pacing_bad 节奏不合理 剪短/重排/加 B-roll;音乐点对齐
audio_mismatch 音画不匹配 重写旁白或替换镜头;重新踩点
没有自动归因与修复,就没有稳定交付。

9. No-Rollback 不可变版本化:可回放与可审计

9.1 版本化原则

  • 所有资产不可变:写入 artifacts/{job_id}/v{n}/
  • 失败只增量版本,不覆盖旧文件
  • 任何成片必须能通过 edit_manifest.json + assets 一键重放(Replayable)

9.2 目录建议

artifacts/job_001/
  brief.json
  style_bible/v1.json
  character_bible/v2.json
  shotlist/v3.json
  assets/
    anchors/...
    clips/...
    audio/...
  v0001/
    edit_manifest.json
    qc_report.json
    render_log.txt
    final.mp4
  v0002/ ...

10. 调度与成本控制:Budget-Aware Scheduler

10.1 调度目标

  • 首版要快:优先产出可预览草稿
  • 高价值镜头要稳:S/A/B 分级资源倾斜
  • 失败要可控:重试次数上限;必要时明确降级

10.2 关键策略

  • 并发:镜头级并行(按预算控制并发数)
  • 缓存:角色/场景锚点、LUT、字幕模板、音乐段落复用
  • 降级:主镜头失败 → 重试+约束增强;次要镜头失败 → 素材库/B-roll/静帧动效替代
  • 预算滑杆:fast/balanced/premium 映射到采样次数、模型选择、QC 阈值、重试上限

11. 安全与合规

  • 版权:素材来源标注与可追溯;避免直接复刻受保护作品
  • 肖像/商标:明确用户授权;品牌包与 Logo 使用规则可配置
  • 内容安全:敏感内容检测;地域/行业合规策略
  • 审计:关键决策与生成参数记录到 job 日志(便于复盘与风控)

12. 落地路线图(MVP → Beta → Studio)

12.1 MVP(2–4 周)

  • One-Click:一句话 → 脚本 → shotlist → 6–10 镜头 → Remotion 合成
  • 基础 No-Rollback + 产物落盘
  • 基础 QC:角色漂移、闪烁、字幕遮挡三类
验收
首版预览 < 60s;一次成功率 > 60%

12.2 Beta(4–8 周)

  • Pro 模式:可见导演计划 + 预算滑杆
  • 三圣经体系 + 锚点复用
  • QC 扩展到 8 维 + 自动修复矩阵
验收
一次成功率 > 80%;平均重试 < 1.5;一致性漂移显著下降

12.3 Studio(8–12 周)

  • 工作台:脚本/镜头表/圣经/资产库/版本对比
  • 团队协作、批量生成、品牌包管理
  • 可回放发布:任意版本一键复刻
验收
同一角色/品牌连续 10 条视频一致性稳定;规模化生产

13. 附录:契约与模板

本附录给出工程可落地的 Schema、模板包、QC 算法草案与自动修复 Patch 规范。

A. Schema 规范(核心文件)

brief.json(用户意图与约束) 复制
{
  "job_id": "job_20251224_0001",
  "request": {
    "prompt": "一句话需求原文",
    "goal": "promo|edu|demo|drama|report|mashup",
    "audience": "general|professional|teen|enterprise",
    "tone": "premium|fun|serious|warm|energetic",
    "language": "zh-CN",
    "duration_sec": 45,
    "aspect_ratio": "9:16|16:9|1:1|4:3",
    "platform": "douyin|bilibili|youtube|xiaohongshu"
  },
  "constraints": {
    "must_have": ["出现产品Logo", "强调卖点A"],
    "must_not": ["血腥", "特定敏感词"],
    "brand_pack": "brand_x_v3",
    "music_style": "cinematic|lofi|upbeat",
    "subtitle": { "enabled": true, "style": "kinetic_01" },
    "voiceover": { "enabled": true, "speaker": "female_01", "emotion": "confident" }
  },
  "budget": {
    "mode": "fast|balanced|premium",
    "max_retries_per_shot": 2,
    "max_total_cost": 8.0,
    "deadline_sec": 120
  }
}
style_bible.json(风格圣经) 复制
{
  "version": "v1",
  "look": {
    "palette": { "primary": "#8b5cf6", "bg": "#09090b", "text": "#e4e4e7" },
    "contrast": "medium-high",
    "grain": "subtle",
    "lut": "teal_orange_soft"
  },
  "cinematography": {
    "lens": ["35mm", "50mm"],
    "camera_moves": ["dolly-in", "slow-pan"],
    "do_not": ["handheld_shaky", "fisheye"]
  },
  "lighting": { "key": "soft", "temperature": "warm", "rim": true },
  "typography": {
    "font_zh": "NotoSansSC",
    "font_en": "Inter",
    "subtitle_safe_area": true
  },
  "composition": { "subject_rule": "center-third", "headroom": "medium" }
}
character_bible.json(角色圣经) 复制
{
  "version": "v2",
  "characters": [
    {
      "id": "char_01",
      "name": "主角",
      "anchors": {
        "sheet": "assets/anchors/char_01_sheet_v2.png",
        "expressions": ["assets/anchors/char_01_smile.png"]
      },
      "lock": { "hair": true, "outfit": true, "face": true, "body": true },
      "outfits": ["purple_jacket_v1"],
      "do_not": ["change_gender", "change_age_group", "tattoos"]
    }
  ]
}
shotlist.json(镜头表) 复制
{
  "version": "v3",
  "global": { "fps": 30, "style_bible": "style_bible/v1.json", "audio_bpm": 120 },
  "shots": [
    {
      "shot_id": "S01",
      "grade": "S",
      "duration_sec": 4.0,
      "type": "character",
      "intent": "开场建立主角与场景",
      "inputs": {
        "char_id": "char_01",
        "anchors": ["assets/anchors/char_01_sheet_v2.png"],
        "scene_anchor": "assets/anchors/scene_citynight_v1.png"
      },
      "gen": {
        "prompt": {
          "visual": "cinematic night city, neon, ...",
          "motion": "walk toward camera, confident",
          "camera": "35mm, dolly-in, shallow DOF"
        },
        "model_policy": { "preferred": ["kling_2_6_pro"], "fallback": ["veo_3_1"] },
        "sampling": { "n": 2, "seed_policy": "locked_after_pass" }
      },
      "audio": { "voiceover": "一句旁白", "sfx": ["whoosh_soft"], "music_cue": "beat@0.0" },
      "subtitle": { "text": "字幕", "template": "kinetic_01" },
      "quality_target": { "min_score": 0.84, "critical": ["identity", "readability"] }
    }
  ]
}
edit_manifest.json(剪辑清单 / Remotion props) 复制
{
  "fps": 30,
  "resolution": { "w": 1080, "h": 1920 },
  "timeline": [
    {
      "asset": "assets/clips/S01_take2.mp4",
      "start_frame": 0,
      "duration_frames": 120,
      "subtitle": { "text": "字幕", "template": "kinetic_01", "pos": "auto_avoid_subject" },
      "overlays": [{ "type": "logo", "asset": "brand/logo.png", "pos": "top_right" }]
    }
  ],
  "audio": {
    "music": "assets/audio/bgm.mp3",
    "voiceover": "assets/audio/vo.wav",
    "mix": { "ducking": true, "loudness_target_lufs": -14 }
  }
}
qc_report.json(质量报告) 复制
{
  "overall_score": 0.86,
  "gates": { "pass": true, "critical_fail": [] },
  "shot_scores": [
    {
      "shot_id": "S01",
      "score": 0.88,
      "metrics": {
        "on_brief": 0.9,
        "identity": 0.92,
        "stability": 0.8,
        "readability": 0.95,
        "audio_match": 0.85
      },
      "issues": [],
      "repair_suggestions": []
    }
  ],
  "summary": { "retries_used": 1, "cost_est": 2.1, "latency_sec": 58 }
}

B. 模板包(4 套,可直接跑)

short_edu_9x16_v1(短视频科普) 复制
{
  "template_id": "short_edu_9x16_v1",
  "defaults": {
    "request": { "goal": "edu", "duration_sec": 40, "aspect_ratio": "9:16", "tone": "energetic" },
    "constraints": { "subtitle": { "enabled": true, "style": "kinetic_02" }, "voiceover": { "enabled": true, "emotion": "confident" } },
    "budget": { "mode": "balanced", "max_retries_per_shot": 1, "deadline_sec": 90 }
  },
  "shot_pattern": [
    { "id": "Hook", "type": "ui", "sec": 3, "grade": "A" },
    { "id": "Point1", "type": "ui", "sec": 7, "grade": "A" },
    { "id": "Broll1", "type": "world", "sec": 4, "grade": "B" },
    { "id": "Point2", "type": "ui", "sec": 7, "grade": "A" },
    { "id": "Broll2", "type": "world", "sec": 4, "grade": "B" },
    { "id": "Point3", "type": "ui", "sec": 7, "grade": "A" },
    { "id": "CTA", "type": "ui", "sec": 6, "grade": "A" }
  ],
  "qc_gate": { "hard": ["readability", "on_brief", "safe_area"], "soft": ["stability"] }
}
brand_film_16x9_v1(品牌宣传片) 复制
{
  "template_id": "brand_film_16x9_v1",
  "defaults": {
    "request": { "goal": "promo", "duration_sec": 55, "aspect_ratio": "16:9", "tone": "premium" },
    "constraints": { "subtitle": { "enabled": false }, "voiceover": { "enabled": true, "emotion": "warm" } },
    "budget": { "mode": "premium", "max_retries_per_shot": 2, "deadline_sec": 180 }
  },
  "shot_pattern": [
    { "id": "MoodOpen", "type": "world", "sec": 8, "grade": "A" },
    { "id": "HeroShot", "type": "character", "sec": 6, "grade": "S" },
    { "id": "Value1", "type": "world", "sec": 7, "grade": "A" },
    { "id": "Value2", "type": "world", "sec": 7, "grade": "A" },
    { "id": "Proof", "type": "ui", "sec": 8, "grade": "A" },
    { "id": "Closing", "type": "ui", "sec": 6, "grade": "A" },
    { "id": "EndCard", "type": "ui", "sec": 3, "grade": "A" }
  ],
  "qc_gate": { "hard": ["brand_consistency", "stability", "on_brief"], "soft": ["audio_match"] }
}
product_demo_ui_v1(产品功能讲解) 复制
{
  "template_id": "product_demo_ui_v1",
  "defaults": {
    "request": { "goal": "demo", "duration_sec": 70, "aspect_ratio": "16:9", "tone": "serious" },
    "constraints": { "subtitle": { "enabled": true, "style": "clean_lowerthird" }, "voiceover": { "enabled": true, "emotion": "neutral" } },
    "budget": { "mode": "balanced", "max_retries_per_shot": 1, "deadline_sec": 150 }
  },
  "shot_pattern": [
    { "id": "IntroUI", "type": "ui", "sec": 6, "grade": "A" },
    { "id": "Step1", "type": "ui", "sec": 12, "grade": "A" },
    { "id": "Step2", "type": "ui", "sec": 12, "grade": "A" },
    { "id": "Step3", "type": "ui", "sec": 12, "grade": "A" },
    { "id": "Broll", "type": "world", "sec": 6, "grade": "B" },
    { "id": "Summary", "type": "ui", "sec": 10, "grade": "A" },
    { "id": "CTA", "type": "ui", "sec": 12, "grade": "A" }
  ],
  "qc_gate": { "hard": ["readability", "safe_area", "on_brief"], "soft": ["stability"] }
}
micro_drama_character_v1(微剧情) 复制
{
  "template_id": "micro_drama_character_v1",
  "defaults": {
    "request": { "goal": "drama", "duration_sec": 60, "aspect_ratio": "9:16", "tone": "warm" },
    "constraints": { "subtitle": { "enabled": true, "style": "dialogue_bubble" }, "voiceover": { "enabled": false } },
    "budget": { "mode": "premium", "max_retries_per_shot": 3, "deadline_sec": 240 }
  },
  "shot_pattern": [
    { "id": "Setup", "type": "character", "sec": 6, "grade": "S" },
    { "id": "Beat1", "type": "character", "sec": 8, "grade": "S" },
    { "id": "Reaction", "type": "character", "sec": 6, "grade": "S" },
    { "id": "Beat2", "type": "character", "sec": 10, "grade": "S" },
    { "id": "Turn", "type": "vfx", "sec": 6, "grade": "A" },
    { "id": "Resolve", "type": "character", "sec": 12, "grade": "S" },
    { "id": "End", "type": "ui", "sec": 12, "grade": "A" }
  ],
  "qc_gate": { "hard": ["identity", "stability", "readability"], "soft": ["on_brief"] }
}

C. QC 打分算法草案(MVP 可实现)

原则:先 Hard Gate(挡灾难),再 Soft Score(排序/优化)。MVP 不追求完美理解,追求“能归因、能修复、能稳定交付”。

C1. 指标与计算(要点)

  • readability:字幕安全区/遮挡主体(字幕框与主体框 IoU)
  • identity:锚点人脸 embedding 与关键帧相似度(min(sim))
  • stability:相邻帧 SSIM/LPIPS 或光流一致性(平均)
  • on_brief:关键帧 caption 与 intent embedding 相似度
  • safe_area:规则校验(越界硬失败)
  • audio_match(Beta):ASR 文本对齐 + beat 对齐误差

C2. 总分聚合(示例)

overall = 0.25*on_brief + 0.25*identity + 0.20*stability + 0.20*readability + 0.10*audio_match

D. 自动修复 Patch 规范(可执行)

修复 = 对 shotlist 的局部补丁(patch)。保证可回放、可审计。

复制
{
  "shot_id": "S01",
  "reason": "identity_drift",
  "actions": [
    { "op": "set", "path": "gen.sampling.n", "value": 3 },
    { "op": "set", "path": "gen.sampling.seed_policy", "value": "unlocked" },
    { "op": "append", "path": "gen.prompt.visual", "value": "keep same face, same outfit, consistent character identity" },
    { "op": "set", "path": "quality_target.min_score", "value": 0.86 }
  ]
}

E. 端到端 Job 示例(最小闭环)

artifacts/job_20251225_0001/
  brief.json
  template.json
  style_bible/v1.json
  shotlist/v1.json
  assets/
    anchors/char_01_sheet_v1.png
    clips/S01_take1.mp4
    clips/S02_take1.mp4
    audio/vo.wav
    audio/bgm.mp3
  v0001/
    edit_manifest.json
    qc_report.json
    render_log.txt
    final.mp4
MVP 执行顺序:brief+template → 生成 bibles+shotlist → 并行出镜头 → 镜头级 QC(失败→patch→重试)→ edit_manifest → Remotion 渲染 → 最终 QC → 发布。