Slide Deck 生成器(HTML)落地文档

图像模型 NanoBanana Pro、Image 1.5 推理模型 Gemini 3 Pro、Claude Opus 4.5、GPT-5.2 Thinking
目标:在仅使用上述模型的前提下,复现“可适配任意风格 + 排版稳定出色”的工程逻辑,输出 HTML(可导出 PDF / 可选导出 PPTX)。
核心:风格 tokens排版引擎 解耦;LLM 只负责结构化规划与修复建议;像素级布局由确定性引擎完成。

1) 目标与约束

目标(Must)
  • 任意风格适配:通过 Theme Tokens +(可选)brandbook/样例 deck 抽取实现换肤
  • 排版稳定:不溢出、不重叠、对齐/留白一致;Presenter/Detailed 两种密度
  • 可解释/可修复:每次失败输出 issues,进入自动修复回路
  • 模型约束:仅使用指定 5 个模型(3 个推理 + 2 个图像)
边界(Not now)
  • 不依赖“模型直接给像素坐标”的自由生成(不可控)
  • 不把关键文字生成到图片里(不可检索/不可编辑/跨语言更差)
  • 不引入额外第三方生成模型(严格遵守模型白名单)

设计原则(强制)

  • P1 LLM 输出必须是 结构化 JSON,通过 schema 校验后才能进入下一阶段
  • P2 布局由 确定性排版引擎 完成,支持文字测量与溢出回退
  • P3 图像只做“背景/插画/装饰/图标”,不承载正文信息
  • P4 Repair loop:规则优先,LLM 仅负责“改写/拆页/重组要点”

2) 总体架构与数据流

Inputs (sources + user prompt + optional brandbook/sample deck)
        |
        v
[1] Planner (LLM) ---------> Slide Outline IR (v0)
        |
        v
[2] Style Builder (LLM+rules) -> Theme Tokens
        |
        v
[3] Composer (rules+LLM opt) -> Component Tree IR (v1)
        |
        v
[4] Layout Engine (deterministic) -> Layout IR (bboxes + line breaks)
        |
        v
[5] Renderer (HTML) -> deck.html
        |
        v
[6] Validators (rules) -> issues
        |
        v
Repair Loop:
  - rule fixes (fast) OR
  - LLM rewrite/split/template swap
        |
        v
Re-layout -> Re-validate -> Done / Fallback
实现要点:“任意风格”靠 Tokens;“排版优秀”靠模板/约束/测量/回退;两者完全解耦。

3) 模型分工与路由策略

3.1 推理模型职责

模型推荐负责输出特征失败回退
GPT-5.2 Thinking 复杂规划/拆页策略/修复决策仲裁 高质量结构化推理、较强约束遵循 Claude 做写作改写;Gemini 做快速补全
Claude Opus 4.5 表达优化、语气控制、摘要与 bullet 精炼 语言自然、风格稳定 GPT-5.2 做硬约束修复/结构重排
Gemini 3 Pro 轻量规划、信息抽取、模板候选打分(可并行) 速度/性价比路线(按你们预算配置) GPT-5.2 做最终仲裁

3.2 图像模型职责

模型推荐负责提示词输入输出要求
NanoBanana Pro 风格化插画/背景纹理/图标套件 tokens + slide intent + imagery style 不要生成正文文字;输出 16:9 与透明/非透明版本(按需)
Image 1.5 写实/素材图补齐、通用配图、背景图变体 tokens + scene + composition hints 同样禁止正文文字;优先“留白足”的构图便于叠字

3.3 路由策略(可落地的最小规则)

单模型默认(MVP)
  • Planner:GPT-5.2 Thinking
  • Rewrite:Claude Opus 4.5
  • 抽取/打分:Gemini 3 Pro(可选)
双模型冗余(更稳)
  • Planner 同时跑 GPT-5.2 与 Gemini,取 schema 更完整者
  • Rewrite 同时跑 Claude 与 GPT-5.2,取更短且不丢信息者
  • 仲裁:GPT-5.2 只做“选哪个 + 是否拆页/换模板”
// Router 伪代码(思路)
task = {type:"planner"|"rewrite"|"extract"|"arbiter", difficulty, latency_budget}
if task.type=="planner" and task.difficulty=="high": use GPT_5_2
else if task.type=="rewrite": use Claude_Opus
else if task.type=="extract": use Gemini_3
else: use GPT_5_2

// 可选:并行 + 仲裁
candidates = run_parallel([primary, secondary])
best = pick_by(schema_valid && fewer_issues && shorter_text && preserves_keys)

4) 统一输出契约(Schema/JSON)

强制:所有推理模型都必须只输出 JSON(不夹杂自然语言),否则直接判失败并触发“格式修复”重试。

4.1 Slide IR(v1)最小字段

{
  "deck": {
    "meta": {
      "title": "string",
      "language": "zh-CN",
      "mode": "presenter|detailed",
      "aspect": "16:9",
      "pageSize": { "w": 1920, "h": 1080 }
    },
    "themeRef": "theme.default",
    "slides": [
      {
        "id": "s1",
        "intent": "cover|section|concept|comparison|process|timeline|data|quote|summary",
        "title": "string",
        "subtitle": "string?",
        "bullets": ["string"],
        "callouts": ["string?"],
        "notes": "string?",
        "assets": [
          { "role":"bg|illustration|icon|chart", "ref":"assetId?", "styleHint":"string?" }
        ],
        "constraints": { "maxBullets": 5, "maxWordsPerBullet": 14 }
      }
    ]
  }
}

4.2 Layout IR(输出给 Renderer)

{
  "page": { "w":1920, "h":1080, "safe":48, "grid":8 },
  "slides": [
    {
      "id":"s1",
      "nodes":[
        {
          "nodeId":"title",
          "type":"text|list|image|shape|chart",
          "bbox":{ "x":96, "y":96, "w":1200, "h":140 },
          "styleRole":"h1|h2|body|caption",
          "text":"...",
          "font":{ "family":"Inter", "size":44, "weight":700, "lineHeight":1.15 },
          "lines":["..."]  // 可选:已换行结果
        }
      ]
    }
  ]
}

4.3 Issues(Validators 输出)

{
  "issues":[
    {"type":"overflow","slideId":"s2","nodeId":"bullets","excessPx":72,"severity":"high"},
    {"type":"overlap","slideId":"s3","nodeA":"title","nodeB":"visual","severity":"high"},
    {"type":"contrast","slideId":"s1","nodeId":"title","ratio":2.3,"severity":"med"}
  ]
}

5) Prompt 规范与模板

5.1 Planner Prompt(给 GPT-5.2 / Gemini)

{
  "task":"planner",
  "input":{
    "mode":"presenter|detailed",
    "language":"zh-CN",
    "audience":"string",
    "goal":"string",
    "sources":[{"id":"p1","text":"..."}],
    "constraints":{
      "slidesMin":8,
      "slidesMax":12,
      "maxBulletsPresenter":4,
      "maxBulletsDetailed":6,
      "maxWordsPerBulletPresenter":12,
      "maxWordsPerBulletDetailed":18
    }
  },
  "output_format":"ONLY_JSON_MATCHING_SCHEMA(deck_ir_v1)"
}

5.2 Rewrite Prompt(给 Claude)

{
  "task":"rewrite",
  "input":{
    "target":"shorter|clearer|more_formal|more_playful",
    "text":"原 bullet/段落",
    "must_keep":["关键词A","数字B","结论C"],
    "limits":{"maxWords":12}
  },
  "output_format":"ONLY_JSON {\"text\":\"...\"}"
}

5.3 Repair Prompt(给 GPT-5.2 仲裁)

{
  "task":"repair",
  "input":{
    "deck_ir":"...",
    "issues":"...",
    "allowed_actions":[
      "rewrite(nodeId,target)",
      "trim_bullets(nodeId,keepTopK)",
      "swap_template(slideId,templateId)",
      "split_slide(slideId,intoN)"
    ],
    "hard_limits":{
      "minFontBody":14,
      "maxIterations":4
    }
  },
  "output_format":"ONLY_JSON {\"actions\":[...]}"
统一约束:所有 prompts 都明确:只输出 JSON;并且给出 schema 名称或示例结构,避免“跑偏”。

6) 风格系统(Tokens)

6.1 Tokens 必备字段

{
  "themeId":"theme.brandA",
  "palette":{"bg":"#0B1220","text":"#E6EDF3","primary":"#7AA2FF","surface":"#101826","line":"#22324A"},
  "typography":{"titleFont":"Inter","bodyFont":"Inter","scale":{"h1":44,"h2":32,"body":18,"small":14}},
  "spacing":{"grid":8,"safeMargin":48,"gap":16,"lineHeight":1.25},
  "shape":{"radius":16,"stroke":1,"shadow":"soft"},
  "imagery":{"style":"flat|handdrawn|realistic|chalk|manga","bgPattern":"none|grain|dots"},
  "charts":{"axis":"minimal|full","labels":"compact|full"}
}

6.2 “任意风格”的工程做法

  • 先做 预置主题库(10~30 套 tokens),把常见风格映射到 tokens(极简/科技/复古/手绘/黑板等)
  • 再做 brandbook 抽取(可用推理模型解析文本规则)→ tokens 覆盖
  • 风格只改变 tokens,不改变模板与布局逻辑(保证稳定)

7) 模板库与组件树

MVP:20 个模板即可跑通。扩展到 30~80 个模板覆盖 90% 场景。

7.1 Template DSL(结构+约束)

{
  "templateId":"concept.2col.imageRight",
  "intent":"concept",
  "grid":{"cols":12,"rows":12,"safe":48},
  "slots":[
    {"name":"title","type":"text","area":[1,1,7,2],"styleRole":"h2"},
    {"name":"bullets","type":"list","area":[1,3,7,8],"styleRole":"body","maxLines":10},
    {"name":"visual","type":"media","area":[8,2,11,10],"fit":"cover","minVisible":0.6}
  ],
  "constraints":{"align":"baselineGrid","minGap":16,"noOverlap":true}
}

7.2 组件树组装规则

  • 按 intent 先选模板候选(3~5 个),再根据内容密度与媒体可用性打分
  • slot 填充时标记优先级:标题(最高)> 关键结论 > bullets > 装饰
  • 为 Repair 留后门:bullets 支持 trim,visual 支持缩放/替换,必要时 split

8) 确定性排版引擎(关键)

8.1 必做能力

  • 文字测量:给定字体/字号/宽度,计算换行与高度
  • 基线网格:元素吸附 grid(8px),保证齐整
  • 溢出回退:rewrite → downscale → trim → swap template → split
  • 下限约束:正文字号不低于 minFontBody(例如 14)

8.2 回退策略(建议顺序)

1) Rewrite(Claude 或 GPT-5.2):更短更清晰,不丢 must_keep
2) Downscale(规则):降字号到下限
3) Trim(规则):bullets 保留 top-k,其余拆页
4) Swap template(规则):换更“文字友好”模板
5) Split slide(GPT-5.2 仲裁 + 规则执行):拆成两页或更多
6) Fallback:summary/section 兜底模板
禁用:无限缩小字号解决溢出(会直接毁掉“专业感”)。

9) 图像生成管线(NanoBanana Pro / Image 1.5)

9.1 使用边界

  • 允许:背景纹理、插画、图标、抽象装饰、无字素材图
  • 不建议:把标题/正文写进图片(除非你们明确接受不可编辑)
  • 推荐构图:为叠字预留留白区;背景不过度花(避免对比度问题)

9.2 图像提示词输入(统一格式)

{
  "task":"image_generate",
  "model":"nanobanana_pro|image_1_5",
  "input":{
    "slideId":"s3",
    "intent":"concept",
    "imagery":{
      "style":"flat|handdrawn|realistic|chalk|manga",
      "mood":"calm|bold|playful|serious",
      "composition":"leave whitespace on left for text",
      "avoid":"no text, no watermark, no logos"
    },
    "palette":{"primary":"#7AA2FF","bg":"#0B1220","accent":"#34D399"},
    "size":{"w":1920,"h":1080},
    "transparent": false
  },
  "output_format":"image_binary_or_url"
}

9.3 选型建议(工程规则)

  • 需要强风格化一致性(一整套 deck 插画风一致):优先 NanoBanana Pro
  • 需要通用写实素材/背景变体:优先 Image 1.5
  • 图像失败/不合规:降级为纯色/渐变/几何纹理(规则生成,无需模型)

10) Validators + Repair Loop

10.1 Validators(规则)

规则判定修复动作
Overflowbbox 超出 safe areadownscale/trim/swap/split/LLM rewrite
Overlap关键元素相交挪位/缩图/换模板
Contrast对比度低改色/加底板/调背景
Alignment未吸附网格snap-to-grid
Density文字密度超阈值trim/split/rewrite
Consistency违反 tokens 白名单回滚到 tokens/role 样式

10.2 Repair Loop(收敛机制)

  • 最多迭代 3~5 次;每次都必须重新 Layout + Validate
  • 规则修复优先;LLM 只做文本改写/拆页建议/模板仲裁
  • 若不收敛:强制 split 或 fallback(summary)保证输出稳定
{
  "actions":[
    {"type":"trim_bullets","slideId":"s2","nodeId":"bullets","keepTopK":4},
    {"type":"llm_rewrite","model":"claude_opus_4_5","slideId":"s2","nodeId":"bullets","target":"shorter"},
    {"type":"swap_template","slideId":"s3","templateId":"concept.singleCol.textHeavy"}
  ]
}

11) HTML 渲染与导出

11.1 HTML 渲染建议

  • 每页一个 <section class="slide">,固定尺寸(推荐 1920x1080)
  • tokens → CSS variables;styleRole → class;bbox → inline style(left/top/width/height)
  • 导出友好:确保字体加载完成再截图/打印
<section class="slide" style="width:1920px;height:1080px">
  <div class="node title h1" style="left:96px;top:96px;width:1200px;height:140px">...</div>
  <ul class="node bullets body" style="left:96px;top:260px;width:920px;height:700px">...</ul>
  <img class="node visual" style="left:1100px;top:200px;width:720px;height:760px;object-fit:cover" />
</section>

11.2 导出 PDF(推荐)

  • Playwright/Puppeteer:加载 HTML → document.fonts.ready → pdf/逐页截图
  • 打印背景:printBackground:true

12) 工程化:服务拆分、缓存、日志、评测

12.1 服务拆分(建议)

/api/plan        (LLM: GPT-5.2 or Gemini)  -> deck_ir_v1
/api/style       (rules + LLM optional)    -> tokens
/api/compose     (rules + LLM optional)    -> deck_ir_v1 (slots filled)
/api/layout      (deterministic)           -> layout_ir
/api/validate    (rules)                   -> issues
/api/repair      (LLM arbiter + rules)     -> actions
/api/image       (NanoBanana/Image1.5)     -> asset
/api/render/html (deterministic)           -> html

12.2 缓存键(强烈建议)

  • LLM:hash(prompt + schemaVersion + model + temperature)
  • 文字测量:hash(fontFamily + size + width + text)
  • 图片:hash(tokens + intent + prompt + size + model)

12.3 评测与回归(必须做)

  • golden cases:长标题、超多 bullets、无图、全图、CJK、RTL
  • 属性测试:随机文本长度/密度,断言“不溢出/不重叠”
  • 视觉回归:截图 diff(阈值)
日志建议:每轮 repair 记录:issues、actions、最终是否收敛;用于定位“为什么这页总是崩”。

13) MVP 交付清单(按最短路径)

MVP-1:先稳定排版(不依赖图像)
  • IR + tokens + 20 模板
  • Layout:文字测量 + 回退策略
  • HTML renderer + PDF 导出
  • Validators:overflow/overlap/alignment
MVP-2:接推理模型(结构化生成 + 修复)
  • Planner:GPT-5.2(主)+ Gemini(可选并行)
  • Rewrite:Claude(主)
  • Repair 仲裁:GPT-5.2
MVP-3:接图像(风格一致)
  • NanoBanana:插画/背景风格化
  • Image 1.5:通用素材/背景变体
  • 对比度与留白自动校验
MVP-4:可选 PPTX 导出
  • Layout IR → pptx 坐标映射
  • 正文保持为 text box(可编辑)

附录:示例 Schema/接口

A) Model Router 请求体(统一)

{
  "model":"gpt_5_2_thinking|claude_opus_4_5|gemini_3_pro|nanobanana_pro|image_1_5",
  "task":"planner|rewrite|repair|extract|image_generate",
  "schema":"deck_ir_v1|repair_actions_v1|rewrite_v1|...",
  "input":{ "...": "..." },
  "settings":{
    "temperature": 0.2,
    "maxTokens": 4000,
    "timeoutMs": 45000
  }
}

B) Repair Actions Schema(简化)

{
  "actions":[
    {"type":"llm_rewrite","model":"claude_opus_4_5","slideId":"s2","nodeId":"bullets","target":"shorter"},
    {"type":"trim_bullets","slideId":"s2","nodeId":"bullets","keepTopK":4},
    {"type":"swap_template","slideId":"s3","templateId":"process.vertical.clean"},
    {"type":"split_slide","slideId":"s4","intoN":2}
  ]
}
落地结论: 只要你们做到“LLM 输出结构化 IR + 确定性布局 + validators/repair 收敛”,即使图像与推理模型有限,仍然能稳定复现“任意风格 + 排版优秀”的体验。
保存为 manual.html 即可打开阅读。此文档严格只使用你指定的 5 个模型作为能力来源。