演示文稿自动化:从文本到幻灯片

内容解析、幻灯片结构规划、LLM 驱动布局、图像生成集成与风格一致性


一、自动化的核心问题

将一段文本转化为一份演示文稿,本质上是一个"信息重构"问题:

  • 内容重构:从连续文本中提取层级结构(标题 -> 要点 -> 细节)
  • 空间重构:将线性信息映射到二维画布上
  • 视觉重构:为抽象信息选择恰当的视觉表达形式

这不是简单的格式转换。一段 2000 字的文本,转化为 10 页 PPT 后,每页只能承载约 50-80 个有效字符。选择"展示什么"和"隐藏什么",是比排版更重要的决策。

从文本到幻灯片的处理阶段

原始文本 / 大纲 / 文档
       |
       v
  [Stage 1: 内容解析]
  提取主题、层级、关键信息
       |
       v
  [Stage 2: 幻灯片规划]
  决定页数、每页类型、内容分配
       |
       v
  [Stage 3: LLM 布局]
  为每页选择最佳布局和视觉元素
       |
       v
  [Stage 4: 图像集成]
  AI 生成配图 / 图表 / 图标
       |
       v
  [Stage 5: 风格统一]
  确保全套幻灯片视觉一致
       |
       v
  最终演示文稿

二、Stage 1:内容解析

2.1 LLM 驱动的结构化提取

// content-parser.ts
interface ParsedContent {
  title: string;
  subtitle?: string;
  audience: string;
  keyThemes: string[];
  sections: ContentSection[];
  suggestedPageCount: number;
}

interface ContentSection {
  heading: string;
  level: 1 | 2 | 3;
  points: string[];
  dataPoints?: DataPoint[];
  suggestedVisual?: 'chart' | 'image' | 'diagram' | 'icon' | 'none';
}

async function parseContent(
  rawText: string,
  targetPages: number = 10,
  language: string = 'zh',
): Promise<ParsedContent> {
  const prompt = `Analyze the following text and extract structured presentation content.

Target: ${targetPages} slides for a professional presentation.
Language: ${language}

Input text:
${rawText}

Output as JSON with this structure:
{
  "title": "main presentation title",
  "subtitle": "optional subtitle",
  "audience": "who is this for",
  "keyThemes": ["theme1", "theme2"],
  "sections": [
    {
      "heading": "section title",
      "level": 1,
      "points": ["bullet point 1", "bullet point 2"],
      "dataPoints": [{"label": "...", "value": 42, "unit": "%"}],
      "suggestedVisual": "chart"
    }
  ],
  "suggestedPageCount": ${targetPages}
}

Rules:
- Each bullet point should be concise (under 15 words)
- Identify data that could be visualized as charts
- Suggest visual types for each section
- Maintain the logical flow of the original text`;

  const response = await llm.chat({
    model: 'gemini-2.5-flash',
    messages: [{ role: 'user', content: prompt }],
    responseFormat: { type: 'json_object' },
    temperature: 0.2,
  });

  return JSON.parse(response.content) as ParsedContent;
}

2.2 内容密度评估

解析后需要评估每个 section 的信息密度,决定是否需要拆分或合并:

function assessContentDensity(
  sections: ContentSection[],
  targetPages: number,
): ContentSection[] {
  /**
   * Ensure each section fits well into 1-2 slides.
   * Split dense sections, merge sparse ones.
   */
  const result: ContentSection[] = [];

  for (const section of sections) {
    const wordCount = section.points.reduce(
      (sum, p) => sum + p.length, 0
    );
    const hasData = (section.dataPoints?.length ?? 0) > 0;

    if (wordCount > 200 || section.points.length > 6) {
      // Split: too dense for one slide
      const mid = Math.ceil(section.points.length / 2);
      result.push({
        ...section,
        heading: section.heading + ' (1/2)',
        points: section.points.slice(0, mid),
      });
      result.push({
        ...section,
        heading: section.heading + ' (2/2)',
        points: section.points.slice(mid),
        dataPoints: section.dataPoints,
        suggestedVisual: hasData ? 'chart' : section.suggestedVisual,
      });
    } else if (wordCount < 30 && section.points.length < 2) {
      // Mark for potential merge with adjacent section
      result.push({ ...section, _sparse: true } as any);
    } else {
      result.push(section);
    }
  }

  return mergeSparse(result, targetPages);
}

三、Stage 2:幻灯片规划

3.1 页面类型分配

// slide-planner.ts
type SlideType =
  | 'cover'           // 封面
  | 'agenda'          // 目录/议程
  | 'section-divider' // 章节分隔页
  | 'content'         // 内容页(文字为主)
  | 'data'            // 数据页(图表为主)
  | 'image-feature'   // 图文结合
  | 'comparison'      // 对比页
  | 'quote'           // 引用页
  | 'summary'         // 总结页
  | 'closing';        // 结束页

interface SlidePlan {
  index: number;
  type: SlideType;
  title: string;
  content: SlideContent;
  notes?: string;      // Speaker notes
}

function planSlides(
  parsed: ParsedContent,
  targetPages: number,
): SlidePlan[] {
  const plans: SlidePlan[] = [];

  // Page 1: Cover
  plans.push({
    index: 0,
    type: 'cover',
    title: parsed.title,
    content: {
      title: parsed.title,
      subtitle: parsed.subtitle ?? parsed.audience,
    },
  });

  // Page 2: Agenda (if >= 8 pages)
  if (targetPages >= 8) {
    plans.push({
      index: 1,
      type: 'agenda',
      title: '目录',
      content: {
        title: '目录',
        bullets: parsed.sections
          .filter(s => s.level === 1)
          .map(s => s.heading),
      },
    });
  }

  // Content pages
  let pageIndex = plans.length;
  const sectionsPerPage = Math.ceil(
    parsed.sections.length / (targetPages - 3) // reserve for cover/agenda/closing
  );

  for (const section of parsed.sections) {
    // Add section divider for level-1 sections
    if (section.level === 1 && pageIndex > 2) {
      plans.push({
        index: pageIndex++,
        type: 'section-divider',
        title: section.heading,
        content: { title: section.heading },
      });
    }

    // Content page
    const slideType = determineSlideType(section);
    plans.push({
      index: pageIndex++,
      type: slideType,
      title: section.heading,
      content: {
        title: section.heading,
        bullets: section.points,
        chartData: section.dataPoints,
        imageHint: section.suggestedVisual === 'image'
          ? section.heading
          : undefined,
      },
    });
  }

  // Last page: Closing
  plans.push({
    index: pageIndex,
    type: 'closing',
    title: '谢谢',
    content: {
      title: '谢谢',
      subtitle: '欢迎提问',
    },
  });

  return plans;
}

function determineSlideType(section: ContentSection): SlideType {
  if (section.dataPoints && section.dataPoints.length > 0) return 'data';
  if (section.suggestedVisual === 'image') return 'image-feature';
  if (section.suggestedVisual === 'diagram') return 'comparison';
  return 'content';
}

四、Stage 3:LLM 驱动的智能布局

4.1 内容感知布局选择

不同类型的内容需要不同的布局策略。关键洞察:布局选择不应该是随机的或顺序的,而应该由内容驱动。

// layout-intelligence.ts

interface LayoutDecision {
  layoutType: string;
  reasoning: string;
  emphasisElement: string;    // What should draw attention first
  visualBalance: 'left' | 'right' | 'center' | 'split';
}

async function intelligentLayout(
  slidePlan: SlidePlan,
  previousLayout?: string,
  sequencePosition: number = 0,
): Promise<LayoutDecision> {
  /**
   * Use LLM to make nuanced layout decisions.
   * Consider: content type, previous slide's layout (avoid repetition),
   * position in sequence (rhythm).
   */
  const bulletCount = slidePlan.content.bullets?.length ?? 0;
  const hasImage = !!slidePlan.content.imageHint;
  const hasChart = !!slidePlan.content.chartData;
  const textLength = (slidePlan.content.bullets ?? [])
    .reduce((s, b) => s + b.length, 0);

  // Rule-based fast path for obvious cases
  if (slidePlan.type === 'cover') {
    return {
      layoutType: 'title',
      reasoning: 'Cover slide always uses title layout',
      emphasisElement: 'title',
      visualBalance: 'center',
    };
  }

  if (hasChart) {
    return {
      layoutType: 'data',
      reasoning: 'Data points present, using chart-focused layout',
      emphasisElement: 'chart',
      visualBalance: 'split',
    };
  }

  // For content slides: alternate image position for rhythm
  if (hasImage) {
    const side = previousLayout === 'image-left' ? 'image-right' : 'image-left';
    return {
      layoutType: side,
      reasoning: `Alternating image position from previous: ${previousLayout}`,
      emphasisElement: 'image',
      visualBalance: side === 'image-left' ? 'left' : 'right',
    };
  }

  // Many bullets: use two-column
  if (bulletCount > 4) {
    return {
      layoutType: 'two-column',
      reasoning: `${bulletCount} bullets split across two columns`,
      emphasisElement: 'body',
      visualBalance: 'split',
    };
  }

  return {
    layoutType: 'content',
    reasoning: 'Standard content layout',
    emphasisElement: 'title',
    visualBalance: 'left',
  };
}

4.2 视觉节奏(Visual Rhythm)

好的 PPT 不是每页都用相同布局,而是有节奏感:

Cover  ->  Agenda  ->  Divider  ->  Content  ->  Image  ->  Content
  |          |           |           |           |           |
center    center      center       left        right       left
(strong)  (medium)   (strong)    (standard)  (visual)   (standard)

                   visual rhythm: strong-medium-strong-standard-visual-standard
function applyVisualRhythm(
  plans: SlidePlan[],
  layouts: LayoutDecision[],
): LayoutDecision[] {
  /**
   * Post-process layout decisions to ensure visual rhythm.
   * Avoid: 3+ consecutive same-type layouts.
   * Ensure: visual variety every 2-3 slides.
   */
  for (let i = 2; i < layouts.length; i++) {
    const prev2 = layouts[i - 2].layoutType;
    const prev1 = layouts[i - 1].layoutType;
    const current = layouts[i].layoutType;

    if (prev2 === prev1 && prev1 === current && current === 'content') {
      // Three consecutive content layouts -> force variety
      if (plans[i].content.bullets && plans[i].content.bullets!.length >= 2) {
        layouts[i] = {
          ...layouts[i],
          layoutType: 'two-column',
          reasoning: 'Forced variety to avoid 3 consecutive content layouts',
        };
      }
    }
  }

  return layouts;
}

五、Stage 4:图像生成集成

5.1 按需生成策略

不是每页都需要 AI 图像。过多图像反而会分散注意力:

页面类型 图像策略 来源优先级
封面 必须 AI 生成 (与主题匹配)
章节分隔 可选 纯色/渐变背景
内容页 仅在 image-feature 布局 AI 生成
数据页 图表代替图片 Chart.js / Mermaid
结束页 可选 品牌图片/纯色

5.2 风格一致的批量生成

// batch-image-gen.ts

async function generateSlideImages(
  plans: SlidePlan[],
  template: PPTTemplate,
  concurrency: number = 2,
): Promise<Map<number, string>> {
  /**
   * Generate images for slides that need them.
   * Ensure style consistency across all images using template's stylePrompt.
   */
  const imageMap = new Map<number, string>();

  // Filter slides that need images
  const needsImage = plans.filter(p =>
    p.type === 'cover' ||
    p.type === 'image-feature' ||
    (p.content.imageHint && p.type !== 'data')
  );

  // Generate in batches
  for (let i = 0; i < needsImage.length; i += concurrency) {
    const batch = needsImage.slice(i, i + concurrency);
    const results = await Promise.allSettled(
      batch.map(async (plan) => {
        const prompt = [
          template.stylePrompt,
          `Topic: ${plan.title}`,
          plan.content.imageHint || plan.title,
          `Colors: ${template.colorScheme.primary}, ${template.colorScheme.secondary}`,
          'Clean professional illustration, no text, high quality',
        ].join('. ');

        const imageUrl = await generateImage(prompt, {
          width: 1920,
          height: 1080,
        });

        return { index: plan.index, url: imageUrl };
      })
    );

    for (const result of results) {
      if (result.status === 'fulfilled') {
        imageMap.set(result.value.index, result.value.url);
      }
    }
  }

  return imageMap;
}

六、Stage 5:风格一致性保障

6.1 设计 Token 系统

// design-tokens.ts

interface DesignTokens {
  // Spacing scale (8pt grid)
  spacing: {
    xs: 8;
    sm: 16;
    md: 24;
    lg: 32;
    xl: 48;
    xxl: 64;
  };

  // Border radius
  radius: {
    none: 0;
    sm: 4;
    md: 8;
    lg: 16;
    full: 9999;
  };

  // Shadow
  shadow: {
    none: string;
    sm: string;
    md: string;
    lg: string;
  };

  // Color (from template)
  color: ColorScheme;

  // Typography (from template)
  typography: TypographyConfig;
}

function createDesignTokens(template: PPTTemplate): DesignTokens {
  return {
    spacing: { xs: 8, sm: 16, md: 24, lg: 32, xl: 48, xxl: 64 },
    radius: { none: 0, sm: 4, md: 8, lg: 16, full: 9999 },
    shadow: {
      none: 'none',
      sm: '0 1px 2px rgba(0,0,0,0.05)',
      md: '0 4px 6px rgba(0,0,0,0.07)',
      lg: '0 10px 15px rgba(0,0,0,0.1)',
    },
    color: template.colorScheme,
    typography: template.typography,
  };
}

6.2 一致性校验

function validateConsistency(slides: ResolvedSlide[]): string[] {
  /**
   * Check that all slides maintain visual consistency.
   * Returns list of warnings.
   */
  const warnings: string[] = [];
  const fonts = new Set<string>();
  const colors = new Set<string>();

  for (const slide of slides) {
    for (const element of slide.elements) {
      if (element.fontFamily) fonts.add(element.fontFamily);
      if (element.color) colors.add(element.color);
    }
  }

  // Too many fonts
  if (fonts.size > 3) {
    warnings.push(
      `Too many fonts (${fonts.size}). Limit to 2-3 for consistency.`
    );
  }

  // Too many colors
  if (colors.size > 6) {
    warnings.push(
      `Too many colors (${colors.size}). Stick to template palette.`
    );
  }

  // Check alignment consistency
  const titlePositions = slides
    .map(s => s.elements.find(e => e.role === 'title'))
    .filter(Boolean)
    .map(e => ({ x: e!.x, y: e!.y }));

  const uniqueX = new Set(titlePositions.map(p => Math.round(p.x / 10) * 10));
  if (uniqueX.size > 2) {
    warnings.push(
      'Title positions are inconsistent across slides. Align to grid.'
    );
  }

  return warnings;
}

七、完整示例:端到端生成

输入: 一篇 1500 字的产品发布文章

-> Stage 1 (内容解析, ~5s):
   提取 5 个主题 + 12 个要点 + 3 组数据

-> Stage 2 (幻灯片规划, ~1s):
   规划 10 页: cover + agenda + 3 dividers + 4 content + closing

-> Stage 3 (布局决策, ~2s):
   cover(center) -> agenda(center) -> divider(center)
   -> content(left) -> image-right -> two-column -> data(split)
   -> divider(center) -> content(left) -> closing(center)

-> Stage 4 (图像生成, ~30-60s):
   生成 3 张 AI 配图 (cover + 2 content pages)

-> Stage 5 (风格统一, ~1s):
   应用 corporate-blue 模板
   校验: 2 fonts, 5 colors, titles aligned -> PASS

-> 导出 PPTX (~2s)

总耗时: ~40-70s

八、质量提升的关键决策

内容取舍 > 排版技巧

PPT 的核心价值不在于"好看",而在于"清晰"。自动化生成时最重要的决策是:

  1. 每页只传递一个核心信息
  2. 要点不超过 5 个(Miller's Law 的 7 +/- 2)
  3. 数据用图表而非文字
  4. 详细内容放 Speaker Notes,不放幻灯片

LLM 在 Pipeline 中的角色

LLM 不应该直接生成 PPT 文件。它的最佳角色是:

  • 内容解析:从非结构化文本中提取结构
  • 布局建议:基于内容语义选择布局
  • 文案优化:将长句压缩为要点
  • 图像提示:生成图像 prompt

格式化、排版、文件生成应该由确定性的代码完成,而非依赖 LLM 的非确定性输出。


Maurice | maurice_wen@proton.me