AI PPT 生成引擎设计

模板系统、布局算法、内容感知设计、图像放置与导出流水线的工程化架构


一、PPT 生成的本质挑战

PPT 生成不是"把文字放到幻灯片上"——它是一个受约束的布局优化问题:在有限的画布空间内,将文字、图像、图表等元素排列成视觉上和谐、信息上清晰的版面。

这个问题之所以难,是因为它同时涉及三个领域:

  1. 内容理解:从输入文本中提取结构化信息(标题、要点、数据)
  2. 视觉设计:将信息映射为视觉元素(排版、配色、层级)
  3. 格式工程:将设计输出为标准文件格式(PPTX、PDF、图片)

系统架构全景

用户输入(文本/大纲/文件)
       |
       v
  [内容解析引擎]
       |
       v
  [幻灯片规划器] -- 决定页数、每页类型、内容分配
       |
       v
  [布局引擎] -- 根据模板+内容选择最优布局
       |
       v
  [图像生成] -- AI 生成配图 / 图表渲染
       |
       v
  [样式引擎] -- 应用配色方案、字体、间距
       |
       v
  [渲染导出] -- PPTX / PDF / PNG
       |
       v
  最终文件

二、模板系统设计

2.1 模板 Schema

模板不是一个固定的 PPTX 文件,而是一套声明式的布局规则:

// types/template.ts
interface PPTTemplate {
  id: string;
  name: string;
  description: string;
  category: 'business' | 'education' | 'creative' | 'minimal';

  // Visual identity
  colorScheme: ColorScheme;
  typography: TypographyConfig;

  // Layout rules
  layouts: SlideLayout[];

  // Style generation prompt (for AI image generation)
  stylePrompt: string;
  keywords: string[];

  // Dimensions
  width: number;   // pixels (default: 1920)
  height: number;  // pixels (default: 1080)
}

interface ColorScheme {
  primary: string;      // Main brand color
  secondary: string;    // Accent color
  background: string;   // Slide background
  text: string;         // Body text
  heading: string;      // Heading text
  accent: string;       // Highlights, links
  gradient?: {
    from: string;
    to: string;
    angle: number;
  };
}

interface TypographyConfig {
  headingFont: string;
  bodyFont: string;
  headingSize: number;   // px
  bodySize: number;      // px
  lineHeight: number;    // multiplier
  headingWeight: number; // 400-900
}

interface SlideLayout {
  type: 'title' | 'content' | 'two-column' | 'image-left'
    | 'image-right' | 'full-image' | 'comparison' | 'data'
    | 'quote' | 'closing';
  zones: LayoutZone[];
  padding: { top: number; right: number; bottom: number; left: number };
}

interface LayoutZone {
  id: string;
  role: 'title' | 'subtitle' | 'body' | 'image' | 'chart' | 'icon';
  bounds: { x: number; y: number; width: number; height: number }; // 0-1 normalized
  style?: Record<string, string>;
  optional?: boolean;
}

2.2 模板解析与应用

// template-engine.ts

class TemplateEngine {
  private templates: Map<string, PPTTemplate>;

  constructor(templates: PPTTemplate[]) {
    this.templates = new Map(templates.map(t => [t.id, t]));
  }

  resolveTemplate(templateId: string): PPTTemplate {
    const template = this.templates.get(templateId);
    if (!template) {
      throw new Error(`Template not found: ${templateId}`);
    }
    return template;
  }

  selectLayout(
    template: PPTTemplate,
    slideContent: SlideContent,
  ): SlideLayout {
    /**
     * Content-aware layout selection.
     * Choose the best layout based on what content is available.
     */
    const hasImage = !!slideContent.image;
    const hasChart = !!slideContent.chartData;
    const bulletCount = slideContent.bullets?.length ?? 0;
    const isTitle = slideContent.slideType === 'title';

    if (isTitle) {
      return this.findLayout(template, 'title');
    }

    if (hasChart) {
      return this.findLayout(template, 'data');
    }

    if (hasImage && bulletCount > 0) {
      // Alternate image position for visual rhythm
      return this.findLayout(
        template,
        slideContent.index % 2 === 0 ? 'image-left' : 'image-right'
      );
    }

    if (hasImage && bulletCount === 0) {
      return this.findLayout(template, 'full-image');
    }

    if (bulletCount > 4) {
      return this.findLayout(template, 'two-column');
    }

    return this.findLayout(template, 'content');
  }

  private findLayout(template: PPTTemplate, type: string): SlideLayout {
    return template.layouts.find(l => l.type === type)
      ?? template.layouts.find(l => l.type === 'content')!;
  }

  applyColorScheme(
    baseColors: ColorScheme,
    overrides?: Partial<ColorScheme>,
  ): ColorScheme {
    /**
     * Apply color overrides, filtering out undefined values.
     * This prevents the { ...defaults, ...overrides } trap
     * where undefined clobbers defaults.
     */
    if (!overrides) return baseColors;

    const filtered = Object.fromEntries(
      Object.entries(overrides).filter(([_, v]) => v !== undefined)
    );

    return { ...baseColors, ...filtered };
  }
}

三、布局算法

3.1 约束满足布局

幻灯片布局本质是一个约束满足问题(CSP):每个元素有位置和大小约束,元素之间不能重叠,整体需要视觉平衡。

// layout-solver.ts

interface LayoutConstraint {
  element: string;
  type: 'position' | 'size' | 'alignment' | 'spacing';
  value: unknown;
}

class LayoutSolver {
  private readonly canvasWidth: number;
  private readonly canvasHeight: number;
  private readonly padding: { top: number; right: number; bottom: number; left: number };

  constructor(width: number, height: number, padding: typeof LayoutSolver.prototype.padding) {
    this.canvasWidth = width;
    this.canvasHeight = height;
    this.padding = padding;
  }

  solve(zones: LayoutZone[], content: SlideContent): ResolvedElement[] {
    const elements: ResolvedElement[] = [];
    const usableWidth = this.canvasWidth - this.padding.left - this.padding.right;
    const usableHeight = this.canvasHeight - this.padding.top - this.padding.bottom;

    for (const zone of zones) {
      // Skip optional zones with no content
      if (zone.optional && !this.hasContentForZone(zone, content)) {
        continue;
      }

      const resolved: ResolvedElement = {
        id: zone.id,
        role: zone.role,
        x: this.padding.left + zone.bounds.x * usableWidth,
        y: this.padding.top + zone.bounds.y * usableHeight,
        width: zone.bounds.width * usableWidth,
        height: zone.bounds.height * usableHeight,
        content: this.getContentForZone(zone, content),
        style: zone.style ?? {},
      };

      // Auto-adjust text size to fit
      if (zone.role === 'body' || zone.role === 'title') {
        resolved.fontSize = this.calculateFontSize(
          resolved.content as string,
          resolved.width,
          resolved.height,
          zone.role === 'title' ? 48 : 24,
        );
      }

      elements.push(resolved);
    }

    return elements;
  }

  private calculateFontSize(
    text: string,
    maxWidth: number,
    maxHeight: number,
    idealSize: number,
  ): number {
    /**
     * Binary search for the largest font size that fits the box.
     * Approximate: assume average char width = fontSize * 0.6
     */
    const lines = text.split('\n');
    let fontSize = idealSize;

    while (fontSize > 12) {
      const charWidth = fontSize * 0.6;
      const lineHeight = fontSize * 1.5;
      const charsPerLine = Math.floor(maxWidth / charWidth);

      let totalLines = 0;
      for (const line of lines) {
        totalLines += Math.ceil(line.length / charsPerLine);
      }

      if (totalLines * lineHeight <= maxHeight) {
        return fontSize;
      }

      fontSize -= 2;
    }

    return 12; // minimum readable size
  }

  private hasContentForZone(zone: LayoutZone, content: SlideContent): boolean {
    switch (zone.role) {
      case 'image': return !!content.image;
      case 'chart': return !!content.chartData;
      case 'subtitle': return !!content.subtitle;
      default: return true;
    }
  }

  private getContentForZone(zone: LayoutZone, content: SlideContent): unknown {
    switch (zone.role) {
      case 'title': return content.title;
      case 'subtitle': return content.subtitle;
      case 'body': return content.bullets?.join('\n') ?? content.bodyText ?? '';
      case 'image': return content.image;
      case 'chart': return content.chartData;
      default: return '';
    }
  }
}

四、配色方案生成

4.1 基于主题的自动配色

// color-generator.ts

interface ColorPalette {
  primary: string;
  secondary: string;
  accent: string;
  background: string;
  text: string;
  heading: string;
}

function generateColorScheme(
  topic: string,
  mood: 'professional' | 'creative' | 'warm' | 'cool' | 'dark',
): ColorPalette {
  /**
   * Generate color scheme based on topic semantics and mood.
   * Uses predefined palettes with topic-based selection.
   */
  const palettes: Record<string, ColorPalette> = {
    professional: {
      primary: '#1a365d',
      secondary: '#2b6cb0',
      accent: '#3182ce',
      background: '#ffffff',
      text: '#2d3748',
      heading: '#1a202c',
    },
    creative: {
      primary: '#6b21a8',
      secondary: '#a855f7',
      accent: '#f59e0b',
      background: '#faf5ff',
      text: '#374151',
      heading: '#1f2937',
    },
    warm: {
      primary: '#c2410c',
      secondary: '#ea580c',
      accent: '#f97316',
      background: '#fffbeb',
      text: '#451a03',
      heading: '#7c2d12',
    },
    cool: {
      primary: '#0e7490',
      secondary: '#06b6d4',
      accent: '#22d3ee',
      background: '#ecfeff',
      text: '#164e63',
      heading: '#155e75',
    },
    dark: {
      primary: '#f8fafc',
      secondary: '#94a3b8',
      accent: '#3b82f6',
      background: '#0f172a',
      text: '#cbd5e1',
      heading: '#f1f5f9',
    },
  };

  return palettes[mood] ?? palettes.professional;
}

function ensureContrast(
  foreground: string,
  background: string,
  minRatio: number = 4.5,
): string {
  /**
   * WCAG contrast check.
   * If contrast is insufficient, adjust foreground color.
   */
  const ratio = calculateContrastRatio(foreground, background);
  if (ratio >= minRatio) return foreground;

  // Darken or lighten foreground to meet contrast requirement
  const bgLuminance = relativeLuminance(background);
  if (bgLuminance > 0.5) {
    return darken(foreground, (minRatio - ratio) * 10);
  } else {
    return lighten(foreground, (minRatio - ratio) * 10);
  }
}

五、图像放置与 AI 图像生成集成

5.1 内容感知图像放置

// image-placement.ts

interface ImagePlacement {
  x: number;
  y: number;
  width: number;
  height: number;
  objectFit: 'cover' | 'contain' | 'fill';
  mask?: 'none' | 'rounded' | 'circle' | 'blob';
}

function calculateImagePlacement(
  zoneBounds: { x: number; y: number; width: number; height: number },
  imageAspect: number, // width / height
  layoutType: string,
): ImagePlacement {
  const zoneAspect = zoneBounds.width / zoneBounds.height;

  if (layoutType === 'full-image') {
    // Full bleed: cover the entire zone
    return {
      ...zoneBounds,
      objectFit: 'cover',
      mask: 'none',
    };
  }

  if (layoutType === 'image-left' || layoutType === 'image-right') {
    // Side image: contain within zone, center vertically
    if (imageAspect > zoneAspect) {
      // Image is wider than zone
      const height = zoneBounds.width / imageAspect;
      const yOffset = (zoneBounds.height - height) / 2;
      return {
        x: zoneBounds.x,
        y: zoneBounds.y + yOffset,
        width: zoneBounds.width,
        height,
        objectFit: 'contain',
        mask: 'rounded',
      };
    } else {
      const width = zoneBounds.height * imageAspect;
      const xOffset = (zoneBounds.width - width) / 2;
      return {
        x: zoneBounds.x + xOffset,
        y: zoneBounds.y,
        width,
        height: zoneBounds.height,
        objectFit: 'contain',
        mask: 'rounded',
      };
    }
  }

  // Default: contain with center alignment
  return {
    ...zoneBounds,
    objectFit: 'contain',
    mask: 'rounded',
  };
}

5.2 AI 图像生成集成

// slide-image-generator.ts

async function generateSlideImage(
  content: SlideContent,
  template: PPTTemplate,
  quality: '2k' | '4k' = '2k',
): Promise<string> {
  /**
   * Generate an image that fits the slide's visual context.
   * The prompt incorporates template style for consistency.
   */
  const sizeMap = {
    '2k': { width: 1920, height: 1080 },
    '4k': { width: 3840, height: 2160 },
  };
  const size = sizeMap[quality];

  const prompt = buildImagePrompt(content, template);

  // Try primary provider, fallback to secondary
  try {
    return await generateWithGoogle(prompt, size);
  } catch {
    return await generateWithPoe(prompt, size);
  }
}

function buildImagePrompt(
  content: SlideContent,
  template: PPTTemplate,
): string {
  /**
   * Construct image generation prompt that maintains
   * visual consistency with the template style.
   */
  const parts = [
    template.stylePrompt,
    `Subject: ${content.title}`,
    content.imageHint ? `Visual: ${content.imageHint}` : '',
    `Color palette: ${template.colorScheme.primary}, ${template.colorScheme.secondary}`,
    template.keywords.join(', '),
    'Professional quality, clean composition, no text overlay',
  ];

  return parts.filter(Boolean).join('. ');
}

六、导出 Pipeline

6.1 PPTX 生成(使用 python-pptx)

# pptx_exporter.py
from pptx import Presentation
from pptx.util import Inches, Pt, Emu
from pptx.dml.color import RGBColor
from pptx.enum.text import PP_ALIGN
from io import BytesIO
import requests


def export_pptx(
    slides_data: list[dict],
    template_config: dict,
    output_path: str,
) -> str:
    """Export resolved slides to PPTX file."""
    prs = Presentation()
    prs.slide_width = Emu(template_config['width'] * 914400 // 96)
    prs.slide_height = Emu(template_config['height'] * 914400 // 96)

    colors = template_config['colorScheme']

    for slide_data in slides_data:
        slide_layout = prs.slide_layouts[6]  # Blank layout
        slide = prs.slides.add_slide(slide_layout)

        # Set background
        bg = slide.background
        fill = bg.fill
        fill.solid()
        fill.fore_color.rgb = RGBColor.from_string(
            colors['background'].lstrip('#')
        )

        # Add elements
        for element in slide_data['elements']:
            if element['role'] in ('title', 'subtitle', 'body'):
                add_text_element(slide, element, colors, template_config)
            elif element['role'] == 'image':
                add_image_element(slide, element)

    prs.save(output_path)
    return output_path


def add_text_element(
    slide, element: dict, colors: dict, config: dict,
) -> None:
    """Add a text box to the slide."""
    left = Emu(int(element['x'] * 914400 / 96))
    top = Emu(int(element['y'] * 914400 / 96))
    width = Emu(int(element['width'] * 914400 / 96))
    height = Emu(int(element['height'] * 914400 / 96))

    txBox = slide.shapes.add_textbox(left, top, width, height)
    tf = txBox.text_frame
    tf.word_wrap = True

    # Determine text properties based on role
    if element['role'] == 'title':
        font_size = Pt(element.get('fontSize', 48))
        font_color = colors['heading']
        font_bold = True
        alignment = PP_ALIGN.LEFT
    elif element['role'] == 'subtitle':
        font_size = Pt(element.get('fontSize', 24))
        font_color = colors['text']
        font_bold = False
        alignment = PP_ALIGN.LEFT
    else:
        font_size = Pt(element.get('fontSize', 18))
        font_color = colors['text']
        font_bold = False
        alignment = PP_ALIGN.LEFT

    # Split text into paragraphs
    text = str(element.get('content', ''))
    for i, line in enumerate(text.split('\n')):
        if i == 0:
            p = tf.paragraphs[0]
        else:
            p = tf.add_paragraph()

        p.text = line
        p.font.size = font_size
        p.font.color.rgb = RGBColor.from_string(font_color.lstrip('#'))
        p.font.bold = font_bold
        p.alignment = alignment
        p.font.name = config['typography']['bodyFont']


def add_image_element(slide, element: dict) -> None:
    """Add an image to the slide."""
    image_url = element.get('content')
    if not image_url:
        return

    # Download image
    response = requests.get(image_url, timeout=30)
    image_stream = BytesIO(response.content)

    left = Emu(int(element['x'] * 914400 / 96))
    top = Emu(int(element['y'] * 914400 / 96))
    width = Emu(int(element['width'] * 914400 / 96))
    height = Emu(int(element['height'] * 914400 / 96))

    slide.shapes.add_picture(image_stream, left, top, width, height)

6.2 多格式导出

格式 工具 用途 质量
PPTX python-pptx 可编辑演示文稿 原始矢量
PDF LibreOffice headless 不可编辑分发
PNG/JPG Puppeteer / wkhtmltoimage 社交媒体缩略图 取决于分辨率
HTML 自定义渲染器 Web 预览 像素级

七、端到端流程示例

输入: "帮我做一份关于 2026 年 AI 趋势的 PPT,10 页,商务风格"

Step 1 - 内容解析:
  LLM 生成 10 页大纲 (title + bullets for each page)

Step 2 - 模板选择:
  匹配 "business" category -> "corporate-blue" template

Step 3 - 布局规划:
  Page 1: title layout
  Pages 2-8: content / two-column / image-left (auto-selected)
  Page 9: data layout (with chart)
  Page 10: closing layout

Step 4 - 图像生成:
  为 5 个需要配图的页面生成 AI 图片 (parallel, 2 at a time)

Step 5 - 样式应用:
  Apply corporate-blue color scheme + typography

Step 6 - 渲染导出:
  Generate PPTX file -> Upload to R2 -> Return download URL

八、常见陷阱与经验

模板解析中的 undefined 陷阱

当前端发送 { id: "template-id" } 而非完整模板数据时,后端必须通过 findTemplateById() 解析完整模板。直接使用 spread 合并会导致 stylePromptcolors 等字段为 undefined,生成出的 PPT 丢失所有风格。

中文字体适配

PPTX 中使用中文字体时:

  • 系统必须安装对应字体(如思源黑体)
  • 或使用 Web 字体嵌入
  • 不同操作系统上字体名称可能不同

图片分辨率

生成的图片必须满足最终输出的分辨率要求:

  • 标准模式(1920x1080):图片至少 2K
  • 高清模式(3840x2160):图片至少 4K
  • 低于要求的图片必须被拦截,不能进入渲染流程

Maurice | maurice_wen@proton.me