结构化输出技术:让LLM返回可靠JSON

JSON Mode、Function Calling、Schema 验证与容错恢复的工程化实践 | 2026-02


一、为什么需要结构化输出

LLM 的默认输出是自然语言文本。但在工程系统中,下游组件需要的是可解析、可验证、可类型化的结构化数据。非结构化输出导致的问题:

  1. 解析失败:JSON 格式错误(多余逗号、未闭合括号)
  2. 字段缺失:LLM "忘记"输出某些必需字段
  3. 类型错误:数字输出为字符串,布尔值输出为 "yes/no"
  4. 幻觉字段:输出了 Schema 中不存在的额外字段

二、技术方案对比

2.1 四种主要方案

方案 原理 可靠性 灵活性 Provider 支持
纯提示词约束 在 prompt 中要求 JSON 低(60-80%) 全部
JSON Mode API 参数强制 JSON 中(90-95%) OpenAI/Anthropic
Function Calling 定义函数 Schema 高(95-99%) OpenAI/Anthropic/Google
Structured Output 严格 Schema 遵循 最高(99.9%) OpenAI (最成熟)
约束解码 推理时 token 级约束 最高(100%) SGLang/vLLM/Outlines

2.2 决策树

Do you need 100% valid JSON?
|
+-- No (tolerance for occasional failures):
|   Use JSON Mode + retry
|
+-- Yes:
    |
    +-- Using OpenAI?
    |   -> Structured Outputs (response_format with schema)
    |
    +-- Using self-hosted model?
    |   -> Constrained Decoding (SGLang/Outlines)
    |
    +-- Using other provider?
        -> Function Calling + Pydantic validation + retry

三、Provider 方案详解

3.1 OpenAI Structured Outputs

from openai import OpenAI
from pydantic import BaseModel

class ProductReview(BaseModel):
    sentiment: str          # "positive" | "negative" | "neutral"
    score: float            # 0.0 - 1.0
    key_topics: list[str]   # Extracted topics
    summary: str            # One-sentence summary
    recommendation: bool    # Would recommend?

client = OpenAI()

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Analyze the product review."},
        {"role": "user", "content": "This laptop is amazing! Great battery..."},
    ],
    response_format=ProductReview,  # Pydantic model directly
)

# result is a typed Pydantic object, guaranteed valid
review: ProductReview = response.choices[0].message.parsed
print(f"Sentiment: {review.sentiment}, Score: {review.score}")

3.2 Anthropic Tool Use

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{
        "name": "extract_review",
        "description": "Extract structured data from a product review",
        "input_schema": {
            "type": "object",
            "properties": {
                "sentiment": {
                    "type": "string",
                    "enum": ["positive", "negative", "neutral"],
                },
                "score": {
                    "type": "number",
                    "minimum": 0, "maximum": 1,
                },
                "key_topics": {
                    "type": "array",
                    "items": {"type": "string"},
                },
                "summary": {"type": "string"},
                "recommendation": {"type": "boolean"},
            },
            "required": ["sentiment", "score", "key_topics", "summary", "recommendation"],
        },
    }],
    tool_choice={"type": "tool", "name": "extract_review"},  # Force tool use
    messages=[
        {"role": "user", "content": "Analyze: This laptop is amazing! Great battery..."},
    ],
)

# Extract from tool use response
tool_input = response.content[0].input  # dict, validated against schema

3.3 约束解码(Constrained Decoding)

# Using Outlines library for 100% valid structured output
import outlines
from pydantic import BaseModel, Field

class Invoice(BaseModel):
    vendor: str = Field(description="Vendor name")
    amount: float = Field(ge=0, description="Total amount")
    currency: str = Field(pattern="^(CNY|USD|EUR)$")
    items: list[str] = Field(min_length=1)
    date: str = Field(pattern=r"^\d{4}-\d{2}-\d{2}$")

model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")

# Generator with schema constraint
generator = outlines.generate.json(model, Invoice)

# Every token is constrained to valid JSON matching the schema
result: Invoice = generator(
    "Extract invoice data: Beijing Tech Co., CNY 15000, office supplies..."
)
# result.amount = 15000.0  (guaranteed valid)
# result.currency = "CNY"  (guaranteed from enum)
# result.date = "2026-02-15"  (guaranteed matching pattern)

四、Schema 设计最佳实践

4.1 Schema 设计原则

原则 说明 示例
字段名自解释 LLM 靠名称理解语义 customer_name 优于 cn
用 enum 限制选项 减少幻觉 "status": enum["pending", "done"]
添加 description 指导 LLM 理解字段 "score": {description: "0-1 scale"}
必需字段显式声明 避免遗漏 "required": ["name", "score"]
嵌套适度 不超过 3 层 扁平优先
提供默认值 处理不确定信息 "confidence": 0.5

4.2 复杂 Schema 示例

from pydantic import BaseModel, Field
from enum import Enum
from typing import Optional

class Priority(str, Enum):
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"

class ActionItem(BaseModel):
    """A single action item extracted from meeting notes."""
    description: str = Field(description="What needs to be done")
    assignee: str = Field(description="Person responsible")
    deadline: Optional[str] = Field(
        None, description="Due date in YYYY-MM-DD format",
        pattern=r"^\d{4}-\d{2}-\d{2}$",
    )
    priority: Priority = Field(
        default=Priority.MEDIUM,
        description="Urgency level",
    )

class MeetingSummary(BaseModel):
    """Structured summary of a meeting."""
    title: str = Field(description="Meeting title or topic")
    date: str = Field(description="Meeting date", pattern=r"^\d{4}-\d{2}-\d{2}$")
    attendees: list[str] = Field(
        description="List of attendee names",
        min_length=1,
    )
    key_decisions: list[str] = Field(
        description="Important decisions made",
        min_length=0,
    )
    action_items: list[ActionItem] = Field(
        description="Tasks assigned during the meeting",
        min_length=0,
    )
    next_meeting: Optional[str] = Field(
        None, description="Next meeting date if scheduled",
    )

五、错误恢复策略

5.1 多层防御架构

Error Recovery Pipeline

LLM Output
     |
     v
[Layer 1: Parse JSON]
     |-- Success -> [Layer 2]
     |-- Fail -> [Repair JSON] -> [Layer 2]
                      |-- Fail -> [Retry with stricter prompt]
                                      |-- Fail -> [Fallback/Error]
     v
[Layer 2: Schema Validation]
     |-- Valid -> Return result
     |-- Invalid -> [Fix missing fields]
                      |-- Fixed -> Return result
                      |-- Cannot fix -> [Retry]

5.2 JSON 修复实现

import json
import re
from typing import Any

def repair_json(raw: str) -> dict | None:
    """Attempt to repair malformed JSON from LLM output."""
    # Step 1: Extract JSON from markdown code blocks
    json_match = re.search(r'```(?:json)?\s*([\s\S]*?)```', raw)
    if json_match:
        raw = json_match.group(1)

    # Step 2: Try direct parse
    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        pass

    # Step 3: Common fixes
    fixes = [
        # Remove trailing commas before } or ]
        (r',\s*([}\]])', r'\1'),
        # Add missing quotes around keys
        (r'(\{|\,)\s*(\w+)\s*:', r'\1"\2":'),
        # Replace single quotes with double
        (r"'", '"'),
        # Remove comments
        (r'//.*$', '', re.MULTILINE),
        # Fix boolean values
        (r'\bTrue\b', 'true'),
        (r'\bFalse\b', 'false'),
        (r'\bNone\b', 'null'),
    ]

    fixed = raw
    for pattern, replacement, *flags in fixes:
        flag = flags[0] if flags else 0
        fixed = re.sub(pattern, replacement, fixed, flags=flag)

    try:
        return json.loads(fixed)
    except json.JSONDecodeError:
        return None

def validate_and_fix(
    data: dict, schema: type[BaseModel],
) -> BaseModel | None:
    """Validate against Pydantic schema, attempt to fix issues."""
    try:
        return schema.model_validate(data)
    except Exception as e:
        # Attempt to fix common issues
        fixed = data.copy()

        # Fill missing required fields with defaults
        for field_name, field_info in schema.model_fields.items():
            if field_name not in fixed:
                if field_info.default is not None:
                    fixed[field_name] = field_info.default
                elif field_info.annotation == str:
                    fixed[field_name] = ""
                elif field_info.annotation == int:
                    fixed[field_name] = 0
                elif field_info.annotation == list:
                    fixed[field_name] = []

        try:
            return schema.model_validate(fixed)
        except Exception:
            return None

5.3 智能重试

async def structured_generate(
    model: str,
    messages: list[dict],
    schema: type[BaseModel],
    max_retries: int = 3,
) -> BaseModel:
    """Generate structured output with automatic retry."""
    last_error = None

    for attempt in range(max_retries):
        response = await openai.chat.completions.create(
            model=model,
            messages=messages + (
                [{
                    "role": "user",
                    "content": f"Previous output had this error: {last_error}. "
                               f"Please fix and output valid JSON.",
                }] if last_error else []
            ),
            response_format={"type": "json_object"},
            temperature=0.0 if attempt > 0 else 0.3,  # Reduce randomness on retry
        )

        raw = response.choices[0].message.content
        parsed = repair_json(raw)

        if parsed:
            result = validate_and_fix(parsed, schema)
            if result:
                return result
            last_error = f"Schema validation failed for: {parsed}"
        else:
            last_error = f"JSON parse failed for output starting with: {raw[:100]}"

    raise ValueError(f"Failed after {max_retries} attempts: {last_error}")

六、Provider 对比

6.1 结构化输出能力矩阵

能力 OpenAI Anthropic Google 开源模型
JSON Mode 否(用 tool_use) 需框架
Structured Output 是(最强) 是(Gemini) Outlines/SGLang
Function Calling 是(tool_use) 是(部分)
Schema 严格遵循 99.9% 95%+ 95%+ 100%(约束解码)
嵌套 Schema
Enum 支持
Regex 约束 是(Outlines)
递归 Schema 部分

6.2 性能与成本影响

维度 纯 Prompt JSON Mode Function Call Structured Output
额外延迟 0 ~5% ~10% ~5%
额外 Token 0 ~5% ~15% (schema) ~10%
解析成功率 60-80% 90-95% 95-99% 99.9%
重试成本 极低
综合成本 最高(重试多) 最低

七、生产化建议

7.1 分层策略

层级 策略 适用场景
首选 Provider 原生结构化输出 OpenAI Structured Output
次选 Function Calling + 验证 多 Provider 兼容
保底 JSON Mode + 修复 + 重试 通用
自托管 约束解码 对可靠性要求极高

7.2 监控要点

# Key metrics to track for structured output quality
metrics = {
    "parse_success_rate": "JSON successfully parsed / total requests",
    "validation_success_rate": "Schema valid / total parsed",
    "retry_rate": "Requests needing retry / total requests",
    "repair_rate": "Requests needing JSON repair / total requests",
    "avg_retries": "Average retries per request (should be < 0.1)",
    "field_completeness": "Percentage of non-null required fields",
}

八、总结

结构化输出是 LLM 工程化的基石。OpenAI 的 Structured Outputs 提供了最高的可靠性,约束解码提供了理论上的 100% 保证,而 Function Calling 是多 Provider 兼容的最佳平衡。

核心建议:不要依赖提示词约束来保证输出格式。用 Schema 定义期望,用工具强制执行,用验证兜底。


Maurice | maurice_wen@proton.me