结构化输出技术：让LLM返回可靠JSON

原创灵阙教研团队

A 推荐进阶 | 约 8 分钟阅读更新于 2026-02-28

AI 导读

结构化输出技术：让LLM返回可靠JSON JSON Mode、Function Calling、Schema 验证与容错恢复的工程化实践 | 2026-02 一、为什么需要结构化输出 LLM 的默认输出是自然语言文本。但在工程系统中，下游组件需要的是可解析、可验证、可类型化的结构化数据。非结构化输出导致的问题：解析失败：JSON 格式错误（多余逗号、未闭合括号）字段缺失：LLM...

结构化输出技术：让LLM返回可靠JSON

JSON Mode、Function Calling、Schema 验证与容错恢复的工程化实践 | 2026-02

一、为什么需要结构化输出

LLM 的默认输出是自然语言文本。但在工程系统中，下游组件需要的是可解析、可验证、可类型化的结构化数据。非结构化输出导致的问题：

解析失败：JSON 格式错误（多余逗号、未闭合括号）
字段缺失：LLM "忘记"输出某些必需字段
类型错误：数字输出为字符串，布尔值输出为 "yes/no"
幻觉字段：输出了 Schema 中不存在的额外字段

二、技术方案对比

2.1 四种主要方案

方案	原理	可靠性	灵活性	Provider 支持
纯提示词约束	在 prompt 中要求 JSON	低（60-80%）	高	全部
JSON Mode	API 参数强制 JSON	中（90-95%）	中	OpenAI/Anthropic
Function Calling	定义函数 Schema	高（95-99%）	高	OpenAI/Anthropic/Google
Structured Output	严格 Schema 遵循	最高（99.9%）	中	OpenAI (最成熟)
约束解码	推理时 token 级约束	最高（100%）	低	SGLang/vLLM/Outlines

2.2 决策树

Do you need 100% valid JSON?
|
+-- No (tolerance for occasional failures):
|   Use JSON Mode + retry
|
+-- Yes:
    |
    +-- Using OpenAI?
    |   -> Structured Outputs (response_format with schema)
    |
    +-- Using self-hosted model?
    |   -> Constrained Decoding (SGLang/Outlines)
    |
    +-- Using other provider?
        -> Function Calling + Pydantic validation + retry

三、Provider 方案详解

3.1 OpenAI Structured Outputs

from openai import OpenAI
from pydantic import BaseModel

class ProductReview(BaseModel):
    sentiment: str          # "positive" | "negative" | "neutral"
    score: float            # 0.0 - 1.0
    key_topics: list[str]   # Extracted topics
    summary: str            # One-sentence summary
    recommendation: bool    # Would recommend?

client = OpenAI()

response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Analyze the product review."},
        {"role": "user", "content": "This laptop is amazing! Great battery..."},
    ],
    response_format=ProductReview,  # Pydantic model directly
)

# result is a typed Pydantic object, guaranteed valid
review: ProductReview = response.choices[0].message.parsed
print(f"Sentiment: {review.sentiment}, Score: {review.score}")

3.2 Anthropic Tool Use

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{
        "name": "extract_review",
        "description": "Extract structured data from a product review",
        "input_schema": {
            "type": "object",
            "properties": {
                "sentiment": {
                    "type": "string",
                    "enum": ["positive", "negative", "neutral"],
                },
                "score": {
                    "type": "number",
                    "minimum": 0, "maximum": 1,
                },
                "key_topics": {
                    "type": "array",
                    "items": {"type": "string"},
                },
                "summary": {"type": "string"},
                "recommendation": {"type": "boolean"},
            },
            "required": ["sentiment", "score", "key_topics", "summary", "recommendation"],
        },
    }],
    tool_choice={"type": "tool", "name": "extract_review"},  # Force tool use
    messages=[
        {"role": "user", "content": "Analyze: This laptop is amazing! Great battery..."},
    ],
)

# Extract from tool use response
tool_input = response.content[0].input  # dict, validated against schema

3.3 约束解码（Constrained Decoding）

# Using Outlines library for 100% valid structured output
import outlines
from pydantic import BaseModel, Field

class Invoice(BaseModel):
    vendor: str = Field(description="Vendor name")
    amount: float = Field(ge=0, description="Total amount")
    currency: str = Field(pattern="^(CNY|USD|EUR)$")
    items: list[str] = Field(min_length=1)
    date: str = Field(pattern=r"^\d{4}-\d{2}-\d{2}$")

model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")

# Generator with schema constraint
generator = outlines.generate.json(model, Invoice)

# Every token is constrained to valid JSON matching the schema
result: Invoice = generator(
    "Extract invoice data: Beijing Tech Co., CNY 15000, office supplies..."
)
# result.amount = 15000.0  (guaranteed valid)
# result.currency = "CNY"  (guaranteed from enum)
# result.date = "2026-02-15"  (guaranteed matching pattern)

四、Schema 设计最佳实践

4.1 Schema 设计原则

原则	说明	示例
字段名自解释	LLM 靠名称理解语义	`customer_name` 优于 `cn`
用 enum 限制选项	减少幻觉	`"status": enum["pending", "done"]`
添加 description	指导 LLM 理解字段	`"score": {description: "0-1 scale"}`
必需字段显式声明	避免遗漏	`"required": ["name", "score"]`
嵌套适度	不超过 3 层	扁平优先
提供默认值	处理不确定信息	`"confidence": 0.5`

4.2 复杂 Schema 示例

from pydantic import BaseModel, Field
from enum import Enum
from typing import Optional

class Priority(str, Enum):
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"

class ActionItem(BaseModel):
    """A single action item extracted from meeting notes."""
    description: str = Field(description="What needs to be done")
    assignee: str = Field(description="Person responsible")
    deadline: Optional[str] = Field(
        None, description="Due date in YYYY-MM-DD format",
        pattern=r"^\d{4}-\d{2}-\d{2}$",
    )
    priority: Priority = Field(
        default=Priority.MEDIUM,
        description="Urgency level",
    )

class MeetingSummary(BaseModel):
    """Structured summary of a meeting."""
    title: str = Field(description="Meeting title or topic")
    date: str = Field(description="Meeting date", pattern=r"^\d{4}-\d{2}-\d{2}$")
    attendees: list[str] = Field(
        description="List of attendee names",
        min_length=1,
    )
    key_decisions: list[str] = Field(
        description="Important decisions made",
        min_length=0,
    )
    action_items: list[ActionItem] = Field(
        description="Tasks assigned during the meeting",
        min_length=0,
    )
    next_meeting: Optional[str] = Field(
        None, description="Next meeting date if scheduled",
    )

五、错误恢复策略

5.1 多层防御架构

Error Recovery Pipeline

LLM Output
     |
     v
[Layer 1: Parse JSON]
     |-- Success -> [Layer 2]
     |-- Fail -> [Repair JSON] -> [Layer 2]
                      |-- Fail -> [Retry with stricter prompt]
                                      |-- Fail -> [Fallback/Error]
     v
[Layer 2: Schema Validation]
     |-- Valid -> Return result
     |-- Invalid -> [Fix missing fields]
                      |-- Fixed -> Return result
                      |-- Cannot fix -> [Retry]

5.2 JSON 修复实现

import json
import re
from typing import Any

def repair_json(raw: str) -> dict | None:
    """Attempt to repair malformed JSON from LLM output."""
    # Step 1: Extract JSON from markdown code blocks
    json_match = re.search(r'```(?:json)?\s*([\s\S]*?)```', raw)
    if json_match:
        raw = json_match.group(1)

    # Step 2: Try direct parse
    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        pass

    # Step 3: Common fixes
    fixes = [
        # Remove trailing commas before } or ]
        (r',\s*([}\]])', r'\1'),
        # Add missing quotes around keys
        (r'(\{|\,)\s*(\w+)\s*:', r'\1"\2":'),
        # Replace single quotes with double
        (r"'", '"'),
        # Remove comments
        (r'//.*$', '', re.MULTILINE),
        # Fix boolean values
        (r'\bTrue\b', 'true'),
        (r'\bFalse\b', 'false'),
        (r'\bNone\b', 'null'),
    ]

    fixed = raw
    for pattern, replacement, *flags in fixes:
        flag = flags[0] if flags else 0
        fixed = re.sub(pattern, replacement, fixed, flags=flag)

    try:
        return json.loads(fixed)
    except json.JSONDecodeError:
        return None

def validate_and_fix(
    data: dict, schema: type[BaseModel],
) -> BaseModel | None:
    """Validate against Pydantic schema, attempt to fix issues."""
    try:
        return schema.model_validate(data)
    except Exception as e:
        # Attempt to fix common issues
        fixed = data.copy()

        # Fill missing required fields with defaults
        for field_name, field_info in schema.model_fields.items():
            if field_name not in fixed:
                if field_info.default is not None:
                    fixed[field_name] = field_info.default
                elif field_info.annotation == str:
                    fixed[field_name] = ""
                elif field_info.annotation == int:
                    fixed[field_name] = 0
                elif field_info.annotation == list:
                    fixed[field_name] = []

        try:
            return schema.model_validate(fixed)
        except Exception:
            return None

5.3 智能重试

async def structured_generate(
    model: str,
    messages: list[dict],
    schema: type[BaseModel],
    max_retries: int = 3,
) -> BaseModel:
    """Generate structured output with automatic retry."""
    last_error = None

    for attempt in range(max_retries):
        response = await openai.chat.completions.create(
            model=model,
            messages=messages + (
                [{
                    "role": "user",
                    "content": f"Previous output had this error: {last_error}. "
                               f"Please fix and output valid JSON.",
                }] if last_error else []
            ),
            response_format={"type": "json_object"},
            temperature=0.0 if attempt > 0 else 0.3,  # Reduce randomness on retry
        )

        raw = response.choices[0].message.content
        parsed = repair_json(raw)

        if parsed:
            result = validate_and_fix(parsed, schema)
            if result:
                return result
            last_error = f"Schema validation failed for: {parsed}"
        else:
            last_error = f"JSON parse failed for output starting with: {raw[:100]}"

    raise ValueError(f"Failed after {max_retries} attempts: {last_error}")

六、Provider 对比

6.1 结构化输出能力矩阵

能力	OpenAI	Anthropic	Google	开源模型
JSON Mode	是	否（用 tool_use）	是	需框架
Structured Output	是（最强）	否	是（Gemini）	Outlines/SGLang
Function Calling	是	是（tool_use）	是	是（部分）
Schema 严格遵循	99.9%	95%+	95%+	100%（约束解码）
嵌套 Schema	是	是	是	是
Enum 支持	是	是	是	是
Regex 约束	否	否	否	是（Outlines）
递归 Schema	是	否	否	部分

6.2 性能与成本影响

维度	纯 Prompt	JSON Mode	Function Call	Structured Output
额外延迟	0	~5%	~10%	~5%
额外 Token	0	~5%	~15% (schema)	~10%
解析成功率	60-80%	90-95%	95-99%	99.9%
重试成本	高	中	低	极低
综合成本	最高（重试多）	中	低	最低

七、生产化建议

7.1 分层策略

层级	策略	适用场景
首选	Provider 原生结构化输出	OpenAI Structured Output
次选	Function Calling + 验证	多 Provider 兼容
保底	JSON Mode + 修复 + 重试	通用
自托管	约束解码	对可靠性要求极高

7.2 监控要点

# Key metrics to track for structured output quality
metrics = {
    "parse_success_rate": "JSON successfully parsed / total requests",
    "validation_success_rate": "Schema valid / total parsed",
    "retry_rate": "Requests needing retry / total requests",
    "repair_rate": "Requests needing JSON repair / total requests",
    "avg_retries": "Average retries per request (should be < 0.1)",
    "field_completeness": "Percentage of non-null required fields",
}

八、总结

结构化输出是 LLM 工程化的基石。OpenAI 的 Structured Outputs 提供了最高的可靠性，约束解码提供了理论上的 100% 保证，而 Function Calling 是多 Provider 兼容的最佳平衡。

核心建议：不要依赖提示词约束来保证输出格式。用 Schema 定义期望，用工具强制执行，用验证兜底。

Maurice | maurice_wen@proton.me