结构化输出技术:让LLM返回可靠JSON
原创
灵阙教研团队
A 推荐 进阶 |
约 8 分钟阅读
更新于 2026-02-28 AI 导读
结构化输出技术:让LLM返回可靠JSON JSON Mode、Function Calling、Schema 验证与容错恢复的工程化实践 | 2026-02 一、为什么需要结构化输出 LLM 的默认输出是自然语言文本。但在工程系统中,下游组件需要的是可解析、可验证、可类型化的结构化数据。非结构化输出导致的问题: 解析失败:JSON 格式错误(多余逗号、未闭合括号) 字段缺失:LLM...
结构化输出技术:让LLM返回可靠JSON
JSON Mode、Function Calling、Schema 验证与容错恢复的工程化实践 | 2026-02
一、为什么需要结构化输出
LLM 的默认输出是自然语言文本。但在工程系统中,下游组件需要的是可解析、可验证、可类型化的结构化数据。非结构化输出导致的问题:
- 解析失败:JSON 格式错误(多余逗号、未闭合括号)
- 字段缺失:LLM "忘记"输出某些必需字段
- 类型错误:数字输出为字符串,布尔值输出为 "yes/no"
- 幻觉字段:输出了 Schema 中不存在的额外字段
二、技术方案对比
2.1 四种主要方案
| 方案 | 原理 | 可靠性 | 灵活性 | Provider 支持 |
|---|---|---|---|---|
| 纯提示词约束 | 在 prompt 中要求 JSON | 低(60-80%) | 高 | 全部 |
| JSON Mode | API 参数强制 JSON | 中(90-95%) | 中 | OpenAI/Anthropic |
| Function Calling | 定义函数 Schema | 高(95-99%) | 高 | OpenAI/Anthropic/Google |
| Structured Output | 严格 Schema 遵循 | 最高(99.9%) | 中 | OpenAI (最成熟) |
| 约束解码 | 推理时 token 级约束 | 最高(100%) | 低 | SGLang/vLLM/Outlines |
2.2 决策树
Do you need 100% valid JSON?
|
+-- No (tolerance for occasional failures):
| Use JSON Mode + retry
|
+-- Yes:
|
+-- Using OpenAI?
| -> Structured Outputs (response_format with schema)
|
+-- Using self-hosted model?
| -> Constrained Decoding (SGLang/Outlines)
|
+-- Using other provider?
-> Function Calling + Pydantic validation + retry
三、Provider 方案详解
3.1 OpenAI Structured Outputs
from openai import OpenAI
from pydantic import BaseModel
class ProductReview(BaseModel):
sentiment: str # "positive" | "negative" | "neutral"
score: float # 0.0 - 1.0
key_topics: list[str] # Extracted topics
summary: str # One-sentence summary
recommendation: bool # Would recommend?
client = OpenAI()
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": "Analyze the product review."},
{"role": "user", "content": "This laptop is amazing! Great battery..."},
],
response_format=ProductReview, # Pydantic model directly
)
# result is a typed Pydantic object, guaranteed valid
review: ProductReview = response.choices[0].message.parsed
print(f"Sentiment: {review.sentiment}, Score: {review.score}")
3.2 Anthropic Tool Use
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[{
"name": "extract_review",
"description": "Extract structured data from a product review",
"input_schema": {
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
},
"score": {
"type": "number",
"minimum": 0, "maximum": 1,
},
"key_topics": {
"type": "array",
"items": {"type": "string"},
},
"summary": {"type": "string"},
"recommendation": {"type": "boolean"},
},
"required": ["sentiment", "score", "key_topics", "summary", "recommendation"],
},
}],
tool_choice={"type": "tool", "name": "extract_review"}, # Force tool use
messages=[
{"role": "user", "content": "Analyze: This laptop is amazing! Great battery..."},
],
)
# Extract from tool use response
tool_input = response.content[0].input # dict, validated against schema
3.3 约束解码(Constrained Decoding)
# Using Outlines library for 100% valid structured output
import outlines
from pydantic import BaseModel, Field
class Invoice(BaseModel):
vendor: str = Field(description="Vendor name")
amount: float = Field(ge=0, description="Total amount")
currency: str = Field(pattern="^(CNY|USD|EUR)$")
items: list[str] = Field(min_length=1)
date: str = Field(pattern=r"^\d{4}-\d{2}-\d{2}$")
model = outlines.models.transformers("Qwen/Qwen2.5-7B-Instruct")
# Generator with schema constraint
generator = outlines.generate.json(model, Invoice)
# Every token is constrained to valid JSON matching the schema
result: Invoice = generator(
"Extract invoice data: Beijing Tech Co., CNY 15000, office supplies..."
)
# result.amount = 15000.0 (guaranteed valid)
# result.currency = "CNY" (guaranteed from enum)
# result.date = "2026-02-15" (guaranteed matching pattern)
四、Schema 设计最佳实践
4.1 Schema 设计原则
| 原则 | 说明 | 示例 |
|---|---|---|
| 字段名自解释 | LLM 靠名称理解语义 | customer_name 优于 cn |
| 用 enum 限制选项 | 减少幻觉 | "status": enum["pending", "done"] |
| 添加 description | 指导 LLM 理解字段 | "score": {description: "0-1 scale"} |
| 必需字段显式声明 | 避免遗漏 | "required": ["name", "score"] |
| 嵌套适度 | 不超过 3 层 | 扁平优先 |
| 提供默认值 | 处理不确定信息 | "confidence": 0.5 |
4.2 复杂 Schema 示例
from pydantic import BaseModel, Field
from enum import Enum
from typing import Optional
class Priority(str, Enum):
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
class ActionItem(BaseModel):
"""A single action item extracted from meeting notes."""
description: str = Field(description="What needs to be done")
assignee: str = Field(description="Person responsible")
deadline: Optional[str] = Field(
None, description="Due date in YYYY-MM-DD format",
pattern=r"^\d{4}-\d{2}-\d{2}$",
)
priority: Priority = Field(
default=Priority.MEDIUM,
description="Urgency level",
)
class MeetingSummary(BaseModel):
"""Structured summary of a meeting."""
title: str = Field(description="Meeting title or topic")
date: str = Field(description="Meeting date", pattern=r"^\d{4}-\d{2}-\d{2}$")
attendees: list[str] = Field(
description="List of attendee names",
min_length=1,
)
key_decisions: list[str] = Field(
description="Important decisions made",
min_length=0,
)
action_items: list[ActionItem] = Field(
description="Tasks assigned during the meeting",
min_length=0,
)
next_meeting: Optional[str] = Field(
None, description="Next meeting date if scheduled",
)
五、错误恢复策略
5.1 多层防御架构
Error Recovery Pipeline
LLM Output
|
v
[Layer 1: Parse JSON]
|-- Success -> [Layer 2]
|-- Fail -> [Repair JSON] -> [Layer 2]
|-- Fail -> [Retry with stricter prompt]
|-- Fail -> [Fallback/Error]
v
[Layer 2: Schema Validation]
|-- Valid -> Return result
|-- Invalid -> [Fix missing fields]
|-- Fixed -> Return result
|-- Cannot fix -> [Retry]
5.2 JSON 修复实现
import json
import re
from typing import Any
def repair_json(raw: str) -> dict | None:
"""Attempt to repair malformed JSON from LLM output."""
# Step 1: Extract JSON from markdown code blocks
json_match = re.search(r'```(?:json)?\s*([\s\S]*?)```', raw)
if json_match:
raw = json_match.group(1)
# Step 2: Try direct parse
try:
return json.loads(raw)
except json.JSONDecodeError:
pass
# Step 3: Common fixes
fixes = [
# Remove trailing commas before } or ]
(r',\s*([}\]])', r'\1'),
# Add missing quotes around keys
(r'(\{|\,)\s*(\w+)\s*:', r'\1"\2":'),
# Replace single quotes with double
(r"'", '"'),
# Remove comments
(r'//.*$', '', re.MULTILINE),
# Fix boolean values
(r'\bTrue\b', 'true'),
(r'\bFalse\b', 'false'),
(r'\bNone\b', 'null'),
]
fixed = raw
for pattern, replacement, *flags in fixes:
flag = flags[0] if flags else 0
fixed = re.sub(pattern, replacement, fixed, flags=flag)
try:
return json.loads(fixed)
except json.JSONDecodeError:
return None
def validate_and_fix(
data: dict, schema: type[BaseModel],
) -> BaseModel | None:
"""Validate against Pydantic schema, attempt to fix issues."""
try:
return schema.model_validate(data)
except Exception as e:
# Attempt to fix common issues
fixed = data.copy()
# Fill missing required fields with defaults
for field_name, field_info in schema.model_fields.items():
if field_name not in fixed:
if field_info.default is not None:
fixed[field_name] = field_info.default
elif field_info.annotation == str:
fixed[field_name] = ""
elif field_info.annotation == int:
fixed[field_name] = 0
elif field_info.annotation == list:
fixed[field_name] = []
try:
return schema.model_validate(fixed)
except Exception:
return None
5.3 智能重试
async def structured_generate(
model: str,
messages: list[dict],
schema: type[BaseModel],
max_retries: int = 3,
) -> BaseModel:
"""Generate structured output with automatic retry."""
last_error = None
for attempt in range(max_retries):
response = await openai.chat.completions.create(
model=model,
messages=messages + (
[{
"role": "user",
"content": f"Previous output had this error: {last_error}. "
f"Please fix and output valid JSON.",
}] if last_error else []
),
response_format={"type": "json_object"},
temperature=0.0 if attempt > 0 else 0.3, # Reduce randomness on retry
)
raw = response.choices[0].message.content
parsed = repair_json(raw)
if parsed:
result = validate_and_fix(parsed, schema)
if result:
return result
last_error = f"Schema validation failed for: {parsed}"
else:
last_error = f"JSON parse failed for output starting with: {raw[:100]}"
raise ValueError(f"Failed after {max_retries} attempts: {last_error}")
六、Provider 对比
6.1 结构化输出能力矩阵
| 能力 | OpenAI | Anthropic | 开源模型 | |
|---|---|---|---|---|
| JSON Mode | 是 | 否(用 tool_use) | 是 | 需框架 |
| Structured Output | 是(最强) | 否 | 是(Gemini) | Outlines/SGLang |
| Function Calling | 是 | 是(tool_use) | 是 | 是(部分) |
| Schema 严格遵循 | 99.9% | 95%+ | 95%+ | 100%(约束解码) |
| 嵌套 Schema | 是 | 是 | 是 | 是 |
| Enum 支持 | 是 | 是 | 是 | 是 |
| Regex 约束 | 否 | 否 | 否 | 是(Outlines) |
| 递归 Schema | 是 | 否 | 否 | 部分 |
6.2 性能与成本影响
| 维度 | 纯 Prompt | JSON Mode | Function Call | Structured Output |
|---|---|---|---|---|
| 额外延迟 | 0 | ~5% | ~10% | ~5% |
| 额外 Token | 0 | ~5% | ~15% (schema) | ~10% |
| 解析成功率 | 60-80% | 90-95% | 95-99% | 99.9% |
| 重试成本 | 高 | 中 | 低 | 极低 |
| 综合成本 | 最高(重试多) | 中 | 低 | 最低 |
七、生产化建议
7.1 分层策略
| 层级 | 策略 | 适用场景 |
|---|---|---|
| 首选 | Provider 原生结构化输出 | OpenAI Structured Output |
| 次选 | Function Calling + 验证 | 多 Provider 兼容 |
| 保底 | JSON Mode + 修复 + 重试 | 通用 |
| 自托管 | 约束解码 | 对可靠性要求极高 |
7.2 监控要点
# Key metrics to track for structured output quality
metrics = {
"parse_success_rate": "JSON successfully parsed / total requests",
"validation_success_rate": "Schema valid / total parsed",
"retry_rate": "Requests needing retry / total requests",
"repair_rate": "Requests needing JSON repair / total requests",
"avg_retries": "Average retries per request (should be < 0.1)",
"field_completeness": "Percentage of non-null required fields",
}
八、总结
结构化输出是 LLM 工程化的基石。OpenAI 的 Structured Outputs 提供了最高的可靠性,约束解码提供了理论上的 100% 保证,而 Function Calling 是多 Provider 兼容的最佳平衡。
核心建议:不要依赖提示词约束来保证输出格式。用 Schema 定义期望,用工具强制执行,用验证兜底。
Maurice | maurice_wen@proton.me