Few-Shot学习模式全解
原创
灵阙教研团队
A 推荐 进阶 |
约 8 分钟阅读
更新于 2026-02-28 AI 导读
Few-Shot学习模式全解 示例选择策略、动态 Few-Shot 构建与 In-Context Learning 的工程化实践 | 2026-02 一、什么是 Few-Shot Learning Few-Shot Learning(少样本学习)是通过在提示词中提供少量示例来引导 LLM 完成特定任务的技术。不同于传统机器学习的微调,Few-Shot 不修改模型权重,而是利用 LLM...
Few-Shot学习模式全解
示例选择策略、动态 Few-Shot 构建与 In-Context Learning 的工程化实践 | 2026-02
一、什么是 Few-Shot Learning
Few-Shot Learning(少样本学习)是通过在提示词中提供少量示例来引导 LLM 完成特定任务的技术。不同于传统机器学习的微调,Few-Shot 不修改模型权重,而是利用 LLM 的上下文学习能力(In-Context Learning, ICL)。
Zero-Shot: 任务描述 -> 模型输出
One-Shot: 任务描述 + 1个示例 -> 模型输出
Few-Shot: 任务描述 + N个示例 -> 模型输出 (N = 2-10)
Many-Shot: 任务描述 + 大量示例 -> 模型输出 (N > 10, 利用长上下文)
二、示例选择策略
2.1 策略分类
| 策略 | 原理 | 适用场景 | 效果 |
|---|---|---|---|
| 随机选择 | 从样本池随机抽取 | 基线 | 低 |
| 相似度选择 | 语义最相似的样本 | 分类/问答 | 高 |
| 多样性选择 | 覆盖不同类别/模式 | 多类别任务 | 高 |
| 混合选择 | 相似度 + 多样性 | 通用 | 最高 |
| 难度递进 | 从简到难排列 | 推理任务 | 高 |
| 对抗选择 | 包含易错边界样本 | 精确分类 | 中高 |
2.2 相似度选择实现
import numpy as np
from dataclasses import dataclass
@dataclass
class Example:
input: str
output: str
embedding: np.ndarray | None = None
class SimilaritySelector:
"""Select examples most similar to the input query."""
def __init__(self, examples: list[Example], embed_fn):
self.examples = examples
self.embed_fn = embed_fn
# Pre-compute embeddings for all examples
for ex in self.examples:
if ex.embedding is None:
ex.embedding = self.embed_fn(ex.input)
def select(self, query: str, k: int = 3) -> list[Example]:
query_embedding = self.embed_fn(query)
# Compute cosine similarity
scores = []
for ex in self.examples:
similarity = np.dot(query_embedding, ex.embedding) / (
np.linalg.norm(query_embedding) * np.linalg.norm(ex.embedding)
)
scores.append((similarity, ex))
# Sort by similarity, return top-k
scores.sort(key=lambda x: x[0], reverse=True)
return [ex for _, ex in scores[:k]]
2.3 多样性感知选择
class DiversityAwareSelector:
"""Select examples balancing similarity and diversity."""
def select(
self, query: str, k: int = 5,
lambda_diversity: float = 0.3,
) -> list[Example]:
"""MMR-style selection: Maximal Marginal Relevance."""
query_emb = self.embed_fn(query)
candidates = list(self.examples)
selected = []
for _ in range(k):
best_score = -float("inf")
best_idx = -1
for i, cand in enumerate(candidates):
# Relevance to query
relevance = cosine_sim(query_emb, cand.embedding)
# Max similarity to already selected (redundancy)
if selected:
redundancy = max(
cosine_sim(cand.embedding, s.embedding)
for s in selected
)
else:
redundancy = 0
# MMR score: balance relevance and diversity
score = (1 - lambda_diversity) * relevance - \
lambda_diversity * redundancy
if score > best_score:
best_score = score
best_idx = i
selected.append(candidates.pop(best_idx))
return selected
三、动态 Few-Shot 构建
3.1 动态 vs 静态
| 维度 | 静态 Few-Shot | 动态 Few-Shot |
|---|---|---|
| 示例选择 | 编写时固定 | 运行时按输入选择 |
| 适应性 | 低 | 高 |
| Token 效率 | 低(通用示例) | 高(针对性示例) |
| 实现复杂度 | 低 | 中 |
| 效果 | 中等 | 显著提升 |
3.2 完整动态 Few-Shot 系统
from typing import Callable
class DynamicFewShotPrompt:
"""Build prompts with dynamically selected examples."""
def __init__(
self,
system_prompt: str,
example_pool: list[Example],
selector: Callable,
formatter: Callable,
max_examples: int = 5,
max_tokens: int = 3000, # Token budget for examples
):
self.system_prompt = system_prompt
self.example_pool = example_pool
self.selector = selector
self.formatter = formatter
self.max_examples = max_examples
self.max_tokens = max_tokens
def build(self, query: str) -> list[dict]:
"""Build complete prompt with dynamically selected examples."""
# Select relevant examples
examples = self.selector(query, k=self.max_examples)
# Fit within token budget
examples = self._fit_token_budget(examples)
# Format into messages
messages = [{"role": "system", "content": self.system_prompt}]
for ex in examples:
messages.append({"role": "user", "content": ex.input})
messages.append({"role": "assistant", "content": ex.output})
messages.append({"role": "user", "content": query})
return messages
def _fit_token_budget(self, examples: list[Example]) -> list[Example]:
"""Trim examples to fit within token budget."""
fitted = []
total_tokens = 0
for ex in examples:
ex_tokens = estimate_tokens(
self.formatter(ex.input, ex.output)
)
if total_tokens + ex_tokens > self.max_tokens:
break
fitted.append(ex)
total_tokens += ex_tokens
return fitted
# Usage
prompt_builder = DynamicFewShotPrompt(
system_prompt="You are a sentiment classifier. Output: positive/negative/neutral",
example_pool=labeled_examples,
selector=SimilaritySelector(labeled_examples, embed_fn).select,
formatter=lambda inp, out: f"Input: {inp}\nOutput: {out}",
)
messages = prompt_builder.build("This product is amazing but overpriced")
response = await openai.chat.completions.create(
model="gpt-4o-mini", messages=messages,
)
四、In-Context Learning 理论
4.1 ICL 的工作原理
In-Context Learning Mechanism (simplified)
Input to model:
[System] You classify sentiment.
[User] "Great movie!" -> positive
[User] "Terrible food" -> negative
[User] "It was okay" -> ?
What happens internally:
1. Attention mechanism identifies pattern:
Input text -> sentiment label
2. Model forms implicit "task vector" from examples
3. Task vector guides generation for new input
4. Output: "neutral"
Key insight: The model is NOT learning new weights.
It is performing approximate Bayesian inference
over possible input-output mappings.
4.2 影响 ICL 效果的因素
| 因素 | 影响方向 | 建议 |
|---|---|---|
| 示例数量 | 3-5 个效果最佳,超过 10 个边际递减 | 默认用 3-5 |
| 示例质量 | 高质量 > 大数量 | 人工审核优先 |
| 示例顺序 | 最后的示例影响最大(近因效应) | 最相关的放最后 |
| 输入-输出格式一致性 | 格式不一致严重降低效果 | 严格统一格式 |
| 标签分布 | 不平衡分布导致偏向 | 每类等比例选取 |
| 示例相关性 | 相关示例远优于随机示例 | 用相似度选择 |
| 模型大小 | 大模型 ICL 能力更强 | 小模型多给示例 |
五、高级 Few-Shot 模式
5.1 Chain-of-Thought Few-Shot
# Standard Few-Shot: Input -> Output
standard_example = Example(
input="If a train travels 60 km/h for 2.5 hours, how far does it go?",
output="150 km",
)
# CoT Few-Shot: Input -> Reasoning -> Output
cot_example = Example(
input="If a train travels 60 km/h for 2.5 hours, how far does it go?",
output="""Let me think step by step:
1. Speed = 60 km/h
2. Time = 2.5 hours
3. Distance = Speed x Time = 60 x 2.5 = 150 km
The train travels 150 km.""",
)
# CoT significantly improves reasoning accuracy
# Especially effective for: math, logic, multi-step problems
5.2 Self-Consistency with Few-Shot
async def self_consistent_few_shot(
query: str, examples: list[Example],
n_samples: int = 5, temperature: float = 0.7,
) -> str:
"""Generate multiple answers and vote on the most common."""
messages = build_few_shot_messages(examples, query)
# Generate multiple responses
responses = []
for _ in range(n_samples):
response = await openai.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=temperature,
)
answer = extract_final_answer(response.choices[0].message.content)
responses.append(answer)
# Majority vote
from collections import Counter
votes = Counter(responses)
best_answer, count = votes.most_common(1)[0]
return best_answer # Confidence = count / n_samples
5.3 Many-Shot ICL(长上下文时代)
# With 128K+ context windows, we can now do Many-Shot ICL
# Research shows: performance continues to improve with 100+ examples
class ManyShotBuilder:
"""Leverage long context for many-shot in-context learning."""
def build(self, query: str, k: int = 100) -> list[dict]:
"""Build prompt with up to 100 examples."""
# Select diverse, high-quality examples
examples = self.diversity_selector.select(query, k=k)
# Order: diverse first, most similar last (recency bias)
diverse = examples[:k-5]
similar = self.similarity_selector.select(query, k=5)
# Estimate tokens
total = estimate_tokens(self.system_prompt) + estimate_tokens(query)
fitted_examples = []
for ex in diverse + similar:
ex_tokens = estimate_tokens(format_example(ex))
if total + ex_tokens > 100_000: # Leave room for output
break
fitted_examples.append(ex)
total += ex_tokens
return self._format_messages(fitted_examples, query)
六、评估与优化
6.1 Few-Shot 效果评估
async def evaluate_few_shot_strategy(
strategies: dict[str, Callable],
test_set: list[dict],
model: str = "gpt-4o-mini",
) -> dict:
"""Compare different few-shot strategies."""
results = {}
for name, strategy in strategies.items():
scores = []
for sample in test_set:
examples = strategy(sample["input"])
messages = build_messages(examples, sample["input"])
response = await openai.chat.completions.create(
model=model, messages=messages, temperature=0,
)
prediction = response.choices[0].message.content
score = evaluate_prediction(prediction, sample["expected"])
scores.append(score)
results[name] = {
"accuracy": sum(scores) / len(scores),
"avg_examples": avg_example_count(strategy, test_set),
"avg_tokens": avg_token_usage(strategy, test_set),
}
return results
# Compare strategies
strategies = {
"random_3": lambda q: random_selector.select(q, k=3),
"similar_3": lambda q: similarity_selector.select(q, k=3),
"diverse_5": lambda q: diversity_selector.select(q, k=5),
"mmr_5": lambda q: mmr_selector.select(q, k=5),
"many_shot_50": lambda q: many_shot_selector.select(q, k=50),
}
results = await evaluate_few_shot_strategy(strategies, test_data)
6.2 优化维度
| 维度 | 优化方向 | 工具 |
|---|---|---|
| 示例数量 | 在 3-50 范围搜索最优 | Grid search |
| 选择策略 | 相似度 vs 多样性权重 | A/B 测试 |
| 示例质量 | 人工审核 + 自动过滤 | LLM-as-judge |
| 排列顺序 | 相关性递增/递减 | 消融实验 |
| 格式设计 | 简洁 vs 详细格式 | A/B 测试 |
| Token 效率 | 压缩示例保持信息量 | 自动摘要 |
七、实战案例
7.1 中文发票分类
# Example pool for Chinese invoice classification
invoice_examples = [
Example(
input="增值税专用发票 / 广州某科技有限公司 / 办公用品 / 5000元",
output='{"category": "办公费用", "tax_type": "增值税专用", "deductible": true}',
),
Example(
input="增值税普通发票 / 某餐饮管理公司 / 餐饮服务 / 800元",
output='{"category": "业务招待费", "tax_type": "增值税普通", "deductible": false}',
),
Example(
input="增值税电子普通发票 / 中国石化 / 汽油 / 500元",
output='{"category": "交通费用", "tax_type": "增值税电子普通", "deductible": false}',
),
# ... 50+ examples covering all categories
]
# Dynamic selection ensures the most relevant examples are used
classifier = DynamicFewShotPrompt(
system_prompt="你是一个发票分类助手。根据发票信息输出JSON分类结果。",
example_pool=invoice_examples,
selector=SimilaritySelector(invoice_examples, embed_fn).select,
formatter=lambda i, o: f"发票:{i}\n分类:{o}",
max_examples=5,
)
八、总结
Few-Shot Learning 是 LLM 应用中最实用且最被低估的技术之一。动态示例选择相比静态示例可以提升 15-30% 的准确率,而长上下文时代的 Many-Shot 策略进一步推高了上限。
核心实践建议:
- 永远用动态选择:相似度 + 多样性的 MMR 策略是最佳起点
- 质量优先于数量:5 个高质量示例胜过 20 个低质量示例
- CoT 不可或缺:推理类任务必须在示例中展示思维链
- 评估驱动优化:用自动评估找到最优示例数量和选择策略
Maurice | maurice_wen@proton.me