Few-Shot学习模式全解

示例选择策略、动态 Few-Shot 构建与 In-Context Learning 的工程化实践 | 2026-02


一、什么是 Few-Shot Learning

Few-Shot Learning(少样本学习)是通过在提示词中提供少量示例来引导 LLM 完成特定任务的技术。不同于传统机器学习的微调,Few-Shot 不修改模型权重,而是利用 LLM 的上下文学习能力(In-Context Learning, ICL)。

Zero-Shot:  任务描述 -> 模型输出
One-Shot:   任务描述 + 1个示例 -> 模型输出
Few-Shot:   任务描述 + N个示例 -> 模型输出 (N = 2-10)
Many-Shot:  任务描述 + 大量示例 -> 模型输出 (N > 10, 利用长上下文)

二、示例选择策略

2.1 策略分类

策略 原理 适用场景 效果
随机选择 从样本池随机抽取 基线
相似度选择 语义最相似的样本 分类/问答
多样性选择 覆盖不同类别/模式 多类别任务
混合选择 相似度 + 多样性 通用 最高
难度递进 从简到难排列 推理任务
对抗选择 包含易错边界样本 精确分类 中高

2.2 相似度选择实现

import numpy as np
from dataclasses import dataclass

@dataclass
class Example:
    input: str
    output: str
    embedding: np.ndarray | None = None

class SimilaritySelector:
    """Select examples most similar to the input query."""

    def __init__(self, examples: list[Example], embed_fn):
        self.examples = examples
        self.embed_fn = embed_fn

        # Pre-compute embeddings for all examples
        for ex in self.examples:
            if ex.embedding is None:
                ex.embedding = self.embed_fn(ex.input)

    def select(self, query: str, k: int = 3) -> list[Example]:
        query_embedding = self.embed_fn(query)

        # Compute cosine similarity
        scores = []
        for ex in self.examples:
            similarity = np.dot(query_embedding, ex.embedding) / (
                np.linalg.norm(query_embedding) * np.linalg.norm(ex.embedding)
            )
            scores.append((similarity, ex))

        # Sort by similarity, return top-k
        scores.sort(key=lambda x: x[0], reverse=True)
        return [ex for _, ex in scores[:k]]

2.3 多样性感知选择

class DiversityAwareSelector:
    """Select examples balancing similarity and diversity."""

    def select(
        self, query: str, k: int = 5,
        lambda_diversity: float = 0.3,
    ) -> list[Example]:
        """MMR-style selection: Maximal Marginal Relevance."""
        query_emb = self.embed_fn(query)
        candidates = list(self.examples)
        selected = []

        for _ in range(k):
            best_score = -float("inf")
            best_idx = -1

            for i, cand in enumerate(candidates):
                # Relevance to query
                relevance = cosine_sim(query_emb, cand.embedding)

                # Max similarity to already selected (redundancy)
                if selected:
                    redundancy = max(
                        cosine_sim(cand.embedding, s.embedding)
                        for s in selected
                    )
                else:
                    redundancy = 0

                # MMR score: balance relevance and diversity
                score = (1 - lambda_diversity) * relevance - \
                        lambda_diversity * redundancy

                if score > best_score:
                    best_score = score
                    best_idx = i

            selected.append(candidates.pop(best_idx))

        return selected

三、动态 Few-Shot 构建

3.1 动态 vs 静态

维度 静态 Few-Shot 动态 Few-Shot
示例选择 编写时固定 运行时按输入选择
适应性
Token 效率 低(通用示例) 高(针对性示例)
实现复杂度
效果 中等 显著提升

3.2 完整动态 Few-Shot 系统

from typing import Callable

class DynamicFewShotPrompt:
    """Build prompts with dynamically selected examples."""

    def __init__(
        self,
        system_prompt: str,
        example_pool: list[Example],
        selector: Callable,
        formatter: Callable,
        max_examples: int = 5,
        max_tokens: int = 3000,  # Token budget for examples
    ):
        self.system_prompt = system_prompt
        self.example_pool = example_pool
        self.selector = selector
        self.formatter = formatter
        self.max_examples = max_examples
        self.max_tokens = max_tokens

    def build(self, query: str) -> list[dict]:
        """Build complete prompt with dynamically selected examples."""
        # Select relevant examples
        examples = self.selector(query, k=self.max_examples)

        # Fit within token budget
        examples = self._fit_token_budget(examples)

        # Format into messages
        messages = [{"role": "system", "content": self.system_prompt}]

        for ex in examples:
            messages.append({"role": "user", "content": ex.input})
            messages.append({"role": "assistant", "content": ex.output})

        messages.append({"role": "user", "content": query})
        return messages

    def _fit_token_budget(self, examples: list[Example]) -> list[Example]:
        """Trim examples to fit within token budget."""
        fitted = []
        total_tokens = 0

        for ex in examples:
            ex_tokens = estimate_tokens(
                self.formatter(ex.input, ex.output)
            )
            if total_tokens + ex_tokens > self.max_tokens:
                break
            fitted.append(ex)
            total_tokens += ex_tokens

        return fitted

# Usage
prompt_builder = DynamicFewShotPrompt(
    system_prompt="You are a sentiment classifier. Output: positive/negative/neutral",
    example_pool=labeled_examples,
    selector=SimilaritySelector(labeled_examples, embed_fn).select,
    formatter=lambda inp, out: f"Input: {inp}\nOutput: {out}",
)

messages = prompt_builder.build("This product is amazing but overpriced")
response = await openai.chat.completions.create(
    model="gpt-4o-mini", messages=messages,
)

四、In-Context Learning 理论

4.1 ICL 的工作原理

In-Context Learning Mechanism (simplified)

Input to model:
  [System] You classify sentiment.
  [User]   "Great movie!" -> positive
  [User]   "Terrible food" -> negative
  [User]   "It was okay" -> ?

What happens internally:
  1. Attention mechanism identifies pattern:
     Input text -> sentiment label
  2. Model forms implicit "task vector" from examples
  3. Task vector guides generation for new input
  4. Output: "neutral"

Key insight: The model is NOT learning new weights.
It is performing approximate Bayesian inference
over possible input-output mappings.

4.2 影响 ICL 效果的因素

因素 影响方向 建议
示例数量 3-5 个效果最佳,超过 10 个边际递减 默认用 3-5
示例质量 高质量 > 大数量 人工审核优先
示例顺序 最后的示例影响最大(近因效应) 最相关的放最后
输入-输出格式一致性 格式不一致严重降低效果 严格统一格式
标签分布 不平衡分布导致偏向 每类等比例选取
示例相关性 相关示例远优于随机示例 用相似度选择
模型大小 大模型 ICL 能力更强 小模型多给示例

五、高级 Few-Shot 模式

5.1 Chain-of-Thought Few-Shot

# Standard Few-Shot: Input -> Output
standard_example = Example(
    input="If a train travels 60 km/h for 2.5 hours, how far does it go?",
    output="150 km",
)

# CoT Few-Shot: Input -> Reasoning -> Output
cot_example = Example(
    input="If a train travels 60 km/h for 2.5 hours, how far does it go?",
    output="""Let me think step by step:
1. Speed = 60 km/h
2. Time = 2.5 hours
3. Distance = Speed x Time = 60 x 2.5 = 150 km

The train travels 150 km.""",
)

# CoT significantly improves reasoning accuracy
# Especially effective for: math, logic, multi-step problems

5.2 Self-Consistency with Few-Shot

async def self_consistent_few_shot(
    query: str, examples: list[Example],
    n_samples: int = 5, temperature: float = 0.7,
) -> str:
    """Generate multiple answers and vote on the most common."""
    messages = build_few_shot_messages(examples, query)

    # Generate multiple responses
    responses = []
    for _ in range(n_samples):
        response = await openai.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            temperature=temperature,
        )
        answer = extract_final_answer(response.choices[0].message.content)
        responses.append(answer)

    # Majority vote
    from collections import Counter
    votes = Counter(responses)
    best_answer, count = votes.most_common(1)[0]

    return best_answer  # Confidence = count / n_samples

5.3 Many-Shot ICL(长上下文时代)

# With 128K+ context windows, we can now do Many-Shot ICL
# Research shows: performance continues to improve with 100+ examples

class ManyShotBuilder:
    """Leverage long context for many-shot in-context learning."""

    def build(self, query: str, k: int = 100) -> list[dict]:
        """Build prompt with up to 100 examples."""
        # Select diverse, high-quality examples
        examples = self.diversity_selector.select(query, k=k)

        # Order: diverse first, most similar last (recency bias)
        diverse = examples[:k-5]
        similar = self.similarity_selector.select(query, k=5)

        # Estimate tokens
        total = estimate_tokens(self.system_prompt) + estimate_tokens(query)
        fitted_examples = []

        for ex in diverse + similar:
            ex_tokens = estimate_tokens(format_example(ex))
            if total + ex_tokens > 100_000:  # Leave room for output
                break
            fitted_examples.append(ex)
            total += ex_tokens

        return self._format_messages(fitted_examples, query)

六、评估与优化

6.1 Few-Shot 效果评估

async def evaluate_few_shot_strategy(
    strategies: dict[str, Callable],
    test_set: list[dict],
    model: str = "gpt-4o-mini",
) -> dict:
    """Compare different few-shot strategies."""
    results = {}

    for name, strategy in strategies.items():
        scores = []
        for sample in test_set:
            examples = strategy(sample["input"])
            messages = build_messages(examples, sample["input"])

            response = await openai.chat.completions.create(
                model=model, messages=messages, temperature=0,
            )
            prediction = response.choices[0].message.content
            score = evaluate_prediction(prediction, sample["expected"])
            scores.append(score)

        results[name] = {
            "accuracy": sum(scores) / len(scores),
            "avg_examples": avg_example_count(strategy, test_set),
            "avg_tokens": avg_token_usage(strategy, test_set),
        }

    return results

# Compare strategies
strategies = {
    "random_3": lambda q: random_selector.select(q, k=3),
    "similar_3": lambda q: similarity_selector.select(q, k=3),
    "diverse_5": lambda q: diversity_selector.select(q, k=5),
    "mmr_5": lambda q: mmr_selector.select(q, k=5),
    "many_shot_50": lambda q: many_shot_selector.select(q, k=50),
}

results = await evaluate_few_shot_strategy(strategies, test_data)

6.2 优化维度

维度 优化方向 工具
示例数量 在 3-50 范围搜索最优 Grid search
选择策略 相似度 vs 多样性权重 A/B 测试
示例质量 人工审核 + 自动过滤 LLM-as-judge
排列顺序 相关性递增/递减 消融实验
格式设计 简洁 vs 详细格式 A/B 测试
Token 效率 压缩示例保持信息量 自动摘要

七、实战案例

7.1 中文发票分类

# Example pool for Chinese invoice classification
invoice_examples = [
    Example(
        input="增值税专用发票 / 广州某科技有限公司 / 办公用品 / 5000元",
        output='{"category": "办公费用", "tax_type": "增值税专用", "deductible": true}',
    ),
    Example(
        input="增值税普通发票 / 某餐饮管理公司 / 餐饮服务 / 800元",
        output='{"category": "业务招待费", "tax_type": "增值税普通", "deductible": false}',
    ),
    Example(
        input="增值税电子普通发票 / 中国石化 / 汽油 / 500元",
        output='{"category": "交通费用", "tax_type": "增值税电子普通", "deductible": false}',
    ),
    # ... 50+ examples covering all categories
]

# Dynamic selection ensures the most relevant examples are used
classifier = DynamicFewShotPrompt(
    system_prompt="你是一个发票分类助手。根据发票信息输出JSON分类结果。",
    example_pool=invoice_examples,
    selector=SimilaritySelector(invoice_examples, embed_fn).select,
    formatter=lambda i, o: f"发票:{i}\n分类:{o}",
    max_examples=5,
)

八、总结

Few-Shot Learning 是 LLM 应用中最实用且最被低估的技术之一。动态示例选择相比静态示例可以提升 15-30% 的准确率,而长上下文时代的 Many-Shot 策略进一步推高了上限。

核心实践建议:

  1. 永远用动态选择:相似度 + 多样性的 MMR 策略是最佳起点
  2. 质量优先于数量:5 个高质量示例胜过 20 个低质量示例
  3. CoT 不可或缺:推理类任务必须在示例中展示思维链
  4. 评估驱动优化:用自动评估找到最优示例数量和选择策略

Maurice | maurice_wen@proton.me