Agent 工具调用模式与 Function Calling

工具 Schema 设计、并行工具调用、结构化输出、错误恢复与工具选择策略

引言

工具调用(Function Calling / Tool Use)是 Agent 从"只能说"到"能做事"的关键跳跃。LLM 本身只能生成文本,但通过工具调用,它可以查询数据库、调用 API、操作文件系统——将思考转化为行动。

但工具调用远不只是"把函数名和参数传给模型"这么简单。工具 Schema 如何设计才能让模型准确理解?多个工具如何并行调用?工具失败了怎么优雅恢复?这些工程问题决定了 Agent 的实际可用性。

工具 Schema 设计

Schema 设计原则

原则 说明 好的示例 差的示例
命名清晰 动词+名词,一看就懂 search_products do_thing
描述充分 说明何时用、限制条件 "Search by name, max 50" "Search"
参数精确 类型+约束+默认值+枚举 limit: int, 1-100, default 10 limit: any
粒度适中 一个工具做一件事 get_user + update_user manage_user
幂等优先 读操作无副作用 GET 请求 无状态清理

TypeScript 工具定义

// src/tools/definitions.ts
import { z } from "zod";

// Good: Clear schema with descriptions, constraints, and enums
const searchProductsTool = {
  name: "search_products",
  description: `Search the product catalog by query string.
Returns up to 'limit' products matching the search criteria.
Use this when the user asks about available products, prices, or product details.
Do NOT use this for order-related queries (use search_orders instead).`,
  parameters: z.object({
    query: z.string()
      .min(1)
      .max(200)
      .describe("Search query: product name, category, or keywords"),
    category: z.enum(["electronics", "clothing", "food", "books", "all"])
      .default("all")
      .describe("Filter by product category"),
    price_min: z.number()
      .min(0)
      .optional()
      .describe("Minimum price in USD"),
    price_max: z.number()
      .max(100000)
      .optional()
      .describe("Maximum price in USD"),
    sort_by: z.enum(["relevance", "price_asc", "price_desc", "rating"])
      .default("relevance")
      .describe("Sort order for results"),
    limit: z.number()
      .int()
      .min(1)
      .max(50)
      .default(10)
      .describe("Maximum number of results to return"),
  }),
};

// Tool for creating orders (with confirmation requirement)
const createOrderTool = {
  name: "create_order",
  description: `Create a new order for the customer.
IMPORTANT: Always confirm the order details with the user before calling this tool.
This action is NOT reversible. Returns order ID on success.`,
  parameters: z.object({
    product_id: z.string()
      .describe("Product ID from search results"),
    quantity: z.number()
      .int()
      .min(1)
      .max(100)
      .describe("Number of items to order"),
    shipping_address_id: z.string()
      .describe("ID of saved shipping address"),
    payment_method_id: z.string()
      .describe("ID of saved payment method"),
  }),
};

Python 工具定义

# src/tools/product_tools.py
from langchain_core.tools import tool
from pydantic import BaseModel, Field
from typing import Optional, Literal

class SearchProductsInput(BaseModel):
    query: str = Field(
        ...,
        description="Search query: product name, category, or keywords",
        min_length=1,
        max_length=200,
    )
    category: Literal["electronics", "clothing", "food", "books", "all"] = Field(
        default="all",
        description="Filter by product category",
    )
    price_min: Optional[float] = Field(
        default=None,
        description="Minimum price in USD",
        ge=0,
    )
    price_max: Optional[float] = Field(
        default=None,
        description="Maximum price in USD",
        le=100000,
    )
    limit: int = Field(
        default=10,
        description="Maximum number of results to return",
        ge=1,
        le=50,
    )

@tool(args_schema=SearchProductsInput)
async def search_products(
    query: str,
    category: str = "all",
    price_min: Optional[float] = None,
    price_max: Optional[float] = None,
    limit: int = 10,
) -> str:
    """Search the product catalog by query string.
    Returns product names, prices, ratings, and IDs.
    Use this when the user asks about available products.
    Do NOT use this for order-related queries."""

    products = await product_service.search(
        query=query,
        category=None if category == "all" else category,
        price_range=(price_min, price_max),
        limit=limit,
    )

    if not products:
        return "No products found matching your criteria."

    results = []
    for p in products:
        results.append(
            f"- {p.name} (ID: {p.id}): ${p.price:.2f}, Rating: {p.rating}/5"
        )

    return f"Found {len(products)} products:\n" + "\n".join(results)

并行工具调用

模型原生并行

# OpenAI and Anthropic support parallel tool calls natively
from openai import AsyncOpenAI

client = AsyncOpenAI()

response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What's the weather in Tokyo and London, and calculate 42*17?"},
    ],
    tools=[weather_tool_schema, calculator_tool_schema],
    parallel_tool_calls=True,  # Enabled by default
)

# Model returns multiple tool calls in a single response
# response.choices[0].message.tool_calls = [
#   ToolCall(id="call_1", function=Function(name="get_weather", arguments='{"city":"Tokyo"}')),
#   ToolCall(id="call_2", function=Function(name="get_weather", arguments='{"city":"London"}')),
#   ToolCall(id="call_3", function=Function(name="calculate", arguments='{"expression":"42*17"}')),
# ]

并行执行引擎

# src/tools/parallel_executor.py
import asyncio
from typing import Any

class ParallelToolExecutor:
    """Execute multiple tool calls concurrently."""

    def __init__(self, tools: dict[str, callable], max_concurrent: int = 5):
        self.tools = tools
        self.semaphore = asyncio.Semaphore(max_concurrent)

    async def execute_all(
        self,
        tool_calls: list[dict],
        timeout: float = 30.0,
    ) -> list[dict]:
        """Execute all tool calls in parallel with timeout."""

        async def execute_one(call: dict) -> dict:
            async with self.semaphore:
                tool_name = call["function"]["name"]
                tool_fn = self.tools.get(tool_name)

                if not tool_fn:
                    return {
                        "tool_call_id": call["id"],
                        "role": "tool",
                        "content": f"Error: Unknown tool '{tool_name}'",
                    }

                try:
                    args = json.loads(call["function"]["arguments"])
                    result = await asyncio.wait_for(
                        tool_fn(**args),
                        timeout=timeout,
                    )
                    return {
                        "tool_call_id": call["id"],
                        "role": "tool",
                        "content": str(result),
                    }
                except asyncio.TimeoutError:
                    return {
                        "tool_call_id": call["id"],
                        "role": "tool",
                        "content": f"Error: Tool '{tool_name}' timed out after {timeout}s",
                    }
                except Exception as e:
                    return {
                        "tool_call_id": call["id"],
                        "role": "tool",
                        "content": f"Error executing '{tool_name}': {str(e)}",
                    }

        # Execute all calls concurrently
        results = await asyncio.gather(
            *[execute_one(call) for call in tool_calls],
            return_exceptions=False,
        )

        return results

结构化输出

强制 JSON Schema 输出

# Using OpenAI Structured Outputs
from pydantic import BaseModel
from openai import AsyncOpenAI

class ProductRecommendation(BaseModel):
    products: list[dict]
    reasoning: str
    confidence: float

client = AsyncOpenAI()

response = await client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a product recommendation engine."},
        {"role": "user", "content": "Recommend laptops under $1000 for coding"},
    ],
    response_format=ProductRecommendation,
)

recommendation = response.choices[0].message.parsed
# Type-safe access: recommendation.products, recommendation.reasoning

Anthropic 工具输出

# Using Anthropic tool_choice for structured output
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[{
        "name": "structured_response",
        "description": "Output a structured analysis",
        "input_schema": {
            "type": "object",
            "properties": {
                "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
                "key_topics": {"type": "array", "items": {"type": "string"}},
                "summary": {"type": "string"},
                "confidence": {"type": "number", "minimum": 0, "maximum": 1},
            },
            "required": ["sentiment", "key_topics", "summary", "confidence"],
        },
    }],
    tool_choice={"type": "tool", "name": "structured_response"},
    messages=[{"role": "user", "content": "Analyze this review: ..."}],
)

# Response will always use the structured_response tool
result = response.content[0].input  # Parsed JSON

错误恢复策略

三级错误处理

# src/tools/error_recovery.py

class ToolErrorHandler:
    """Three-tier error recovery for tool calls."""

    async def handle_tool_error(
        self,
        tool_name: str,
        error: Exception,
        original_args: dict,
        messages: list,
        retry_count: int = 0,
    ) -> dict:
        """
        Tier 1: Auto-retry with same args (transient errors)
        Tier 2: Ask LLM to fix args (parameter errors)
        Tier 3: Report to user (unrecoverable)
        """

        error_type = classify_error(error)

        # Tier 1: Transient errors -> retry
        if error_type == "transient" and retry_count < 2:
            await asyncio.sleep(2 ** retry_count)
            return await self.retry_tool(tool_name, original_args)

        # Tier 2: Parameter errors -> ask LLM to fix
        if error_type == "parameter" and retry_count < 1:
            fixed_args = await self.ask_llm_to_fix(
                tool_name, original_args, str(error), messages
            )
            return await self.retry_tool(tool_name, fixed_args)

        # Tier 3: Unrecoverable -> inform the model
        return {
            "role": "tool",
            "content": f"Tool '{tool_name}' failed: {error}. "
                      f"Please try a different approach or inform the user.",
        }

    async def ask_llm_to_fix(
        self,
        tool_name: str,
        args: dict,
        error: str,
        messages: list,
    ) -> dict:
        """Ask LLM to correct the tool arguments."""
        fix_prompt = f"""The tool call failed. Fix the arguments.

Tool: {tool_name}
Arguments: {json.dumps(args)}
Error: {error}

Return corrected arguments as JSON."""

        response = await llm.ainvoke(fix_prompt)
        return json.loads(response.content)

工具选择策略

动态工具集

# src/tools/tool_selector.py

class DynamicToolSelector:
    """Select relevant tools based on conversation context."""

    def __init__(self, all_tools: list, max_tools: int = 10):
        self.all_tools = all_tools
        self.max_tools = max_tools
        self.tool_embeddings = {}

    async def select_tools(self, query: str, context: str = "") -> list:
        """Select the most relevant tools for the current query."""

        # Always include core tools
        core_tools = [t for t in self.all_tools if t.metadata.get("always_available")]

        # Semantic matching for optional tools
        optional_tools = [t for t in self.all_tools if not t.metadata.get("always_available")]

        if not optional_tools:
            return core_tools

        query_embedding = await embed(query + " " + context)

        scored = []
        for tool in optional_tools:
            tool_embedding = await self.get_tool_embedding(tool)
            similarity = cosine_similarity(query_embedding, tool_embedding)
            scored.append((similarity, tool))

        scored.sort(key=lambda x: x[0], reverse=True)
        selected = [t for _, t in scored[:self.max_tools - len(core_tools)]]

        return core_tools + selected

    async def get_tool_embedding(self, tool) -> list[float]:
        key = tool.name
        if key not in self.tool_embeddings:
            text = f"{tool.name}: {tool.description}"
            self.tool_embeddings[key] = await embed(text)
        return self.tool_embeddings[key]

设计清单

检查项 要求 优先级
工具命名 动词+名词,清晰无歧义 必需
参数描述 每个参数有 description 必需
类型约束 枚举/范围/默认值 必需
错误处理 工具失败返回有意义的错误信息 必需
幂等设计 读操作无副作用 推荐
并行调用 独立工具支持并行执行 推荐
超时保护 每个工具有执行超时 必需
工具数量 单次调用不超过 20 个工具定义 推荐

总结

  1. Schema 质量决定调用准确率:清晰的命名、充分的描述、精确的类型约束,是工具调用成功的基础。
  2. 并行调用提升吞吐:独立的工具调用应该并行执行,用 semaphore 控制并发度。
  3. 结构化输出消除解析风险:用 Schema 强制输出格式,比让模型"自由发挥"再解析可靠得多。
  4. 错误恢复要分级:临时错误自动重试,参数错误让 LLM 修正,不可恢复的错误优雅报告。
  5. 工具数量要克制:给模型太多工具会降低选择准确率,用动态工具选择控制在 10-15 个以内。

Maurice | maurice_wen@proton.me