客服智能体系统设计与落地
原创
灵阙教研团队
S 精选 进阶 |
约 16 分钟阅读
更新于 2026-02-28 AI 导读
客服智能体系统设计与落地 意图识别、知识库 RAG、人机协作转接、多轮对话管理与满意度闭环 引言 客服是 AI Agent 最早规模化落地的场景之一。原因很直接:客服对话有明确的意图边界、可量化的效果指标(解决率/满意度/人力成本),且容错空间相对可控——答错了可以转人工,不会像金融交易那样造成不可逆损失。...
客服智能体系统设计与落地
意图识别、知识库 RAG、人机协作转接、多轮对话管理与满意度闭环
引言
客服是 AI Agent 最早规模化落地的场景之一。原因很直接:客服对话有明确的意图边界、可量化的效果指标(解决率/满意度/人力成本),且容错空间相对可控——答错了可以转人工,不会像金融交易那样造成不可逆损失。
但"能回答问题"和"能替代客服"之间有巨大的鸿沟。一个生产级客服智能体需要解决:意图识别的准确率、知识库的时效性、多轮对话的上下文维护、敏感场景的人工转接、以及持续优化的数据闭环。
本文从系统架构到工程实现,完整拆解客服智能体的设计与落地。
系统架构
整体架构
┌─────────────────────────────────────────────────────────────────┐
│ 用户接入层 │
│ Web Chat | App SDK | 微信公众号 | 电话 IVR | 邮件 │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ 消息网关(Gateway) │
│ 协议适配 | 限流 | 会话路由 | 消息队列 │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ 对话引擎(Dialog Engine) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 意图识别 │ │ 槽位填充 │ │ 对话状态 │ │ 回复生成 │ │
│ │ (NLU) │ │ (Slot) │ │ (State) │ │ (NLG) │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ ┌────▼──────────────▼──────────────▼──────────────▼────┐ │
│ │ Agent Orchestrator │ │
│ │ 路由决策 | 工具调用 | 人工转接 | 兜底策略 │ │
│ └──────────────────────┬───────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ 知识与服务层 │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ 知识库 │ │ 订单系统 │ │ 工单系统 │ │ 用户画像 │ │
│ │ (RAG) │ │ (Order) │ │ (Ticket) │ │ (Profile)│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
核心模块职责
| 模块 | 职责 | 关键指标 |
|---|---|---|
| 消息网关 | 多渠道协议适配、限流、会话路由 | 吞吐量、延迟 P99 |
| 意图识别 | 分类用户意图(咨询/投诉/退货/...) | 准确率 > 95% |
| 槽位填充 | 提取关键实体(订单号/商品名/日期) | 召回率 > 90% |
| 对话状态 | 维护多轮上下文与对话阶段 | 上下文一致性 |
| 知识库 RAG | 检索企业知识并生成回复 | 回答准确率、时效性 |
| 人工转接 | 识别转接时机并平滑切换 | 转接成功率、等待时长 |
| 满意度闭环 | 收集反馈、标注数据、持续优化 | CSAT、NPS |
意图识别与槽位填充
混合意图识别
实际生产中,纯规则或纯模型都不够。最佳实践是"规则兜底 + 模型主力"的混合架构。
# src/nlu/intent_classifier.py
from dataclasses import dataclass
from typing import Optional
import re
@dataclass
class IntentResult:
intent: str
confidence: float
slots: dict[str, str]
source: str # "rule" or "model"
class HybridIntentClassifier:
"""Rule-first, model-fallback intent classifier."""
def __init__(self, llm_client, intent_config: dict):
self.llm = llm_client
self.config = intent_config
self.rule_patterns = self._compile_rules(intent_config)
def _compile_rules(self, config: dict) -> list[tuple[re.Pattern, str]]:
"""Compile regex patterns for rule-based matching."""
patterns = []
for intent, spec in config.items():
for pattern in spec.get("patterns", []):
patterns.append((re.compile(pattern, re.IGNORECASE), intent))
return patterns
async def classify(self, message: str, context: list[dict]) -> IntentResult:
"""Classify intent with rule-first, model-fallback strategy."""
# Phase 1: Rule-based matching (fast, deterministic)
for pattern, intent in self.rule_patterns:
match = pattern.search(message)
if match:
slots = {k: v for k, v in match.groupdict().items() if v}
return IntentResult(
intent=intent,
confidence=1.0,
slots=slots,
source="rule",
)
# Phase 2: LLM-based classification (flexible, handles ambiguity)
return await self._llm_classify(message, context)
async def _llm_classify(
self, message: str, context: list[dict]
) -> IntentResult:
"""Use LLM for intent classification with structured output."""
intent_list = "\n".join(
f"- {name}: {spec['description']}"
for name, spec in self.config.items()
)
prompt = f"""Classify the user's intent from the following message.
Available intents:
{intent_list}
Recent conversation:
{self._format_context(context[-3:])}
Current message: {message}
Respond in JSON:
{{"intent": "<intent_name>", "confidence": <0.0-1.0>, "slots": {{"key": "value"}}}}
"""
response = await self.llm.chat(
messages=[{"role": "user", "content": prompt}],
response_format={"type": "json_object"},
temperature=0,
)
result = json.loads(response.content)
return IntentResult(
intent=result["intent"],
confidence=result["confidence"],
slots=result.get("slots", {}),
source="model",
)
def _format_context(self, messages: list[dict]) -> str:
return "\n".join(
f"{m['role']}: {m['content']}" for m in messages
)
意图配置
# configs/intents.py
INTENT_CONFIG = {
"query_order": {
"description": "Query order status, logistics, delivery time",
"patterns": [
r"(?:我的)?订单.*(?:到哪|状态|物流|快递|发货)",
r"(?:查|看|问).*订单(?P<order_id>\w{10,20})?",
],
"required_slots": ["order_id"],
"handler": "order_query_handler",
},
"request_refund": {
"description": "Request refund or return for a product",
"patterns": [
r"(?:退款|退货|退钱|换货)",
r"不想要了.*(?:怎么退|退掉)",
],
"required_slots": ["order_id", "reason"],
"handler": "refund_handler",
"requires_human": True, # high-risk, needs human approval
},
"product_consult": {
"description": "Ask about product features, specs, availability",
"patterns": [
r"(?:这个|那个)?(?:商品|产品).*(?:怎么样|好不好|有没有)",
],
"required_slots": [],
"handler": "product_rag_handler",
},
"complaint": {
"description": "User complaint about service or product quality",
"patterns": [
r"(?:投诉|差评|太差|垃圾|骗子|举报)",
],
"required_slots": ["complaint_content"],
"handler": "complaint_handler",
"requires_human": True,
"priority": "high",
},
"greeting": {
"description": "General greeting or chitchat",
"patterns": [r"^(?:你好|hi|hello|在吗|嗨)$"],
"required_slots": [],
"handler": "greeting_handler",
},
}
知识库 RAG 引擎
企业知识检索
客服场景的 RAG 和通用 RAG 有几个关键差异:知识更新频率高(促销/政策/库存实时变化)、答案需要精确(不能"大概齐")、需要引用来源(方便人工核实)。
# src/knowledge/rag_engine.py
from dataclasses import dataclass
from datetime import datetime
@dataclass
class KnowledgeChunk:
content: str
source: str
category: str
score: float
updated_at: datetime
metadata: dict
class CustomerServiceRAG:
"""RAG engine optimized for customer service scenarios."""
def __init__(self, vector_store, llm_client, config: dict):
self.vector_store = vector_store
self.llm = llm_client
self.config = config
async def answer(
self,
query: str,
intent: str,
user_context: dict,
) -> dict:
"""Generate answer with knowledge retrieval and citation."""
# Step 1: Query rewriting (handle colloquial expressions)
rewritten = await self._rewrite_query(query, intent)
# Step 2: Hybrid retrieval (semantic + keyword)
chunks = await self._retrieve(rewritten, intent)
# Step 3: Relevance filtering (remove stale/irrelevant results)
filtered = self._filter_chunks(chunks)
if not filtered:
return {
"answer": None,
"confidence": 0,
"fallback": "no_knowledge",
}
# Step 4: Answer generation with citations
answer = await self._generate_answer(query, filtered, user_context)
return answer
async def _rewrite_query(self, query: str, intent: str) -> str:
"""Rewrite colloquial queries into formal search queries."""
prompt = f"""Rewrite the customer query for knowledge base search.
Convert colloquial language to formal terms.
Add relevant keywords based on intent: {intent}
Original: {query}
Rewritten (Chinese, concise):"""
response = await self.llm.chat(
messages=[{"role": "user", "content": prompt}],
temperature=0,
max_tokens=100,
)
return response.content.strip()
async def _retrieve(
self, query: str, intent: str, top_k: int = 5
) -> list[KnowledgeChunk]:
"""Hybrid retrieval: vector similarity + keyword boost."""
# Vector search
vector_results = await self.vector_store.search(
query=query,
top_k=top_k * 2,
filter={"category": {"$in": self._intent_to_categories(intent)}},
)
# Keyword search (for exact matches like policy numbers, SKUs)
keyword_results = await self.vector_store.keyword_search(
query=query,
top_k=top_k,
)
# Merge and deduplicate
seen_ids = set()
merged = []
for chunk in vector_results + keyword_results:
if chunk.id not in seen_ids:
seen_ids.add(chunk.id)
merged.append(chunk)
# Sort by combined score
merged.sort(key=lambda c: c.score, reverse=True)
return merged[:top_k]
def _filter_chunks(self, chunks: list[KnowledgeChunk]) -> list[KnowledgeChunk]:
"""Filter out stale or low-relevance chunks."""
now = datetime.utcnow()
filtered = []
for chunk in chunks:
# Skip low-confidence results
if chunk.score < self.config.get("min_score", 0.6):
continue
# Warn about stale content (but still include)
days_old = (now - chunk.updated_at).days
if days_old > self.config.get("stale_days", 30):
chunk.metadata["stale_warning"] = True
filtered.append(chunk)
return filtered
async def _generate_answer(
self,
query: str,
chunks: list[KnowledgeChunk],
user_context: dict,
) -> dict:
"""Generate answer with explicit citations."""
context_text = "\n\n".join(
f"[Source {i+1}] ({c.source}, updated: {c.updated_at.strftime('%Y-%m-%d')})\n{c.content}"
for i, c in enumerate(chunks)
)
prompt = f"""You are a customer service agent. Answer the customer's question based ONLY on the provided knowledge base.
Rules:
1. Answer in Chinese, professional but friendly tone
2. Cite sources using [Source N] format
3. If the knowledge base doesn't contain the answer, say so explicitly
4. Never fabricate information not in the sources
5. If any source has a stale warning, mention the information may need verification
Customer info: {json.dumps(user_context, ensure_ascii=False)}
Knowledge base:
{context_text}
Customer question: {query}
Answer:"""
response = await self.llm.chat(
messages=[{"role": "user", "content": prompt}],
temperature=0.3,
max_tokens=500,
)
# Extract cited source indices
cited_sources = re.findall(r'\[Source (\d+)\]', response.content)
citations = [
{"source": chunks[int(i)-1].source, "content": chunks[int(i)-1].content[:100]}
for i in cited_sources
if int(i) <= len(chunks)
]
return {
"answer": response.content,
"confidence": min(c.score for c in chunks),
"citations": citations,
"has_stale_content": any(c.metadata.get("stale_warning") for c in chunks),
}
def _intent_to_categories(self, intent: str) -> list[str]:
"""Map intent to knowledge base categories."""
mapping = {
"query_order": ["shipping", "logistics", "order_policy"],
"request_refund": ["refund_policy", "return_policy"],
"product_consult": ["product_info", "specs", "faq"],
"complaint": ["complaint_policy", "escalation"],
}
return mapping.get(intent, ["general"])
多轮对话管理
对话状态机
# src/dialog/state_machine.py
from enum import Enum
from dataclasses import dataclass, field
class DialogPhase(Enum):
GREETING = "greeting"
INTENT_CONFIRM = "intent_confirm"
SLOT_FILLING = "slot_filling"
EXECUTING = "executing"
CLARIFYING = "clarifying"
HUMAN_HANDOFF = "human_handoff"
RESOLVED = "resolved"
CLOSED = "closed"
@dataclass
class DialogState:
session_id: str
phase: DialogPhase = DialogPhase.GREETING
intent: str = ""
slots: dict = field(default_factory=dict)
missing_slots: list = field(default_factory=list)
messages: list = field(default_factory=list)
turn_count: int = 0
created_at: float = 0
metadata: dict = field(default_factory=dict)
class DialogManager:
"""Manage multi-turn dialog state transitions."""
def __init__(self, intent_config: dict, max_turns: int = 20):
self.intent_config = intent_config
self.max_turns = max_turns
async def process_turn(
self,
state: DialogState,
user_message: str,
intent_result: "IntentResult",
) -> tuple[DialogState, str]:
"""Process one turn of conversation, return updated state and action."""
state.turn_count += 1
state.messages.append({"role": "user", "content": user_message})
# Safety: prevent infinite loops
if state.turn_count > self.max_turns:
return state, "force_human_handoff"
# State machine transitions
match state.phase:
case DialogPhase.GREETING:
return self._handle_greeting(state, intent_result)
case DialogPhase.INTENT_CONFIRM:
return self._handle_intent_confirm(state, user_message)
case DialogPhase.SLOT_FILLING:
return self._handle_slot_filling(state, intent_result)
case DialogPhase.EXECUTING:
return self._handle_executing(state)
case DialogPhase.CLARIFYING:
return self._handle_clarifying(state, user_message)
case DialogPhase.RESOLVED:
return self._handle_resolved(state, user_message)
case _:
return state, "error_unknown_phase"
def _handle_greeting(
self, state: DialogState, intent_result: IntentResult
) -> tuple[DialogState, str]:
"""From greeting, determine intent and check required slots."""
if intent_result.intent == "greeting":
return state, "respond_greeting"
state.intent = intent_result.intent
state.slots.update(intent_result.slots)
config = self.intent_config.get(state.intent, {})
required = set(config.get("required_slots", []))
filled = set(state.slots.keys())
state.missing_slots = list(required - filled)
if intent_result.confidence < 0.7:
state.phase = DialogPhase.INTENT_CONFIRM
return state, "confirm_intent"
if state.missing_slots:
state.phase = DialogPhase.SLOT_FILLING
return state, "ask_slot"
state.phase = DialogPhase.EXECUTING
return state, "execute_handler"
def _handle_slot_filling(
self, state: DialogState, intent_result: IntentResult
) -> tuple[DialogState, str]:
"""Collect missing slots from user responses."""
# Update slots from new message
state.slots.update(intent_result.slots)
config = self.intent_config.get(state.intent, {})
required = set(config.get("required_slots", []))
filled = set(state.slots.keys())
state.missing_slots = list(required - filled)
if state.missing_slots:
return state, "ask_slot"
state.phase = DialogPhase.EXECUTING
return state, "execute_handler"
def _handle_executing(self, state: DialogState) -> tuple[DialogState, str]:
"""Execute the handler and transition to resolved or clarifying."""
config = self.intent_config.get(state.intent, {})
if config.get("requires_human"):
state.phase = DialogPhase.HUMAN_HANDOFF
return state, "transfer_to_human"
state.phase = DialogPhase.RESOLVED
return state, "execute_handler"
def _handle_resolved(
self, state: DialogState, user_message: str
) -> tuple[DialogState, str]:
"""After resolution, check if user has follow-up questions."""
# Detect if user has new intent or is confirming resolution
if self._is_satisfaction_signal(user_message):
state.phase = DialogPhase.CLOSED
return state, "close_session"
# New question: reset to greeting phase
state.phase = DialogPhase.GREETING
return state, "reprocess"
def _is_satisfaction_signal(self, message: str) -> bool:
"""Detect if user is satisfied and wants to end conversation."""
signals = ["谢谢", "好的", "知道了", "没问题", "可以了", "解决了", "明白"]
return any(s in message for s in signals)
人工转接
转接时机识别
转接不是"认输",而是服务质量保障的关键环节。需要准确识别何时转接、转给谁、带什么上下文。
# src/handoff/escalation.py
@dataclass
class HandoffDecision:
should_handoff: bool
reason: str
urgency: str # "low" | "medium" | "high" | "critical"
target_group: str # "general" | "refund" | "complaint" | "technical"
context_summary: str
class EscalationEngine:
"""Decide when and how to transfer to human agents."""
# Hard rules: always transfer
HARD_TRANSFER_INTENTS = {"complaint", "request_refund"}
HARD_TRANSFER_KEYWORDS = ["投诉", "315", "工商", "律师", "法院", "起诉"]
# Soft rules: transfer based on conditions
MAX_FAILED_ATTEMPTS = 3
LOW_CONFIDENCE_THRESHOLD = 0.5
MAX_BOT_TURNS = 10
async def evaluate(
self,
state: DialogState,
intent_result: IntentResult,
answer_result: dict,
) -> HandoffDecision:
"""Evaluate whether to transfer to human agent."""
# Rule 1: Hard transfer intents (high-risk operations)
if state.intent in self.HARD_TRANSFER_INTENTS:
return HandoffDecision(
should_handoff=True,
reason=f"High-risk intent: {state.intent}",
urgency="high",
target_group=self._intent_to_group(state.intent),
context_summary=self._build_summary(state),
)
# Rule 2: Sensitive keywords detected
last_message = state.messages[-1]["content"] if state.messages else ""
if any(kw in last_message for kw in self.HARD_TRANSFER_KEYWORDS):
return HandoffDecision(
should_handoff=True,
reason="Sensitive keywords detected",
urgency="critical",
target_group="complaint",
context_summary=self._build_summary(state),
)
# Rule 3: User explicitly requests human
if self._user_requests_human(last_message):
return HandoffDecision(
should_handoff=True,
reason="User explicitly requested human agent",
urgency="medium",
target_group="general",
context_summary=self._build_summary(state),
)
# Rule 4: Bot cannot answer (low confidence / no knowledge)
if answer_result and answer_result.get("confidence", 1.0) < self.LOW_CONFIDENCE_THRESHOLD:
state.metadata["failed_attempts"] = state.metadata.get("failed_attempts", 0) + 1
if state.metadata["failed_attempts"] >= self.MAX_FAILED_ATTEMPTS:
return HandoffDecision(
should_handoff=True,
reason=f"Failed {self.MAX_FAILED_ATTEMPTS} consecutive attempts",
urgency="medium",
target_group=self._intent_to_group(state.intent),
context_summary=self._build_summary(state),
)
# Rule 5: Too many turns without resolution
if state.turn_count > self.MAX_BOT_TURNS:
return HandoffDecision(
should_handoff=True,
reason="Exceeded maximum bot turns",
urgency="low",
target_group="general",
context_summary=self._build_summary(state),
)
return HandoffDecision(
should_handoff=False,
reason="",
urgency="low",
target_group="",
context_summary="",
)
def _user_requests_human(self, message: str) -> bool:
patterns = ["转人工", "人工客服", "真人", "找你们领导", "经理"]
return any(p in message for p in patterns)
def _intent_to_group(self, intent: str) -> str:
mapping = {
"complaint": "complaint",
"request_refund": "refund",
"technical_issue": "technical",
}
return mapping.get(intent, "general")
def _build_summary(self, state: DialogState) -> str:
"""Build concise context summary for human agent."""
return (
f"Intent: {state.intent}\n"
f"Slots: {json.dumps(state.slots, ensure_ascii=False)}\n"
f"Turns: {state.turn_count}\n"
f"Key messages:\n"
+ "\n".join(
f" {m['role']}: {m['content'][:80]}"
for m in state.messages[-5:]
)
)
满意度闭环
数据采集与标注
# src/feedback/satisfaction.py
class SatisfactionCollector:
"""Collect and process customer satisfaction feedback."""
async def collect_feedback(
self,
session_id: str,
state: DialogState,
) -> dict:
"""Collect feedback after session ends."""
# Auto-evaluate session quality
auto_score = self._auto_evaluate(state)
# Store for human review if low quality
feedback_record = {
"session_id": session_id,
"auto_score": auto_score,
"intent": state.intent,
"turn_count": state.turn_count,
"resolved_by": "bot" if state.phase == DialogPhase.CLOSED else "human",
"messages": state.messages,
"slots": state.slots,
"needs_review": auto_score < 3,
}
await self.feedback_store.save(feedback_record)
return feedback_record
def _auto_evaluate(self, state: DialogState) -> int:
"""Auto-evaluate session quality (1-5 scale)."""
score = 5
# Penalty: too many turns
if state.turn_count > 8:
score -= 1
if state.turn_count > 15:
score -= 1
# Penalty: transferred to human
if state.phase == DialogPhase.HUMAN_HANDOFF:
score -= 1
# Penalty: repeated slot-filling attempts
slot_asks = sum(
1 for m in state.messages
if m.get("type") == "ask_slot"
)
if slot_asks > 3:
score -= 1
# Bonus: quick resolution
if state.turn_count <= 3 and state.phase == DialogPhase.CLOSED:
score = min(score + 1, 5)
return max(score, 1)
持续优化指标
| 指标 | 计算方式 | 目标值 | 优化方向 |
|---|---|---|---|
| 首次解决率(FCR) | 首次对话即解决 / 总对话 | > 70% | 知识库覆盖、意图准确率 |
| 机器人解决率 | 机器人独立解决 / 总对话 | > 60% | RAG 质量、对话流设计 |
| 平均对话轮次 | 总轮次 / 总对话 | < 5 轮 | 槽位填充效率、引导话术 |
| 人工转接率 | 转人工对话 / 总对话 | < 30% | 意图覆盖、兜底策略 |
| 用户满意度(CSAT) | 满意评价 / 总评价 | > 85% | 回复质量、响应速度 |
| 首次响应时间 | 用户发消息到机器人回复 | < 2s | 推理延迟优化 |
数据飞轮
┌────────────────────────────────────────────────────────────┐
│ 数据飞轮闭环 │
│ │
│ 用户对话 ──→ 意图/回复质量标注 ──→ 训练数据 │
│ │ │ │
│ │ ▼ │
│ │ 模型微调/RAG 更新 │
│ │ │ │
│ │ ▼ │
│ └──────── 效果评估 ←──── 上线 A/B 测试 │
│ │
│ 周期:每周一次数据标注,每月一次模型更新 │
│ 自动化:低置信度对话自动入标注队列 │
│ 人工:标注团队复核 + 知识库更新 │
└────────────────────────────────────────────────────────────┘
完整对话引擎
主编排器
# src/engine/orchestrator.py
class CustomerServiceAgent:
"""Main orchestrator for the customer service agent."""
def __init__(
self,
classifier: HybridIntentClassifier,
rag: CustomerServiceRAG,
dialog_mgr: DialogManager,
escalation: EscalationEngine,
satisfaction: SatisfactionCollector,
):
self.classifier = classifier
self.rag = rag
self.dialog_mgr = dialog_mgr
self.escalation = escalation
self.satisfaction = satisfaction
self.sessions: dict[str, DialogState] = {}
async def handle_message(
self, session_id: str, message: str, user_context: dict
) -> dict:
"""Handle one user message end-to-end."""
# Get or create session state
state = self.sessions.get(session_id)
if not state:
state = DialogState(session_id=session_id, created_at=time.time())
self.sessions[session_id] = state
# Step 1: Intent classification
intent_result = await self.classifier.classify(
message, state.messages
)
# Step 2: Dialog state management
state, action = await self.dialog_mgr.process_turn(
state, message, intent_result
)
# Step 3: Check escalation
answer_result = None
if action == "execute_handler":
answer_result = await self.rag.answer(
query=message,
intent=state.intent,
user_context=user_context,
)
handoff = await self.escalation.evaluate(state, intent_result, answer_result)
if handoff.should_handoff:
state.phase = DialogPhase.HUMAN_HANDOFF
return {
"type": "handoff",
"message": "正在为您转接人工客服,请稍候...",
"context": handoff.context_summary,
"target_group": handoff.target_group,
"urgency": handoff.urgency,
}
# Step 4: Generate response based on action
response = await self._execute_action(
action, state, answer_result, user_context
)
state.messages.append({"role": "assistant", "content": response["message"]})
# Step 5: Session cleanup
if state.phase == DialogPhase.CLOSED:
await self.satisfaction.collect_feedback(session_id, state)
del self.sessions[session_id]
return response
async def _execute_action(
self, action: str, state: DialogState,
answer_result: dict, user_context: dict
) -> dict:
"""Execute the action determined by dialog manager."""
match action:
case "respond_greeting":
name = user_context.get("name", "")
greeting = f"您好{name}!请问有什么可以帮您?"
return {"type": "text", "message": greeting}
case "confirm_intent":
desc = self.classifier.config[state.intent]["description"]
return {
"type": "text",
"message": f"请问您是想{desc}吗?",
}
case "ask_slot":
slot = state.missing_slots[0]
prompts = {
"order_id": "请提供您的订单号(可在订单详情页查看)",
"reason": "请描述一下具体原因",
"product_name": "请告诉我商品名称",
}
return {
"type": "text",
"message": prompts.get(slot, f"请提供{slot}"),
}
case "execute_handler":
if answer_result and answer_result.get("answer"):
return {
"type": "text",
"message": answer_result["answer"],
"citations": answer_result.get("citations", []),
}
return {
"type": "text",
"message": "抱歉,我暂时无法回答这个问题。需要为您转接人工客服吗?",
}
case "close_session":
return {
"type": "text",
"message": "感谢您的咨询,祝您生活愉快!如有其他问题随时联系我们。",
}
case _:
return {"type": "text", "message": "请稍等,正在为您处理..."}
设计清单
| 检查项 | 要求 | 优先级 |
|---|---|---|
| 意图识别准确率 | > 95%(规则 + 模型混合) | 必需 |
| 知识库时效性 | 支持实时更新,标记过期内容 | 必需 |
| 人工转接 | 敏感场景必须转接,上下文完整传递 | 必需 |
| 多轮状态 | 对话状态机防死循环,最大轮次保护 | 必需 |
| 满意度采集 | 自动评分 + 人工复核 + 数据飞轮 | 必需 |
| 首次响应延迟 | < 2 秒 | 推荐 |
| 兜底策略 | 无法回答时提供明确的下一步(转人工/留言) | 必需 |
| 多渠道适配 | 统一对话引擎,渠道层只做协议转换 | 推荐 |
总结
- 混合意图识别是基础:规则处理确定性模式(关键词/正则),模型处理模糊/复杂意图,两者互补而非替代。
- RAG 质量决定回答质量:客服场景的 RAG 需要特别关注知识时效性、查询改写(口语化 -> 标准化)和引用溯源。
- 对话状态机防止失控:显式的状态转移比"让 LLM 自由发挥"可靠得多,最大轮次保护是必需的安全网。
- 人工转接是服务保障:不是 AI 认输,而是在 AI 能力边界处确保用户体验不降级。转接时必须传递完整上下文。
- 数据飞轮是长期竞争力:低置信度对话自动入标注队列,标注数据反哺模型和知识库,形成持续优化的闭环。
Maurice | maurice_wen@proton.me