AI 错误处理 UX 设计

原创灵阙教研团队

S 精选进阶 | 约 10 分钟阅读更新于 2026-02-28

AI 导读

AI 错误处理 UX 设计优雅降级、重试模式与信心指标：当 AI 出错时如何保持用户信任 AI 产品的错误是常态传统软件的错误率可以做到 0.01%，但 AI 产品的"不完美"是结构性的。模型会产生幻觉，推理会超时，置信度会波动。接受这个事实，然后设计出让用户在 AI 出错时仍然信任产品的体验——这是 AI 产品 UX 的核心竞争力。一、AI 错误的完整分类学 1.1 错误维度矩阵维度...

AI 错误处理 UX 设计

优雅降级、重试模式与信心指标：当 AI 出错时如何保持用户信任

AI 产品的错误是常态

传统软件的错误率可以做到 0.01%，但 AI 产品的"不完美"是结构性的。模型会产生幻觉，推理会超时，置信度会波动。接受这个事实，然后设计出让用户在 AI 出错时仍然信任产品的体验——这是 AI 产品 UX 的核心竞争力。

一、AI 错误的完整分类学

1.1 错误维度矩阵

维度	可检测性	影响程度	恢复成本
硬错误（系统崩溃）	高	阻塞	低（重试）
超时（响应过慢）	高	阻塞	低（等待/重试）
幻觉（事实错误）	中-低	高	中（验证成本）
低质量（回答不好）	低	中	中（重新生成）
偏差（有偏见）	极低	高	高（需人工审核）
安全拦截（内容审核）	高	低	低（改写提问）
能力越界（不会）	中	中	低（引导替代方案）

1.2 用户视角的错误体验

用户期望               AI 实际表现             感知类型
───────────────────────────────────────────────────
"给我答案"         ->  系统无响应          ->  "坏了"
"给我答案"         ->  等了 30 秒才出来    ->  "太慢"
"给我正确答案"     ->  给了错误信息        ->  "不靠谱"
"给我好的回答"     ->  回答质量差          ->  "不够好"
"帮我做这件事"     ->  说"我做不到"        ->  "没用"
"让我问这个"       ->  被拦截              ->  "限制太多"

二、优雅降级 UI（Graceful Degradation）

2.1 降级链设计

                  主模型 (Premium)
                       │
                  ┌────┴────┐
                  │  失败?   │
                  └────┬────┘
                       ▼
               备选模型 (Balanced)
                       │
                  ┌────┴────┐
                  │  失败?   │
                  └────┬────┘
                       ▼
                缓存匹配 (Similar Query)
                       │
                  ┌────┴────┐
                  │  无命中?  │
                  └────┬────┘
                       ▼
               规则引擎 (Deterministic)
                       │
                  ┌────┴────┐
                  │  无规则?  │
                  └────┬────┘
                       ▼
                人工兜底 (Human Escalation)

2.2 降级透明度

降级策略	是否告知用户	告知方式
主 -> 备选模型	可不告知	仅当质量有明显差异时标注
模型 -> 缓存	应告知	"基于相似问题的缓存回答"
AI -> 规则引擎	应告知	"基于预设规则的回答"
AI -> 人工	必须告知	"已转接人工客服"

2.3 降级 UI 组件

function DegradedResponse({
  response,
  degradationLevel,
}: {
  response: string;
  degradationLevel: 'primary' | 'fallback' | 'cached' | 'rule' | 'human';
}) {
  const labels = {
    primary: null,  // No label for normal responses
    fallback: { text: 'Fast Mode', color: 'blue', icon: 'zap' },
    cached: { text: 'Cached', color: 'yellow', icon: 'clock' },
    rule: { text: 'Standard', color: 'gray', icon: 'book' },
    human: { text: 'Human', color: 'green', icon: 'user' },
  };

  const label = labels[degradationLevel];

  return (
    <div className="ai-response">
      {label && (
        <span className={`badge badge-${label.color} mb-2`}>
          {label.text}
        </span>
      )}
      <div className="prose">{response}</div>
      {degradationLevel !== 'primary' && (
        <button className="text-sm text-blue-500 mt-2">
          Retry with full AI
        </button>
      )}
    </div>
  );
}

三、重试模式（Retry Patterns）

3.1 重试策略对比

策略	触发方式	用户感知	适用场景
自动重试	系统自动	无感/轻微延迟	瞬时网络错误
一键重试	用户点击	简单操作	超时/质量不满意
修改后重试	用户编辑输入	需要思考	理解偏差
换模型重试	用户选择	探索性	模型能力不匹配
拆分重试	系统建议	协作感	问题太复杂

3.2 后端自动重试

import asyncio
from typing import AsyncGenerator

async def resilient_generate(
    prompt: str,
    models: list[str] = ["gpt-4", "gpt-4o-mini", "claude-3-haiku"],
    max_retries: int = 3,
    timeout: float = 30.0
) -> AsyncGenerator[str, None]:
    """Generate with automatic retry and model fallback."""

    for model_idx, model in enumerate(models):
        for attempt in range(max_retries):
            try:
                async with asyncio.timeout(timeout):
                    stream = await call_model(model, prompt, stream=True)
                    async for token in stream:
                        yield token
                    return  # Success, exit entirely

            except asyncio.TimeoutError:
                if attempt < max_retries - 1:
                    yield f"\n[Retrying... attempt {attempt + 2}]\n"
                    await asyncio.sleep(0.5 * (attempt + 1))  # Backoff
                elif model_idx < len(models) - 1:
                    yield f"\n[Switching to faster model...]\n"
                    break  # Try next model

            except Exception as e:
                if is_retryable(e) and attempt < max_retries - 1:
                    await asyncio.sleep(0.5 * (attempt + 1))
                elif model_idx < len(models) - 1:
                    break  # Try next model
                else:
                    yield f"\n[Unable to generate response. Error: {classify_error(e)}]\n"
                    return

    yield "\n[All models unavailable. Please try again later.]\n"

3.3 前端重试交互

失败时显示:
┌──────────────────────────────────────────┐
│                                          │
│  (!) 响应生成失败                          │
│                                          │
│  可能原因: 服务暂时繁忙                    │
│                                          │
│  [重试]  [换个问法]  [使用快速模式]        │
│                                          │
│  或者尝试简化你的问题:                     │
│  原问题: "分析过去三年所有增值税发票的      │
│          合规性并生成对比报告"               │
│                                          │
│  建议拆分为:                               │
│  1. [先分析 2024 年的发票]                 │
│  2. [再分析 2023 年的发票]                 │
│  3. [最后生成对比报告]                     │
│                                          │
└──────────────────────────────────────────┘

四、置信度指示器（Confidence Indicators）

4.1 置信度可视化方案

方案	视觉形式	优点	缺点
数字百分比	"92%"	精确	用户不理解含义
颜色编码	绿/黄/红背景	直觉	过于简化
图标标注	勾号/问号/叹号	清晰	空间占用
文字标签	"高可信度"	易懂	不够精确
进度条	填充条	直观	可能误导
混合方案	颜色 + 文字 + 图标	最完整	视觉噪音

4.2 推荐方案：分级标签

高置信（> 90%）:
  ┌──────────────────────────────────────┐
  │ [Verified] 该发票税率为 13%           │
  │ 来源: 增值税暂行条例第二条             │
  └──────────────────────────────────────┘

中置信（70-90%）:
  ┌──────────────────────────────────────┐
  │ [Review] 该交易可能适用 9% 税率       │
  │ 建议: 核实具体商品分类                 │
  │ [查看详细分析]                         │
  └──────────────────────────────────────┘

低置信（< 70%）:
  ┌──────────────────────────────────────┐
  │ [Uncertain] AI 无法确定合适税率       │
  │ 原因: 商品描述不够具体                 │
  │ [补充信息] [转人工审核]               │
  └──────────────────────────────────────┘

4.3 置信度计算

def compute_display_confidence(
    model_confidence: float,
    source_quality: float,
    query_complexity: float,
    historical_accuracy: float
) -> dict:
    """
    Compute user-facing confidence from multiple signals.

    Returns:
      {"level": "high|medium|low", "score": 0.0-1.0, "label": str, "action": str}
    """
    # Weighted composite score
    score = (
        model_confidence * 0.3 +
        source_quality * 0.25 +
        (1 - query_complexity) * 0.2 +
        historical_accuracy * 0.25
    )

    if score >= 0.90:
        return {
            "level": "high",
            "score": score,
            "label": "Verified",
            "action": "direct_display"
        }
    elif score >= 0.70:
        return {
            "level": "medium",
            "score": score,
            "label": "Review Suggested",
            "action": "show_with_caveat"
        }
    else:
        return {
            "level": "low",
            "score": score,
            "label": "Uncertain",
            "action": "require_confirmation"
        }

五、用户反馈收集（User Feedback Collection）

5.1 反馈类型层级

Level 1: 二元反馈（最低摩擦）
  [Thumbs Up] [Thumbs Down]

Level 2: 分类反馈（中等摩擦）
  [Accurate] [Partially Accurate] [Inaccurate]
  + Optional: Why? [Factual Error] [Incomplete] [Irrelevant] [Other]

Level 3: 详细反馈（高摩擦，高价值）
  Free text: "What was wrong with this response?"
  + Selection of specific incorrect parts

Level 4: 专家标注（最高价值）
  Provide the correct answer
  + Evidence/source

5.2 反馈触发时机

时机	方式	适用场景
回答结束	内联按钮	所有回答
用户重新生成	弹出面板	质量不满意
用户复制	微提示	正面信号确认
用户编辑后使用	自动记录	部分正确
定期抽样	弹窗问卷	系统性评估
会话结束	满意度评分	整体体验

5.3 反馈组件实现

function FeedbackWidget({
  messageId,
  onFeedback,
}: {
  messageId: string;
  onFeedback: (data: FeedbackData) => void;
}) {
  const [step, setStep] = useState<'initial' | 'detail' | 'done'>('initial');
  const [rating, setRating] = useState<'positive' | 'negative' | null>(null);

  if (step === 'done') {
    return (
      <p className="text-sm text-gray-400">
        Thank you for your feedback
      </p>
    );
  }

  return (
    <div className="flex flex-col gap-2">
      {step === 'initial' && (
        <div className="flex gap-2 items-center">
          <span className="text-sm text-gray-500">Was this helpful?</span>
          <button
            onClick={() => { setRating('positive'); setStep('done'); onFeedback({ messageId, rating: 'positive' }); }}
            className="p-1 hover:bg-green-50 rounded"
          >
            [+]
          </button>
          <button
            onClick={() => { setRating('negative'); setStep('detail'); }}
            className="p-1 hover:bg-red-50 rounded"
          >
            [-]
          </button>
        </div>
      )}

      {step === 'detail' && (
        <div className="bg-gray-50 p-3 rounded-lg">
          <p className="text-sm font-medium mb-2">What went wrong?</p>
          {['Factual error', 'Incomplete', 'Not relevant', 'Too slow', 'Other'].map(reason => (
            <button
              key={reason}
              onClick={() => {
                onFeedback({ messageId, rating: 'negative', reason });
                setStep('done');
              }}
              className="block text-sm text-left px-3 py-1.5 hover:bg-gray-100 rounded w-full"
            >
              {reason}
            </button>
          ))}
        </div>
      )}
    </div>
  );
}

六、错误恢复流程（Error Recovery Flows）

6.1 错误恢复决策树

AI 输出异常
    │
    ├── 用户未察觉?
    │     └── 自动修正 (如拼写/格式问题)
    │
    ├── 用户察觉但可继续?
    │     └── 内联提示 + 修正建议
    │           "AI 对此回答不太确定，已标注需要确认的部分"
    │
    ├── 用户无法继续?
    │     ├── 可重试?
    │     │     └── 重试按钮 + 替代方案
    │     │
    │     └── 不可重试?
    │           └── 人工升级 + 离线处理
    │
    └── 影响其他功能?
          └── 隔离故障 + 全局提示
               "部分功能暂时不可用，其他功能正常"

6.2 幻觉检测与处理

def detect_and_handle_hallucination(response: str, context: dict) -> dict:
    """Post-processing layer for hallucination detection."""

    checks = {
        "factual_consistency": check_against_knowledge_base(response, context),
        "self_consistency": check_self_contradiction(response),
        "source_grounding": check_citation_validity(response, context.get("sources", [])),
        "numeric_sanity": check_numeric_claims(response),
    }

    risk_score = sum(c["risk"] for c in checks.values()) / len(checks)

    if risk_score > 0.7:
        return {
            "action": "block",
            "message": "AI response contained potential inaccuracies. Regenerating...",
            "auto_regenerate": True
        }
    elif risk_score > 0.4:
        return {
            "action": "warn",
            "message": "Some parts of this response may need verification",
            "highlights": [c["span"] for c in checks.values() if c["risk"] > 0.5]
        }
    else:
        return {
            "action": "pass",
            "confidence": 1 - risk_score
        }

6.3 错误恢复 UI 状态机

Normal -> Error -> Recovery -> Normal
  │                  │
  │                  ├── Retry Success -> Normal
  │                  ├── Retry Fail -> Degraded Mode
  │                  ├── User Edit -> Retry
  │                  └── Escalate -> Human Queue
  │
  └── Degraded -> Partial Recovery -> Normal

七、全局错误监控

7.1 错误指标看板

指标	定义	阈值	告警级别
错误率	失败请求 / 总请求	< 1%	P1 (> 5%), P2 (> 1%)
幻觉率	检测到的幻觉 / 总回答	< 3%	P2 (> 5%)
重试率	用户点重试 / 总回答	< 10%	P3 (> 15%)
负面反馈率	差评 / 有反馈的回答	< 15%	P3 (> 25%)
降级率	降级回答 / 总回答	< 5%	P2 (> 10%)
MTTR	从错误到恢复的平均时间	< 2min	P2 (> 5min)

设计检查清单

Error Handling UX Checklist:

Graceful Degradation:
  [ ] 降级链已配置（至少 3 层）
  [ ] 降级时用户有知情权
  [ ] 降级回答有"重试完整版"入口

Retry:
  [ ] 瞬时错误自动重试（用户无感）
  [ ] 超时有明确提示 + 手动重试按钮
  [ ] 复杂问题有"拆分建议"
  [ ] 重试间有指数退避

Confidence:
  [ ] 低置信回答有明确标注
  [ ] 高置信和低置信视觉差异明显
  [ ] 低置信提供"补充信息"或"转人工"入口

Feedback:
  [ ] 每条回答有反馈入口
  [ ] 负面反馈有分类选项
  [ ] 反馈数据接入模型改进管道

Recovery:
  [ ] 幻觉检测已部署
  [ ] 错误消息包含"为什么 + 怎么办"
  [ ] 人工升级通道可用

Monitoring:
  [ ] 错误率 / 幻觉率 / 重试率 实时监控
  [ ] 告警阈值已配置
  [ ] 每周错误分析报告已自动化

总结

AI 错误处理 UX 的核心哲学：

不是让 AI 不出错（做不到），
而是在 AI 出错时，用户仍然觉得这个产品值得信赖。

信任 = 透明度 x 恢复速度 x 持续改进

  透明度:  坦诚告知错误，不掩盖不粉饰
  恢复速度: 一键重试，自动降级，快速恢复
  持续改进: 反馈闭环，每次错误都让产品变得更好

Maurice | maurice_wen@proton.me