AI 产品指标看板设计

原创灵阙教研团队

S 精选进阶 | 约 9 分钟阅读更新于 2026-02-28

AI 导读

AI 产品指标看板设计从 DAU 到 Cost-per-Query：构建 AI 产品的数据可观测体系为什么 AI 产品需要专属指标体系传统 SaaS 产品的核心指标是 DAU、留存率、转化率。AI 产品除了这些，还必须追踪模型质量和推理成本两个独特维度。一个 DAU 增长 50% 但推理成本增长 200% 的 AI 产品，可能正在走向死亡。本文覆盖指标设计、看板布局、告警阈值、数据管道和...

AI 产品指标看板设计

从 DAU 到 Cost-per-Query：构建 AI 产品的数据可观测体系

为什么 AI 产品需要专属指标体系

传统 SaaS 产品的核心指标是 DAU、留存率、转化率。AI 产品除了这些，还必须追踪模型质量和推理成本两个独特维度。一个 DAU 增长 50% 但推理成本增长 200% 的 AI 产品，可能正在走向死亡。

本文覆盖指标设计、看板布局、告警阈值、数据管道和 Grafana/Metabase 落地实践。

一、AI 产品指标分层框架

1.1 四层指标模型

┌──────────────────────────────────────────────────┐
│  Layer 1: 业务指标（Business Metrics）             │
│  DAU/MAU, Revenue, Conversion, Churn              │
│  -> 回答: 产品有没有商业价值？                      │
├──────────────────────────────────────────────────┤
│  Layer 2: 产品指标（Product Metrics）              │
│  Session Duration, Feature Usage, Task Success    │
│  -> 回答: 用户在用什么？用得好吗？                   │
├──────────────────────────────────────────────────┤
│  Layer 3: AI 质量指标（AI Quality Metrics）        │
│  Accuracy, Latency, Hallucination Rate, CSAT      │
│  -> 回答: AI 够好吗？在变好还是变差？                │
├──────────────────────────────────────────────────┤
│  Layer 4: 基础设施指标（Infra Metrics）            │
│  Cost/Query, GPU Util, Error Rate, Throughput     │
│  -> 回答: 系统健康吗？钱花得值吗？                   │
└──────────────────────────────────────────────────┘

1.2 核心指标矩阵

指标	层级	采集方式	刷新频率	健康阈值
DAU/MAU	L1	事件追踪	实时	DAU/MAU > 25%
付费转化率	L1	支付事件	日	> 3%
月流失率	L1	订阅状态	月	< 5%
会话完成率	L2	事件追踪	实时	> 80%
功能采纳率	L2	事件追踪	周	Top 3 功能 > 60%
AI 准确率	L3	人工评审 + 自动评估	日	> 90%
平均延迟	L3	APM	实时	P95 < 5s
幻觉率	L3	自动检测 + 人工抽样	日	< 3%
CSAT	L3	用户反馈	周	> 4.0/5.0
Cost/Query	L4	计费 API	实时	< ¥0.10
GPU 利用率	L4	监控 Agent	实时	60-85%
错误率	L4	日志聚合	实时	< 0.5%

二、看板布局设计

2.1 Executive Dashboard（高管视图）

一屏展示最关键的 6-8 个指标，30 秒内看完全局：

┌─────────────────────────────────────────────────────────┐
│  AI Product Executive Dashboard           2026-02-28    │
├─────────────┬───────────────┬───────────────────────────┤
│  DAU         │  Revenue       │  AI Quality Score        │
│  12,847      │  ¥485,200      │  ████████░░  82/100     │
│  +12% WoW   │  +8% MoM       │  +3 pts MoM             │
├─────────────┼───────────────┼───────────────────────────┤
│  Retention   │  Cost/Query    │  CSAT                    │
│  D7: 45%     │  ¥0.067        │  4.2 / 5.0              │
│  D30: 28%    │  -15% MoM      │  +0.1 MoM               │
├─────────────┴───────────────┴───────────────────────────┤
│  [7-Day Trend: DAU]   ▁▂▃▃▅▆█                          │
│  [7-Day Trend: Rev]   ▃▃▄▅▅▆▇                          │
│  [7-Day Trend: CSAT]  ▅▅▆▆▆▇▇                          │
├─────────────────────────────────────────────────────────┤
│  Active Alerts: 1 WARNING (P95 latency > 4s)           │
└─────────────────────────────────────────────────────────┘

2.2 Operations Dashboard（运营视图）

聚焦用户行为和产品使用情况：

┌─────────────────────────────────────────────────────────┐
│  Operations Dashboard                                    │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  [User Funnel]                                           │
│  Visit -> Signup -> Activate -> Retain -> Pay            │
│  100%  -> 22%    -> 68%      -> 45%    -> 8%            │
│                                                          │
│  [Feature Usage Heatmap]                                 │
│  Chat:           ████████████████  82%                   │
│  Doc Analysis:   ██████████░░░░░░  55%                   │
│  Report Gen:     ████████░░░░░░░░  42%                   │
│  API Access:     ████░░░░░░░░░░░░  18%                   │
│                                                          │
│  [Session Quality Distribution]                          │
│  Excellent (>0.8):  ████████░░  35%                      │
│  Good (0.5-0.8):    ██████████  45%                      │
│  Poor (<0.5):       ████░░░░░░  20%                      │
│                                                          │
│  [Top User Queries This Week]                            │
│  1. 发票合规检查 (2,847)                                  │
│  2. 税率计算 (1,923)                                      │
│  3. 报表生成 (1,456)                                      │
│                                                          │
└─────────────────────────────────────────────────────────┘

2.3 AI Quality Dashboard（模型质量视图）

这是 AI 产品独有的看板：

┌─────────────────────────────────────────────────────────┐
│  AI Quality Dashboard                                    │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  [Model Performance by Category]                         │
│  Category          Accuracy  Latency  Hallucination     │
│  Tax Classification  94.2%    1.2s     1.8%             │
│  Invoice Parsing     91.7%    2.3s     2.5%             │
│  Compliance Check    88.5%    3.8s     3.2%             │
│  Report Generation   86.3%    5.1s     4.1%             │
│                                                          │
│  [Quality Trend (30 Days)]                               │
│  Accuracy:   ▁▂▂▃▃▃▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇█████           │
│  Latency:    █▇▇▆▆▆▅▅▅▅▄▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▁           │
│                                                          │
│  [User Feedback Distribution]                            │
│  Thumbs Up:    ████████████  72%                         │
│  Thumbs Down:  ████░░░░░░░░  15%                         │
│  Regenerated:  ███░░░░░░░░░  13%                         │
│                                                          │
│  [Hallucination Detection]                               │
│  Auto-detected:   45 / day                               │
│  User-reported:   12 / day                               │
│  False positive:  8%                                     │
│                                                          │
└─────────────────────────────────────────────────────────┘

三、告警阈值设计

3.1 分级告警策略

级别	条件	通知方式	响应时间
P0 Critical	服务完全不可用 / 数据泄露	电话 + 短信 + 钉钉	5 分钟
P1 High	准确率骤降 > 10% / 错误率 > 5%	短信 + 钉钉	15 分钟
P2 Medium	延迟 P95 > 8s / Cost 异常 > 50%	钉钉 + 邮件	1 小时
P3 Low	指标轻微偏离 / 趋势预警	邮件 + 日报	24 小时

3.2 AI 专属告警规则

# alerting-rules.yaml
alerts:
  - name: accuracy_drop
    metric: ai.accuracy.rolling_24h
    condition: decrease > 5% compared to 7-day avg
    severity: P1
    message: "AI accuracy dropped {value}% in last 24h"

  - name: hallucination_spike
    metric: ai.hallucination.rate.1h
    condition: value > 5%
    severity: P1
    message: "Hallucination rate spiked to {value}%"

  - name: cost_anomaly
    metric: infra.cost_per_query.1h
    condition: value > 2x of 7-day avg
    severity: P2
    message: "Cost per query anomaly: {value} (avg: {avg})"

  - name: latency_degradation
    metric: ai.latency.p95.5m
    condition: value > 8000  # milliseconds
    severity: P2
    message: "P95 latency: {value}ms"

  - name: feedback_negative_surge
    metric: ai.feedback.negative_rate.1h
    condition: value > 25%
    severity: P2
    message: "Negative feedback rate: {value}%"

  - name: model_drift
    metric: ai.distribution.kl_divergence.daily
    condition: value > 0.15
    severity: P3
    message: "Model input distribution drift detected: KL={value}"

四、数据管道架构

4.1 端到端数据流

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  Client   │    │  API     │    │  Stream  │    │  Storage │
│  SDK      │───>│  Gateway │───>│  Kafka   │───>│  ClickH. │
│           │    │          │    │          │    │          │
│  Events:  │    │  Enrich: │    │  Topics: │    │  Tables: │
│  - click  │    │  - user  │    │  - events│    │  - events│
│  - query  │    │  - geo   │    │  - metrics│   │  - metrics│
│  - feedback│   │  - device│    │  - logs  │    │  - agg   │
│  - timing │    │  - session│   │          │    │          │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
                                                      │
                                                      ▼
                                              ┌──────────────┐
                                              │   Dashboard   │
                                              │  Grafana /    │
                                              │  Metabase     │
                                              └──────────────┘

4.2 事件追踪 Schema

interface AIEvent {
  // Standard fields
  event_id: string;           // UUID
  timestamp: string;          // ISO 8601
  user_id: string;
  session_id: string;
  event_type: string;         // "query" | "feedback" | "action" | "error"

  // AI-specific fields
  model_id: string;           // "gpt-4" | "claude-3" | "custom-v2"
  prompt_tokens: number;
  completion_tokens: number;
  latency_ms: number;
  cost_cents: number;         // Cost in cents (USD/RMB)

  // Quality fields
  confidence_score: number;   // 0.0 - 1.0
  hallucination_detected: boolean;
  user_feedback: "positive" | "negative" | "neutral" | null;
  regeneration_count: number;

  // Context
  feature: string;            // "chat" | "doc_analysis" | "report"
  input_type: string;         // "text" | "file" | "image"
  output_type: string;        // "text" | "table" | "chart"

  // Metadata
  metadata: Record<string, unknown>;
}

4.3 聚合查询示例

-- Daily AI quality metrics
SELECT
    toDate(timestamp) AS date,
    model_id,
    feature,
    count() AS total_queries,
    avg(latency_ms) AS avg_latency,
    quantile(0.95)(latency_ms) AS p95_latency,
    avg(confidence_score) AS avg_confidence,
    countIf(hallucination_detected) / count() AS hallucination_rate,
    countIf(user_feedback = 'positive') /
        nullIf(countIf(user_feedback IS NOT NULL), 0) AS positive_rate,
    sum(cost_cents) / 100.0 AS total_cost_yuan,
    sum(cost_cents) / count() / 100.0 AS cost_per_query_yuan
FROM ai_events
WHERE event_type = 'query'
  AND timestamp >= today() - INTERVAL 30 DAY
GROUP BY date, model_id, feature
ORDER BY date DESC, total_queries DESC;

五、Grafana 落地实践

5.1 Dashboard 组织结构

Grafana Folder Structure:
  AI Product/
    ├── Executive Overview          # 高管看板
    ├── User & Product Metrics      # 用户与产品指标
    ├── AI Quality Monitoring       # AI 质量监控
    ├── Cost & Infrastructure       # 成本与基础设施
    └── Alerts & Incidents          # 告警与事件

5.2 关键面板配置

{
  "dashboard": {
    "title": "AI Quality Monitoring",
    "panels": [
      {
        "title": "Accuracy by Feature (7-Day Rolling)",
        "type": "timeseries",
        "datasource": "ClickHouse",
        "targets": [{
          "rawSql": "SELECT toStartOfHour(timestamp) AS time, feature, avg(confidence_score) AS accuracy FROM ai_events WHERE timestamp >= now() - INTERVAL 7 DAY GROUP BY time, feature ORDER BY time"
        }],
        "fieldConfig": {
          "defaults": {
            "min": 0.7,
            "max": 1.0,
            "thresholds": {
              "steps": [
                { "value": 0.85, "color": "red" },
                { "value": 0.90, "color": "yellow" },
                { "value": 0.95, "color": "green" }
              ]
            }
          }
        }
      },
      {
        "title": "Cost per Query (Hourly)",
        "type": "stat",
        "datasource": "ClickHouse",
        "targets": [{
          "rawSql": "SELECT sum(cost_cents)/count()/100.0 AS cost FROM ai_events WHERE event_type='query' AND timestamp >= now() - INTERVAL 1 HOUR"
        }]
      }
    ]
  }
}

六、Metabase 业务分析设置

6.1 适用场景对比

维度	Grafana	Metabase
定位	实时监控 + 告警	业务分析 + 自助查询
用户	工程师 / SRE	产品经理 / 运营 / 管理层
数据刷新	秒级	分钟级
可视化	时序图为主	表格/漏斗/地图
告警	原生支持	有限支持
自助查询	需 SQL	可视化拖拽
推荐用法	L3/L4 指标	L1/L2 指标

6.2 Metabase 核心 Question 配置

Saved Questions:
  1. "Daily Active Users Trend"
     - Table: user_sessions
     - Group by: date, user_type
     - Visualization: Line chart

  2. "Feature Usage Breakdown"
     - Table: ai_events
     - Filter: event_type = 'query'
     - Group by: feature
     - Visualization: Bar chart

  3. "Conversion Funnel"
     - Custom SQL with CTE
     - Steps: Visit -> Signup -> First Query -> 10th Query -> Paid
     - Visualization: Funnel

  4. "Cost Analysis by Model"
     - Table: ai_events
     - Group by: model_id, week
     - Metrics: total_cost, avg_cost_per_query, total_queries
     - Visualization: Pivot table

七、指标驱动决策框架

7.1 常见决策场景

场景	看什么指标	决策标准
是否上线新模型	Accuracy + Latency + Cost	Accuracy >= 当前, Latency <= 1.5x, Cost <= 2x
是否推广新功能	Feature Usage + CSAT + Retention Impact	Day 7 Retention 提升 > 2%
是否调整定价	Conversion + Churn + Revenue	Revenue +15% AND Churn < +2%
是否降级模型	Cost/Query + Accuracy Drop	Cost 下降 > 30% AND Accuracy 下降 < 3%
是否扩容	GPU Util + P95 Latency + Error Rate	GPU > 80% 或 P95 > 5s

7.2 A/B 测试框架

# AI-specific A/B test configuration
AB_TEST_CONFIG = {
    "model_comparison": {
        "control": "gpt-4-turbo",
        "treatment": "claude-3-opus",
        "metrics": {
            "primary": "user_satisfaction_score",
            "secondary": ["accuracy", "latency_p95", "cost_per_query"],
            "guardrail": ["hallucination_rate", "error_rate"]
        },
        "split": 50,  # 50/50 split
        "min_sample": 1000,  # queries per arm
        "duration_days": 14,
        "success_criteria": {
            "primary_lift": 0.05,     # 5% improvement
            "guardrail_max_increase": 0.01  # No more than 1% increase
        }
    }
}

总结

AI 产品指标体系的核心原则：

四层分明 —— 业务/产品/AI 质量/基础设施各司其职
AI 独有指标不可缺 —— 准确率、幻觉率、Cost/Query 是 AI 产品的生命线
两套系统互补 —— Grafana 管监控告警，Metabase 管业务分析
告警分级响应 —— P0 电话叫人，P3 日报提醒，不一刀切
数据驱动决策 —— 每个决策场景都有对应的指标组合和判断标准

指标不是目的，决策才是。建设看板的终极目标是让团队在 30 秒内看到问题，5 分钟内定位原因，1 小时内推动修复。

Maurice | maurice_wen@proton.me