Playbook

Observability

Know what your system actually did.

AI systems require deeper telemetry than standard apps. You need traces, redaction, and lineage.

Tracing

  • Capture prompt, tools, and latency per step
  • Correlate requests with user and source data
  • Sample safely to manage cost

Logging

  • Structured logs for model calls
  • Store input/output hashes for audit
  • Tag runs with model and prompt version

Redaction

  • Remove PII before storage
  • Centralize redaction rules
  • Audit redaction coverage regularly

Cost + latency

  • Set budgets per request
  • Cache frequent paths
  • Alert on spikes

Checklist

  • Tracing dashboard
  • Redaction rules documented
  • Prompt/model version tagging