Playbook
Observability
Know what your system actually did.
AI systems require deeper telemetry than standard apps. You need traces, redaction, and lineage.
Tracing
- Capture prompt, tools, and latency per step
- Correlate requests with user and source data
- Sample safely to manage cost
Logging
- Structured logs for model calls
- Store input/output hashes for audit
- Tag runs with model and prompt version
Redaction
- Remove PII before storage
- Centralize redaction rules
- Audit redaction coverage regularly
Cost + latency
- Set budgets per request
- Cache frequent paths
- Alert on spikes
Checklist
- Tracing dashboard
- Redaction rules documented
- Prompt/model version tagging