Playbook
RAG Systems
Retrieval that stays reliable as your docs change.
A working RAG system is more than embeddings. Treat ingestion, retrieval, and answer assembly as separate, testable layers.
Ingestion + chunking
Preserve structure and provenance.
- Normalize sources (PDF, DOCX, HTML) into a stable schema
- Chunk by headings or semantic boundaries, not fixed length
- Store metadata for ownership, access, and timestamps
- Version documents so you can roll back or compare
Retrieval strategy
Get the right context before generation.
- Hybrid search (BM25 + vector) beats pure embeddings
- Use metadata filters and access control in retrieval
- Rerank top results with a lightweight model
- Cache frequent queries and keep a freshness window
Answer assembly
Citations are not optional in production.
- Prompt with explicit citation requirements
- Refuse when sources are missing or low confidence
- Use a strict answer schema to avoid drift
Failure modes
- Stale or missing docs leading to hallucinations
- Overfetching irrelevant context
- Conflicting sources without disambiguation
- No visibility into retrieval quality
Checklist
- Test set that covers top queries and edge cases
- Retrieval quality dashboard
- Citation enforcement in prompts
- Access filters at retrieval time