RAG Is Not Dead — It Just Grew Up
Every jump in context length brings the same headline: RAG is obsolete, just stuff everything in the prompt. It never quite works out. Long context is a capability, not a strategy — and retrieval-augmented generation has quietly matured into the default way to keep LLM answers grounded, fresh, and affordable.
Why long context didn't replace retrieval
- Cost scales with every token you send — retrieval sends only what matters.
- Models still lose the middle of very long contexts; relevant beats abundant.
- Your data changes; an index updates, a prompt does not.
- Grounding and citations are easier when you control what was retrieved.
What modern RAG actually looks like
The naive version — embed everything, top-k by cosine similarity, stuff into a prompt — is where most teams stop, and where most RAG disappoints. Production RAG in 2026 layers retrieval: hybrid lexical plus semantic search, a reranking pass, and query rewriting so the question you embed is the question worth answering.
Where teams still get it wrong
Chunking is treated as a one-line decision when it determines retrieval quality. Evaluation is skipped, so nobody knows if a change helped. And retrieval failures fail silently — the model confidently answers from nothing. Fix those three and most RAG complaints disappear.
The takeaway
RAG isn't a relic; it's the boring infrastructure that makes LLM features trustworthy. If you're building a grounded assistant or internal copilot and want a retrieval pipeline that holds up, that's the kind of system I design and ship.
Open to select projects
Building something with AI?
I take on select AI engineering projects end-to-end — from React frontend to LLM pipeline on AWS. Tell me what you're building.