RAG is Just Getting Started: Key Insights from Our Optimization & Evaluation Series

If you think "RAG is dead," think again. Benjamin Clavié and I compiled annotated notes from our RAG optimization and evaluation series, highlighting why Retrieval-Augmented Generation (RAG) is only beginning to realize its potential. We challenge the status quo of single dense vector representations and traditional IR metrics, arguing for richer, more nuanced approaches. Key takeaways include: the need for new evaluation metrics focused on coverage and diversity; the ability of retrieval models to reason and follow complex instructions; the superiority of late-interaction models that preserve token-level detail; and the importance of using multiple, specialized representations rather than searching for a one-size-fits-all embedding. Dive into our series to see why RAG’s future is brighter than ever.

1. RAG Optimization & Evaluation Series: Why RAG is Far From Dead

If you believe "RAG is Dead", this post is for you. Benjamin Clavié and I assembled annotated notes from our RAG optimization & evaluation series, where we take you through many reasons why RAG is just getting started (single dense vector representations are quite naive)

Links in comment

Overview of the series and takeaways

We’ve been measuring wrong. Nandan Thakur showed that traditional IR metrics optimize for finding the #1 result. RAG needs different goals: coverage (getting all the facts), diversity (corroborating facts), and relevance. Models that ace BEIR benchmarks often fail at real RAG tasks.
Retrieval can reason. Orion Weller 's models understand instructions like “find documents about data privacy using metaphors.” His Rank1 system generates explicit reasoning traces about relevance. These models find documents that traditional systems never surface.
Single vectors lose information. Antoine Chaffin demonstrated how late-interaction models like ColBERT preserve token-level information. No more forcing everything into one conflicted representation. Result: 150M parameter models outperforming 7B parameter alternatives on reasoning tasks.
One map isn’t enough. Bryan Bischof and Ayush Chaurasia showed why we need multiple representations. Their art search demo finds the same painting through literal descriptions, poetic interpretations, or similar images—each using different indices. Stop searching for the perfect embedding. Build specialized representations and route intelligently.

👉 Check this out