The most surprising thing about RAG is that retrieval, the very foundation of RAG, is often its weakest link, and we’ve been largely ignoring it.

Let’s see this in action. Imagine a chatbot answering questions about a company’s internal documentation. Without RAG, it might hallucinate or give generic answers. With basic RAG, it retrieves relevant documents and synthesizes an answer.

# Basic RAG setup (conceptual)
from some_rag_library import RetrievalQA, DocumentStore, Retriever, LLM

# Assume document_store, retriever, and llm are initialized
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff"
)

question = "What is our policy on remote work?"
response = qa_chain.run(question)
print(response)

This works reasonably well when the retriever is confident. But what happens when the retrieved documents are only partially relevant, or worse, subtly misleading? The LLM, trusting the retrieved context, might still produce a plausible-sounding but incorrect answer. This is where Corrective RAG (CRAG) steps in. CRAG doesn’t just retrieve; it evaluates the retrieval and adapts before passing context to the LLM.

The core idea is to decouple the retrieval and generation steps and introduce a "retrieval evaluator" or "corrector" module. This module, often another LLM or a specialized classifier, looks at the retrieved documents and the original question and asks: "Is this retrieval good enough?"

Here’s a simplified view of the CRAG flow:

  1. Initial Retrieval: A standard retriever (e.g., based on vector similarity) fetches candidate documents for the user’s query.
  2. Retrieval Evaluation: The CRAG module analyzes the retrieved documents against the query. It checks for relevance, factual consistency, and completeness.
  3. Decision Point: Based on the evaluation, the CRAG module decides:
    • Keep: If retrieval is good, pass the documents to the LLM for generation.
    • Re-retrieve: If retrieval is poor or incomplete, modify the query or parameters and re-run the retriever.
    • Generate without retrieval: If no relevant documents can be found, inform the user or generate a general answer.
    • Correct context: If documents are partially relevant but need refinement, the CRAG module might try to "fix" them or ask clarifying questions.

Let’s consider the levers you control in a CRAG system. The primary one is the retrieval evaluator. You can train or prompt this evaluator to be more or less strict. For example, you might tell it to flag documents that are only marginally relevant, or to prioritize documents that directly answer the question versus those that provide tangential information.

Another lever is the re-retrieval strategy. When the initial retrieval is deemed insufficient, how do you refine the query? Do you add keywords from the original query that were deemed important by the evaluator? Do you ask the evaluator to suggest a better query? Do you broaden the search scope?

The CRAG system can also adapt the retrieval mechanism itself. Instead of just using vector similarity, you might incorporate keyword search, or use a hybrid approach, guided by the evaluator’s feedback.

Consider a scenario where a user asks, "What are the security implications of using third-party plugins with our internal CRM?" A basic RAG might retrieve documents about "CRM security" and "third-party plugins" separately. The CRAG evaluator, however, might notice that no single document directly addresses the intersection of these two topics. It could then prompt the LLM to explicitly ask for documents that discuss "security risks of external CRM add-ons" or even suggest a refined query to the retriever.

The CRAG module doesn’t just magically know what’s good. It often uses a small, fast LLM or a fine-tuned classification model. This model is given the query and the retrieved snippets, and it outputs a score or a category (e.g., "highly relevant," "partially relevant," "irrelevant," "contradictory"). This allows the system to dynamically adjust its behavior based on the quality of information it finds. For instance, if the evaluator gives a low score, the system might trigger a different retrieval strategy, like a broader search or a different embedding model, before committing to an answer.

The most counterintuitive part of CRAG is that sometimes the best action when retrieval is weak is not to retrieve more, but to use the LLM’s own reasoning to identify the failure of retrieval. The evaluator, acting as a meta-reasoner, can explicitly state, "The retrieved documents do not contain sufficient information about X," thereby preventing the LLM from hallucinating an answer based on partial or irrelevant context. This allows the system to gracefully admit ignorance or request clarification rather than fabricating falsehoods.

The next frontier after implementing Corrective RAG is often managing the complexity of multiple retrieval and evaluation loops, which can impact latency.

Want structured learning?

Take the full Rag course →