RAG Fusion is a technique that combines multiple search results to produce a single, more relevant ranking, often outperforming individual search methods.

Let’s see RAG Fusion in action. Imagine you’re building a question-answering system. You’ve got a user query, say, "What are the best practices for cloud security compliance?" You might use several different search strategies to find documents that answer this:

  1. Keyword Search: A traditional search that looks for exact keyword matches.
  2. Semantic Search: A search that understands the meaning of the query and finds documents with similar semantic embeddings, even if they don’t share exact words.
  3. Hybrid Search: A combination of keyword and semantic search.

Each of these might return a list of documents, ranked by their perceived relevance. However, a document that ranks #1 in keyword search might be #5 in semantic search, and vice-versa. RAG Fusion’s goal is to take these disparate rankings and merge them into one superior ranking.

Here’s a simplified representation of results from three hypothetical search methods for our query:

Keyword Search Results:

  1. doc_A (Score: 0.95)
  2. doc_C (Score: 0.88)
  3. doc_B (Score: 0.72)
  4. doc_D (Score: 0.65)

Semantic Search Results:

  1. doc_B (Score: 0.98)
  2. doc_E (Score: 0.92)
  3. doc_A (Score: 0.85)
  4. doc_F (Score: 0.78)

Hybrid Search Results:

  1. doc_A (Score: 0.96)
  2. doc_B (Score: 0.91)
  3. doc_C (Score: 0.80)
  4. doc_G (Score: 0.75)

The problem is, how do we combine these into one definitive "best" list? Do we average scores? That would favor documents that appear high in all lists. What if a document is a perfect match for one method but only okay for others? We want to capture that excellence.

This is where Reciprocal Rank Fusion (RRF) comes in. RRF is a method for combining ranked lists that doesn’t require shared scoring mechanisms. It focuses on the rank of each item within its own list. The core idea is that an item’s position in any ranked list is informative. RRF assigns a score to each item based on its rank across all lists.

The RRF formula for an item x is:

RRF_score(x) = sum(1 / (k + rank(x, L))) for all lists L, where k is a constant (often set to 60).

Let’s calculate the RRF scores for our example documents. We’ll use k=60.

  • doc_A:

    • Keyword Rank: 1
    • Semantic Rank: 3
    • Hybrid Rank: 1
    • RRF Score = 1/(60+1) + 1/(60+3) + 1/(60+1) = 1/61 + 1/63 + 1/61 ≈ 0.0164 + 0.0159 + 0.0164 ≈ 0.0487
  • doc_B:

    • Keyword Rank: 3
    • Semantic Rank: 1
    • Hybrid Rank: 2
    • RRF Score = 1/(60+3) + 1/(60+1) + 1/(60+2) = 1/63 + 1/61 + 1/62 ≈ 0.0159 + 0.0164 + 0.0161 ≈ 0.0484
  • doc_C:

    • Keyword Rank: 2
    • Semantic Rank: N/A (assume it wasn’t in semantic results)
    • Hybrid Rank: 3
    • RRF Score = 1/(60+2) + 0 + 1/(60+3) = 1/62 + 1/63 ≈ 0.0161 + 0.0159 ≈ 0.0320
  • doc_D:

    • Keyword Rank: 4
    • Semantic Rank: N/A
    • Hybrid Rank: N/A
    • RRF Score = 1/(60+4) = 1/64 ≈ 0.0156
  • doc_E:

    • Keyword Rank: N/A
    • Semantic Rank: 2
    • Hybrid Rank: N/A
    • RRF Score = 1/(60+2) = 1/62 ≈ 0.0161
  • doc_F:

    • Keyword Rank: N/A
    • Semantic Rank: 4
    • Hybrid Rank: N/A
    • RRF Score = 1/(60+4) = 1/64 ≈ 0.0156
  • doc_G:

    • Keyword Rank: N/A
    • Semantic Rank: N/A
    • Hybrid Rank: 4
    • RRF Score = 1/(60+4) = 1/64 ≈ 0.0156

Now, we sort these RRF scores:

RAG Fusion (RRF) Merged Results:

  1. doc_A (RRF Score: 0.0487)
  2. doc_B (RRF Score: 0.0484)
  3. doc_C (RRF Score: 0.0320)
  4. doc_E (RRF Score: 0.0161)
  5. doc_D (RRF Score: 0.0156)
  6. doc_F (RRF Score: 0.0156)
  7. doc_G (RRF Score: 0.0156)

Notice how doc_A and doc_B are very close. doc_A ranked #1 twice, which gave it a slight edge. doc_B ranked #1 once and #2 once, very strong performance. doc_C was #2 in one list and #3 in another, giving it a solid mid-tier rank. Documents that only appeared in one list, and at a lower rank, naturally score lower.

The constant k is crucial. It acts as a dampening factor, ensuring that the rank’s impact diminishes as its position gets lower. A higher k means that only very high ranks (e.g., #1, #2) contribute significantly to the score, making the fusion more sensitive to top results. A lower k gives more weight to items appearing further down the lists. The value k=60 is a common heuristic that balances the influence of top ranks versus the inclusion of more items.

The real power of RRF is its ability to combine diverse retrieval systems, like a vector database, a BM25 index, and even a knowledge graph traversal, into a cohesive, superior ranked output without needing to normalize their disparate scoring scales.

The next step after RAG Fusion is typically using these merged results to generate a response.

Want structured learning?

Take the full Rag course →