RAG Hybrid Search: Combine BM25 and Semantic Retrieval (2026)

Hybrid search is how you get the best of both worlds: keyword matching and meaning matching.

Let’s see it in action. Imagine we have a small dataset of documents about various fruits.

[
  {"id": 1, "text": "Apples are crisp and come in red, green, and yellow varieties. They are a good source of fiber."},
  {"id": 2, "text": "Bananas are long, yellow fruits with a soft texture. They are known for their potassium content."},
  {"id": 3, "text": "Oranges are citrus fruits, round and orange in color. They are rich in Vitamin C."},
  {"id": 4, "text": "Pineapple is a tropical fruit, sweet and tart. It has a tough, spiky exterior and yellow flesh."},
  {"id": 5, "text": "Grapes grow in clusters and can be green, red, or purple. They are often used to make wine."}
]

If we search for "red fruit", a pure keyword search (like BM25) would likely return document 1 ("Apples are crisp and come in red…") and document 5 ("Grapes grow in clusters and can be green, red, or purple."). A pure semantic search might struggle with the exact term "red" if the embeddings don’t perfectly capture that nuance, but it might understand "fruit" broadly.

Now, let’s combine them. We’ll use a tool that supports hybrid search, like Pinecone or Weaviate. For this example, let’s conceptually walk through what happens.

How it Works Internally

Hybrid search typically involves two distinct retrieval steps:

Keyword (BM25) Retrieval: This uses a traditional inverted index to find documents that contain the exact query terms or their close variations (stemming, etc.). It’s fast and excellent for matching specific keywords, brand names, or technical jargon.
Semantic (Vector) Retrieval: This uses dense vector embeddings to find documents whose meaning is similar to the query. The query is also embedded, and then a similarity search (like cosine similarity) is performed against the document embeddings. This is great for understanding synonyms, paraphrases, and conceptual relationships.

Once both retrieval steps are done, the results are merged and re-ranked. A common re-ranking algorithm is Reciprocal Rank Fusion (RRF). RRF takes the ranked lists from each retriever and assigns a score based on the reciprocal of the rank position, adjusted by a constant. The formula is roughly:

RRF_score = sum( (1 / (rank_i + k)) for each retriever i )

where rank_i is the rank of the document in retriever i, and k is a constant (often set to 60) to balance the influence of different rank positions. Documents that appear high in both lists get a significantly boosted score.

The Problem it Solves

Pure semantic search can miss exact keyword matches that are crucial for certain queries. For instance, searching for a specific product model number or a legal term might fail if the embeddings don’t perfectly align. Conversely, pure keyword search can be brittle; it won’t understand "a fruit that is yellow and long" if the document only says "banana" and not "yellow" or "long" in that specific context. Hybrid search addresses this by ensuring that both precise term matching and conceptual understanding contribute to the final results.

Levers You Control

Weighting/Fusion Algorithm: How much influence does each retriever have? RRF is common, but other fusion methods exist. Some systems allow you to directly assign weights to BM25 and vector scores before fusion.
BM25 Parameters: For the keyword component, you can tune parameters like k1 and b (standard BM25 parameters that control term frequency saturation and document length normalization).
Embedding Model: The quality of your semantic search is entirely dependent on the embedding model you choose. A model trained on your specific domain will perform better.
Hybrid Thresholds: You might set a minimum score for each retriever before its results are considered for fusion.

The Real Magic: Semantic Relevance Beats Keyword Density

When you combine BM25 and vector search, it’s not just about getting more results. It’s about how those results are re-ranked. A document that ranks #1 for BM25 but #50 for vector similarity might be pushed down significantly. Conversely, a document that ranks #50 for BM25 but #2 for vector similarity could leapfrog to the top. This means that if a document truly understands the semantic intent of your query, it has a strong chance of being surfaced, even if it doesn’t contain the exact keywords you typed. This is particularly powerful for long, complex queries or when users are using natural language.

The next step is often exploring how to tune the fusion parameters to optimize for different types of queries.