Pinecone’s sparse vectors aren’t just a fancy way to store BM25 scores; they’re a fundamental building block for achieving true hybrid search by allowing the vector database to directly participate in the ranking process.

Let’s see this in action. Imagine we have a collection of documents and we want to search for "machine learning algorithms."

First, we’ll index a document. In Pinecone, you’d send a request like this:

{
  "id": "doc1",
  "values": [0.1, 0.2, ..., 0.9], // Dense vector embedding
  "sparse_values": {
    "indices": [101, 256, 500],
    "values": [2.5, 1.8, 3.2] // BM25-like scores for terms
  }
}

Here, values is your traditional dense vector embedding (e.g., from a transformer model). The magic is in sparse_values. indices are essentially the document’s "terms" (or more accurately, their internal representation/IDs), and values are their associated scores, typically derived from TF-IDF or BM25.

When you query, you send a similar structure:

{
  "sparse_values": {
    "indices": [101, 400, 500], // Query terms' indices
    "values": [3.1, 2.0, 2.8]   // Query terms' scores
  },
  "values": [0.15, 0.25, ..., 0.85], // Dense query vector
  "top_k": 10,
  "hybrid_search_k": 100 // How many results to consider for hybrid scoring
}

Pinecone then performs two searches simultaneously:

  1. Dense Search: It finds the top_k most similar dense vectors to your query dense vector.
  2. Sparse Search: It finds candidates based on the overlap and scores of your query’s sparse terms against the indexed sparse terms. The hybrid_search_k parameter dictates how many of these sparse candidates are considered.

The crucial part is the re-ranking. Pinecone doesn’t just give you results from one or the other. It takes the candidates from both searches and combines their scores using a learned or configured fusion algorithm. This fusion typically looks something like:

final_score = alpha * dense_score + (1 - alpha) * sparse_score

The alpha is a tunable parameter that balances the contribution of dense and sparse retrieval.

The problem this solves is the inherent limitation of each search type. Dense search excels at capturing semantic similarity and understanding nuanced meaning but can miss exact keyword matches. Sparse search (like BM25) is brilliant at keyword relevance and exact term matching but struggles with synonyms or conceptual understanding. Hybrid search, by integrating both, aims for the best of both worlds: precise keyword recall and broad semantic understanding.

Internally, Pinecone uses an inverted index for sparse vectors, similar to traditional IR systems. When you query, it quickly retrieves documents containing the query terms and calculates a score based on the term frequency and inverse document frequency (or a BM25 variant). This is done in parallel with the Approximate Nearest Neighbor (ANN) search on dense vectors. The results are then merged and re-ranked.

One thing most people don’t realize is how the hybrid_search_k parameter directly influences the quality of the sparse retrieval phase. If hybrid_search_k is too low, you might miss relevant documents that have a good sparse match but aren’t in the top N sparse results. If it’s too high, you increase the computational cost of the re-ranking step without necessarily improving recall, as the dense search also contributes. Experimenting with this value, alongside the alpha parameter in the fusion, is key to optimizing hybrid search performance for your specific dataset and query patterns.

The next step after mastering hybrid search is often exploring different fusion strategies beyond simple linear interpolation, such as Reciprocal Rank Fusion (RRF).

Want structured learning?

Take the full Pinecone course →