Pinecone’s hybrid search isn’t just about mixing vector types; it’s a way to tell the system that not all matches are created equal, giving you fine-grained control over relevance.

Let’s see it in action. Imagine you have a dataset of product descriptions and you want to find items that are both semantically similar to a query ("red running shoes") and contain the exact keywords "nike" and "air max".

First, you’d index your data. Each product would have a sparse vector representing keyword matches (e.g., {"123": 1, "456": 0.7}) and a dense vector capturing semantic meaning (e.g., [0.1, 0.5, -0.2, ...]).

from pinecone import Pinecone, ServerlessSpec
import os

pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))

index_name = "hybrid-search-example"

# Create index if it doesn't exist
if index_name not in pc.list_indexes().names:
    pc.create_index(
        index_name,
        dimension=8,  # Example dimension for dense vectors
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-west-2")
    )

index = pc.Index(index_name)

# Example data
data = [
    ("product1", {"sparse_values": {"101": 1.0, "202": 0.8}, "values": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]}),
    ("product2", {"sparse_values": {"101": 0.9, "303": 0.5}, "values": [0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]}),
    ("product3", {"sparse_values": {"202": 1.0, "404": 0.9}, "values": [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}),
    ("product4", {"sparse_values": {"101": 0.7, "202": 0.6, "505": 1.0}, "values": [0.5, 0.4, 0.3, 0.2, 0.1, 0.8, 0.7, 0.6]}),
]

# Upsert data
index.upsert(vectors=data)

print("Index created and data upserted.")

Now, when you query, you provide both types of vectors. The key is the query method’s hybrid_search parameter.

# Example query
query_sparse_values = {"101": 1.0, "505": 0.9} # Represents keywords like "nike", "air max"
query_dense_values = [0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85] # Represents semantic meaning of "red running shoes"

# Perform hybrid search
response = index.query(
    sparse_vector=query_sparse_values,
    vector=query_dense_values,
    top_k=3,
    include_values=True,
    include_metadata=True,
    hybrid_search={
        "use_sparse_rate": 0.5 # Weight for sparse results
    }
)

print("\nHybrid Search Results:")
for match in response.matches:
    print(f"ID: {match.id}, Score: {match.score}")

The hybrid_search object, specifically use_sparse_rate, is where the magic happens. It’s a float between 0 and 1 that determines the weighting between your sparse and dense search results. A use_sparse_rate of 0.5 means a 50/50 blend. The final score for each document is a weighted sum of its individual sparse and dense similarity scores.

This allows you to build systems that understand both the precise vocabulary of your data and the nuanced, contextual meaning of user queries. You can tune the use_sparse_rate to prioritize exact keyword matches or semantic similarity based on your specific use case. For instance, if you’re searching a legal document database, you might want a higher use_sparse_rate to ensure critical legal terms are hit precisely. For a product recommendation engine, a lower use_sparse_rate might be better to capture broader user intent.

The underlying mechanism involves two separate searches: one using the sparse vector against the inverted index (for keyword matching) and another using the dense vector against the HNSW graph (for semantic similarity). Pinecone then combines the ranked lists from these two searches, applying the use_sparse_rate to re-rank and produce a final, unified score. This is why you can have a document that’s a perfect semantic match but misses a keyword, or vice-versa, and still get a high score if the other component is strong enough and the weighting is right.

What most users don’t realize is that the sparse vector itself doesn’t need to be a traditional TF-IDF or BM25 representation. You can construct sparse vectors based on any arbitrary set of features or identifiers that you want to give explicit weight to, effectively creating custom "keyword" signals that can be blended with dense semantic embeddings. This flexibility allows for sophisticated relevance tuning beyond simple keyword spotting or pure semantic understanding.

The next step is exploring how to dynamically adjust the use_sparse_rate based on query characteristics or user feedback.

Want structured learning?

Take the full Pinecone course →