HyDE is a technique that uses a large language model (LLM) to generate a hypothetical answer to a user’s query, and then uses that hypothetical answer as the basis for retrieving relevant documents. This can improve recall in retrieval-augmented generation (RAG) systems by allowing the system to find documents that are semantically similar to the ideal answer, rather than just the user’s original query.

Let’s see it in action.

Imagine a user asks: "What are the health benefits of intermittent fasting?"

Without HyDE, a RAG system might embed the query directly and search for documents containing similar embeddings. This could miss documents that discuss the benefits of IF but don’t use the exact phrasing.

With HyDE, the LLM first generates a hypothetical answer:

"Intermittent fasting, a pattern of cycling between periods of eating and voluntary fasting, offers several health benefits. These include improved insulin sensitivity, leading to better blood sugar control; promotion of cellular repair processes like autophagy; potential for weight loss due to reduced calorie intake and metabolic shifts; and enhanced brain health, possibly by increasing BDNF levels."

Now, the RAG system embeds this hypothetical answer and uses it to search for relevant documents. This hypothetical answer is more descriptive and covers a broader semantic space than the original query. The search might then retrieve documents that discuss "autophagy in relation to fasting," "BDNF and brain function," or "insulin sensitivity improvements from caloric restriction," even if the original query didn’t explicitly mention these terms.

The core problem HyDE solves is the mismatch between a user’s potentially vague or poorly phrased query and the specific, often keyword-rich content of documents. LLM embeddings are good at capturing semantic meaning, but a direct embedding of a short query might not fully represent the information space that would contain a comprehensive answer. By generating a hypothetical answer, we create a richer, more descriptive "proxy" for the user’s underlying information need. This proxy is more likely to share semantic space with the relevant documents.

Here’s how it works internally:

  1. Query Input: The user submits a query (e.g., "How does photosynthesis work?").
  2. Hypothetical Document Generation: The query is passed to an LLM. The LLM is prompted to generate a comprehensive, factual-sounding answer to the query. This is the "hypothetical document."
  3. Embedding Generation: Both the original query and the hypothetical document are embedded into vector representations using an embedding model.
  4. Retrieval: The embedding of the hypothetical document is used to query a vector database containing document embeddings. The system retrieves the top-k most similar document embeddings.
  5. Context Augmentation: The content of the retrieved documents is combined with the original query.
  6. Final Answer Generation: This augmented context is passed to a generative LLM to produce the final answer for the user.

The "levers" you control are primarily:

  • The LLM for Hypothetical Document Generation: A more capable LLM will produce a more accurate and comprehensive hypothetical answer, leading to better retrieval. You might fine-tune this LLM on a dataset of question-answer pairs relevant to your domain.
  • The Embedding Model: The choice of embedding model significantly impacts how "similarity" is calculated. Models trained on larger, more diverse datasets or specifically for semantic similarity tasks will perform better.
  • The Retrieval Strategy: While HyDE typically uses the hypothetical document’s embedding for retrieval, you could experiment with hybrid approaches, such as combining the original query’s embedding with the hypothetical document’s embedding for the search.
  • The Prompt for Hypothetical Document Generation: The prompt engineering here is crucial. You want to instruct the LLM to generate a detailed, informative, and factually plausible answer, not just a short sentence. For example, "Generate a detailed and informative answer to the following question, as if you were explaining it to an expert in the field: [user query]".

A common pitfall is using the original query’s embedding for retrieval after generating the hypothetical document. The whole point is to leverage the richer semantic representation of the hypothetical answer itself for the vector search. If you embed the hypothetical document, but then search using the embedding of the original query, you’re not fully benefiting from the HyDE process.

The next concept you’ll likely encounter is how to optimize the trade-off between the latency introduced by the LLM call for hypothetical document generation and the potential gains in retrieval accuracy.

Want structured learning?

Take the full Rag course →