The most surprising thing about query rewriting for RAG is that the LLM often makes your search worse if you don’t guide it precisely.
Let’s see it in action. Imagine we have a document about the "Mars Rover Perseverance" and a user asks: "What’s the latest on that Mars robot?"
Without transformation, a simple keyword search might struggle. "Robot" is too generic, and "latest" gives no specific date. The retriever might pull up general info about Mars robots, or even unrelated robots on Earth.
But with a RAG query transformer, we can get much better results. Here’s a simplified flow:
User Query: "What’s the latest on that Mars robot?"
LLM (acting as transformer): Prompt: "You are a query rewriting assistant for a RAG system. Rewrite the following user query to be more specific and include relevant keywords for retrieving information about the Mars Rover Perseverance. Focus on recent activities and scientific findings. If the query is too vague, ask clarifying questions or generate multiple specific queries."
Output:
- "Recent scientific findings of Mars Rover Perseverance"
- "Latest mission updates for Perseverance rover"
- "What has the Perseverance rover discovered recently on Mars?"
Now, our retriever gets these much more targeted queries. It can use keywords like "Perseverance," "recent discoveries," "mission updates," and "Mars" to find documents specifically about the rover’s latest activities, leading to higher recall and relevance.
The Mental Model: From Vague to Pinpoint
The core problem RAG query transformation solves is the mismatch between natural, often ambiguous human language and the precise, keyword-driven nature of information retrieval systems (like vector databases or traditional search indexes). Users don’t think in terms of database schemas or search engine optimization; they think in concepts and context.
Here’s how it works internally:
- Initial Query: The user’s raw question enters the system.
- LLM as Interpreter: The LLM, guided by a carefully crafted prompt, acts as an intelligent intermediary. Its goal isn’t to answer the question directly, but to rephrase it into a format that a retriever can understand and act upon effectively.
- Prompt Engineering is Key: The prompt tells the LLM what to focus on. For example, it might instruct the LLM to:
- Identify key entities (e.g., "Perseverance rover").
- Infer temporal context (e.g., "latest," "recent").
- Extract underlying intent (e.g., "discoveries," "mission status").
- Generate multiple variations if the original query is ambiguous, increasing the chances of a hit.
- Add relevant synonyms or related terms (e.g., "robot" -> "rover," "Perseverance").
- Transformed Queries: The LLM outputs one or more refined queries. These are not necessarily full sentences but are optimized for retrieval.
- Retriever Action: The retriever uses these transformed queries to search the knowledge base. Because the queries are more specific and contain better keywords, the retriever is more likely to fetch the most relevant chunks of information.
- Contextual Augmentation: These retrieved chunks are then passed to another LLM (the generator) along with the original query to synthesize a final, coherent answer.
The exact levers you control are primarily in the prompt given to the LLM acting as the query transformer. You specify:
- Role: What kind of assistant is it? (e.g., "query optimizer," "information extractor").
- Target Domain/Knowledge Base: What kind of information should it focus on? (e.g., "technical documentation," "financial reports," "scientific papers").
- Transformation Goals: What should the output look like? (e.g., "keywords only," "question variations," "add synonyms," "specify date ranges").
- Constraints: What to avoid? (e.g., "don’t hallucinate," "don’t answer directly").
Consider this prompt snippet: "Rewrite the user's query to be a concise search query suitable for a vector database. Include named entities, key concepts, and temporal information. Avoid conversational filler."
This instructs the LLM to strip away politeness and focus on the search terms themselves.
The most counterintuitive part for many is that simply asking the LLM to "make the query better" is insufficient. Without explicit instructions on how to make it better – by identifying entities, adding context, or generating variations – the LLM might over-interpret or introduce its own biases, leading to a query that is less effective than the original. For instance, if the user asks "Tell me about that thing on Mars," the LLM might confidently rewrite it as "Details on the Curiosity rover’s latest findings," completely missing the user’s potential interest in Perseverance because "that thing" was too vague for it to infer Perseverance. The prompt must guide it to acknowledge ambiguity and perhaps generate queries for multiple likely candidates if inference is impossible.
The next step in optimizing RAG is often dealing with the quality of the retrieved chunks themselves, and how to effectively synthesize them into a cohesive answer.