Step-back prompting in RAG is actually about avoiding the initial retrieval, not improving it.
Let’s see what this looks like in practice. Imagine we have a simple RAG system that answers questions about a fictional product called "Gizmo."
Here’s a snippet of our knowledge base:
[
{
"id": "gizmo_features_v1",
"content": "The Gizmo v1 features a 5MP camera, 128GB storage, and a 3-hour battery life. It is available in blue and silver."
},
{
"id": "gizmo_price_v1",
"content": "The Gizmo v1 is priced at $299. It is currently on sale for $249 until the end of the month."
},
{
"id": "gizmo_compatibility_v2",
"content": "The Gizmo v2 is compatible with Android 10+ and iOS 14+. It supports Wi-Fi 6 and Bluetooth 5.0."
},
{
"id": "gizmo_price_v2",
"content": "The Gizmo v2 costs $499. No discounts are currently available."
}
]
Scenario 1: Standard RAG
User Query: "What’s the price of the Gizmo?"
Standard RAG Process:
- Retrieval: The system searches the knowledge base for documents containing "price" and "Gizmo." It might retrieve
gizmo_price_v1andgizmo_price_v2. - Generation: The LLM synthesizes an answer based on the retrieved chunks.
LLM Output: "The Gizmo v1 is priced at $299, on sale for $249. The Gizmo v2 costs $499."
This works fine. But what if the query is more nuanced?
Scenario 2: Step-Back Prompting
User Query: "What’s the price of the Gizmo, considering both models?"
Step-Back Prompting Process:
- Abstraction Prompt: The LLM is first prompted to generate a more general, abstract question that captures the essence of the original query.
- Prompt:
Given the user's question, generate a more abstract, higher-level question that the user is trying to answer. This abstract question should not contain specific entities or details from the original question. Original question: "What's the price of the Gizmo, considering both models?" Abstract question: - LLM Output (Abstraction): "What are the pricing details for the product line?"
- Prompt:
- Retrieval (using Abstract Query): The RAG system now uses this abstract query ("What are the pricing details for the product line?") to search the knowledge base.
- This abstract query is more robust to variations in product names or model numbers. It effectively broadens the search scope.
- Generation: The LLM receives the original query and the retrieved chunks (which are likely to be more comprehensive because of the abstract search) and generates the final answer.
LLM Output (with Step-Back): "The Gizmo v1 is priced at $299, with a current sale price of $249. The Gizmo v2 is priced at $499."
The step-back prompt helps the system retrieve all relevant information about pricing, even if the original query was slightly ambiguous or only hinted at specific models. It forces the system to think about the underlying information need before diving into the details.
The Core Problem Solved: Standard RAG can struggle with queries that are underspecified, ambiguous, or require synthesizing information across multiple distinct entities that aren’t explicitly mentioned together. The LLM might retrieve only the most directly matching chunks, missing related but crucial information. Step-back prompting addresses this by creating an intermediate, abstract query that is more likely to trigger the retrieval of a broader, more relevant set of documents. This allows the final generation step to have a richer context.
Internal Mechanics: When you ask the LLM to "step back" or generate an abstract query, you’re essentially asking it to perform a form of query expansion or generalization. Instead of directly mapping "Gizmo price" to specific documents, it first maps it to the concept of "product pricing information." This conceptual mapping is more likely to hit a wider net of relevant documents. The LLM is trained to understand relationships between specific queries and underlying abstract information needs.
Levers You Control:
- The Abstraction Prompt: The wording of the prompt you give the LLM to generate the abstract question is critical. You can guide it towards more specific or more general abstractions. For example, you could ask it to "generate a question about the product’s lifecycle" instead of just "pricing details."
- The Retrieval Strategy: You can choose to use the abstract query only for retrieval, or you can combine it with the original query. Some systems might perform a hybrid retrieval, searching for both.
- The Generation Prompt: The prompt for the final generation step needs to instruct the LLM to answer the original user question using the retrieved (potentially broader) context.
The Counterintuitive Insight: The most surprising part is that by not using the user’s exact query for the initial retrieval, you often get a better and more complete answer. The LLM’s ability to "reason" about the underlying intent and generalize it into an abstract query is the key. This abstraction acts as a semantic filter, surfacing a wider array of relevant documents that a direct keyword or embedding match might miss. It turns the LLM from a simple search enhancer into a query refiner.
The next hurdle is handling situations where the abstract query itself is too broad and retrieves irrelevant information.