The most surprising thing about managing multi-turn conversations with LLMs is that the model doesn’t "remember" anything between turns; you have to explicitly feed it the entire history every single time.
Let’s see this in action. Imagine a simple chatbot that helps users find movie showtimes.
import openai
# Assume you have your OpenAI API key set as an environment variable
# openai.api_key = os.getenv("OPENAI_API_KEY")
def get_movie_showtimes(user_query, conversation_history=[]):
"""
Simulates getting movie showtimes using an LLM.
In a real scenario, this would involve a more complex prompt and potentially tool use.
"""
# The core of multi-turn: include the history
messages = conversation_history + [{"role": "user", "content": user_query}]
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", # Or "gpt-4"
messages=messages,
temperature=0.7,
)
assistant_response = response.choices[0].message['content']
# Update history for the next turn
updated_history = messages + [{"role": "assistant", "content": assistant_response}]
return assistant_response, updated_history
# --- Simulation ---
# Turn 1
print("--- Turn 1 ---")
initial_query = "What are the showtimes for 'Inception' today?"
history = []
response, history = get_movie_showtimes(initial_query, history)
print(f"User: {initial_query}")
print(f"Assistant: {response}")
print("\nConversation History after Turn 1:")
for msg in history:
print(msg)
# Turn 2
print("\n--- Turn 2 ---")
follow_up_query = "And what about 'The Dark Knight' at the AMC Metreon?"
response, history = get_movie_showtimes(follow_up_query, history) # Pass the updated history
print(f"User: {follow_up_query}")
print(f"Assistant: {response}")
print("\nConversation History after Turn 2:")
for msg in history:
print(msg)
# Turn 3
print("\n--- Turn 3 ---")
another_query = "How late does 'Inception' play?"
response, history = get_movie_showtimes(another_query, history) # Pass the updated history
print(f"User: {another_query}")
print(f"Assistant: {response}")
print("\nConversation History after Turn 3:")
for msg in history:
print(msg)
In this example, the conversation_history list is the explicit state management. Each time get_movie_showtimes is called, it receives the entire history of the conversation so far, prepends the new user query, and sends it to the LLM. The LLM’s response is then appended to the history before being returned. This builds up the context window, allowing the LLM to understand follow-up questions like "And what about…" or "How late does…" because it’s seeing the previous turns as part of its input.
The problem this solves is the stateless nature of LLM API calls. By default, each API call is independent. The LLM has no memory of previous interactions. To create a conversational experience, we must engineer this memory by serializing the turns and re-injecting them. The core challenge is balancing the need for sufficient context with the LLM’s context window limits and the cost/latency associated with longer prompts.
Internally, the openai.ChatCompletion.create method expects a list of message objects, each with a role (system, user, or assistant) and content. The conversation_history list precisely mirrors this structure. The "system" message (if used) sets the initial instructions for the assistant, followed by alternating "user" and "assistant" messages representing the dialogue flow.
The exact levers you control are:
- The prompt content: What you ask the user to say, and how you instruct the LLM via the system message.
- The history content: What messages you include in the
conversation_history. This is where you manage state. - History truncation/summarization: When the history gets too long, you need a strategy. This could be simply dropping the oldest messages, or using another LLM call to summarize older parts of the conversation.
- The model itself: Different models have different context window sizes (e.g.,
gpt-3.5-turbo-16kvs.gpt-4-32k).
The most common mistake is assuming the LLM "remembers" past turns without you providing that history explicitly. This leads to chatbots that only respond to the immediate query, ignoring all prior context. The conversation_history variable is your explicit mechanism for giving the LLM memory.
When managing long conversations, a critical detail is how you handle the order of messages. The openai.ChatCompletion API expects messages to be ordered chronologically, with the most recent user message typically being the last one before the API call. If you inject new information or user utterances out of order, the model’s understanding of the dialogue flow can become confused, leading to nonsensical responses or a failure to grasp the current conversational intent.
The next conceptual hurdle is implementing robust error handling and fallback mechanisms when the LLM fails to provide a coherent answer or when an external tool integration (if used) breaks.