OpenAI Assistants API: Build Stateful AI Applications (2026)

The OpenAI Assistants API doesn’t just give you a new way to chat; it’s a state machine that remembers your context and lets you attach tools to it.

Let’s see it in action. Imagine an assistant that can search a knowledge base and then answer questions based on that information.

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Create a vector store for our knowledge
vector_store = client.beta.vector_stores.create(
    name="my-knowledge-base",
)

# Upload files to the vector store
file_paths = ["./my_document.txt"]
file_streams = [open(path, "rb") for path in file_paths]

file_batch = client.beta.vector_stores.files.create_and_poll(
    vector_store_id=vector_store.id,
    files=file_streams,
)

# Create an assistant with the vector store attached
assistant = client.beta.assistants.create(
    name="Knowledge Navigator",
    instructions="You are a helpful assistant. Use your tools to answer questions based on the provided documents.",
    tools=[{"type": "file_search"}],
    tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
    model="gpt-4-turbo-preview",
)

# Create a thread (session)
thread = client.beta.threads.create()

# Add a message to the thread
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What are the key findings from the document?",
)

# Run the assistant
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

# Poll for completion and retrieve messages
while run.status not in ["completed", "failed", "cancelled", "expired"]:
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    print(f"Run status: {run.status}")
    time.sleep(1)

if run.status == "completed":
    messages = client.beta.threads.messages.list(thread_id=thread.id)
    for msg in messages.data:
        if msg.role == "assistant":
            print(f"Assistant: {msg.content[0].text.value}")
else:
    print(f"Run failed with status: {run.status}")

The core problem the Assistants API solves is managing conversational state and tool execution across multiple turns. Before, you’d have to manually store the entire conversation history, pass it back to the model on every request, and then parse the model’s output to decide which tool to call next. This became a complex orchestration problem. The Assistants API abstracts this away.

Internally, an Assistant object is essentially a pre-configured LLM with specific instructions, tools, and access to VectorStores. When you create an Assistant, you’re defining a persona and its capabilities. The model parameter (gpt-4-turbo-preview in the example) is the LLM underpinning it. instructions are the system-level prompts that guide its behavior. tools define what the assistant can do, like file_search for retrieving information or code_interpreter for running Python code. tool_resources link these tools to specific data, such as a VectorStore containing your documents.

A Thread represents a single, ongoing conversation. When a user sends a message, it’s added to the Thread. Crucially, the Thread persists this history. You don’t need to send the entire conversation every time; you just add the new message.

When you call client.beta.threads.runs.create, you’re telling the Assistant to process the current state of the Thread. The API then handles:

Context Management: It pulls the relevant history from the Thread.
Tool Selection: Based on the user’s message and its instructions, it decides if a tool needs to be invoked.
Tool Execution: If a tool is chosen (e.g., file_search), it executes that tool with the necessary parameters. For file_search, this involves querying the specified VectorStore.
Response Generation: After tool execution (or if no tool was needed), it uses the LLM to generate a natural language response, incorporating any information retrieved by tools.
State Update: The assistant’s response (and any tool outputs) are automatically appended to the Thread, maintaining the conversation’s state for the next turn.

The client.beta.vector_stores.files.create_and_poll method is key for knowledge retrieval. It takes your raw files, chunks them, generates embeddings, and stores them in a way that the file_search tool can efficiently query. The _and_poll part means it handles the asynchronous process of embedding and indexing, waiting until the files are ready for use.

The most surprising thing is how the Thread acts as a true persistent entity. You can create a thread, add messages, run the assistant, and then later retrieve that same thread and add more messages, continuing the conversation as if it were a single, unbroken session, all managed server-side by OpenAI. This allows for complex, multi-step workflows where the assistant might ask clarifying questions, perform multiple tool actions, and then present a final, synthesized answer, without the client needing to orchestrate each individual step.

The next challenge is handling asynchronous tool outputs and managing multiple tools within a single run.