The most surprising thing about OpenAI’s structured output feature is that it’s not really about generating JSON; it’s about validating it before it even leaves the LLM.
Let’s see this in action. Imagine you want to extract a user’s name and their favorite color from a piece of text.
import openai
from pydantic import BaseModel, Field
from typing import Optional
# Configure your OpenAI API key here
# openai.api_key = "YOUR_API_KEY"
class UserInfo(BaseModel):
name: str = Field(..., description="The full name of the user")
favorite_color: Optional[str] = Field(None, description="The user's favorite color, if mentioned")
def extract_user_info(text: str) -> UserInfo:
response = openai.chat.completions.create(
model="gpt-4o", # Or gpt-3.5-turbo-0125 for faster/cheaper
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": f"You are a helpful assistant designed to output JSON. Use the following Pydantic model to structure your output:\n\n{UserInfo.model_json_schema()}"},
{"role": "user", "content": f"Extract user information from this text: {text}"}
]
)
# The LLM is instructed to return JSON, and we're telling it to validate against our schema.
# The response will be a JSON string.
json_output = response.choices[0].message.content
# Pydantic parses the JSON string and validates it against the UserInfo model.
# If the LLM returned invalid JSON or JSON that doesn't match the schema, Pydantic will raise a ValidationError.
return UserInfo.model_validate_json(json_output)
# Example Usage
text1 = "My name is Alice and I love the color blue."
user_data1 = extract_user_info(text1)
print(f"User 1: Name={user_data1.name}, Favorite Color={user_data1.favorite_color}")
text2 = "Bob mentioned he's a big fan of green."
user_data2 = extract_user_info(text2)
print(f"User 2: Name={user_data2.name}, Favorite Color={user_data2.favorite_color}")
text3 = "The user's name is Carol."
user_data3 = extract_user_info(text3)
print(f"User 3: Name={user_data3.name}, Favorite Color={user_data3.favorite_color}")
The magic happens because when you set response_format={"type": "json_object"}, you’re not just asking the LLM to try to output JSON. You’re telling the OpenAI API to enforce that the LLM’s output must be valid JSON. If the LLM produces something that isn’t syntactically valid JSON, the API will reject it.
Then, you pair this with a Pydantic model. You feed the LLM a description of your desired data structure, often by including its JSON schema in the system prompt. The LLM understands this as a strict requirement for its output. It attempts to generate JSON that conforms to this schema.
The Pydantic model_validate_json method then takes the raw JSON string from the LLM and attempts to deserialize it into your Python object. Crucially, it also validates that the data types and structure match your Pydantic model. If the LLM generated JSON that looks like JSON but doesn’t have the right keys, or has keys with incorrect data types (e.g., favorite_color as a number instead of a string), Pydantic will raise a ValidationError. This gives you a robust way to ensure the data you get back is usable and correct.
The core problem this solves is the inherent unreliability of LLMs when asked to produce structured data. Without this mechanism, you’d get varied responses: sometimes perfect JSON, sometimes a string that looks like JSON but has minor syntax errors (e.g., trailing commas), sometimes a conversational response that just happens to contain some JSON-like text, and sometimes no JSON at all. You’d then have to write brittle post-processing code to clean up the LLM’s output, which would constantly break as the LLM’s generation patterns subtly shift.
By using response_format={"type": "json_object"} and a Pydantic model, you delegate the responsibility of generating and validating the structure to the LLM and the API. The LLM is penalized (in terms of its internal confidence and generation process) if it can’t produce valid, schema-compliant JSON. The OpenAI API acts as a gatekeeper, ensuring only valid JSON passes through. Pydantic then acts as a final, deterministic validator in your application.
The levers you control are primarily the Pydantic model itself and the system prompt. The Pydantic model defines the shape and types of the data you expect. The JSON schema derived from it (UserInfo.model_json_schema()) is what you provide to the LLM, often embedded in the system message. The LLM’s internal reasoning is then guided by this schema. You can add more fields, change types, add validation constraints (like Field(gt=0) for numbers), and use description fields within Pydantic to provide more context to the LLM about what each field represents, which improves its accuracy.
The "magic" of the LLM adhering to the schema isn’t just about following instructions; it’s a consequence of the training data and the fine-tuning process. Models are trained on vast amounts of text, including code and structured data. When you provide a JSON schema, you’re essentially giving the LLM a very precise blueprint. For models like gpt-4o or gpt-3.5-turbo-0125 with native JSON mode, this becomes an even more direct instruction that the API enforces.
One critical aspect often overlooked is how the LLM interprets the description fields in your Pydantic model. These aren’t just for human readability; they are a primary source of contextual information for the LLM. A well-crafted description can significantly improve the accuracy of the extracted data. For instance, if you have a status field, describing it as "The current status of the order, which can be one of 'pending', 'processing', 'shipped', or 'delivered'" is far more effective than just status: str. The LLM uses this to constrain its output to the allowed values, even if the input text is ambiguous.
The next step you’ll likely encounter is handling ValidationError exceptions gracefully when the LLM’s output, despite best efforts, doesn’t perfectly match your schema.