OpenAI’s o1 and o3 models aren’t just better versions of their predecessors; they represent a fundamental shift towards models that can reason about tasks, not just predict the next token.

Let’s see this in action. Imagine you want to extract structured data from a messy, free-form text.

{
  "model": "o3-medium",
  "messages": [
    {"role": "system", "content": "You are a data extraction assistant. Extract the following information: Customer Name, Order ID, and Total Amount. Format the output as a JSON object."},
    {"role": "user", "content": "Hi, I'm Jane Doe. I placed an order yesterday, order #12345, and the total was $99.99. Can you confirm this?"}
  ]
}

The o3-medium model, given this prompt, will produce:

{
  "Customer Name": "Jane Doe",
  "Order ID": "12345",
  "Total Amount": "$99.99"
}

This isn’t just pattern matching; the model understood the intent of the request and the semantic meaning of the entities within the text to fulfill the structured output requirement.

The core problem these models address is the brittle nature of traditional NLP pipelines. Before, you’d need separate components for entity recognition, intent classification, and then custom logic to stitch it all together. Each component needed fine-tuning, and errors cascaded. With o1 and o3, you can often achieve the same outcome with a single API call by providing clear instructions in the system prompt.

Internally, these models are built on advanced transformer architectures, but with significant enhancements in their attention mechanisms and training methodologies. The key is their ability to maintain and manipulate a more coherent internal state representing the problem at hand. This allows them to follow multi-step instructions, perform logical deductions, and adhere to complex output formats. Think of it as giving the model a scratchpad to work out the problem before giving you the final answer.

The primary lever you control is the system message. This is where you define the model’s persona, its objective, and any constraints or desired output formats. For instance, you can specify:

  • Task Definition: "You are a customer support agent. Respond to user inquiries about product features."
  • Output Constraints: "Provide your answer as a bulleted list." or "Only respond with 'Yes' or 'No'."
  • Persona: "Act as a sarcastic but helpful chatbot."
  • Data Formatting: "Extract the date and time and present them in ISO 8601 format."

The messages array, containing user and assistant turns, provides the context. The model uses this conversational history to understand the ongoing dialogue and refine its responses.

When using these models, it’s crucial to understand that "reasoning" doesn’t imply true consciousness or understanding in the human sense. It’s a sophisticated form of pattern recognition and learned inference based on massive datasets. The model has learned to associate certain input structures and prompts with specific output structures and content, simulating a reasoning process. The quality of its "reasoning" is directly proportional to the clarity and specificity of your prompt and the model’s training data. If the training data lacks examples of a particular type of reasoning or if the prompt is ambiguous, the model will likely falter.

The next frontier you’ll encounter is managing the increasing complexity of multi-turn reasoning and complex tool use, where models need to decide which external API or function to call based on the user’s request.

Want structured learning?

Take the full Openai-api course →