Chain-of-Thought Prompting: Make LLMs Show Their Work (2026)

Chain-of-Thought (CoT) prompting is the secret sauce that makes Large Language Models (LLMs) surprisingly good at tasks requiring multi-step reasoning, not by giving them more knowledge, but by forcing them to show their work.

Let’s see it in action. Imagine we have a simple math problem:

Prompt: "Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A:"

Without CoT, an LLM might just blurt out an answer, potentially incorrect: 11

Now, let’s try CoT by simply adding "Let’s think step by step." to the prompt:

Prompt: "Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Let’s think step by step."

The LLM’s output becomes: Roger started with 5 tennis balls. He bought 2 cans of tennis balls. Each can has 3 tennis balls, so he bought 2 * 3 = 6 tennis balls. In total, he now has 5 + 6 = 11 tennis balls. The final answer is 11.

Notice how it breaks down the problem: identifying the initial state, calculating the new additions, and then combining them. This isn’t magic; it’s a structured way of eliciting reasoning.

The core problem CoT solves is the LLM’s tendency to jump to conclusions. LLMs are fundamentally pattern-matching machines trained on vast amounts of text. When faced with a complex query, they might find a superficial pattern that looks like the answer without actually performing the underlying logical operations. CoT forces the model to generate intermediate reasoning steps, which are themselves part of the training data distribution. By generating these steps, the LLM is guided through a more robust inferential process, similar to how a human would solve a problem by writing down their thoughts.

The key levers you control are the prompt itself. There are a few main ways to implement CoT:

Zero-shot CoT: As shown above, simply appending "Let’s think step by step." to the end of your question triggers the model to generate reasoning. This is the easiest and often surprisingly effective method.
Few-shot CoT: You provide a few examples within the prompt that demonstrate the desired step-by-step reasoning. This gives the model more explicit guidance.

Prompt: `Q: John has 3 apples. He gives 1 to Mary. How many does he have left? A: John started with 3 apples. He gave 1 away. So he has 3 - 1 = 2 apples left. The final answer is 2.

Q: Sarah has 5 cookies. She bakes 2 more. How many does she have now? A: Sarah started with 5 cookies. She baked 2 more. So she has 5 + 2 = 7 cookies now. The final answer is 7.

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A:`

The LLM will then follow the pattern established by the examples.
Self-Consistency: This is a more advanced technique that builds on CoT. Instead of just taking the first reasoning path the LLM generates, you ask the LLM to generate multiple reasoning paths (e.g., by sampling with a higher temperature or running the zero-shot CoT prompt multiple times). You then take a majority vote on the final answer across all these paths. This significantly improves accuracy, especially on complex arithmetic or logical puzzles, because it leverages the LLM’s ability to explore different solutions and converge on the most robust one.

The power of CoT lies in its ability to decompose complex problems into simpler, solvable sub-problems, allowing the LLM to leverage its pattern-matching capabilities more effectively for logical deduction rather than direct recall. It’s not about injecting new knowledge, but about structuring the application of existing knowledge.

When using few-shot CoT, the exact formatting of your examples matters. The LLM learns the structure of the reasoning, so a consistent format for introducing the problem, stating the intermediate steps, and presenting the final answer is crucial. If you use bullet points for steps in one example and a narrative paragraph in another, the model might struggle to generalize.

The next frontier is understanding how to automatically extract and verify these generated reasoning steps, moving beyond simple answer aggregation to true programmatic verification of LLM logic.