Reasoning models like OpenAI's o3, Google's Gemini Thinking, and Anthropic's Claude Extended Thinking respond differently to prompts than standard models. Vague problems waste expensive thinking tokens; precisely framed problems unlock dramatically better results. Here's how to prompt reasoning models effectively.
What Makes Reasoning Models Different
Standard AI models generate responses by predicting the next token. Reasoning models add a "thinking" phase — they generate internal chain-of-thought computation before producing a final answer. This thinking phase can involve thousands of additional tokens of intermediate reasoning, self-correction, and verification.
The result is substantially better performance on hard problems — but at 3–10x the cost and with much slower response times. Using reasoning models on simple tasks wastes money. Using them without understanding how they process prompts wastes their capability.
Key Differences from Standard Model Prompting
| Aspect | Standard Models | Reasoning Models |
|---|---|---|
| Chain-of-thought prompting | Helps ("think step by step") | Redundant — they do this automatically |
| Self-verification requests | Useful ("check your answer") | Redundant — they self-verify automatically |
| Problem specificity | Moderate specificity needed | High specificity required (every constraint matters) |
| Optimal prompt length | Concise is often better | Comprehensive upfront context is better |
| Response time | Fast (seconds) | Slow (10–120 seconds for hard problems) |
| Best for | Most everyday tasks | Hard math, logic, architecture, analysis |
The Core Rule: Specify Everything Upfront
Reasoning models reason about the problem you give them — if your problem statement is ambiguous, the model will reason about one interpretation and may miss the one you intended. Unlike standard models where you can clarify in follow-up messages at low cost, with reasoning models every misinterpretation wastes expensive thinking tokens.
Before sending a reasoning model prompt, ask yourself:
- Have I stated all constraints the solution must satisfy?
- Have I defined all terms that could be interpreted multiple ways?
- Have I specified the desired output format?
- Is there any background context the model needs to reason correctly?
Framing Problems for Extended Reasoning
Mathematical Problems
State the problem completely, including all constraints, variable definitions, and the specific quantity to solve for. Don't abbreviate — reasoning models work better with full problem statements.
Strong prompt: "I need to optimize this Python function for time complexity. Current complexity is O(n²). Constraints: (1) input is a sorted array of integers, (2) must return all pairs summing to target k, (3) output order doesn't matter, (4) can use O(n) additional space. Find the most efficient algorithm and explain why it's optimal. [code]"
Logic and Reasoning Problems
Define all entities and relationships explicitly. Reasoning models can handle complex multi-entity reasoning, but they need the problem fully specified, not inferred from context.
Code Architecture Problems
Provide full context: scale requirements, existing technology constraints, team capabilities, and performance requirements. Reasoning models produce better architecture recommendations with comprehensive constraints than with underspecified requirements.
Strong prompt: "I'm building a SaaS application that needs to: (1) store 100M+ user events per day, (2) support real-time queries for the last 7 days of events, (3) support batch analytics queries on historical data, (4) run on AWS with a budget of $2,000/month for database infrastructure. We have 2 engineers with PostgreSQL experience but no NoSQL experience. What database architecture should we use? Consider time-series databases, event streaming solutions, and hybrid approaches. Justify your recommendation."
What NOT to Do with Reasoning Models
- Don't add "think step by step": This is the standard model technique to trigger reasoning. Reasoning models do this automatically and the phrase is redundant.
- Don't add "check your work" or "verify your answer": Same — built into the reasoning process.
- Don't use for simple tasks: Asking a reasoning model to write a quick email or summarize a short document wastes money. Use standard models for everyday tasks.
- Don't expect fast responses: Hard problems can take 30–120 seconds. Don't interrupt or retry before the model completes its reasoning.
- Don't iterate excessively: Each iteration costs thinking tokens. Get the prompt right before running it, rather than refining through repeated runs.
Evaluating Chain-of-Thought Outputs
When reasoning models expose their thinking process (Claude Extended Thinking shows a summary; DeepSeek R1 shows the full chain-of-thought), reading the reasoning is valuable:
- The model's stated reasoning should make sense given your problem constraints
- If the model correctly identifies your key constraints and addresses them in sequence, the answer is likely correct
- If the reasoning takes an unexpected path, investigate whether the model misunderstood a constraint
- For math problems, verify the key calculation steps even if the final answer looks right
When to Use Which Reasoning Model
| Model | Best For | Cost Level |
|---|---|---|
| o3 (OpenAI) | Hardest math, science, research problems | Very high (~8x GPT-4o) |
| o4-mini (OpenAI) | Math, coding, reasoning at reasonable cost | Medium (~2x GPT-4o) |
| DeepSeek R1 | Math and reasoning, cost-sensitive applications | Low (cheaper than GPT-4o) |
| Gemini 2.0 Thinking | Multimodal reasoning, science with visual data | Medium (~3x GPT-4o) |
| Claude Extended Thinking | Complex analysis, legal reasoning, transparent logic | Medium-High |
Frequently Asked Questions
How do I know if my task needs a reasoning model?
Use a reasoning model when: the task requires many sequential reasoning steps, getting it wrong would be costly, the problem has complex constraints that all must be satisfied simultaneously, or a standard model has failed on the task. For everything else, use a standard model.
Can reasoning models still give wrong answers?
Yes. Reasoning models make fewer errors than standard models on hard problems, but they still fail — especially on novel problems outside their training distribution, problems with ambiguous constraint specifications, and problems requiring real-world knowledge beyond training cutoff.
Is DeepSeek R1 as good as o3?
On mathematical benchmarks, DeepSeek R1 matches o3. On broader science and reasoning benchmarks, o3 leads. For math-heavy tasks where DeepSeek's data sovereignty considerations are acceptable, DeepSeek R1 offers near-o3 quality at dramatically lower cost.
Should I always wait for the full reasoning before reading the answer?
Yes. Reasoning models sometimes revise their answers partway through the thinking process. Reading the response before the model completes its reasoning can give you an intermediate, potentially incorrect answer. Wait for the full response.