Reasoning models like OpenAI's o3, Google's Gemini Thinking, and Anthropic's Claude Extended Thinking respond differently to prompts than standard models. Vague problems waste expensive thinking tokens; precisely framed problems unlock dramatically better results. Here's how to prompt reasoning models effectively.

What Makes Reasoning Models Different

Standard AI models generate responses by predicting the next token. Reasoning models add a "thinking" phase — they generate internal chain-of-thought computation before producing a final answer. This thinking phase can involve thousands of additional tokens of intermediate reasoning, self-correction, and verification.

The result is substantially better performance on hard problems — but at 3–10x the cost and with much slower response times. Using reasoning models on simple tasks wastes money. Using them without understanding how they process prompts wastes their capability.

Key Differences from Standard Model Prompting

Aspect	Standard Models	Reasoning Models
Chain-of-thought prompting	Helps ("think step by step")	Redundant — they do this automatically
Self-verification requests	Useful ("check your answer")	Redundant — they self-verify automatically
Problem specificity	Moderate specificity needed	High specificity required (every constraint matters)
Optimal prompt length	Concise is often better	Comprehensive upfront context is better
Response time	Fast (seconds)	Slow (10–120 seconds for hard problems)
Best for	Most everyday tasks	Hard math, logic, architecture, analysis

The Core Rule: Specify Everything Upfront

Reasoning models reason about the problem you give them — if your problem statement is ambiguous, the model will reason about one interpretation and may miss the one you intended. Unlike standard models where you can clarify in follow-up messages at low cost, with reasoning models every misinterpretation wastes expensive thinking tokens.

Before sending a reasoning model prompt, ask yourself:

Have I stated all constraints the solution must satisfy?
Have I defined all terms that could be interpreted multiple ways?
Have I specified the desired output format?
Is there any background context the model needs to reason correctly?

Framing Problems for Extended Reasoning

Mathematical Problems

State the problem completely, including all constraints, variable definitions, and the specific quantity to solve for. Don't abbreviate — reasoning models work better with full problem statements.

Weak prompt: "How do I optimize this function? [code]"

Strong prompt: "I need to optimize this Python function for time complexity. Current complexity is O(n²). Constraints: (1) input is a sorted array of integers, (2) must return all pairs summing to target k, (3) output order doesn't matter, (4) can use O(n) additional space. Find the most efficient algorithm and explain why it's optimal. [code]"

Logic and Reasoning Problems

Define all entities and relationships explicitly. Reasoning models can handle complex multi-entity reasoning, but they need the problem fully specified, not inferred from context.

Code Architecture Problems

Provide full context: scale requirements, existing technology constraints, team capabilities, and performance requirements. Reasoning models produce better architecture recommendations with comprehensive constraints than with underspecified requirements.

Weak prompt: "What database should I use for my app?"

Strong prompt: "I'm building a SaaS application that needs to: (1) store 100M+ user events per day, (2) support real-time queries for the last 7 days of events, (3) support batch analytics queries on historical data, (4) run on AWS with a budget of $2,000/month for database infrastructure. We have 2 engineers with PostgreSQL experience but no NoSQL experience. What database architecture should we use? Consider time-series databases, event streaming solutions, and hybrid approaches. Justify your recommendation."

What NOT to Do with Reasoning Models

Don't add "think step by step": This is the standard model technique to trigger reasoning. Reasoning models do this automatically and the phrase is redundant.
Don't add "check your work" or "verify your answer": Same — built into the reasoning process.
Don't use for simple tasks: Asking a reasoning model to write a quick email or summarize a short document wastes money. Use standard models for everyday tasks.
Don't expect fast responses: Hard problems can take 30–120 seconds. Don't interrupt or retry before the model completes its reasoning.
Don't iterate excessively: Each iteration costs thinking tokens. Get the prompt right before running it, rather than refining through repeated runs.

Evaluating Chain-of-Thought Outputs

When reasoning models expose their thinking process (Claude Extended Thinking shows a summary; DeepSeek R1 shows the full chain-of-thought), reading the reasoning is valuable:

The model's stated reasoning should make sense given your problem constraints
If the model correctly identifies your key constraints and addresses them in sequence, the answer is likely correct
If the reasoning takes an unexpected path, investigate whether the model misunderstood a constraint
For math problems, verify the key calculation steps even if the final answer looks right

When to Use Which Reasoning Model

Model	Best For	Cost Level
o3 (OpenAI)	Hardest math, science, research problems	Very high (~8x GPT-4o)
o4-mini (OpenAI)	Math, coding, reasoning at reasonable cost	Medium (~2x GPT-4o)
DeepSeek R1	Math and reasoning, cost-sensitive applications	Low (cheaper than GPT-4o)
Gemini 2.0 Thinking	Multimodal reasoning, science with visual data	Medium (~3x GPT-4o)
Claude Extended Thinking	Complex analysis, legal reasoning, transparent logic	Medium-High

Frequently Asked Questions

How do I know if my task needs a reasoning model?

Use a reasoning model when: the task requires many sequential reasoning steps, getting it wrong would be costly, the problem has complex constraints that all must be satisfied simultaneously, or a standard model has failed on the task. For everything else, use a standard model.

Can reasoning models still give wrong answers?

Yes. Reasoning models make fewer errors than standard models on hard problems, but they still fail — especially on novel problems outside their training distribution, problems with ambiguous constraint specifications, and problems requiring real-world knowledge beyond training cutoff.

Is DeepSeek R1 as good as o3?

On mathematical benchmarks, DeepSeek R1 matches o3. On broader science and reasoning benchmarks, o3 leads. For math-heavy tasks where DeepSeek's data sovereignty considerations are acceptable, DeepSeek R1 offers near-o3 quality at dramatically lower cost.

Should I always wait for the full reasoning before reading the answer?

Yes. Reasoning models sometimes revise their answers partway through the thinking process. Reading the response before the model completes its reasoning can give you an intermediate, potentially incorrect answer. Wait for the full response.

How to Write Prompts for Reasoning Models (o3, Gemini Thinking, Claude Extended)