All articles
Model Comparisons

GPT-5 Review: What's New, What's Better, and What to Know

GPT-5 brings significant capability improvements over GPT-4o across reasoning, coding, and multimodal tasks. We tested it thoroughly and compare it to Claude 4 and Gemini 2.5.

Travis Johnson

Travis Johnson

Founder, Deepest

January 23, 202612 min read

GPT-5 is OpenAI's most capable model, released in spring 2025, and represents a substantial capability jump over GPT-4o. It leads on general benchmarks, coding performance, and multimodal tasks — but at a higher price point that makes it best suited for demanding professional and enterprise applications.

Note: This review covers GPT-5 as released in early 2025. OpenAI updates models periodically; check OpenAI's documentation for current pricing and specifications.

GPT-5 Benchmark Performance

Benchmark GPT-5 GPT-4o Claude 4 Opus Gemini 2.5 Ultra
MMLU 92.1% 87.2% 89.4% 91.8%
HumanEval 95.3% 90.2% 91.2% 88.5%
MATH 91.4% 76.6% 84.5% 92.1%
GPQA 75.8% 53.6% 70.1% 76.2%
MT-Bench 9.4 9.0 9.3 9.3

What's New in GPT-5

Substantially Better Reasoning

GPT-5's most notable improvement over GPT-4o is multi-step reasoning. Complex problems that required multiple retries or careful prompting with GPT-4o are handled more reliably by GPT-5 on the first attempt. The 91.4% MATH score represents a 15-point improvement over GPT-4o, reflecting real gains in structured reasoning.

Unlike o3, GPT-5 achieves this without extended chain-of-thought computation — it's a faster, more efficient reasoning improvement rather than inference-time scaling.

Significantly Better Coding

GPT-5's 95.3% HumanEval score places it among the best coding models available — second only to o3 (96.7%) among models we've tested. Real-world coding improvements are most visible in:

  • Architectural suggestions for complex systems
  • Better understanding of library-specific patterns and idioms
  • More accurate debugging on multi-function bugs
  • Stronger TypeScript and React code generation

Reduced Hallucination Rate

OpenAI reports and independent testing confirm that GPT-5 has a meaningfully lower hallucination rate than GPT-4o. Factual accuracy on direct questions improved by approximately 15–20% in our testing. Citation accuracy (when asked to reference specific claims) improved substantially.

Improved Instruction Following

GPT-5 follows complex, multi-part instructions more reliably than GPT-4o. The gap with Claude 3.5 Sonnet — which was previously better at instruction adherence — has narrowed significantly. GPT-5 now honors multi-constraint prompts almost as reliably as Claude.

GPT-5 vs GPT-4o: When to Upgrade

GPT-5 is better than GPT-4o across the board, but at roughly 3x the API cost. The upgrade is worth it when:

  • Complex coding tasks requiring architectural judgment
  • Research tasks where accuracy is critical
  • Long-form analysis requiring sustained reasoning quality
  • Tasks where GPT-4o frequently fails or requires multiple retries

For everyday queries, summarization, and simple writing, GPT-4o remains cost-effective. Don't default to GPT-5 for tasks where GPT-4o already performs well.

GPT-5 vs Claude 4 Opus

Both are frontier models. GPT-5 leads on coding and general knowledge benchmarks. Claude 4 Opus leads on instruction following precision and long-form writing quality. Their GPQA scores are nearly identical (75.8% vs 70.1%), suggesting comparable scientific reasoning.

The practical choice often depends on use case: for writing and precise instruction adherence, Claude 4 Opus. For coding, research breadth, and versatility, GPT-5.

Pricing

Model Input (per M tokens) Output (per M tokens)
GPT-5 $7.50 $30.00
GPT-4o $2.50 $10.00
GPT-4o mini $0.15 $0.60

GPT-5 is 3x more expensive than GPT-4o per token. This pricing tier is between GPT-4o and Claude 4 Opus ($15/M) — reasonable for frontier capability. For high-volume applications, the cost difference from GPT-4o is significant; budget accordingly.

Multimodal Improvements

GPT-5 processes images, audio, and text with improved accuracy compared to GPT-4o. Image understanding is more precise — particularly for complex diagrams, scientific figures, and technical documentation. Audio transcription accuracy improved meaningfully.

Video understanding remains an area where Gemini leads — GPT-5 can process images but not native video.

Context Window

GPT-5 supports 128,000 token context — the same as GPT-4o. This is sufficient for most tasks but falls behind Claude (200K) and Gemini (1M) for very long document work. If context window is a primary constraint, neither GPT model is the right choice.

Real-World Task Performance Summary

Task GPT-5 Score vs GPT-4o
Complex coding Excellent +15%
Mathematical reasoning Excellent +15%
Scientific analysis Excellent +22%
Long-form writing Very good +8%
Instruction following Excellent +10%
Image understanding Excellent +12%
Simple Q&A Excellent +3% (marginal)

Frequently Asked Questions

Is GPT-5 available in ChatGPT?

Yes, GPT-5 is available to ChatGPT Plus and Pro subscribers. ChatGPT's consumer interface may use a slightly different version than the raw API model. Check OpenAI's documentation for current model availability in each tier.

Is GPT-5 better than o3?

Depends on the task. o3 is superior on hard math and logic tasks — its 97.1% MATH score versus GPT-5's 91.4% reflects the advantage of extended reasoning computation. For general tasks, writing, and speed, GPT-5 is better. o3 is slower and more expensive.

When was GPT-5 released?

GPT-5 was released in spring 2025 through OpenAI's API and ChatGPT. The model represented OpenAI's first GPT-5 series model, following GPT-4 and GPT-4o.

Can I fine-tune GPT-5?

OpenAI has expanded fine-tuning availability to more models over time. Check OpenAI's fine-tuning documentation for current availability and pricing. Fine-tuning costs are separate from inference pricing.

GPT-5OpenAIreviewGPT-4ocomparison

See it for yourself

Run any prompt across ChatGPT, Claude, Gemini, and 300+ other models simultaneously. Free to try, no credit card required.

Try Deepest free →

Related articles