o3
OpenAIspecializedOpenAI's full-power reasoning model. Uses extended chain-of-thought to solve the hardest problems in math, science, and formal logic. Slower and more expensive than standard models but achieves state-of-the-art accuracy on competition-level benchmarks. Best reserved for genuinely hard reasoning tasks.
Specifications
200K tokens
100K tokens
$1.00 / 1M tokens
$4.00 / 1M tokens
Slow (speed score: 3.5/10)
Capability Profile
Feature Support
Best Use Cases
Not Ideal For
Strengths
Weaknesses
Edge Cases & Notes
Provider Notes
Use only when the task genuinely requires deep reasoning. For most coding and general tasks, GPT-5.4 or o4-mini are better value. Available through OpenAI API with Tier 3+ access.
Benchmarks
Benchmark Notes
GSM8K near-saturated at 99.1%. AIME 2025 score ~85%. GPQA Diamond >70%. Top reasoning model on most hard benchmarks. Arena Elo reflects reasoning tasks specifically.
Research Meta
Last Evaluated
2026-03-15
Source Confidence
91%
Evaluation Method
AIME 2025, GPQA Diamond, GSM8K, MATH, SWE-bench, LMSYS Arena (reasoning category)
Needs Re-evaluation
NoSources
- OpenAI o3 system card
- LMSYS Chatbot Arena
- AIME 2025 evaluation results
- Independent reasoning benchmarks