Model Intelligence
24 models across 8 providers. Filter by tier and capabilities; toggle routing per model.
Showing 24 models
New since last refresh
Last 30 daysOutdated
This model may be behind newer options.
Gemini 3.1 Pro
Google's latest reasoning-first frontier model, still in preview. Built from the ground up for agentic workflows with native planning, tool orchestration, and self-verification. Early benchmarks suggest it rivals Claude Opus 4.6 on coding and exceeds Gemini 2.5 Pro on reasoning.
Context
1M
In / 1M
$2.00
Out / 1M
$12.00
Latency
1000ms
Cutoff: 2026-01
Needs review · verified 13d ago
Top scores
reasoning
10
longContext
10
multimodal
10
toolUse
10
Best for
Avoid for
All models
GPT-5.4
OpenAI's flagship model and one of the most capable general-purpose LLMs available. Natively multimodal with vision, audio, reasoning, tool use, computer use, and web search. Excels across virtually every dimension with a 1M token context window and 128K output.
Context
1M
In / 1M
$2.50
Out / 1M
$15.00
Latency
800ms
Cutoff: 2025-11
High confidence · verified 13d ago
Top scores
toolUse
10
reasoning
10
coding
10
longContext
10
Best for
Avoid for
GPT-5.4 mini
OpenAI's balanced mid-tier model that inherits many GPT-5.4 capabilities at roughly 70% reduced cost. Supports vision, tool use, and computer use with a 400K context window. The workhorse for production applications where quality and cost both matter.
Context
400K
In / 1M
$0.750
Out / 1M
$4.50
Latency
500ms
Cutoff: 2025-11
High confidence · verified 13d ago
Top scores
structuredOutput
9
instructionFollowing
9
toolUse
9
coding
9
Best for
Avoid for
GPT-5.4 nano
OpenAI's ultra-efficient budget model designed for high-volume production workloads. Supports tool calling and MCP natively with a 400K context window at just $0.20/M input tokens. Replaces GPT-4o-mini as the go-to budget option.
Context
400K
In / 1M
$0.200
Out / 1M
$1.25
Latency
300ms
Cutoff: 2025-11
High confidence · verified 13d ago
Top scores
speed
10
costEfficiency
10
structuredOutput
9
toolUse
9
Best for
Avoid for
o3
OpenAI's full-power reasoning model. Uses extended chain-of-thought to solve the hardest problems in math, science, and formal logic. Slower and more expensive than standard models but achieves state-of-the-art accuracy on competition-level benchmarks. Best reserved for genuinely hard reasoning tasks.
Context
200K
In / 1M
$1.00
Out / 1M
$4.00
Latency
1500ms
Cutoff: 2025-06
High confidence · verified 30d ago
Top scores
reasoning
10
factuality
10
coding
9
safetyEnterprise
9
Best for
Avoid for
o4-mini
OpenAI's efficient reasoning model that balances o3-level thinking with significantly lower cost and latency. Matches o3 on many reasoning benchmarks while being faster and cheaper. The recommended reasoning model for most production use cases.
Context
200K
In / 1M
$0.550
Out / 1M
$2.20
Latency
1000ms
Cutoff: 2025-06
High confidence · verified 30d ago
Top scores
reasoning
9
coding
9
factuality
9
structuredOutput
8
Best for
Avoid for
Claude Opus 4.6
Anthropic's flagship model and widely regarded as the best coding model in the world. Achieves the highest SWE-bench Verified score of any model. Features a 1M context window (beta), native computer use, and Anthropic's industry-leading safety alignment. The premium choice for complex software engineering and enterprise applications.
Context
1M
In / 1M
$5.00
Out / 1M
$25.00
Latency
1100ms
Cutoff: 2025-10
High confidence · verified 13d ago
Top scores
coding
10
instructionFollowing
10
safetyEnterprise
10
reasoning
10
Best for
Avoid for
Claude Sonnet 4.6
Anthropic's balanced frontier model that approaches Opus 4.6 quality at 60% lower cost. An exceptional coder in its own right with strong reasoning, 1M context (beta), and Anthropic's safety alignment. The recommended default for most Anthropic API users.
Context
1M
In / 1M
$3.00
Out / 1M
$15.00
Latency
800ms
Cutoff: 2025-10
High confidence · verified 13d ago
Top scores
coding
10
structuredOutput
10
instructionFollowing
10
safetyEnterprise
10
Best for
Avoid for
Claude Haiku 4.5
Anthropic's fast and affordable model with industry-leading safety for its tier. Matches the original Claude 3.5 Sonnet on many tasks at a fraction of the cost. The go-to Anthropic model for high-volume safety-conscious deployments.
Context
200K
In / 1M
$0.800
Out / 1M
$4.00
Latency
400ms
Cutoff: 2025-07
High confidence · verified 44d ago
Top scores
safetyEnterprise
10
speed
9
structuredOutput
9
instructionFollowing
9
Best for
Avoid for
Gemini 2.5 Pro
Google's thinking model that combines strong reasoning with native multimodal understanding and a 1M+ token context window. Features built-in Google Search grounding and code execution. Excels at long-context analysis, multimodal reasoning, and complex STEM tasks.
Context
1M
In / 1M
$1.25
Out / 1M
$10.00
Latency
1100ms
Cutoff: 2025-06
High confidence · verified 13d ago
Top scores
longContext
10
multimodal
10
reasoning
9
factuality
9
Best for
Avoid for
Gemini 2.5 Flash
Google's fast and affordable thinking model with native multimodal support and a 1M token context window. Combines reasoning capabilities with exceptional speed and low cost. One of the best value models available for multimodal and long-context workloads.
Context
1M
In / 1M
$0.150
Out / 1M
$0.600
Latency
300ms
Cutoff: 2025-06
High confidence · verified 13d ago
Top scores
speed
10
longContext
9
costEfficiency
9
multimodal
9
Best for
Avoid for
Gemini 2.5 Flash Lite
Google's ultra-budget multimodal model. The cheapest model with native vision, audio, and video understanding available anywhere. Designed for extreme-volume workloads where cost is the primary constraint.
Context
1M
In / 1M
$0.075
Out / 1M
$0.300
Latency
200ms
Cutoff: 2025-06
High confidence · verified 30d ago
Top scores
speed
10
costEfficiency
10
longContext
8
multimodal
8
Best for
Avoid for
Grok 4.20
xAI's flagship model with a massive 2M token context window, strong reasoning capabilities, and vision support. Known for its straightforward, less filtered conversational style and real-time information access through X integration. Competitive with GPT-5.4 on reasoning benchmarks.
Context
2M
In / 1M
$2.00
Out / 1M
$6.00
Latency
800ms
Cutoff: 2025-12
High confidence · verified 30d ago
Top scores
longContext
10
reasoning
9
coding
9
conversational
9
Best for
Avoid for
Grok 4.1 Fast
xAI's efficient budget model with the same 2M context window as Grok 4.20 at a fraction of the cost. Fast reasoning at $0.20/$0.50 makes it one of the cheapest models with genuine reasoning capability. Ideal for high-volume analytical workloads.
Context
2M
In / 1M
$0.200
Out / 1M
$0.500
Latency
400ms
Cutoff: 2025-09
High confidence · verified 44d ago
Top scores
costEfficiency
10
longContext
9
speed
9
reasoning
7
Best for
Avoid for
Llama 4 Scout
Meta's efficient MoE model with 109B total parameters but only 17B active per token. Features a groundbreaking 10M token context window and native multimodal support. Fits on a single GPU for inference, making it the premier self-hosted option for long-context multimodal work. Open-weight under Llama license.
Context
10M
In / 1M
$0.150
Out / 1M
$0.400
Latency
600ms
Cutoff: 2025-08
High confidence · verified 30d ago
Top scores
longContext
10
costEfficiency
9
speed
8
reasoning
7
Best for
Avoid for
Llama 4 Maverick
Meta's most capable open-weight model. A 400B MoE (17B active) with native multimodal support and a 1M token context window. Approaches closed-source frontier quality on many benchmarks while being fully open-weight. Competitive with GPT-5.4-mini and Claude Sonnet 4.6.
Context
1M
In / 1M
$0.500
Out / 1M
$1.50
Latency
900ms
Cutoff: 2025-08
High confidence · verified 30d ago
Top scores
longContext
9
reasoning
8
coding
8
costEfficiency
8
Best for
Avoid for
DeepSeek V3.2
DeepSeek's updated general-purpose MoE model with 671B total parameters. Offers frontier-competitive quality at ultra-low cost through its efficient MoE architecture. Open-weight and available through the DeepSeek API and numerous third-party providers. The cost-efficiency champion.
Context
128K
In / 1M
$0.270
Out / 1M
$1.10
Latency
800ms
Cutoff: 2025-10
High confidence · verified 30d ago
Top scores
costEfficiency
10
coding
9
reasoning
8
structuredOutput
8
Best for
Avoid for
DeepSeek R1
DeepSeek's reasoning-specialized model built on the 671B MoE architecture. Rivals OpenAI's o1 on reasoning benchmarks at a tiny fraction of the cost. Uses visible chain-of-thought reasoning (unlike o1/o3 where reasoning is hidden). Open-weight and fully inspectable.
Context
128K
In / 1M
$0.550
Out / 1M
$2.19
Latency
1400ms
Cutoff: 2025-01
High confidence · verified 44d ago
Top scores
reasoning
10
coding
9
costEfficiency
9
factuality
9
Best for
Avoid for
Mistral Large 3
Mistral's most capable model. A 675B MoE (41B active) with 256K context, native multimodal support, and full Apache 2.0 open-weight licensing. The strongest truly open-source frontier model. Excels at multilingual tasks with particular strength in European languages.
Context
256K
In / 1M
$2.00
Out / 1M
$6.00
Latency
1000ms
Cutoff: 2025-09
High confidence · verified 30d ago
Top scores
reasoning
9
coding
9
longContext
8
structuredOutput
8
Best for
Avoid for
Mistral Medium 3
Mistral's balanced mid-tier model offering 8x cheaper pricing than frontier models while retaining strong quality. 128K context with good multilingual and coding capabilities. Designed as the everyday workhorse for Mistral API users.
Context
128K
In / 1M
$0.400
Out / 1M
$2.00
Latency
600ms
Cutoff: 2025-10
High confidence · verified 44d ago
Top scores
costEfficiency
9
speed
8
structuredOutput
8
instructionFollowing
8
Best for
Avoid for
Mistral Small 4
Mistral's efficient MoE model with 119B total parameters but only 6B active per token. Features a 256K context window, reasoning mode, and vision support. Open-weight under Apache 2.0. Designed for self-hosting on modest hardware while providing strong reasoning capabilities.
Context
256K
In / 1M
$0.100
Out / 1M
$0.300
Latency
400ms
Cutoff: 2025-12
High confidence · verified 30d ago
Top scores
costEfficiency
10
speed
9
longContext
8
structuredOutput
7
Best for
Avoid for
Qwen 3.5
Alibaba's open-weight MoE model with 397B total parameters (17B active). Supports 201 languages — the most multilingual model available. Features a 262K context window and Apache 2.0 licensing. Particularly strong on CJK languages and emerging market languages.
Context
262K
In / 1M
$0.300
Out / 1M
$1.20
Latency
800ms
Cutoff: 2025-10
High confidence · verified 30d ago
Top scores
costEfficiency
9
coding
8
reasoning
8
longContext
8
Best for
Avoid for
Outdated
This model may be behind newer options.
Qwen 3.6 Plus
Alibaba's proprietary frontier model with a 1M context window. Rivals Claude Opus 4.5-class performance on SWE-bench and achieves strong results across coding, reasoning, and multilingual benchmarks. The most capable model from a Chinese AI lab, pushing into frontier territory previously dominated by Western providers.
Context
1M
In / 1M
$1.50
Out / 1M
$6.00
Latency
900ms
Cutoff: 2026-01
Needs review · verified 13d ago
Top scores
reasoning
9
coding
9
longContext
9
structuredOutput
9
Best for
Avoid for
Llama 3.3 70B
Meta's 70B dense model from the Llama 3.3 generation. Still widely used for self-hosted deployments due to its straightforward dense architecture, strong fine-tuning ecosystem, and proven reliability. Not the most capable model anymore, but the most battle-tested open-weight option with massive community support.
Context
128K
In / 1M
$0.180
Out / 1M
$0.180
Latency
600ms
Cutoff: 2024-10
Medium confidence · verified 72d ago
Top scores
costEfficiency
9
speed
8
reasoning
7
coding
7
Best for
Avoid for