N
NexusRoute
Back to Models

Gemini 2.5 Pro

Googlefrontier

Google's thinking model that combines strong reasoning with native multimodal understanding and a 1M+ token context window. Features built-in Google Search grounding and code execution. Excels at long-context analysis, multimodal reasoning, and complex STEM tasks.

Released 2025-09-20Knowledge cutoff: 2025-06
Medium confidence|Updated 55d ago|90% source confidence

Specifications

Context Window

1.0M tokens

Max Output

65.5K tokens

Input Price

$1.25 / 1M tokens

Output Price

$10.00 / 1M tokens

Latency Tier

Moderate (speed score: 5.5/10)

Capability Profile

Long Context
10/10
Multimodal
9.5/10
Reasoning
9/10
Factuality
9/10
Coding
8.5/10
Structured Output
8.5/10
Instruction Following
8.5/10
Tool Use
8.5/10
Safety & Enterprise
8/10
Conversational
8/10
Creativity
7.5/10
Cost Efficiency
7/10
Speed
5.5/10

Feature Support

Vision Yes
Audio In Yes
Audio Out No
Video Yes
Image Generation No
Image Editing No
Function Calling Yes
JSON Mode Yes
Structured Output Yes
Streaming Yes
Reasoning Yes
Realtime No
Computer Use No
Web Search No

Best Use Cases

Analyzing entire codebases, books, or video corpora within its 1M context
Multimodal reasoning combining video, audio, and text in a single query
STEM research requiring thinking-mode reasoning with grounding
Multi-document synthesis and comparison across hundreds of sources
Data analysis with native code execution for verification

Not Ideal For

Simple tasks where its thinking overhead is wasteful and slow
Latency-sensitive interactive chat (thinking adds 5-20s)
Pure coding tasks where Claude Opus 4.6 is demonstrably better
Safety-critical enterprise use requiring Anthropic-level alignment

Strengths

Largest effective context window with strong recall — 1M tokens with good needle-in-haystack
Best-in-class multimodal understanding across text, images, audio, and video
Built-in thinking mode rivals o3 on many reasoning benchmarks
Google Search grounding reduces hallucination on factual queries
Native code execution allows self-verification of analytical results
Very competitive pricing for a frontier thinking model

Weaknesses

Thinking mode increases latency significantly (5-20s for complex queries)
Coding quality is strong but measurably below Claude Opus/Sonnet 4.6
Structured output compliance is less consistent than Anthropic models
Instruction following can be imprecise on complex multi-constraint prompts
Safety filtering can be unpredictable — occasionally over-refuses, occasionally under-refuses

Edge Cases & Notes

Thinking tokens are billed at the output rate — effective cost can be 2-4x visible output on hard problems
Google Search grounding adds latency but dramatically improves factuality on current events
Video analysis works best with clips under 30 minutes; quality degrades on very long videos
1M context pricing is tiered — significantly cheaper under 200K tokens

Provider Notes

Available through the Gemini API and Google Cloud Vertex AI. Free tier available with rate limits. Thinking mode can be controlled via API parameters. Google Search grounding requires separate API enablement.

Benchmarks

MMLU91.8%
HumanEval90.5%
Arena Elo1390

Benchmark Notes

MMLU-Pro 91.8%. Strong on GPQA Diamond (~68%). Best-in-class on long-context benchmarks (RULER, needle-in-haystack). Multimodal benchmarks are its strongest area. SWE-bench ~55%.

Research Meta

Last Evaluated

2026-04-01

Source Confidence

90%

Evaluation Method

LMSYS Arena, MMLU-Pro, GPQA, RULER long-context, multimodal evaluations, SWE-bench

Needs Re-evaluation

No

Sources

  • Google Gemini 2.5 Pro technical report
  • LMSYS Chatbot Arena
  • RULER long-context benchmark
  • Artificial Analysis