N
NexusRoute
Back to Models

GPT-5.4

OpenAIfrontier

OpenAI's flagship model and one of the most capable general-purpose LLMs available. Natively multimodal with vision, audio, reasoning, tool use, computer use, and web search. Excels across virtually every dimension with a 1M token context window and 128K output.

Released 2026-02-18Knowledge cutoff: 2025-11

Specifications

Context Window

1M tokens

Max Output

128K tokens

Input Price

$2.50 / 1M tokens

Output Price

$15.00 / 1M tokens

Latency Tier

Fast (speed score: 7/10)

Capability Profile

Tool Use
10/10
Reasoning
9.5/10
Coding
9.5/10
Long Context
9.5/10
Structured Output
9.5/10
Multimodal
9.5/10
Factuality
9.5/10
Instruction Following
9.5/10
Conversational
9.5/10
Creativity
9/10
Safety & Enterprise
9/10
Speed
7/10
Cost Efficiency
5.5/10

Feature Support

Vision Yes
Audio In Yes
Audio Out No
Video Yes
Image Generation No
Image Editing No
Function Calling Yes
JSON Mode Yes
Structured Output Yes
Streaming Yes
Reasoning Yes
Realtime No
Computer Use Yes
Web Search Yes

Best Use Cases

Complex agentic workflows requiring tool orchestration, web browsing, and computer use
Multimodal analysis combining text, images, audio, and video in a single turn
Enterprise-grade production systems needing the highest quality across all dimensions
Long-document reasoning over 500K+ token contexts with high recall
Research and analysis tasks requiring near-perfect factuality and citation

Not Ideal For

Ultra-low-latency applications where sub-second TTFT is required
High-volume bulk classification where GPT-5.4-nano is 12x cheaper
Extremely narrow math/logic tasks where o3 reasoning chains outperform
Budget-constrained hobbyist projects

Strengths

Best-in-class tool use and function calling reliability across all providers
Native computer use agent that can operate GUIs and browsers end-to-end
Integrated web search grounding reduces hallucination on current events
Massive 1M context with strong needle-in-haystack recall throughout
Equally strong across English and 40+ other languages

Weaknesses

Expensive at scale — 6x the cost of GPT-5.4-mini for marginal quality gains on simpler tasks
Latency is noticeable on complex reasoning chains (5-15s for hard problems)
Computer use agent is powerful but occasionally takes suboptimal action paths
Audio processing adds ~2s latency and is billed at a higher effective token rate

Edge Cases & Notes

Computer use capability is production-ready but should be sandboxed for safety
Web search grounding can be disabled via API parameter to reduce latency and cost
At 1M context, input pricing shifts to a tiered model above 200K tokens
Vision performance on handwritten text and low-res images has improved dramatically from GPT-4o

Provider Notes

Available via the OpenAI API with Tier 4+ access for full 1M context. Batch API at 50% discount. Azure OpenAI Service offers managed deployments with SLA. Rate limits scale with usage tier.

Benchmarks

MMLU92.8%
HumanEval95.1%
Arena Elo1420

Benchmark Notes

Top-3 on LMSYS Arena across all categories. MMLU-Pro 92.8%, HumanEval 95.1%. SWE-bench Verified 62.4%. Strong GPQA Diamond scores. Web-search grounding evaluation shows 94%+ factuality on current events.

Research Meta

Last Evaluated

2026-04-01

Source Confidence

93%

Evaluation Method

LMSYS Chatbot Arena, SWE-bench Verified, MMLU-Pro, GPQA Diamond, internal comparative testing

Needs Re-evaluation

No

Sources

  • OpenAI GPT-5.4 system card (Feb 2026)
  • LMSYS Chatbot Arena leaderboard
  • SWE-bench Verified leaderboard
  • Artificial Analysis quality index