GPT-5.4

OpenAIfrontier

OpenAI's flagship model and one of the most capable general-purpose LLMs available. Natively multimodal with vision, audio, reasoning, tool use, computer use, and web search. Excels across virtually every dimension with a 1M token context window and 128K output.

Released 2026-02-18Knowledge cutoff: 2025-11

Specifications

Context Window

1M tokens

Max Output

128K tokens

Input Price

$2.50 / 1M tokens

Output Price

$15.00 / 1M tokens

Latency Tier

Fast (speed score: 7/10)

Capability Profile

Tool Use

10/10

Reasoning

9.5/10

Coding

9.5/10

Long Context

9.5/10

Structured Output

9.5/10

Multimodal

9.5/10

Factuality

9.5/10

Instruction Following

9.5/10

Conversational

9.5/10

Creativity

9/10

Safety & Enterprise

9/10

Speed

7/10

Cost Efficiency

5.5/10

Feature Support

Vision Yes

Audio In Yes

Audio Out No

Video Yes

Image Generation No

Image Editing No

Function Calling Yes

JSON Mode Yes

Structured Output Yes

Streaming Yes

Reasoning Yes

Realtime No

Computer Use Yes

Web Search Yes

Best Use Cases

Complex agentic workflows requiring tool orchestration, web browsing, and computer use

Multimodal analysis combining text, images, audio, and video in a single turn

Enterprise-grade production systems needing the highest quality across all dimensions

Long-document reasoning over 500K+ token contexts with high recall

Research and analysis tasks requiring near-perfect factuality and citation

Not Ideal For

Ultra-low-latency applications where sub-second TTFT is required

High-volume bulk classification where GPT-5.4-nano is 12x cheaper

Extremely narrow math/logic tasks where o3 reasoning chains outperform

Budget-constrained hobbyist projects

Strengths

Best-in-class tool use and function calling reliability across all providers

Native computer use agent that can operate GUIs and browsers end-to-end

Integrated web search grounding reduces hallucination on current events

Massive 1M context with strong needle-in-haystack recall throughout

Equally strong across English and 40+ other languages

Weaknesses

Expensive at scale — 6x the cost of GPT-5.4-mini for marginal quality gains on simpler tasks

Latency is noticeable on complex reasoning chains (5-15s for hard problems)

Computer use agent is powerful but occasionally takes suboptimal action paths

Audio processing adds ~2s latency and is billed at a higher effective token rate

Edge Cases & Notes

Computer use capability is production-ready but should be sandboxed for safety

Web search grounding can be disabled via API parameter to reduce latency and cost

At 1M context, input pricing shifts to a tiered model above 200K tokens

Vision performance on handwritten text and low-res images has improved dramatically from GPT-4o

Provider Notes

Available via the OpenAI API with Tier 4+ access for full 1M context. Batch API at 50% discount. Azure OpenAI Service offers managed deployments with SLA. Rate limits scale with usage tier.

Benchmarks

MMLU92.8%

HumanEval95.1%

Arena Elo1420

Benchmark Notes

Top-3 on LMSYS Arena across all categories. MMLU-Pro 92.8%, HumanEval 95.1%. SWE-bench Verified 62.4%. Strong GPQA Diamond scores. Web-search grounding evaluation shows 94%+ factuality on current events.

Research Meta

Last Evaluated

2026-04-01

Source Confidence

93%

Evaluation Method

LMSYS Chatbot Arena, SWE-bench Verified, MMLU-Pro, GPQA Diamond, internal comparative testing

Needs Re-evaluation

Sources

OpenAI GPT-5.4 system card (Feb 2026)
LMSYS Chatbot Arena leaderboard
SWE-bench Verified leaderboard
Artificial Analysis quality index

Continue exploring

Route a prompt

See how GPT-5.4 ranks

Compare models

Side-by-side analysis

Browse registry

Explore all 24 models