DeepSeek R1

Name: DeepSeek R1
Price: 0.55 USD
Author: DeepSeek

DeepSeekspecialized

DeepSeek's reasoning-specialized model built on the 671B MoE architecture. Rivals OpenAI's o1 on reasoning benchmarks at a tiny fraction of the cost. Uses visible chain-of-thought reasoning (unlike o1/o3 where reasoning is hidden). Open-weight and fully inspectable.

Released 2025-01-20Knowledge cutoff: 2025-01

Needs review|Updated 134d ago|88% source confidence

Specifications

Context Window

128K tokens

Max Output

32K tokens

Input Price

$0.550 / 1M tokens

Output Price

$2.19 / 1M tokens

Latency Tier

Slow (speed score: 4/10)

Capability Profile

Reasoning

9.5/10

Coding

9/10

Cost Efficiency

8.5/10

Factuality

8.5/10

Instruction Following

7/10

Structured Output

6.5/10

Long Context

6/10

Creativity

5.5/10

Tool Use

5.5/10

Safety & Enterprise

5/10

Conversational

4.5/10

Speed

4/10

Multimodal

1/10

Feature Support

Vision No

Audio In No

Audio Out No

Video No

Image Generation No

Image Editing No

Function Calling Yes

JSON Mode Yes

Structured Output No

Streaming Yes

Reasoning Yes

Realtime No

Computer Use No

Web Search No

Best Use Cases

Complex mathematical reasoning at a fraction of o3's cost

Science and logic problems requiring deep chain-of-thought

Coding tasks requiring careful multi-step analysis

Research problems where visible reasoning chains are valuable for verification

Budget-friendly reasoning for tasks that would otherwise need o3

Not Ideal For

Casual chat or conversational AI

Multimodal tasks (text only)

Enterprise deployments with safety/compliance requirements

Simple tasks where reasoning overhead is wasteful

Tasks requiring fast response times

Strengths

Reasoning quality rivaling o1/o3 at 5-20x lower cost

Visible chain-of-thought — you can inspect and verify the reasoning process

Open weights allow self-hosting and customization of the reasoning model

Strong on AIME, GPQA, and competition-level math benchmarks

Cost-effective alternative to o3 for most reasoning workloads

Weaknesses

Slow — reasoning chains can be very long and increase latency to 30-120s

Text-only with no multimodal capabilities

Safety alignment is minimal by Western enterprise standards

Chinese content censorship applies to the reasoning model too

Structured output compliance is weaker during reasoning mode

Conversational ability is poor — it's a reasoning specialist, not an assistant

Edge Cases & Notes

Visible reasoning tokens are part of the output and are billed accordingly

Reasoning chains can become very long on adversarial or ambiguous prompts

Distilled versions (R1-7B, R1-14B, R1-32B, R1-70B) are available for lighter deployments

The gap vs o3 is most visible on the hardest competition math problems

Provider Notes

Available through DeepSeek's API and third-party providers. Distilled versions available in multiple sizes. Self-hosting the full 671B model requires extensive GPU infrastructure. Data processing considerations apply.

Benchmarks

MMLU90.8%

HumanEval91.5%

Arena Elo1360

GSM8K97.5%

Benchmark Notes

GSM8K 97.5%. AIME ~79%. Rivaling o1 on most reasoning benchmarks. MMLU-Pro 90.8%. The most cost-efficient reasoning model available. Arena Elo reflects reasoning-specific evaluation.

Research Meta

Last Evaluated

2026-03-01

Source Confidence

88%

Evaluation Method

AIME, GPQA Diamond, GSM8K, MATH, LMSYS Arena (reasoning), cost-reasoning Pareto analysis

Needs Re-evaluation

Sources

DeepSeek R1 technical report
LMSYS Chatbot Arena
AIME evaluation results
Open LLM Leaderboard

Continue exploring

Route a prompt

See how DeepSeek R1 ranks

Compare models

Side-by-side analysis

Browse registry

Explore all 24 models