Mistral Small 4

Name: Mistral Small 4
Price: 0.1 USD
Author: Mistral

Mistralbudget

Mistral's efficient MoE model with 119B total parameters but only 6B active per token. Features a 256K context window, reasoning mode, and vision support. Open-weight under Apache 2.0. Designed for self-hosting on modest hardware while providing strong reasoning capabilities.

Released 2026-02-20Knowledge cutoff: 2025-12

Needs review|Updated 116d ago|82% source confidence

Specifications

Context Window

256K tokens

Max Output

16.4K tokens

Input Price

$0.100 / 1M tokens

Output Price

$0.300 / 1M tokens

Latency Tier

Ultra Fast (speed score: 9/10)

Capability Profile

Cost Efficiency

9.5/10

Speed

9/10

Long Context

7.5/10

Structured Output

7/10

Instruction Following

7/10

Reasoning

6.5/10

Coding

6.5/10

Factuality

6.5/10

Tool Use

6.5/10

Safety & Enterprise

6.5/10

Conversational

6.5/10

Creativity

5.5/10

Multimodal

5/10

Feature Support

Vision Yes

Audio In No

Audio Out No

Video No

Image Generation No

Image Editing No

Function Calling Yes

JSON Mode Yes

Structured Output Yes

Streaming Yes

Reasoning Yes

Realtime No

Computer Use No

Web Search No

Best Use Cases

Self-hosted inference on a single GPU with reasoning and vision

Budget-friendly multilingual chatbots and assistants

Edge deployment where open-weight models with reasoning are needed

Cost-sensitive European enterprise deployments

Lightweight classification and extraction with reasoning mode for harder cases

Not Ideal For

Complex coding or software engineering tasks

Tasks requiring the highest factual accuracy

Audio or video processing

Enterprise-critical applications needing top-tier safety

Strengths

Apache 2.0 license with reasoning and vision at budget pricing

Runs on a single GPU — 6B active parameters per token despite 119B total

256K context window is impressive for a budget model

Built-in reasoning mode brings analytical depth to a budget tier

Good multilingual support inherited from the Mistral family

Weaknesses

Quality is noticeably below Mistral Large 3 on complex tasks

Vision capabilities are basic compared to Gemini or GPT models

Reasoning mode is helpful but well below o4-mini quality

Creative writing is mediocre

Smaller community and ecosystem than Llama

Edge Cases & Notes

6B active parameters means it can run on a single RTX 4090 with quantization

Reasoning mode adds latency but meaningfully improves accuracy on analytical tasks

Apache 2.0 licensing differentiates it from Meta's more restrictive Llama license

Provider Notes

Open-weight under Apache 2.0. Available through Mistral's La Plateforme, Ollama, and self-hosted. One of the best options for self-hosted reasoning on consumer hardware.

Benchmarks

MMLU78.5%

HumanEval76%

Arena Elo1180

Benchmark Notes

MMLU-Pro 78.5%. Impressive for 6B active parameters. Reasoning mode benchmarks show meaningful improvement over non-reasoning mode. Good multilingual benchmark scores.

Research Meta

Last Evaluated

2026-03-15

Source Confidence

82%

Evaluation Method

Open LLM Leaderboard, LMSYS Arena, self-hosting evaluation, cost analysis

Needs Re-evaluation

Sources

Mistral Small 4 technical report
Open LLM Leaderboard
LMSYS Chatbot Arena

Continue exploring

Route a prompt

See how Mistral Small 4 ranks

Compare models

Side-by-side analysis

Browse registry

Explore all 24 models