N
NexusRoute
Back to Models

Mistral Small 4

Mistralbudget

Mistral's efficient MoE model with 119B total parameters but only 6B active per token. Features a 256K context window, reasoning mode, and vision support. Open-weight under Apache 2.0. Designed for self-hosting on modest hardware while providing strong reasoning capabilities.

Released 2026-02-20Knowledge cutoff: 2025-12
Medium confidence|Updated 74d ago|82% source confidence

Specifications

Context Window

256K tokens

Max Output

16.4K tokens

Input Price

$0.100 / 1M tokens

Output Price

$0.300 / 1M tokens

Latency Tier

Ultra Fast (speed score: 9/10)

Capability Profile

Cost Efficiency
9.5/10
Speed
9/10
Long Context
7.5/10
Structured Output
7/10
Instruction Following
7/10
Reasoning
6.5/10
Coding
6.5/10
Factuality
6.5/10
Tool Use
6.5/10
Safety & Enterprise
6.5/10
Conversational
6.5/10
Creativity
5.5/10
Multimodal
5/10

Feature Support

Vision Yes
Audio In No
Audio Out No
Video No
Image Generation No
Image Editing No
Function Calling Yes
JSON Mode Yes
Structured Output Yes
Streaming Yes
Reasoning Yes
Realtime No
Computer Use No
Web Search No

Best Use Cases

Self-hosted inference on a single GPU with reasoning and vision
Budget-friendly multilingual chatbots and assistants
Edge deployment where open-weight models with reasoning are needed
Cost-sensitive European enterprise deployments
Lightweight classification and extraction with reasoning mode for harder cases

Not Ideal For

Complex coding or software engineering tasks
Tasks requiring the highest factual accuracy
Audio or video processing
Enterprise-critical applications needing top-tier safety

Strengths

Apache 2.0 license with reasoning and vision at budget pricing
Runs on a single GPU — 6B active parameters per token despite 119B total
256K context window is impressive for a budget model
Built-in reasoning mode brings analytical depth to a budget tier
Good multilingual support inherited from the Mistral family

Weaknesses

Quality is noticeably below Mistral Large 3 on complex tasks
Vision capabilities are basic compared to Gemini or GPT models
Reasoning mode is helpful but well below o4-mini quality
Creative writing is mediocre
Smaller community and ecosystem than Llama

Edge Cases & Notes

6B active parameters means it can run on a single RTX 4090 with quantization
Reasoning mode adds latency but meaningfully improves accuracy on analytical tasks
Apache 2.0 licensing differentiates it from Meta's more restrictive Llama license

Provider Notes

Open-weight under Apache 2.0. Available through Mistral's La Plateforme, Ollama, and self-hosted. One of the best options for self-hosted reasoning on consumer hardware.

Benchmarks

MMLU78.5%
HumanEval76%
Arena Elo1180

Benchmark Notes

MMLU-Pro 78.5%. Impressive for 6B active parameters. Reasoning mode benchmarks show meaningful improvement over non-reasoning mode. Good multilingual benchmark scores.

Research Meta

Last Evaluated

2026-03-15

Source Confidence

82%

Evaluation Method

Open LLM Leaderboard, LMSYS Arena, self-hosting evaluation, cost analysis

Needs Re-evaluation

No

Sources

  • Mistral Small 4 technical report
  • Open LLM Leaderboard
  • LMSYS Chatbot Arena