Llama 4 Scout
MetamidMeta's efficient MoE model with 109B total parameters but only 17B active per token. Features a groundbreaking 10M token context window and native multimodal support. Fits on a single GPU for inference, making it the premier self-hosted option for long-context multimodal work. Open-weight under Llama license.
Specifications
10M tokens
32K tokens
$0.150 / 1M tokens
$0.400 / 1M tokens
Fast (speed score: 8/10)
Capability Profile
Feature Support
Best Use Cases
Not Ideal For
Strengths
Weaknesses
Edge Cases & Notes
Provider Notes
Open-weight under Meta's Llama license. Available through Together AI, Fireworks, Replicate, and self-hosted. Self-hosting on a single H100 is feasible. Pricing shown is approximate (Together AI).
Benchmarks
Benchmark Notes
Strong for a model that runs on a single GPU. MMLU-Pro 83%. Long-context benchmarks are its standout — near-perfect recall at 1M, good at 5M. Open LLM Leaderboard shows competitive MoE performance.
Research Meta
Last Evaluated
2026-03-15
Source Confidence
85%
Evaluation Method
Open LLM Leaderboard, LMSYS Arena, long-context benchmarks, self-hosting evaluation
Needs Re-evaluation
NoSources
- Meta Llama 4 technical report
- Open LLM Leaderboard
- LMSYS Chatbot Arena
- Together AI benchmarks