Open vs Closed Models: Capability Convergence Analysis
On standardized reasoning benchmarks, leading open-weight models now score within 10–15% of GPT/Claude/Gemini, up from 50%+ a year ago.
Reasoning Engine
Conducting research…
Research Telemetry
live · demoSynthesized Answer
Default to open for high-volume inference, sensitive data, and customization-heavy workloads. Default to closed for frontier reasoning, complex agents, and fast-moving consumer features. Hybrid (open for 70%+ of tokens, closed for hard tail) is the dominant production pattern.
Key Points
Enterprise open-weight production deployments tripled YoY.
Open models match closed on benchmarks but underperform 25–40% on production agentic workloads.
License terms now matter more than parameter count for enterprise procurement.
EU AI Act exempts true open-source from key GPAI obligations — structural EU tailwind for open.
Open-weight capability gap
30%
−45% YoYvs. leading closed frontier
Enterprise open-weight adoption
62%
40pp YoYat least one production model
Open-weight inference cost
$0.18 / 1M tok
−71% YoYself-hosted, blended
DeepSeek V3 training cost
$5.6M
vs. ~$50M closed peerdisclosed estimates
On standardized reasoning benchmarks, leading open-weight models now score within 10–15% of GPT/Claude/Gemini, up from 50%+ a year ago.
Post-training compute is scaling ~4× YoY at frontier labs; gains are concentrated in long-horizon reasoning and tool use.
62% of enterprises now run at least one open-weight model in production, up from 22% a year ago. Cost and data residency are the top drivers.
Llama 4 reaches 88% MMLU-Pro and 38% SWE-bench Verified; 405B variant approaches closed frontier on knowledge tasks.
DeepSeek V3 demonstrates that capable open-weight models can be trained at 5–10× lower cost via architectural and training innovations.
EU implementing acts exempt true open-source models from many GPAI obligations, creating a structural advantage for open ecosystems in Europe.
Refine your research
Demo mode · All sources, insights, and data are mock-generated for illustration.