Claude API models vs. open-source alternatives — June 2026
| Model | Tier | Input /1M tok | Output /1M tok | Context | MMLU | GPQA Diamond | SWE-bench | Best For |
|---|---|---|---|---|---|---|---|---|
Claude Opus 4.8 Anthropic · Newest (May 2026) |
Frontier | $5.00 | $25.00 | 1M | ~92% |
~88% |
~75% |
Complex reasoning, agentic tasks, long-horizon coding, adaptive thinking. Most capable. |
Claude Opus 4.7 Anthropic · Apr 2026 |
Frontier | $5.00 | $25.00 | 1M | ~91% |
~86% |
~73% |
Legal, financial analysis, complex multi-step tasks, high-res vision. |
Claude Sonnet 4.6 Anthropic · Recommended |
Balanced | $3.00 | $15.00 | 1M | ~88% |
~80% |
~65% |
Best price/quality. General apps, coding, RAG, production services. Sweet spot for most businesses. |
Claude Haiku 4.5 Anthropic · Cheapest current |
Fast / Low Cost | $1.00 | $5.00 | 200K | ~74% |
~55% |
~45% |
High-volume simple tasks: classification, routing, summarisation, free tier user serving, chatbots. |
Claude Haiku 3 (legacy) Anthropic · Absolute cheapest |
Fast / Low Cost | $0.25 | $1.25 | 200K | ~63% |
~42% |
— | Ultra-budget bulk. Being phased out. Use Haiku 4.5 for new builds unless cost is the only factor. |
| Model | Tier | Hosted API Cost | Self-Host Cost | License | Context | MMLU | GPQA Diamond | SWE-bench | Hardware (Self-Host) | Best For |
|---|---|---|---|---|---|---|---|---|---|---|
DeepSeek V3.2 DeepSeek · 671B MoE (37B active) |
Frontier OSS | ~$0.14โ0.27/M | $0 API fees | MIT | 128K | 88.5% |
85%+ |
72%+ |
โ Server-grade~140 GB VRAM (Q4)4โ8ร A100 80GB or H100. Not laptop-viable. Use hosted API for most teams. |
Best all-round open model. General reasoning, coding, agentic workflows. MIT license = commercial use fully permitted. |
DeepSeek R1 DeepSeek · 671B MoE, reasoning specialist |
Deep Reasoning | ~$0.55/M | $0 API fees | MIT | 128K | 84% |
71% |
49% |
โ Server-grade~136 GB VRAM (Q4)4โ8ร A100 80GB or H100. Smaller distilled versions (8Bโ70B) run on consumer hardware. |
Math, logic, chain-of-thought. MATH-500: 97.3% โ highest open model. Shows step-by-step reasoning. Great for tutoring, finance. |
Qwen 3 235B Alibaba · MoE (22B active) |
Reasoning | ~$0.14/M | $0 API fees | Apache 2.0 | 131K | 84.4% |
81.1% |
~60% |
โ Server-grade~120 GB VRAM (Q4)2โ4ร A100 80GB. Smaller Qwen3-32B runs on 2ร RTX 4090. |
Maths & multilingual. AIME 2025: 92.3%. Best fully Apache 2.0 option. Strong for non-English services. |
Llama 4 Maverick Meta · 400B MoE (17B active) |
General / Long-ctx | ~$0.20/M | $0 API fees | Llama 4 | 1M | ~82% |
~70% |
~62% |
โ Multi-GPU~80 GB VRAM (Q4)2ร A100 80GB or 4ร RTX 4090. Mac Studio M4 Ultra (192GB) viable. |
Large document RAG, 1M context. Within 3โ5% of Sonnet on most everyday tasks. Good production choice. |
Llama 4 Scout Meta · 109B MoE (17B active) |
Speed / Long-ctx | ~$0.10/M | $0 API fees | Llama 4 | 10M (!) | ~79% |
~65% |
~55% |
โ Multi-GPU~50โ60 GB VRAM (Q4)2ร RTX 4090 (48GB) or 1ร A100 80GB. Mac Studio M2 Ultra+ with 64GB RAM. |
Longest context of any model (10M tokens). 2,600 tok/s throughput. Retrieving across massive document sets. |
Mistral Small 4 Mistral · 24B dense |
Single GPU | ~$0.10/M | $0 API fees | Apache 2.0 | 256K | ~72% |
~52% |
~38% |
โ Single GPU~16โ24 GB VRAMRTX 3090 / 4090 (24GB), or Mac M2/M3 Pro with 18GB+ RAM. Intern laptop-friendly. |
Runs on 1 consumer GPU. Ideal for intern dev machines, low-traffic services, European data sovereignty needs. |
Gemma 3 27B Google · 27B dense |
Single GPU | ~$0.10/M | $0 API fees | Gemma ToS | 256K | ~75% |
~55% |
~40% |
โ Single GPU~16 GB VRAMRTX 3090 / 4090, or Mac M2/M3 with 16GB unified memory. Most accessible option. |
Needs only 16GB VRAM. Runs on gaming PCs or Apple M-series. Good for simple everyday consumer services. |