284B parameter (13B activated) Mixture-of-Experts language model with 1M token context length. Features hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), Manifold-Constrained Hyper-Connections (mHC), and Muon optimizer for faster convergence. Supports three reasoning effort modes: Non-think, Think High, and Think Max. Pre-trained on 32T+ tokens with comprehensive post-training via GRPO and on-policy distillation.
Modalities
Context window
1,000,000
Pricing
input / output per 1M
Reasoning
Streaming
Real-time token-by-token response streaming
Function calling
Connect the model to external tools and systems
Structured outputs
Return responses in JSON schema format
Fine-tuning
Custom model training on your data
Reasoning
Extended thinking before responding
| MMLU PRO | 86.4 |
| SIMPLEQA VERIFIED | 34.1 |
| GPQA DIAMOND | 88.1 |
| HLE | 34.8 |
| LIVECODEBENCH | 91.6 |
| CODEFORCES RATING | 3052 |
| HMMT 2026 FEB | 94.8 |
| IMO ANSWERBENCH | 88.4 |
| SWE BENCH VERIFIED | 79.0 |
| SWE PRO | 52.6 |
| SWE MULTILINGUAL | 73.3 |
| TERMINALBENCH 2 | 56.9 |
| BROWSECOMP | 73.2 |
| HLE WITH TOOLS | 45.1 |
| MCP ATLAS | 69.0 |
| TOOLATHLON | 47.8 |
| MRCR 1M | 78.7 |
| CORPUSQA 1M | 60.5 |
| Release date | 2026-05-01 |
| Model ID | deepseek-v4-flash |
| Provider | DeepSeek |