LLM Models/DeepSeek-V4-Flash

DeepSeek-V4-Flash by DeepSeek — 1.0M Context

284B parameter (13B activated) Mixture-of-Experts language model with 1M token context length. Features hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), Manifold-Constrained Hyper-Connections (mHC), and Muon optimizer for faster convergence. Supports three reasoning effort modes: Non-think, Think High, and Think Max. Pre-trained on 32T+ tokens with comprehensive post-training via GRPO and on-policy distillation.

At a glance

Modalities

Context window

1,000,000

Pricing

/

input / output per 1M

Reasoning

Enabled

Capabilities

Streaming

Real-time token-by-token response streaming

Function calling

Connect the model to external tools and systems

Structured outputs

Return responses in JSON schema format

Fine-tuning

Custom model training on your data

Reasoning

Extended thinking before responding

Benchmarks

MMLU PRO 86.4
SIMPLEQA VERIFIED 34.1
GPQA DIAMOND 88.1
HLE 34.8
LIVECODEBENCH 91.6
CODEFORCES RATING 3052
HMMT 2026 FEB 94.8
IMO ANSWERBENCH 88.4
SWE BENCH VERIFIED 79.0
SWE PRO 52.6
SWE MULTILINGUAL 73.3
TERMINALBENCH 2 56.9
BROWSECOMP 73.2
HLE WITH TOOLS 45.1
MCP ATLAS 69.0
TOOLATHLON 47.8
MRCR 1M 78.7
CORPUSQA 1M 60.5

Details

Release date 2026-05-01
Model ID deepseek-v4-flash
Provider DeepSeek