LLM Models/DeepSeek-V4-Flash

DeepSeek-V4-Flash by DeepSeek — 1.0M Context

284B parameter (13B activated) Mixture-of-Experts language model with 1M token context length. Features hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), Manifold-Constrained Hyper-Connections (mHC), and Muon optimizer for faster convergence. Supports three reasoning effort modes: Non-think, Think High, and Think Max. Pre-trained on 32T+ tokens with comprehensive post-training via GRPO and on-policy distillation.

At a glance

Modalities

Context window

1,000,000

Pricing

— / —

input / output per 1M

Reasoning

Enabled

Capabilities

Streaming

Real-time token-by-token response streaming

Function calling

Connect the model to external tools and systems

Structured outputs

Return responses in JSON schema format

Fine-tuning

Custom model training on your data

Reasoning

Extended thinking before responding

Benchmarks

MMLU PRO	86.4
SIMPLEQA VERIFIED	34.1
GPQA DIAMOND	88.1
HLE	34.8
LIVECODEBENCH	91.6
CODEFORCES RATING	3052
HMMT 2026 FEB	94.8
IMO ANSWERBENCH	88.4
SWE BENCH VERIFIED	79.0
SWE PRO	52.6
SWE MULTILINGUAL	73.3
TERMINALBENCH 2	56.9
BROWSECOMP	73.2
HLE WITH TOOLS	45.1
MCP ATLAS	69.0
TOOLATHLON	47.8
MRCR 1M	78.7
CORPUSQA 1M	60.5

Details

Release date	2026-05-01
Model ID	deepseek-v4-flash
Provider	DeepSeek