LLM Models/DeepSeek-V4-Flash

DeepSeek-V4-Flash by DeepSeek — 1.0M Context

284B parameter (13B activated) Mixture-of-Experts language model with 1M token context length. Features hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), Manifold-Constrained Hyper-Connections (mHC), and Muon optimizer for faster convergence. Supports three reasoning effort modes: Non-think, Think High, and Think Max. Pre-trained on 32T+ tokens with comprehensive post-training via GRPO and on-policy distillation.

At a glance

Modalities

Context window

1,000,000

Pricing

/

input / output per 1M

Reasoning

Enabled

Capabilities

Streaming

Real-time token-by-token response streaming

Function calling

Connect the model to external tools and systems

Structured outputs

Return responses in JSON schema format

Fine-tuning

Custom model training on your data

Reasoning

Extended thinking before responding

Benchmarks

MMLU PRO 86.4
SIMPLEQA VERIFIED 34.1
GPQA DIAMOND 88.1
HLE 34.8
LIVECODEBENCH 91.6
CODEFORCES RATING 3052
HMMT 2026 FEB 94.8
IMO ANSWERBENCH 88.4
SWE BENCH VERIFIED 79.0
SWE PRO 52.6
SWE MULTILINGUAL 73.3
TERMINALBENCH 2 56.9
BROWSECOMP 73.2
HLE WITH TOOLS 45.1
MCP ATLAS 69.0
TOOLATHLON 47.8
MRCR 1M 78.7
CORPUSQA 1M 60.5

Details

Release date 2026-05-01
Model ID deepseek-v4-flash
Provider DeepSeek

What You Need to Know About DeepSeek-V4-Flash

Complete Overview of DeepSeek-V4-Flash by DeepSeek

Get detailed information about DeepSeek-V4-Flash, including its context window of 1000000 tokens, pricing per million tokens, supported input and output modalities, and benchmark scores. This model from DeepSeek offers specific capabilities for natural language processing, code generation, and complex reasoning tasks that set it apart from alternatives.

Pricing and Cost Analysis for DeepSeek-V4-Flash

Compare input and output token pricing for DeepSeek-V4-Flash against other models in its class. Understanding LLM pricing is essential for budgeting your AI applications at scale. We break down the cost per million tokens for both input and output so you can estimate the total cost of your workloads and compare value across providers.

Benchmarks and Performance Metrics for DeepSeek-V4-Flash

Review benchmark performance data for DeepSeek-V4-Flash across key evaluation metrics. Compare its reasoning, coding, and language understanding capabilities against competing models to determine if it is the right fit for your specific requirements, whether that involves complex analysis, creative generation, or efficient inference at scale.