25.2B total parameter (3.8B active) Mixture-of-Experts model with 256K token context. 8 active experts out of 128 total + 1 shared expert. Runs inference speed comparable to a 4B dense model while matching larger dense model quality. Supports text, image, and video input with text output, configurable thinking, native function calling, and multilingual support.
Modalities
Context window
256,000
Pricing
input / output per 1M
Reasoning
Streaming
Real-time token-by-token response streaming
Function calling
Connect the model to external tools and systems
Structured outputs
Return responses in JSON schema format
Fine-tuning
Custom model training on your data
Reasoning
Extended thinking before responding
| MMLU PRO | 82.6 |
| AIME 2026 | 88.3 |
| LIVECODEBENCH V6 | 77.1 |
| CODEFORCES ELO | 1718 |
| GPQA DIAMOND | 82.3 |
| TAU2 AVG | 68.2 |
| HLE NO TOOLS | 8.7 |
| HLE WITH SEARCH | 17.2 |
| BIGBENCH EXTRA HARD | 64.8 |
| MMMLU | 86.3 |
| MMMU PRO | 73.8 |
| OMNIDOCBENCH 1 5 | 0.149 |
| MATH VISION | 82.4 |
| MEDXPERTQA MM | 58.1 |
| LONG CONTEXT MRCR V2 | 44.1 |
| Release date | 2026-05-01 |
| Model ID | gemma-4-26b-a4b |
| Provider | Google DeepMind |