4.5B effective parameter (8B with Per-Layer Embeddings) on-device model with 128K token context. Supports text, image, and audio input with text output. Designed for efficient local execution on laptops and mobile devices with native ASR and speech-to-translated-text capabilities. Features configurable thinking, function calling, and multilingual support in 140+ languages.
Modalities
Context window
128,000
Pricing
input / output per 1M
Reasoning
Streaming
Real-time token-by-token response streaming
Function calling
Connect the model to external tools and systems
Structured outputs
Return responses in JSON schema format
Fine-tuning
Custom model training on your data
Reasoning
Extended thinking before responding
| MMLU PRO | 69.4 |
| AIME 2026 | 42.5 |
| LIVECODEBENCH V6 | 52.0 |
| CODEFORCES ELO | 940 |
| GPQA DIAMOND | 58.6 |
| TAU2 AVG | 42.2 |
| BIGBENCH EXTRA HARD | 33.1 |
| MMMLU | 76.6 |
| MMMU PRO | 52.6 |
| OMNIDOCBENCH 1 5 | 0.181 |
| MATH VISION | 59.5 |
| MEDXPERTQA MM | 28.7 |
| LONG CONTEXT MRCR V2 | 25.4 |
| COVOST | 35.54 |
| FLEURS | 0.08 |
| Release date | 2026-05-01 |
| Model ID | gemma-4-e4b |
| Provider | Google DeepMind |