4.5B effective parameter (8B with Per-Layer Embeddings) on-device model with 128K token context. Supports text, image, and audio input with text output. Designed for efficient local execution on laptops and mobile devices with native ASR and speech-to-translated-text capabilities. Features configurable thinking, function calling, and multilingual support in 140+ languages.
Modalities
Context window
128,000
Pricing
input / output per 1M
Reasoning
Streaming
Real-time token-by-token response streaming
Function calling
Connect the model to external tools and systems
Structured outputs
Return responses in JSON schema format
Fine-tuning
Custom model training on your data
Reasoning
Extended thinking before responding
| MMLU PRO | 69.4 |
| AIME 2026 | 42.5 |
| LIVECODEBENCH V6 | 52.0 |
| CODEFORCES ELO | 940 |
| GPQA DIAMOND | 58.6 |
| TAU2 AVG | 42.2 |
| BIGBENCH EXTRA HARD | 33.1 |
| MMMLU | 76.6 |
| MMMU PRO | 52.6 |
| OMNIDOCBENCH 1 5 | 0.181 |
| MATH VISION | 59.5 |
| MEDXPERTQA MM | 28.7 |
| LONG CONTEXT MRCR V2 | 25.4 |
| COVOST | 35.54 |
| FLEURS | 0.08 |
| Release date | 2026-05-01 |
| Model ID | gemma-4-e4b |
| Provider | Google DeepMind |
Get detailed information about Gemma 4 E4B, including its context window of 128000 tokens, pricing per million tokens, supported input and output modalities, and benchmark scores. This model from Google DeepMind offers specific capabilities for natural language processing, code generation, and complex reasoning tasks that set it apart from alternatives.
Compare input and output token pricing for Gemma 4 E4B against other models in its class. Understanding LLM pricing is essential for budgeting your AI applications at scale. We break down the cost per million tokens for both input and output so you can estimate the total cost of your workloads and compare value across providers.
Review benchmark performance data for Gemma 4 E4B across key evaluation metrics. Compare its reasoning, coding, and language understanding capabilities against competing models to determine if it is the right fit for your specific requirements, whether that involves complex analysis, creative generation, or efficient inference at scale.