LLM Models/Gemma 4 E4B

Gemma 4 E4B by Google DeepMind — 128K Context

4.5B effective parameter (8B with Per-Layer Embeddings) on-device model with 128K token context. Supports text, image, and audio input with text output. Designed for efficient local execution on laptops and mobile devices with native ASR and speech-to-translated-text capabilities. Features configurable thinking, function calling, and multilingual support in 140+ languages.

At a glance

Modalities

Context window

128,000

Pricing

— / —

input / output per 1M

Reasoning

Enabled

Capabilities

Streaming

Real-time token-by-token response streaming

Function calling

Connect the model to external tools and systems

Structured outputs

Return responses in JSON schema format

Fine-tuning

Custom model training on your data

Reasoning

Extended thinking before responding

Benchmarks

MMLU PRO	69.4
AIME 2026	42.5
LIVECODEBENCH V6	52.0
CODEFORCES ELO	940
GPQA DIAMOND	58.6
TAU2 AVG	42.2
BIGBENCH EXTRA HARD	33.1
MMMLU	76.6
MMMU PRO	52.6
OMNIDOCBENCH 1 5	0.181
MATH VISION	59.5
MEDXPERTQA MM	28.7
LONG CONTEXT MRCR V2	25.4
COVOST	35.54
FLEURS	0.08

Details

Release date	2026-05-01
Model ID	gemma-4-e4b
Provider	Google DeepMind