LLM Models/Gemma 4 26B A4B

Gemma 4 26B A4B by Google DeepMind — 256K Context

25.2B total parameter (3.8B active) Mixture-of-Experts model with 256K token context. 8 active experts out of 128 total + 1 shared expert. Runs inference speed comparable to a 4B dense model while matching larger dense model quality. Supports text, image, and video input with text output, configurable thinking, native function calling, and multilingual support.

At a glance

Modalities

Context window

256,000

Pricing

/

input / output per 1M

Reasoning

Enabled

Capabilities

Streaming

Real-time token-by-token response streaming

Function calling

Connect the model to external tools and systems

Structured outputs

Return responses in JSON schema format

Fine-tuning

Custom model training on your data

Reasoning

Extended thinking before responding

Benchmarks

MMLU PRO 82.6
AIME 2026 88.3
LIVECODEBENCH V6 77.1
CODEFORCES ELO 1718
GPQA DIAMOND 82.3
TAU2 AVG 68.2
HLE NO TOOLS 8.7
HLE WITH SEARCH 17.2
BIGBENCH EXTRA HARD 64.8
MMMLU 86.3
MMMU PRO 73.8
OMNIDOCBENCH 1 5 0.149
MATH VISION 82.4
MEDXPERTQA MM 58.1
LONG CONTEXT MRCR V2 44.1

Details

Release date 2026-05-01
Model ID gemma-4-26b-a4b
Provider Google DeepMind