LLM Models/Gemma 4 26B A4B

Gemma 4 26B A4B by Google DeepMind — 256K Context

25.2B total parameter (3.8B active) Mixture-of-Experts model with 256K token context. 8 active experts out of 128 total + 1 shared expert. Runs inference speed comparable to a 4B dense model while matching larger dense model quality. Supports text, image, and video input with text output, configurable thinking, native function calling, and multilingual support.

At a glance

Modalities

Context window

256,000

Pricing

— / —

input / output per 1M

Reasoning

Enabled

Capabilities

Streaming

Real-time token-by-token response streaming

Function calling

Connect the model to external tools and systems

Structured outputs

Return responses in JSON schema format

Fine-tuning

Custom model training on your data

Reasoning

Extended thinking before responding

Benchmarks

MMLU PRO	82.6
AIME 2026	88.3
LIVECODEBENCH V6	77.1
CODEFORCES ELO	1718
GPQA DIAMOND	82.3
TAU2 AVG	68.2
HLE NO TOOLS	8.7
HLE WITH SEARCH	17.2
BIGBENCH EXTRA HARD	64.8
MMMLU	86.3
MMMU PRO	73.8
OMNIDOCBENCH 1 5	0.149
MATH VISION	82.4
MEDXPERTQA MM	58.1
LONG CONTEXT MRCR V2	44.1

Details

Release date	2026-05-01
Model ID	gemma-4-26b-a4b
Provider	Google DeepMind