30.7B-parameter dense multimodal model from Google DeepMind with 256K token context. Handles text, image, and video input with text output. Features hybrid attention (sliding window + global), configurable thinking mode, native function calling, and multilingual support in 140+ languages. Optimized for coding, reasoning, and agentic workflows.
Modalities
Context window
256,000
Pricing
input / output per 1M
Reasoning
Streaming
Real-time token-by-token response streaming
Function calling
Connect the model to external tools and systems
Structured outputs
Return responses in JSON schema format
Fine-tuning
Custom model training on your data
Reasoning
Extended thinking before responding
| MMLU PRO | 85.2 |
| AIME 2026 | 89.2 |
| LIVECODEBENCH V6 | 80.0 |
| CODEFORCES ELO | 2150 |
| GPQA DIAMOND | 84.3 |
| TAU2 AVG | 76.9 |
| HLE NO TOOLS | 19.5 |
| HLE WITH SEARCH | 26.5 |
| BIGBENCH EXTRA HARD | 74.4 |
| MMMLU | 88.4 |
| MMMU PRO | 76.9 |
| OMNIDOCBENCH 1 5 | 0.131 |
| MATH VISION | 85.6 |
| MEDXPERTQA MM | 61.3 |
| LONG CONTEXT MRCR V2 | 66.4 |
| Release date | 2026-05-01 |
| Model ID | gemma-4-31b |
| Provider | Google DeepMind |