A frontier multimodal model optimized specifically for high-performance agentic tool calling.
Modalities
Context window
2,000,000
Pricing
input / output per 1M
Reasoning
Streaming
Real-time token-by-token response streaming
Function calling
Connect the model to external tools and systems
Structured outputs
Return responses in JSON schema format
Reasoning
The model thinks before responding
Caching
Cache responses to reduce latency and costs
Web search
Search the internet for real-time information
Live search
Real-time web search with source citations
| Model ID | grok-4.1-fast |
| Provider | xAI |
| Tier | RPM | TPM | Batch queue |
|---|---|---|---|
| Default | 480 | 4,000,000 | — |