LLM Models/Nemotron 3 Nano Omni

Nemotron 3 Nano Omni by NVIDIA — 256K Context

31B-parameter (3B active) Mamba2-Transformer hybrid MoE multimodal model that unifies video, audio, image, and text understanding. Supports enterprise-grade Q&A, summarization, transcription, OCR, document intelligence, GUI automation, and agentic workflows. Reasoning is on by default with toggle via enable_thinking. Trained on 354M+ samples (~717B tokens) across 1,395 datasets. Available in BF16, FP8, and NVFP4 precisions. Commercial use permitted under NVIDIA Open Model Agreement.

At a glance

Modalities

Context window

256,000

Pricing

— / —

input / output per 1M

Reasoning

Enabled

Capabilities

Streaming

Real-time token-by-token response streaming

Function calling

Connect the model to external tools and systems

Structured outputs

Return responses in JSON schema format

Fine-tuning

Custom model training on your data

Reasoning

Extended thinking before responding

Computer use

Control and interact with computer interfaces

Benchmarks

CVBENCH 2D	83.95
OCRBENCH V2 EN	67.04
OSWORLD	47.4
CHARXIV REASONING	63.6
MMLONGBENCH DOC	57.5
MATHVISTA MINI	82.8
OCR REASONING	54.14
VIDEO MME	72.2
WORLD SENSE	55.4
DAILY OMNI	74.52
VOICE INTERACTION	89.39

Details

Release date	2026-04-28
Model ID	nemotron-3-nano-omni
Provider	NVIDIA