NVIDIA announced significant performance and cost advances with its Blackwell Ultra platform, enabling organizations to scale agentic artificial intelligence (AI) workloads with dramatically improved throughput and lower token inference costs. Real-world analysis shows that systems powered by NVIDIA Blackwell Ultra demonstrate up to 50× higher inference performance and 35× lower cost per token compared with the previous-generation Hopper architecture, supporting real-time interactive and reasoning-based AI applications at scale.
AI agents and coding assistants continue to drive explosive growth in software-related queries, requiring both ultralow latency and extended context reasoning to support multistep workflows. These workloads demand infrastructure that can deliver high performance across both dimensions, and NVIDIA’s extreme codesign of chips, system architecture, and software accelerates throughput while reducing operational costs for organizations deploying agentic AI in production.
Benchmarking by SemiAnalysis shows that the GB300 NVL72 system, built on the Blackwell Ultra GPU architecture, delivers up to 50× better performance per megawatt than systems based on the Hopper platform. This gain translates into up to 35× lower cost per million tokens on latency-sensitive workloads that power reasoning and agent-driven coding assistants.
Also Read: Anthropic and CodePath Introduce Claude to Computer Science Program
By continuously optimizing key software layers such as TensorRT-LLM, Dynamo, and supporting libraries, NVIDIA ensures that the Blackwell Ultra platform drives consistent performance gains across a broad range of inference scenarios. These enhancements include higher-efficiency GPU kernels, improved memory access pathways, and advanced launch techniques that minimize idle time and maximize utilization across large GPU arrays.
In addition to performance improvements for short-latency inference, the Blackwell Ultra-based GB300 NVL72 system shows superior economics for long-context workloads, where reasoning across large token sequences is essential for agentic tasks such as comprehensive code analysis and multistep decision workflows. For example, in scenarios with extended token inputs and outputs, Blackwell Ultra delivers significantly lower cost per token compared to earlier Blackwell configurations.
Leading cloud providers, including Microsoft, CoreWeave, and Oracle Cloud Infrastructure, have begun deploying GB300 NVL72 systems to support low-latency and long-context use cases at scale, enabling a new class of real-time agentic applications that reason across expanded context windows without compromising responsiveness.
NVIDIA continues to innovate across its platform stack to unlock further performance and cost efficiencies for AI infrastructure, including next-generation systems such as the upcoming Rubin platform, which is expected to deliver future leaps in throughput and energy-efficient AI inference.























