A deep technology conference for processor and system architects from industry and academia has become a key forum for the trillion-dollar data center computing market.
At Hot Chips 2024 next week, senior NVIDIA engineers will present the latest advancements powering the NVIDIA Blackwell platform, plus research on liquid cooling for data centers and AI agents for chip design.
They’ll share how:
- NVIDIA Blackwell brings together multiple chips, systems and NVIDIA CUDA software to power the next generation of AI across use cases, industries and countries.
- NVIDIA GB200 NVL72 — a multi-node, liquid-cooled, rack-scale solution that connects 72 Blackwell GPUs and 36 Grace CPUs — raises the bar for AI system design.
- NVLink interconnect technology provides all-to-all GPU communication, enabling record high throughput and low-latency inference for generative AI.
- The NVIDIA Quasar Quantization System pushes the limits of physics to accelerate AI computing.
- NVIDIA researchers are building AI models that help build processors for AI.
Also Read: Avionté Acquires AkkenCloud, a Staffing Software Company
An NVIDIA Blackwell talk, taking place Monday, Aug. 26, will also spotlight new architectural details and examples of generative AI models running on Blackwell silicon.
It’s preceded by three tutorials on Sunday, Aug. 25, that will cover how hybrid liquid-cooling solutions can help data centers transition to more energy-efficient infrastructure and how AI models, including large language model (LLM)-powered agents, can help engineers design the next generation of processors.
Together, these presentations showcase the ways NVIDIA engineers are innovating across every area of data center computing and design to deliver unprecedented performance, efficiency and optimization.
NVIDIA Blackwell is the ultimate full-stack computing challenge. It comprises multiple NVIDIA chips, including the Blackwell GPU, Grace CPU, BlueField data processing unit, ConnectX network interface card, NVLink Switch, Spectrum Ethernet switch and Quantum InfiniBand switch.
Ajay Tirumala and Raymond Wong, directors of architecture at NVIDIA, will provide a first look at the platform and explain how these technologies work together to deliver a new standard for AI and accelerated computing performance while advancing energy efficiency.
The multi-node NVIDIA GB200 NVL72 solution is a perfect example. LLM inference requires low-latency, high-throughput token generation. GB200 NVL72 acts as a unified system to deliver up to 30x faster inference for LLM workloads, unlocking the ability to run trillion-parameter models in real time.
Tirumala and Wong will also discuss how the NVIDIA Quasar Quantization System — which brings together algorithmic innovations, NVIDIA software libraries and tools, and Blackwell’s second-generation Transformer Engine — supports high accuracy on low-precision models, highlighting examples using LLMs and visual generative AI.
The traditional hum of air-cooled data centers may become a relic of the past as researchers develop more efficient and sustainable solutions that use hybrid cooling, a combination of air and liquid cooling.
Liquid-cooling techniques move heat away from systems more efficiently than air, making it easier for computing systems to stay cool even while processing large workloads. The equipment for liquid cooling also takes up less space and consumes less power than air-cooling systems, allowing data centers to add more server racks — and therefore more compute power — in their facilities.
Ali Heydari, director of data center cooling and infrastructure at NVIDIA, will present several designs for hybrid-cooled data centers.
Some designs retrofit existing air-cooled data centers with liquid-cooling units, offering a quick and easy solution to add liquid-cooling capabilities to existing racks. Other designs require the installation of piping for direct-to-chip liquid cooling using cooling distribution units or by entirely submerging servers in immersion cooling tanks. Although these options demand a larger upfront investment, they lead to substantial savings in both energy consumption and operational costs.
Heydari will also share his team’s work as part of COOLERCHIPS, a U.S. Department of Energy program to develop advanced data center cooling technologies. As part of the project, the team is using the NVIDIA Omniverse platform to create physics-informed digital twins that will help them model energy consumption and cooling efficiency to optimize their data center designs.
Semiconductor design is a mammoth challenge at microscopic scale. Engineers developing cutting-edge processors work to fit as much computing power as they can onto a piece of silicon a few inches across, testing the limits of what’s physically possible.
AI models are supporting their work by improving design quality and productivity, boosting the efficiency of manual processes and automating some time-consuming tasks. The models include prediction and optimization tools to help engineers rapidly analyze and improve designs, as well as LLMs that can assist engineers with answering questions, generating code, debugging design problems and more.
Mark Ren, director of design automation research at NVIDIA, will provide an overview of these models and their uses in a tutorial. In a second session, he’ll focus on agent-based AI systems for chip design.
AI agents powered by LLMs can be directed to complete tasks autonomously, unlocking broad applications across industries. In microprocessor design, NVIDIA researchers are developing agent-based systems that can reason and take action using customized circuit design tools, interact with experienced designers, and learn from a database of human and agent experiences.
NVIDIA experts aren’t just building this technology — they’re using it. Ren will share examples of how engineers can use AI agents for timing report analysis, cell cluster optimization processes and code generation. The cell cluster optimization work recently won best paper at the first IEEE International Workshop on LLM-Aided Design.
SOURCE: NVIDIA