Supermicro Launches Three NVIDIA-Based, Fully Aligned, Turnkey Generative AI SuperClusters Scalable from Enterprise to Large LLM Infrastructures

Supermicro

Supermicro, Inc., a provider of total IT solutions for AI, cloud, storage and 5G/Edge, announces its latest portfolio to accelerate the deployment of generative AI. The Supermicro SuperCluster solutions provide foundational building blocks in current and future large language model (LLM) infrastructure.

The three powerful Supermicro SuperCluster solutions are now available for generative AI workloads. The 4U liquid-cooled systems or the 8U air-cooled systems are purpose-built and designed for high-performance LLM training, as well as large batch and high-volume LLM inference. A third SuperCluster, with 1U air-cooled Supermicro NVIDIA MGX   systems, is optimized for cloud-scale inference.

“In the age of AI, compute is now measured by clusters, rather than just the number of servers. With our extensive global production capacity of 5,000 racks/month, we can deliver complete generative AI clusters to our customers faster than ever,” says Charles Liang, president and CEO of Supermicro . “A 64-node cluster enables 512 NVIDIA HGX H200 GPUs with 72 TB HBM3e over a pair of our scalable cluster building blocks with 400 Gb/s NVIDIA Quantum-2 InfiniBand and Spectrum-X Ethernet networking. Supermicro’s SuperCluster solutions in combination with NVIDIA AI Enterprise software is ideal for enterprise and cloud infrastructures to train today’s LLMs with up to trillions of parameters. The interconnected GPUs, CPUs, memory, storage and networking deployed across multiple nodes in racks are the foundation of today’s AI .Supermicro’s SuperCluster solutions provide foundational building blocks for rapidly evolving generative AI and LLMs.”

Also Read: NTT DATA and Stonebranch Help Clients Modernize Workloads in the Cloud

“NVIDIA’s latest GPU, CPU, networking and software technologies enable system makers to accelerate a range of next-generation AI workloads for global markets,” said Kaustubh Sanghani , vice president of GPU Product Management at NVIDIA. “By leveraging the NVIDIA accelerated computing platform with Blackwell architecture-based products, Supermicro provides customers with the advanced server systems they need that can be easily deployed in data centers.”

Supermicro 4U NVIDIA HGX H100/H200 8-GPU  systems double the density of the 8U air-cooled system by using liquid cooling, reducing energy consumption and lowering the total cost of ownership of data centers. These systems are designed to support the next generation of GPUs based on the NVIDIA Blackwell architecture. The Supermicro Cooling Distribution Unit (CDU) and Manifold (CDM) are the main arteries for distributing chilled liquid to Supermicro’s custom direct-to-chip (D2C) cold plates. This keeps GPUs and CPUs at optimal temperatures so they deliver maximum performance. This cooling technology can reduce electricity costs for the entire data center by up to 40% and requires less space in the building. Learn more about Supermicro Liquid Cooling technology: https://www.supermicro.com/en/solutions/liquid-cooling

Systems with NVIDIA HGX H100/H200 8-GPU technology are ideal for training Generative AI. The fast, interconnected GPUs via NVIDIA ®  NVLink ® , the high bandwidth and capacity of the GPU memory are essential for cost-effectively running LLM models. The Supermicro SuperCluster creates a massive pool of GPU resources that work as a single AI supercomputer.

Whether customizing a massive foundation model trained on a dataset with trillions of tokens or building a cloud-scale LLM inference infrastructure, the spine and leaf topology with non-blocking 400Gb/s network fabrics makes it possible to seamlessly scale from 32 nodes to thousands of nodes. With fully integrated liquid cooling, operational effectiveness and efficiency are thoroughly proven in Supermicro’s testing processes before shipment.

Supermicro’s NVIDIA MGX  system designs with the NVIDIA GH200 Grace Hopper Superchips create a blueprint for future AI clusters that address a critical bottleneck in Generative Al: the bandwidth and capacity of GPU memory for running large Language Models (LLM) with large inference batches to reduce operational costs. The 256-node cluster makes it a high-volume cloud inference powerhouse that is easily deployed and scalable.

SuperCluster with 4U liquid-cooled system in 5 racks or 8U air-cooled system in 9 racks

  • 256 NVIDIA H100/H200 Tensor Core GPUs in one scalable unit
  • Liquid cooling enables 512 GPUs and 64 nodes in the same space as the air-cooled solution with 256 GPUs and 32 nodes
  • 20 TB HBM3 with NVIDIA H100 or 36 TB HBM3e with NVIDIA H200 in one scalable unit
  • One-to-one networking delivers up to 400 Gbps to each GPU to realize GPUDirect RDMA and storage for training large language models with up to trillions of parameters
  • 400G InfiniBand or 400GbE Ethernet switch network fabrics with highly scalable backbone and leaf topology, including NVIDIA Quantum-2 InfiniBand and NVIDIA Spectrum-X Ethernet platform.
  • Customizable AI data pipeline storage structure with industry-leading parallel file system options
  • NVIDIA AI Enterprise 5.0 software, which includes support for the new NVIDIA NIM inference microservices that accelerate the deployment of AI models at scale.

SuperCluster with 1U air-cooled NVIDIA MGX system in 9 racks

  • 256 GH200 Grace Hopper Superchips in one scalable unit
  • Up to 144 GB HBM3e + 480 GB LPDDR5X combined memory, cloud-capable, high volume, low latency and large batches, able to fit a 70B+ parameter model in a single node.
  • 400G InfiniBand or 400GbE Ethernet switch network fabric with highly scalable backbone leaf topology
  • Up to 8 built-in E1.S NVMe storage devices per node
  • Adaptable AI data pipeline storage fabric with NVIDIA BlueField  -3 DPUs and industry-leading parallel file system options to deliver high-throughput and low-latency storage access to any GPU.
  • NVIDIA AI Enterprise 5.0-software

With the highest network performance possible for GPU-GPU connectivity, Supermicro‘s SuperCluster solutions are optimized for high-volume, large-batch LLM training, deep learning, and inference. Supermicro’s L11 and L12 validation testing combined with on-site deployment service provides customers with a seamless experience. Customers receive plug-and-play scalable units for easy data center deployment and faster results.

SOURCE: PRNewsWire