This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

NVIDIA HGX AI Supercomputer

The most powerful end-to-end AI supercomputing platform.

Purpose-Built for AI, Simulation, and Data Analytics

AI, complex simulations, and massive datasets require multiple GPUs with extremely fast interconnections and a fully accelerated software stack. The NVIDIA HGX™ AI supercomputing platform brings together the full power of NVIDIA GPUs, NVLink®, NVIDIA networking, and fully optimized AI and high-performance computing (HPC) software stacks to provide the highest application performance and drive the fastest time to insights.

Read NVIDIA HGX H100 Datasheet

Read NVIDIA HGX A100 80GB Datasheet

Unmatched End-to-End Accelerated Computing Platform

NVIDIA HGX H100 combines H100 Tensor Core GPUs with high-speed interconnects to form the world’s most powerful servers. Configurations of up to eight GPUs deliver unprecedented acceleration, with up to 640 gigabytes (GB) of GPU memory and 24 terabytes per second (TB/s) of aggregate memory bandwidth. And a staggering 32 petaFLOPS of performance creates the world’s most powerful accelerated scale-up server platform for AI and HPC.

HGX H100 includes advanced networking options— at speeds up to 400 gigabits per second (Gb/s)—utilizing NVIDIA Quantum-2 InfiniBand and Spectrum-X™ Ethernet for the highest AI performance. HGX H100 also includes NVIDIA® BlueField®-3 data processing units (DPUs) to enable cloud networking, composable storage, zero-trust security, and GPU compute elasticity in hyperscale AI clouds.

NVIDIA HGX A100 8-GPU

NVIDIA HGX A100 4-GPU

Deep Learning Training: Performance and Scalability

Up to 4X Higher AI Training on GPT-3

GPT-3 175B training NVIDIA A100 Tensor Core GPU cluster: NVIDIA Quantum InfiniBand network, H100 cluster: NVIDIA Quantum-2 InfiniBand network | Mixture of Experts (MoE) training transformer switch-XXL variant with 395B parameters on 1T token dataset, A100 cluster: NVIDIA Quantum InfiniBand network, H100 cluster: NVIDIA Quantum-2 InfiniBand network with NVLink Switch System where indicated. (Note: H100 systems offering NVLink NVSwitch System are not currently available.)

NVIDIA H100 GPUs feature the Transformer Engine, with FP8 precision, that provides up to 4X faster training over the prior GPU generation for large language models. The combination of fourth-generation NVIDIA NVLink, which offers 900GB/s of GPU-to-GPU interconnect, NVLink Switch System, which accelerates collective communication by every GPU across nodes, PCIe Gen5, and Magnum IO™ software delivers efficient scalability, from small enterprises to massive unified GPU clusters. These infrastructure advances, working in tandem with the NVIDIA AI Enterprise software suite, make HGX H100 the most powerful end-to-end AI and HPC data center platform.

Deep Learning Inference: Performance and Versatility

Up to 30X Higher AI Inference Performance on the Largest Models

Megatron chatbot inference with 530 billion parameters.

Inference on Megatron 530B parameter model chatbot for input sequence length = 128, output sequence length = 20 , A100 cluster: NVIDIA Quantum InfiniBand network; H100 cluster: NVIDIA Quantum-2 InfiniBand network for 2x HGX H100 configurations; 4x HGX A100 vs. 2x HGX H100 for 1 and 1.5 sec ; 2x HGX A100 vs. 1x HGX H100 for 2 sec.

AI solves a wide array of business challenges using an equally wide array of neural networks. A great AI inference accelerator has to not only deliver the highest performance, but also the versatility needed to accelerate these networks in any location—from data center to edge—that customers choose to deploy them.

HGX H100 further extends NVIDIA’s market-leading inference leadership, accelerating inference by up to 30X over the prior generation on Megatron 530 billion parameter chatbots.

HPC Performance

HPC applications need to perform an enormous amount of calculations per second. Increasing the compute density of each server node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors for computation, making GPUs connected by NVIDIA NVLink ideal. HPC applications can also leverage TF32 in A100 to achieve up to 11X higher throughput in four years for single-precision, dense matrix-multiply operations.

An HGX powered by A100 80GB GPUs delivers a 2X throughput increase over A100 40GB GPUs on Quantum Espresso, a materials simulation, boosting time to insight.

Up to 7X Higher Performance for HPC Applications

HGX H100 triples the floating-point operations per second (FLOPS) of double-precision Tensor Cores, delivering up to 535 teraFLOPS of FP64 computing for HPC in the 8-GPU configuration or 268 teraFLOPs in the 4-GPU configuration. AI-fused HPC applications can also leverage H100’s TF32 precision to achieve nearly 8,000 teraFLOPS of throughput for single-precision matrix-multiply operations with zero code changes.

H100 features DPX instructions that speed up dynamic programming algorithms—such as Smith-Waterman used in DNA sequence alignment and protein alignment for protein structure prediction—by 7X over NVIDIA Ampere architecture-based GPUs. By increasing the throughput of diagnostic functions like gene sequencing, H100 can enable every clinic to offer accurate, real-time disease diagnosis and precision medicine prescriptions.

Up to 1.8X Higher Performance for HPC Applications

Quantum Espresso

Quantum Espresso measured using CNT10POR8 dataset, precision = FP64.

Accelerating HGX With NVIDIA Networking

The data center is the new unit of computing, and networking plays an integral role in scaling application performance across it. Paired with NVIDIA Quantum InfiniBand, HGX delivers world-class performance and efficiency, which ensure the full utilization of computing resources.

For AI cloud data centers that deploy Ethernet, HGX is best used with the NVIDIA Spectrum-X networking platform, which powers the highest AI performance over 400Gb/s Ethernet. Featuring NVIDIA Spectrum™-4 switches and BlueField-3 DPUs, Spectrum-X delivers consistent, predictable outcomes for thousands of simultaneous AI jobs at every scale through optimal resource utilization and performance isolation. Spectrum-X enables advanced cloud multi-tenancy and zero-trust security. As a reference design for NVIDIA Spectrum-X, NVIDIA has designed Israel-1, a hyperscale generative AI supercomputer built with Dell PowerEdge XE9680 servers based on the NVIDIA HGX™ H100 eight-GPU platform, BlueField-3 DPUs and Spectrum-4 switches.

Connecting HGX H100 with NVIDIA Networking

	NVIDIA Quantum-2 InfiniBand Platform: Quantum-2 Switch, ConnectX-7 Adapter, BlueField-3 DPU	NVIDIA Spectrum-X Platform: Spectrum-4 Switch, BlueField-3 DPU, Spectrum-X License	NVIDIA Spectrum Ethernet Platform: Spectrum Switch, ConnectX Adapter, BlueField DPU
DL Training	BEST	BETTER	GOOD
Scientific Sim	BEST	BETTER	GOOD
Data Analytics	BEST	BETTER	GOOD
DL Inference	BEST	BETTER	GOOD

NVIDIA HGX Specifications

NVIDIA HGX is available in single baseboards with four or eight H100 GPUs or four or eight A100 GPUs. These powerful combinations of hardware and software lay the foundation for unprecedented AI supercomputing performance.

	HGX H100
	4-GPU	8-GPU
GPUs	HGX H100 4-GPU	HGX H100 8-GPU
Form factor	4x NVIDIA H100 SXM	8x NVIDIA H100 SXM
HPC and AI compute (FP64/TF32/FP16/FP8/INT8)	268TF/4PF/8PF/16PF/16POPS	535TF/8PF/16PF/32PF/32POPS
Memory	Up to 320GB	Up to 640GB
NVLink	Fourth generation	Fourth generation
NVSwitch	N/A	Third generation
NVLink Switch	N/A	N/A
NVSwitch GPU-to-GPU bandwidth	N/A	900 GB/s
Total aggregate bandwidth	3.6 TB/s	7.2 TB/s

	HGX A100
	4-GPU	8-GPU
GPUs	HGX A100 4-GPU	HGX A100 8-GPU
Form factor	4x NVIDIA A100 SXM	8x NVIDIA A100 SXM
HPC and AI compute (FP64/TF32/FP16/INT8)	78TF/1.25PF/2.5PF/5POPS	156TF/2.5PF/5PF/10POPS
Memory	Up to 320GB	Up to 640GB
NVLink	Third generation	Third generation
NVSwitch	N/A	Second generation
NVSwitch GPU-to-GPU bandwidth	N/A	600 GB/s
Total aggregate bandwidth	2.4 TB/s	4.8 TB/s

HGX-1 and HGX-2 Reference Architectures

Powered by NVIDIA GPUs and NVLINK

NVIDIA HGX-1 and HGX-2 are reference architectures that standardize the design of data centers accelerating AI and HPC. Built with NVIDIA SXM2 V100 boards, with NVIDIA NVLink and NVSwitch interconnect technologies, HGX reference architectures have a modular design that works seamlessly in hyperscale and hybrid data centers to deliver up to 2 petaFLOPS of compute power for a quick, simple path to AI and HPC.

Specifications

	8-GPU HGX-1	16-GPU HGX-2
GPUs	8x NVIDIA V100	16x NVIDIA V100
AI Compute	1 petaFLOPS (FP16)	2 petaFLOPS (FP16)
Memory	256 GB	512 GB
NVLink	2nd generation	2nd generation
NVSwitch	N/A	Yes
NVSwitch GPU-to-GPU Bandwidth	N/A	300 GB/s
Total Aggregate Bandwidth	2.4 TB/s	4.8 TB/s

Find out more about the NVIDIA H100 GPU

Learn More