NVIDIA HGX AI Supercomputer

The most powerful end-to-end AI supercomputing platform.

Purpose-Built for AI, Simulation, and Data Analytics

AI, complex simulations, and massive datasets require multiple GPUs with extremely fast interconnections and a fully accelerated software stack. The NVIDIA HGX™ AI supercomputing platform brings together the full power of NVIDIA GPUs, NVLink®, NVIDIA networking, and fully optimized AI and high-performance computing (HPC) software stacks to provide the highest application performance and drive the fastest time to insights. 

Unmatched End-to-End Accelerated Computing Platform

NVIDIA HGX H100 combines H100 Tensor Core GPUs with high-speed interconnects to form the world’s most powerful servers. Configurations of up to eight GPUs deliver unprecedented acceleration, with up to 640 gigabytes (GB) of GPU memory and 24 terabytes per second (TB/s) of aggregate memory bandwidth. And a staggering 32 petaFLOPS of performance creates the world’s most powerful accelerated scale-up server platform for AI and HPC.  

HGX H100 includes advanced networking options— at speeds up to 400 gigabits per second (Gb/s)—utilizing NVIDIA Quantum-2 InfiniBand and Spectrum-X™ Ethernet for the highest AI performance. HGX H100 also includes NVIDIA® BlueField®-3 data processing units (DPUs) to enable cloud networking, composable storage, zero-trust security, and GPU compute elasticity in hyperscale AI clouds.

HGX Stack

NVIDIA HGX A100 8-GPU

NVIDIA HGX A100 with 8x A100 GPUs

NVIDIA HGX A100 4-GPU

NVIDIA HGX A100 with 4x A100 GPUs

Deep Learning Training: Performance and Scalability

Up to 4X Higher AI Training on GPT-3

Up to 4X Higher AI Training on GPT-3

NVIDIA H100 GPUs feature the Transformer Engine, with FP8 precision, that provides up to 4X faster training over the prior GPU generation for large language models. The combination of fourth-generation NVIDIA NVLink, which offers 900GB/s of GPU-to-GPU interconnect, NVLink Switch System, which accelerates collective communication by every GPU across nodes, PCIe Gen5, and Magnum IO™ software delivers efficient scalability, from small enterprises to massive unified GPU clusters. These infrastructure advances, working in tandem with the NVIDIA AI Enterprise software suite, make HGX H100 the most powerful end-to-end AI and HPC data center platform.

Deep Learning Inference: Performance and Versatility

Up to 30X Higher AI Inference Performance on the Largest Models

Megatron chatbot inference with 530 billion parameters.

Real-Time Deep Learning Inference

AI solves a wide array of business challenges using an equally wide array of neural networks. A great AI inference accelerator has to not only deliver the highest performance, but also the versatility needed to accelerate these networks in any location—from data center to edge—that customers choose to deploy them.

HGX H100 further extends NVIDIA’s market-leading inference leadership, accelerating inference by up to 30X over the prior generation on Megatron 530 billion parameter chatbots.

HPC Performance

HPC applications need to perform an enormous amount of calculations per second. Increasing the compute density of each server node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors for computation, making GPUs connected by NVIDIA NVLink ideal. HPC applications can also leverage TF32 in A100 to achieve up to 11X higher throughput in four years for single-precision, dense matrix-multiply operations.

An HGX powered by A100 80GB GPUs delivers a 2X throughput increase over A100 40GB GPUs on Quantum Espresso, a materials simulation, boosting time to insight.

Up to 7X Higher Performance for HPC Applications ​

AI-fused HPC Applications

HGX H100 triples the floating-point operations per second (FLOPS) of double-precision Tensor Cores, delivering up to 535 teraFLOPS of FP64 computing for HPC in the 8-GPU configuration or 268 teraFLOPs in the 4-GPU configuration. AI-fused HPC applications can also leverage H100’s TF32 precision to achieve nearly 8,000 teraFLOPS of throughput for single-precision matrix-multiply operations with zero code changes.

H100 features DPX instructions that speed up dynamic programming algorithms—such as Smith-Waterman used in DNA sequence alignment and protein alignment for protein structure prediction—by 7X over NVIDIA Ampere architecture-based GPUs. By increasing the throughput of diagnostic functions like gene sequencing, H100 can enable every clinic to offer accurate, real-time disease diagnosis and precision medicine prescriptions.

Up to 1.8X Higher Performance for HPC Applications

Quantum Espresso​

Up to 1.8X Higher Performance for HPC Applications

Accelerating HGX With NVIDIA Networking

The data center is the new unit of computing, and networking plays an integral role in scaling application performance across it. Paired with NVIDIA Quantum InfiniBand, HGX delivers world-class performance and efficiency, which ensure the full utilization of computing resources.

For AI cloud data centers that deploy Ethernet, HGX is best used with the NVIDIA Spectrum-X networking platform, which powers the highest AI performance over 400Gb/s Ethernet. Featuring NVIDIA Spectrum™-4 switches and BlueField-3 DPUs, Spectrum-X delivers consistent, predictable outcomes for thousands of simultaneous AI jobs at every scale through optimal resource utilization and performance isolation. Spectrum-X enables advanced cloud multi-tenancy and zero-trust security.  As a reference design for NVIDIA Spectrum-X, NVIDIA has designed Israel-1, a hyperscale generative AI supercomputer built with Dell PowerEdge XE9680 servers based on the NVIDIA HGX™ H100 eight-GPU platform, BlueField-3 DPUs and Spectrum-4 switches.

Connecting HGX H100 with NVIDIA Networking

  NVIDIA Quantum-2 InfiniBand
Platform:

Quantum-2 Switch, ConnectX-7 Adapter, BlueField-3 DPU

NVIDIA Spectrum-X
Platform:

Spectrum-4 Switch, BlueField-3 DPU, Spectrum-X License

NVIDIA Spectrum Ethernet Platform:

Spectrum Switch, ConnectX Adapter, BlueField DPU

DL Training BEST BETTER GOOD
Scientific Sim BEST BETTER GOOD
Data Analytics BEST BETTER GOOD
DL Inference BEST BETTER GOOD

NVIDIA HGX Specifications

NVIDIA HGX is available in single baseboards with four or eight H100 GPUs or four or eight A100 GPUs. These powerful combinations of hardware and software lay the foundation for unprecedented AI supercomputing performance.

  HGX H100
  4-GPU 8-GPU
GPUs HGX H100 4-GPU HGX H100 8-GPU
Form factor 4x NVIDIA H100 SXM 8x NVIDIA H100 SXM
HPC and AI compute (FP64/TF32/FP16/FP8/INT8) 268TF/4PF/8PF/16PF/16POPS 535TF/8PF/16PF/32PF/32POPS
Memory Up to 320GB Up to 640GB
NVLink Fourth generation Fourth generation
NVSwitch N/A Third generation
NVLink Switch N/A N/A
NVSwitch GPU-to-GPU bandwidth N/A 900 GB/s
Total aggregate bandwidth 3.6 TB/s 7.2 TB/s
  HGX A100
  4-GPU 8-GPU
GPUs HGX A100 4-GPU HGX A100 8-GPU
Form factor 4x NVIDIA A100 SXM 8x NVIDIA A100 SXM
HPC and AI compute (FP64/TF32/FP16/INT8) 78TF/1.25PF/2.5PF/5POPS 156TF/2.5PF/5PF/10POPS
Memory Up to 320GB Up to 640GB
NVLink Third generation Third generation
NVSwitch N/A Second generation
NVSwitch GPU-to-GPU bandwidth N/A 600 GB/s
Total aggregate bandwidth 2.4 TB/s 4.8 TB/s

HGX-1 and HGX-2 Reference Architectures

Powered by NVIDIA GPUs and NVLINK

NVIDIA HGX-1 and HGX-2 are reference architectures that standardize the design of data centers accelerating AI and HPC. Built with NVIDIA SXM2 V100 boards, with NVIDIA NVLink and NVSwitch interconnect technologies, HGX reference architectures have a modular design that works seamlessly in hyperscale and hybrid data centers to deliver up to 2 petaFLOPS of compute power for a quick, simple path to AI and HPC.

Powered by NVIDIA GPUs and NVLINK

Specifications

8-GPU
HGX-1 
16-GPU
HGX-2 
GPUs 8x NVIDIA V100 16x NVIDIA V100
AI Compute 1 petaFLOPS (FP16) 2 petaFLOPS (FP16)
Memory 256 GB 512 GB
NVLink 2nd generation 2nd generation
NVSwitch N/A Yes
NVSwitch GPU-to-GPU Bandwidth N/A 300 GB/s
Total Aggregate Bandwidth 2.4 TB/s 4.8 TB/s

Find out more about the NVIDIA H100 GPU