NVIDIA GB200 architecture

NVIDIA GB200 NVL72: Blackwell at rack scale.

A technical guide for datacentre operators and infrastructure architects evaluating the GB200 NVL72 — the rack-level system, the NVLink Switch fabric, the liquid-cooling reality, and what changes from H100.

Rack anatomy

What is actually inside an NVL72.

The unit of design is no longer the server. It is the rack — and every mechanical, electrical and network decision flows from that.

insight

72 GPUs as one NVLink domain

An NVL72 rack pairs 36 Grace CPUs with 72 Blackwell GPUs across 18 compute trays. All 72 GPUs share a single NVLink5 fabric — 1.8 TB/s per GPU, fully connected through nine NVLink Switch trays. From a software perspective the rack behaves like one very large accelerator with 13.5 TB of unified HBM3e.

insight

Compute and switch trays

Each 1U compute tray holds two GB200 superchips: two Grace CPUs and four Blackwell GPUs per tray, connected on-board via NVLink-C2C at 900 GB/s. Nine 1U NVLink Switch trays sit in the middle of the rack and provide the all-to-all GPU fabric. The remaining U-space is power shelves, NIC trays and cable management.

insight

Cartridge-style cabling, not optics

Intra-rack NVLink runs over copper cable cartridges — over 5,000 individual cables, ~2 miles of copper per rack. Copper at this density is only possible because everything is in one rack; that is the whole reason NVL72 is a rack-scale product rather than a row-scale one.

NVLink Switch System

One NVLink domain, 72 GPUs, 130 TB/s.

The scale-up fabric is the headline architectural change. It is what lets a single rack train and serve frontier models without the scale-out tax.

insight

NVLink Switch System

Five generations in, NVLink is no longer a point-to-point GPU interconnect. Each Switch tray contains two NVLink5 ASICs delivering 7.2 TB/s of switching. Across nine trays the rack provides 130 TB/s of all-to-all GPU bandwidth — roughly an order of magnitude beyond a comparable InfiniBand fabric of the same node count.

insight

Why it matters for training

Tensor and expert parallelism are bandwidth-bound at scale. Keeping a 72-GPU model-parallel group inside one NVLink domain removes the need to spill collective traffic onto the slower scale-out fabric. For trillion-parameter models this is the difference between linear and sub-linear scaling past 8k GPUs.

insight

Why it matters for inference

Disaggregated inference (prefill on one GPU set, decode on another) needs fast KV-cache movement between phases. NVL72 lets a single rack host the entire serving pipeline of a frontier model with KV transfers staying on NVLink rather than crossing an Ethernet or InfiniBand boundary.

Liquid cooling

120 kW per rack is not an air-cooled problem.

The cooling design is not optional polish — it is the gating constraint on whether a site can host Blackwell at all.

insight

Direct-to-chip liquid is mandatory

A populated NVL72 rack draws ~120 kW. Air cooling is not an option — every GPU, CPU, NVSwitch and voltage regulator carries a cold plate. Manifolds at the back of the rack feed parallel loops to each tray; warm water return temperatures of 40–45°C make heat reuse credible for the first time at this scale.

insight

Facility implications

Each rack needs a secondary fluid loop sized for 120 kW plus headroom, a CDU upstream (typically one MW-class CDU per 6–10 racks), and floor loading that accepts ~1.4 tonnes per rack. Most enterprise halls cannot accept NVL72 without retrofit; new builds spec 130–200 kW per rack as the baseline.

insight

What can still go wrong

Quick-disconnect leaks, biological growth in warm water loops, and CDU controls integration are the recurring failure modes. Commissioning the fluid loop and proving leak-free operation under load is now a six-to-eight-week activity that belongs on the critical path, not at the end.

H100 → Blackwell

What changes when you move from HGX H100 to GB200 NVL72.

Side-by-side at the levels that drive facility, fabric and software decisions.

DimensionHGX H100GB200 NVL72
GPUs per node8 (HGX H100)72 (NVL72 rack)
Intra-node interconnectNVLink4, 900 GB/s per GPUNVLink5, 1.8 TB/s per GPU
Unified memory per node640 GB HBM313.5 TB HBM3e
Per-GPU FP8 dense~2 PFLOPS~5 PFLOPS
Per-rack power30–40 kW typical~120 kW
CoolingAir or hybrid liquidDirect-to-chip liquid only
Scale-out NICConnectX-7, 400 Gb/sConnectX-8 / BlueField-3, 800 Gb/s
FAQ

Questions we hear most about GB200.

A 72-GPU, 36-CPU rack-scale system from NVIDIA built on the Blackwell architecture. All 72 GPUs are connected through the NVLink Switch System into a single NVLink domain, presenting ~13.5 TB of unified HBM3e to software. It is sold as a complete rack with integrated power, NVLink switching and liquid cooling.
Next

Planning a Blackwell-class deployment?

setloop.io helps operators and end-users design facility, power and fabric for GB200 NVL72 and successor systems — and evaluate vendor proposals before commitment.