NVIDIA GB200 NVL72: Blackwell at rack scale.
A technical guide for datacentre operators and infrastructure architects evaluating the GB200 NVL72 — the rack-level system, the NVLink Switch fabric, the liquid-cooling reality, and what changes from H100.
What is actually inside an NVL72.
The unit of design is no longer the server. It is the rack — and every mechanical, electrical and network decision flows from that.
72 GPUs as one NVLink domain
An NVL72 rack pairs 36 Grace CPUs with 72 Blackwell GPUs across 18 compute trays. All 72 GPUs share a single NVLink5 fabric — 1.8 TB/s per GPU, fully connected through nine NVLink Switch trays. From a software perspective the rack behaves like one very large accelerator with 13.5 TB of unified HBM3e.
Compute and switch trays
Each 1U compute tray holds two GB200 superchips: two Grace CPUs and four Blackwell GPUs per tray, connected on-board via NVLink-C2C at 900 GB/s. Nine 1U NVLink Switch trays sit in the middle of the rack and provide the all-to-all GPU fabric. The remaining U-space is power shelves, NIC trays and cable management.
Cartridge-style cabling, not optics
Intra-rack NVLink runs over copper cable cartridges — over 5,000 individual cables, ~2 miles of copper per rack. Copper at this density is only possible because everything is in one rack; that is the whole reason NVL72 is a rack-scale product rather than a row-scale one.
One NVLink domain, 72 GPUs, 130 TB/s.
The scale-up fabric is the headline architectural change. It is what lets a single rack train and serve frontier models without the scale-out tax.
NVLink Switch System
Five generations in, NVLink is no longer a point-to-point GPU interconnect. Each Switch tray contains two NVLink5 ASICs delivering 7.2 TB/s of switching. Across nine trays the rack provides 130 TB/s of all-to-all GPU bandwidth — roughly an order of magnitude beyond a comparable InfiniBand fabric of the same node count.
Why it matters for training
Tensor and expert parallelism are bandwidth-bound at scale. Keeping a 72-GPU model-parallel group inside one NVLink domain removes the need to spill collective traffic onto the slower scale-out fabric. For trillion-parameter models this is the difference between linear and sub-linear scaling past 8k GPUs.
Why it matters for inference
Disaggregated inference (prefill on one GPU set, decode on another) needs fast KV-cache movement between phases. NVL72 lets a single rack host the entire serving pipeline of a frontier model with KV transfers staying on NVLink rather than crossing an Ethernet or InfiniBand boundary.
120 kW per rack is not an air-cooled problem.
The cooling design is not optional polish — it is the gating constraint on whether a site can host Blackwell at all.
Direct-to-chip liquid is mandatory
A populated NVL72 rack draws ~120 kW. Air cooling is not an option — every GPU, CPU, NVSwitch and voltage regulator carries a cold plate. Manifolds at the back of the rack feed parallel loops to each tray; warm water return temperatures of 40–45°C make heat reuse credible for the first time at this scale.
Facility implications
Each rack needs a secondary fluid loop sized for 120 kW plus headroom, a CDU upstream (typically one MW-class CDU per 6–10 racks), and floor loading that accepts ~1.4 tonnes per rack. Most enterprise halls cannot accept NVL72 without retrofit; new builds spec 130–200 kW per rack as the baseline.
What can still go wrong
Quick-disconnect leaks, biological growth in warm water loops, and CDU controls integration are the recurring failure modes. Commissioning the fluid loop and proving leak-free operation under load is now a six-to-eight-week activity that belongs on the critical path, not at the end.
What changes when you move from HGX H100 to GB200 NVL72.
Side-by-side at the levels that drive facility, fabric and software decisions.
| Dimension | HGX H100 | GB200 NVL72 |
|---|---|---|
| GPUs per node | 8 (HGX H100) | 72 (NVL72 rack) |
| Intra-node interconnect | NVLink4, 900 GB/s per GPU | NVLink5, 1.8 TB/s per GPU |
| Unified memory per node | 640 GB HBM3 | 13.5 TB HBM3e |
| Per-GPU FP8 dense | ~2 PFLOPS | ~5 PFLOPS |
| Per-rack power | 30–40 kW typical | ~120 kW |
| Cooling | Air or hybrid liquid | Direct-to-chip liquid only |
| Scale-out NIC | ConnectX-7, 400 Gb/s | ConnectX-8 / BlueField-3, 800 Gb/s |
Questions we hear most about GB200.
Planning a Blackwell-class deployment?
setloop.io helps operators and end-users design facility, power and fabric for GB200 NVL72 and successor systems — and evaluate vendor proposals before commitment.