NVIDIA DGX SuperPOD: Next Generation Scalable Infrastructure for AI Leadership Reference Architecture Featuring NVIDIA DGX GB200

Release: latest

Company: NVIDIA Corporation

Publication Date: June 16, 2025

1. Abstract

The NVIDIA DGX SuperPOD architecture is engineered to power next-generation AI factories, offering unparalleled performance, scalability, and innovation. It serves as a physical twin to NVIDIA's internal research and development systems, ensuring that infrastructure software, applications, and support are rigorously tested and validated on this architecture. This Reference Architecture (RA) is based on DGX GB200 systems, featuring Grace CPUs and Blackwell GPUs. The architecture is built around Scalable Units (SUs), with each SU containing 8 DGX GB200 systems, enabling flexible deployment sizes. The RA provides detailed specifications for SU design, InfiniBand and NVLink network topologies, Ethernet fabric configurations, storage system requirements, and rack layouts.

Key NVIDIA technologies powering DGX SuperPOD include:

NVIDIA DGX GB200 systems: High-performance computational building blocks for AI and HPC.
NVIDIA NDR (400 Gbps) InfiniBand: High-performance, low-latency, and scalable network interconnect.
NVIDIA Spectrum-4 (800 Gbps) Ethernet: High-performance, low-latency, and scalable Ethernet connectivity for storage.
NVIDIA fifth-generation NVLink® technology: A high-speed interconnect for CPU and GPU processors, delivering unprecedented performance for demanding communication patterns.
NVIDIA Mission Control: A unified operations and orchestration software stack for managing AI factories.

The NVIDIA DGX SuperPOD provides the world's most efficient AI infrastructure, featuring 288 Grace CPUs, 576 Blackwell GPUs, 240TB Fast Memory, delivering 11.5 ExaFLOPS FP4 performance, 30X inference acceleration, 4X training speed, and 25X energy savings.

2. Key Components of the DGX SuperPOD

The DGX SuperPOD architecture is designed to meet the high demands of AI factories for the era of reasoning. AI factories require specialized components like high-performance GPUs and CPUs, advanced networking, and cooling systems to support intensive computational needs. These factories excel at AI reasoning, enabling faster, more accurate decision-making across industries. NVIDIA's end-to-end accelerated computing platform optimizes for energy efficiency while accelerating AI inference performance, helping enterprises deploy secure, future-ready AI infrastructure.

DGX SuperPOD integrates key NVIDIA components and partner-certified storage solutions. By leveraging Scalable Units (SUs), DGX SuperPOD reduces AI factory deployment times from months to weeks, accelerating time-to-solution and time-to-market for next-generation models and applications.

DGX SuperPOD is deployed on-premises, meaning the customer owns and manages the hardware within their data center or a co-located facility. The customer is responsible for their cluster infrastructure and service provision.

The key components of DGX GB200 SuperPOD are:

NVIDIA DGX GB200 NVL72 rack system
NVIDIA InfiniBand
Mission Control software platform

2.1. NVIDIA DGX GB200 Rack System

The NVIDIA DGX GB200 system is an AI powerhouse enabling enterprises to expand business innovation and optimization. Each DGX GB200 delivers breakthrough AI performance in a rack-scale, 72 GPU configuration. The NVIDIA Blackwell GPU architecture provides the latest technologies, reducing computational effort from months to days and hours for large AI/ML workloads. The DGX GB200 rack system employs a sophisticated hybrid cooling solution, with liquid cooling for power-intensive components like GPUs and CPUs, and air cooling for others, to manage substantial heat generation and optimize datacenter space.

2.1.1. DGX GB200 Compute Tray

The compute nodes for the DGX GB200 rack system utilize a 72x1 NVLink topology, featuring 72 GPUs in a single NVLink domain. Each DGX GB200 rack system contains 18 compute trays. Each compute tray houses two GB200 Superchips, with each Superchip comprising two B200 GPUs and one Grace CPU. A coherent chip-to-chip interconnect, NVLink-C2C, links the two Superchips, allowing them to function as a single logical unit for a single OS instance.

Each compute tray integrates four ConnectX-7 (CX-7) NICs for InfiniBand NDR (400Gbps) connectivity across racks and two BlueField-3 (BF3) NICs for 2x200Gbps connectivity for In-band Management and Storage networks. All network ports are front-facing for cold aisle access.

Compute trays provide local storage with 4x 3.84 TB E1.S NVMe drives in RAID0 and a 1.92TB M.2 NVMe for the OS image.

The block diagram of the GB200 compute tray shows two GB200 Superchips, each combining two NVIDIA B200 Tensor Core GPUs and one NVIDIA Grace CPU, connected via a 900GB/s ultra-low-power NVLink-C2C interconnect.

2.1.2. NVLink Switch Tray

Each DGX GB200 NVL72 system rack includes nine NVLink switch trays. These trays provide full-mesh connectivity between all 72 GPUs within the same DGX GB200 rack via blind-mate connectors. Each switch tray features two COMe RJ45 ports for NVOS and one BMC module for out-of-band management with one RJ45 port.

2.1.3. DGX Power Shelves

The power shelf for DGX GB200 SuperPOD is equipped with six 5.5 kW PSUs in N+1 redundancy, delivering up to 33 kW of power. A single DGX GB200 NVL72 rack system has eight power shelves. RJ45 ports at the rear facilitate power braking and current sharing, with daisy-chaining capabilities. The front of the power shelf includes a BMC port.

2.2. NVIDIA InfiniBand Technology

InfiniBand is a high-performance, low-latency, RDMA-capable networking technology with a 20-year track record in demanding compute environments. The latest generation, NDR, offers a peak speed of 400 Gbps per direction with extremely low port-to-port latency and backward compatibility. Key features include adaptive routing (AR), collective communication with SHARP™, dynamic network healing with SHIELD™, and support for various network topologies like fat-tree, Dragonfly, and multi-dimensional Torus for building large network fabrics.

2.3. NVIDIA Mission Control

NVIDIA Mission Control is the software used to manage all DGX GB200 SuperPOD deployments, representing best practices for building high-performance AI factories. It is a sophisticated full-stack software solution that optimizes developer workload performance and resiliency, ensures continuous uptime with automated failure handling, and provides unified cluster-scale telemetry and manageability. Key features include full-stack resiliency, predictive maintenance, unified error reporting, data center optimizations, cluster health checks, and automated node management.

Mission Control incorporates the technology NVIDIA uses to manage thousands of systems, providing an immediate path to a TOP500 supercomputer for organizations requiring top-tier performance.

DGX SuperPOD is deployed on-premises, with the customer owning and managing the hardware, including responsibility for cluster infrastructure and building management system integration.

2.4. Components

Hardware Components:

Component	Technology	Description
8x Compute Racks	NVIDIA DGX GB200 NVL72	The world's premier rack-scale, purpose-built AI systems featuring NVIDIA Grace-Blackwell GB200 modules, GB200 Superchips, NVL72 scale-out GPU interconnect, and integrated NVLink switch trays.
NVLink Fabric	NVIDIA NVLink 5	NVLink Switches support fast, direct memory access between GPUs on the same compute rack.
Compute Fabric	NVIDIA Quantum QM9700 InfiniBand switches	Rail-optimized, non-blocking, full fat-tree network with eight NDR400 connections per system for cross-rack GPU communications.
Storage and Inband Management Fabric	NVIDIA Spectrum 4 SN5600 Ethernet switches	Fabric optimized to match peak performance of the configured storage array, built with 64-port 800 Gbps Ethernet switches providing high port density and performance.
InfiniBand management	NVIDIA Unified Fabric Manager Appliance, Enterprise Edition	NVIDIA UFM combines enhanced, real-time network telemetry with AI-powered cyber intelligence and analytics to manage scale-out InfiniBand data centers.
NVLink Management	NVIDIA Network Manager eXperience (NMX)	NVIDIA NMX manages and operates NVLink switches, providing real-time network telemetry for all NVLink infrastructure.
Out-of-band (OOB) management network	NVIDIA SN2201 switch	48x 1 Gbps Ethernet switch leveraging copper ports to minimize complexity.

Software Components:

Component	Description
Mission Control Software	NVIDIA Mission Control software provides a full-stack data center solution for NVIDIA DGX SuperPOD deployments. It integrates management and operational capabilities into a unified platform for simplified, large-scale control. It leverages NVIDIA Run:ai for seamless workload orchestration.
NVIDIA AI Enterprise	An end-to-end, cloud-native software platform that accelerates data science pipelines and streamlines the development and deployment of production-grade LLMs and other generative AI applications.
Magnum IO	Enables increased performance for AI and HPC.
NVIDIA NGC	The NGC catalog provides a collection of GPU-optimized containers for AI and HPC.
Slurm	A workload manager used to manage complex workloads in multinode, batch-style compute environments.

2.5. Design Requirements

DGX SuperPOD is designed to minimize system bottlenecks, ensuring optimal performance and application scalability. The architecture is modular, based on Scalable Units (SUs) of 8 DGX GB200 systems (NVL72 rack systems). A single SU is fully tested, and larger deployments can be built based on customer requirements. The rack-level integrated design facilitates rapid installation and deployment of liquid-cooled, high-density compute racks. Storage partner equipment is certified for DGX SuperPOD environments. Full system support, including compute, storage, network, and software, is provided by NVIDIA Enterprise Support (NVEX).

2.5.1. System Design

DGX SuperPOD is optimized for multi-node AI and HPC applications. Key design principles include:

Modular architecture based on Scalable Units (SUs) of 8 DGX GB200 systems (NVL72 rack systems).
Scalability to two SUs for fully tested configurations, with larger deployments possible based on customer needs.
Rack-level integrated design for rapid deployment of liquid-cooled, high-density compute racks.
Certified storage partner equipment for DGX SuperPOD environments.
Comprehensive system support from NVIDIA Enterprise Support (NVEX).

2.5.2. Compute Fabric

Rail-optimized to the top layer of the fabric.
Balanced, full-fat tree topology.
Designed with state-of-the-art, high-performance, low-latency network switches, supporting future networking hardware.
Managed NDR switches for enhanced fabric management.
Designed to support the latest SHARPv3 features.

2.5.3. Storage Fabric (High Speed Storage)

The storage fabric provides high bandwidth to shared storage with the following characteristics:

Independent of the compute fabric to maximize performance for both storage and applications.
Provides single-node line-rate of 2x 200 Gbps to each DGX GB200 compute tray.
Storage access is provided over RDMA over Converged Ethernet (RoCE) for maximum performance and minimal CPU overhead.
User-accessible management nodes provide access to shared storage.

2.5.4. In-Band Management Network

Ethernet-based fabric used for node provisioning, data movement, internet access, and other user-accessible services.
In-band management network connections for compute and management nodes operate at 200 Gbps and are bonded for resiliency.

2.5.5. Out-of-Band Management Network

The OOB management network connects all baseboard management controllers (BMC), BlueField BMCs, NVSwitch Management Interfaces (COMe), and other devices requiring physical isolation from system users.

2.5.6. Storage Requirements

DGX SuperPOD requires a high-performance, balanced storage system. It is designed to utilize two separate storage systems: High-Performance Storage (HPS) and User Storage, optimized for throughput, parallel I/O, IOPS, and metadata workloads.

2.5.6.1 High-Performance Storage

HPS is provided via RDMA over Converged Ethernet v2 (RoCEv2) connected storage from a DGX SuperPOD certified storage partner. It is engineered and tested with the following attributes:

High-performance, resilient, POSIX-style file system optimized for multi-threaded read/write operations across multiple nodes.
Certified for Grace-based systems.
Native RoCE support.
Local system RAM for transparent data caching.
Leverages local flash devices for transparent read/write caching.

The specific storage fabric topology, capacity, and components are determined by the DGX SuperPOD certified storage partner.

2.5.6.2 User Storage

User Storage exposes an NFS share on the in-band management fabric. It is typically used for home directories, administrative scratch space, shared storage for High Availability configurations (e.g., Mission Control), log file collection, and system configuration files.

User Storage requirements include:

High metadata performance, IOPS, and enterprise features like log collection and data capacity.
Communication over Ethernet using NFS.
100GbE DR1 Connectivity.

User storage is often satisfied by existing NFS servers, with a new export made accessible to the DGX SuperPOD's in-band management network.

3. DGX SuperPOD Architecture

The DGX SuperPOD architecture integrates DGX systems, InfiniBand and Ethernet networking, management nodes, and storage. A single SU (Scalable Unit) requires a Thermal Design Power (TDP) of 1.2 Megawatts (MW). Data centers should meet or exceed Uptime Institute Tier 3 standards, TIA-942-B Rated 3, or EN50600 Availability Class 3, ensuring concurrent maintainability and no single point of failure.

This reference architecture focuses on the design of a single SU (8 DGX GB200 rack systems). DGX SuperPOD can scale to configurations exceeding 128 racks with 9216 GPUs. Contact your NVIDIA representative for information on DGX SuperPOD solutions of four SUs or more.

4. Network Fabrics

Building systems by SU provides efficient designs. If different node counts are required, the fabric should be designed to support the full SU, including leaf switches and cables, leaving unused portions for future expansion. This ensures optimal traffic routing and consistent performance.

DGX SuperPOD configurations utilize five network fabrics:

NVLink 5
Compute Fabric
Storage Fabric
In-Band Management Network
Out-of-Band Management Network

These segments are carried by four physical fabrics:

Multi-node NVLink Fabric (MN-NVL)
Compute InfiniBand Fabric
Storage and In-band Ethernet Fabric
Out-of-Band network

4.1. Multi-node NVLink Fabric (NVL5)

Each DGX GB200 rack is built with 18 compute trays and 9 NVLink switch trays. Each NVLink switch tray contains 2 NVLink switch chips, providing full-mesh connectivity between all 72 GPUs within the same DGX GB200 rack. Each B200 GPU features 18 NVL5 links, with one dedicated link to each of the 18 switch chips, delivering 1.8 TB/s of low-latency bandwidth.

The compute fabric ports on the DGX B200 CPU tray use a two-port transceiver for access to all eight GPUs. Each pair of in-band management and storage ports provides parallel pathways into the system for increased performance. The OOB port is used for BMC access.

4.2. Compute Fabric

The compute fabric layout for a full GB200 SuperPOD Scalable Unit (8 DGX GB200 systems) ensures traffic per rail is one hop away from other compute trays in the same SU. Traffic between different compute racks or rails traverses the spine layer. For designs larger than one SU, a spine-leaf-group (SLG) based scalable design supports up to 16 SUs. Each SU contains 4 SLGs, with 8 leaf switches (one per compute rack) and 6 spine switches per SLG, forming a non-blocking fat-tree topology.

Table 4.1: Larger SuperPOD component counts

# GPUs	# SUs	# Core Group	Switch per Core Group	Core Switch	IB Leaf Switch		IB Spine Switch
					Per SU	Total	Per SU	Total
1152	2	6	3	18	32	64	24	48
2304	4	6	6	36	32	128	24	96
4608	8	6	12	72	32	256	24	192
9216	16	6	24	144	32	512	24	384

4.3. Storage and In-band Ethernet Fabric

DGX GB200 introduces a new generation of Ethernet-based fabric for storage and in-band networking, enhancing cost-efficiency while maintaining high performance. This fabric uses SN5600 and SN2201 switches. Each SU features two SN5600 switches as the aggregation layer, aggregating leaf switches connected to DGX nodes, storage, and out-of-band connections. DGX compute trays connect to SN5600 leaf switches at 4x 200 GbE via their BlueField 3 DPUs. Additional SN5600 switches serve as ingestion points for Storage Appliances and control plane nodes, while SN2201 switches connect legacy devices.

For scale-out designs up to 16 SUs, a third layer of switches, known as super spines, is added. The super spine is designed with 2 groups, with each spine expected to have 28x 800 GbE uplinks to maintain non-blocking characteristics for disaggregated storage.

Table 4.2: Spine and Super Spine Switch Requirements for Scale Out

	# GPUs	# SUs	# Super Spine Groups		# Super Spine Switches	Spine Switches		IB Spine Switch
			Per Group	Total		Total	Per SU	Total
2304	4	2	2	4	8	64	24	48
4608	8	2	4	8	16	128	24	96
9216	16	2	7	14	32	256	24	192
9216	16	6	24	144	32	512	24	384

4.4. Network Segmentation of the Ethernet Fabric

The Ethernet fabric is segmented into the following networks:

Storage Network
In-band Network
Out-of-Band Management Network

4.4.1. Storage Network

The storage network provides high-speed storage performance and high availability. Two of the four available ports on each BlueField-3 DPU are dedicated for storage access. The physical Ethernet fabric carries a dedicated VXLAN with termination points on the leaf switches for DGX nodes' storage NICs. A pair of SN5600 leaf switches in each SU provides connectivity to storage appliances. RoCE is a requirement for storage appliances, benefiting from advanced fabric management features like congestion control and adaptive routing.

Each scalable unit is designed to carry 16x 800 Gbps non-blocking bandwidth to storage appliances. On the DGX node side, each scalable unit carries a slightly blocking fabric with a blocking factor of 5:3.

4.4.2. In-Band Management Network

The in-band management network provides several key functions:

Connects all cluster management services.
Enables access to the lower-speed NFS tier of storage.
Provides uplink connectivity for in-cluster services (Mission Control, Base Command Manager, Slurm, Kubernetes) to external services (NGC registry, code repositories, data sources).
Provides end-user access to Slurm head nodes and Kubernetes services.

The in-band network is split into three segments:

A dedicated VTEP for uplink, with default handover to the customer edge via BGP peering.
Customer provisioned link to the Building Management System (BMS) network.
A dedicated VTEP for the out-of-band network, used by network management devices (NMX-Manager) and BCM for telemetry and management functions.
The in-band VTEP carries network for user access, home-directory storage via NFS, service delivery, and general control traffic.

4.4.3. Out-of-Band Management Network

The OOB Ethernet fabric connects management ports of all devices, including DGX GB200 compute trays, switch trays, management servers, storage, networking gear, rack PDUs, and other devices. These are separated onto their own network, secured via logical network separation. The OOB network carries all IPMI-related control traffic and serves as the network for fabric management of the compute InfiniBand and NVLink fabrics. The OOB management network uses SN2201 switches.

4.5. Customer Edge Connectivity

For connecting DGX SuperPOD to the customer edge for uplink and corporate network access, at least 2x 100 GbE links with DR1 single-mode connectivity are recommended. Connection to the Building Management System (BMS) is also a new requirement for DGX SuperPOD with DGX GB200 systems due to complex cooling and power requirements. The BMS manages the operational technology (OT) side of the data center infrastructure.

For route handover, eBGP protocol is used to peer with the customer's network, announcing routes to/from in-band, out-of-band, and building management systems. The example customer edge connectivity shows connections to the BMS and the internet.

5. Storage Architecture

Data is crucial for developing accurate deep learning (DL) models, and data volume continues to grow exponentially. Storage system performance must scale commensurately. The key I/O operation in DL training is re-read, requiring data to be reused iteratively. Pure read performance is important, as is write performance for saving checkpoints, which can be terabytes in size and block training progress.

Ideally, data is cached during the first read to avoid repeated retrieval across the network. Shared filesystems use RAM as the first layer of cache. DGX GB200 systems provide local NVMe storage that can also be used for caching or staging data.

DGX SuperPOD is designed to support all workloads, but storage performance requirements vary by model type and dataset. High-speed storage provides a shared view of data to all nodes, optimized for small, random I/O patterns and high peak node and aggregate filesystem performance. While most DL workloads are read-dominant, NLP and LLM cases require peak performance for creating and reading checkpoint files.

Table 5.1: Storage Performance Requirements

Level	Work Description	Data Set Size
Standard	Multiple concurrent LLM or fine-tuning training jobs and periodic checkpoints, where compute requirements dominate data I/O. Datasets fit within local compute systems' memory cache. Single modality, millions of parameters.	Most datasets fit within local memory cache.
Enhanced	Multiple concurrent multimodal training jobs and periodic checkpoints, where data I/O performance is critical for end-to-end training time. Datasets exceed local memory cache, requiring more I/O during training. Multiple modalities, billions of parameters.	Datasets too large for local memory cache.

Table 5.2: Guidelines for storage performance (GBps)

Performance Characteristic	Standard (GBps)	Enhanced (GBps)
Single SU aggregate system read	40	125
Single SU aggregate system write	20	62
4 SU aggregate system read	160	500
4 SU aggregate system write	80	250

6. DGX SuperPOD Software

NVIDIA Mission Control software provides a full-stack data center solution for enterprise infrastructure deployments like NVIDIA DGX SuperPOD. It integrates essential management and operational capabilities into a unified platform, offering seamless, large-scale control. The DGX GB200 SuperPOD software stack features a five-layer architecture, from system health diagnostics to cluster management. It leverages NVIDIA Base Command Manager (BCM) and NVIDIA Run:ai for scheduler access with SLURM and Kubernetes. The Telemetry and Observability layer uses proprietary diagnostic tools, while the Validation and Diagnostics layer supports an autonomous recovery engine for rapid failure recovery.

Mission Control handles deployment optimization and system health monitoring, cooperating with NVIDIA Base Command Manager (BCM) for cluster provisioning and operations. Network Management eXpert (NMX) monitors and controls NVLINK Switch trays.

Mission Control delivers innovations that improve efficiency, reduce downtime, and optimize resource utilization:

Automated Failure Detection & Self-Recovery: Minimizes downtime and ensures continuous AI training with faster recovery times, improving GPU utilization and reducing resource wastage.
Optimized Workload Migration & Resource Allocation: Continuously reassigns workloads to healthy nodes, preventing idle GPU time and increasing overall GPU efficiency for higher throughput and faster time-to-results.
Unified Diagnostics for Infrastructure & Applications: Reduces troubleshooting time by combining infrastructure and application-level diagnostics, decreasing operational burden on SREs and DevOps and speeding up time to production.
In-Memory Checkpointing for Seamless Job Recovery: Enables recovery of training workloads from the most recent valid checkpoint, eliminating data loss and rework, reducing retraining time, and increasing model iteration speed.

6.1. Run:ai

The Run:ai platform, included with NVIDIA Mission Control, employs a distributed architecture with a control plane (backend) and clusters. The control plane centrally manages and orchestrates multiple Run:ai clusters. For BCM-managed installations, the control plane is hosted on the control plane, while cluster components are deployed on the customer's Kubernetes infrastructure, allowing for centralized management and localized workload execution.

7. Summary

NVIDIA DGX SuperPOD with NVIDIA DGX GB200 systems represents the next generation of data center scale architecture, designed to meet the growing demands of AI training and inferencing. This reference architecture document details the system used by NVIDIA for its own AI model and HPC research and development. DGX SuperPOD builds upon its high-performance foundation to enable training of the largest NLP models, support expansive training needs for automotive applications, and scale recommender models for improved accuracy and faster turnaround times.

DGX SuperPOD is a complete system encompassing hardware and essential software to accelerate deployment, streamline management, and proactively identify system issues. This combination ensures reliable operation, maximum performance, and enables users to push the boundaries of state-of-the-art AI. The platform is designed to support current workloads and scale for future applications. Configurations are representative and must be finalized based on actual design requirements. Your NVIDIA representative can assist in finalizing the exact components list.

8. Notices

8.1. Notice

This document is provided for informational purposes only and does not constitute a warranty of any kind. NVIDIA Corporation ("NVIDIA") makes no representations or warranties regarding the accuracy or completeness of the information herein and assumes no responsibility for any errors. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties. This document is not a commitment to develop, release, or deliver any material, code, or functionality. NVIDIA reserves the right to make changes to this document at any time without notice. Customers should obtain the latest relevant information before placing orders.

NVIDIA products are sold subject to NVIDIA's standard terms and conditions of sale, unless otherwise agreed in a separate sales agreement. NVIDIA expressly objects to the application of any customer general terms and conditions. No contractual obligations are formed by this document.

NVIDIA makes no warranty that products based on this document will be suitable for any specified use. It is the customer's sole responsibility to evaluate and determine the applicability of any information contained herein, ensure product suitability for their planned application, and perform necessary testing to avoid application or product default. Weaknesses in customer product designs may affect NVIDIA product quality and reliability. NVIDIA accepts no liability related to any default, damage, costs, or problems attributable to (i) the use of the NVIDIA product contrary to this document or (ii) customer product designs.

No license, expressed or implied, is granted under any NVIDIA patent right, copyright, or other intellectual property right. Information published by NVIDIA regarding third-party products or services does not constitute a license from NVIDIA to use such products or services or a warranty or endorsement thereof. Reproduction of information in this document is permissible only with NVIDIA's prior written approval, without alteration, in compliance with export laws and regulations, and with all associated conditions, limitations, and notices.

THIS DOCUMENT AND ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS ("MATERIALS") ARE PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING OUT OF ANY USE OF THIS DOCUMENT, EVEN IF NVIDIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. NOTWITHSTANDING ANY DAMAGES THAT CUSTOMER MIGHT INCUR FOR ANY REASON WHATSOEVER, NVIDIA'S AGGREGATE AND CUMULATIVE LIABILITY TOWARDS CUSTOMER FOR THE PRODUCTS DESCRIBED HEREIN SHALL BE LIMITED IN ACCORDANCE WITH THE TERMS OF SALE FOR THE PRODUCT.

8.2. Trademarks

NVIDIA, the NVIDIA logo, NVIDIA DGX, and NVIDIA DGX SuperPOD are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of their respective companies.

	NVIDIA DGX B300 Datasheet: AI Factory Performance Explore the NVIDIA DGX B300, a powerful AI infrastructure solution designed for AI factory performance, from training to inference. Learn about its key features, specifications, and how it enables enterprises to scale AI operations.
	NVIDIA DGX GB300 Datasheet: AI Infrastructure for the Era of Reasoning Explore the NVIDIA DGX GB300, a purpose-built AI factory infrastructure designed for generative AI and large language models. Discover its key features, including the Grace Blackwell Ultra Superchips, liquid-cooled design, and NVIDIA networking, for accelerating state-of-the-art AI models.
	The IT Leader's Guide to AI Inference and Performance A comprehensive guide for IT leaders on navigating AI infrastructure, focusing on performance measurement, optimization, and cost efficiency for AI inference workloads.
	NVIDIA GB200 NVL Partition User Guide This user guide provides comprehensive information on NVIDIA GB200 NVL systems, focusing on NVLink partitioning. It details how to create, manage, and maintain partitions for efficient GPU resource allocation and multi-tenancy. The guide covers platform location information, different partition types (UID-based, location-based, zero-GPU), administrative and tenant workflows, fault handling, and maintenance procedures.
	NVIDIA DGX SuperPOD Deployment Guide This document provides detailed instructions for deploying NVIDIA Base Command Manager on NVIDIA DGX SuperPOD configurations, covering initial cluster setup, head node configuration, and high availability setup.
	NVIDIA Mission Control Manual: Managing NVIDIA Base Command Manager 11 A comprehensive guide to NVIDIA Mission Control features within NVIDIA Base Command Manager 11, covering rack management, NVLink monitoring, power shelf integration, and autonomous hardware recovery for NVIDIA B200 and GB200 platforms.
	NVIDIA RTX PRO 2000 Blackwell GPU: Datasheet for AI & Professional Graphics Explore the NVIDIA RTX PRO 2000 Blackwell GPU, designed for advanced AI, generative design, and professional visualization. Features 16GB GDDR7 memory, 5th Gen Tensor Cores, and 4th Gen Ray Tracing Cores for unparalleled performance.
	NVIDIA DGX OS Server Release 4.9 Release Notes and Update Guide This document provides release notes and an update guide for NVIDIA DGX OS Server Release 4.9, detailing primary changes, delivery and update mechanisms, version history, known issues, and limitations.