NVIDIA H100 Tensor Core GPU

Unprecedented performance, scalability, and security for every data center.

Take an Order-of-Magnitude Leap in Accelerated Computing

The NVIDIA H100 Tensor Core GPU delivers unprecedented performance, scalability, and security for every workload. With the NVIDIA NVLink Switch System, up to 256 H100 GPUs can be connected to accelerate exascale workloads. The dedicated Transformer Engine supports trillion-parameter language models. H100 utilizes breakthrough innovations in the NVIDIA Hopper architecture to deliver industry-leading conversational AI, speeding up large language models by 30X over the previous generation.

Ready for Enterprise AI?

NVIDIA H100 Tensor Core GPUs for mainstream servers include a five-year software subscription with enterprise support to the NVIDIA AI Enterprise software suite, simplifying AI adoption with the highest performance. This ensures organizations have access to the AI frameworks and tools needed to build H100-accelerated AI workflows such as AI chatbots, recommendation engines, and vision AI. Access the NVIDIA AI Enterprise software subscription and related support benefits for the NVIDIA H100 at the NVIDIA website.

Securely Accelerate Workloads from Enterprise to Exascale

NVIDIA H100 GPUs feature fourth-generation Tensor Cores and the Transformer Engine with FP8 precision, extending NVIDIA's market-leading AI leadership with up to 9X faster training and an incredible 30X inference speedup on large language models. For high-performance computing (HPC) applications, H100 triples the floating-point operations per second (FLOPS) of FP64 and adds dynamic programming (DPX) instructions to deliver up to 7X higher performance. With second-generation Multi-Instance GPU (MIG), built-in NVIDIA Confidential Computing, and the NVIDIA NVLink Switch System, H100 securely accelerates all workloads for every data center from enterprise to exascale.

Specifications

Feature	H100 SXM	H100 PCIe
FP64	34 TFLOPS	26 TFLOPS
FP64 Tensor Core	67 TFLOPS	51 TFLOPS
FP32	67 TFLOPS	51 TFLOPS
TF32 Tensor Core	989 TFLOPS*	756 TFLOPS*
BFLOAT16 Tensor Core	1,979 TFLOPS*	1,513 TFLOPS*
FP16 Tensor Core	1,979 TFLOPS*	1,513 TFLOPS*
FP8 Tensor Core	3,958 TFLOPS*	3,026 TFLOPS*
INT8 Tensor Core	3,958 TOPS*	3,026 TOPS*
GPU memory	80GB	80GB
GPU memory bandwidth	3.35TB/s	2TB/s
Decoders	7 NVDEC	7 NVDEC
	7 JPEG	7 JPEG
Max thermal design power (TDP)	Up to 700W (configurable)	300-350W (configurable)
Multi-Instance GPUs	Up to 7 MIGs @ 10GB each
Form factor	SXM	PCIe dual-slot air-cooled
Interconnect	NVLink: 900GB/s PCIe Gen5: 128GB/s	NVLink: 600GB/s PCIe Gen5: 128GB/s
Server options	NVIDIA HGX™ H100 partner and NVIDIA-Certified Systems™ with 4 or 8 GPUs NVIDIA DGX™ H100 with 8 GPUs	Partner and NVIDIA-Certified Systems with 1-8 GPUs
NVIDIA AI Enterprise	Add-on	Included

* Shown with sparsity. Specifications are 1/2 lower without sparsity.

Performance Highlights

Up to 9X Higher AI Training on Largest Models: Comparison of NVIDIA A100 and H100 GPUs for AI training on Mixture of Experts (395 Billion Parameters). The NVIDIA H100 demonstrates up to 9X faster training times, reducing training from 7 days to approximately 128 sequences per second compared to A100.

Up to 30X Higher AI Inference Performance on Largest Models: Comparison of NVIDIA A100 and H100 GPUs for chatbot inference on Megatron (530 Billion Parameters). The NVIDIA H100 achieves up to 30X higher inference performance, reducing latency from 2 seconds to 1.5 seconds or 1 second for the H100 compared to the A100.

Up to 7X Higher Performance for HPC Applications: Comparison of NVIDIA A100 and H100 GPUs for High-Performance Computing (HPC) applications. The H100 shows up to 7X higher performance in tasks like 3D Fast Fourier Transform (FFT) and Genome Sequencing compared to the A100.

Explore the Technology Breakthroughs of NVIDIA Hopper

NVIDIA H100 Tensor Core GPU ⚙️

Built with 80 billion transistors using a cutting-edge TSMC 4N process custom tailored for NVIDIA's accelerated compute needs, H100 is the world's most advanced chip ever built. It features major advances to accelerate AI, HPC, memory bandwidth, interconnect, and communication at data center scale.

Transformer Engine ?

The Transformer Engine uses software and Hopper Tensor Core technology designed to accelerate training for models built from the world's most important AI model building block, the transformer. Hopper Tensor Cores can apply mixed FP8 and FP16 precisions to dramatically accelerate AI calculations for transformers.

NVLink Switch System ?

The NVLink Switch System enables the scaling of multi-GPU input/output (IO) across multiple servers at 900 gigabytes per second (GB/s) bidirectional per GPU, over 7X the bandwidth of PCIe Gen5. The system supports clusters of up to 256 H100s and delivers 9X higher bandwidth than InfiniBand HDR on the NVIDIA Ampere architecture.

NVIDIA Confidential Computing ?️

NVIDIA Confidential Computing is a built-in security feature of Hopper that makes NVIDIA H100 the world's first accelerator with confidential computing capabilities. Users can protect the confidentiality and integrity of their data and applications in use while accessing the unsurpassed acceleration of H100 GPUs.

Second-Generation Multi-Instance GPU (MIG) ?️

The Hopper architecture's second-generation MIG supports multi-tenant, multi-user configurations in virtualized environments, securely partitioning the GPU into isolated, right-size instances to maximize quality of service (QoS) for 7X more secured tenants.

DPX Instructions ?

Hopper's DPX instructions accelerate dynamic programming algorithms by 40X compared to CPUs and 7X compared to NVIDIA Ampere architecture GPUs. This leads to dramatically faster times in disease diagnosis, real-time routing optimizations, and graph analytics.

The Convergence of GPU and SmartNIC

NVIDIA H100 CNX combines the power of the NVIDIA H100 with the advanced networking capabilities of the NVIDIA ConnectX-7 smart network interface card (SmartNIC) in a single, unique platform. This convergence delivers unparalleled performance for GPU-powered IO-intensive workloads, such as distributed AI training in the enterprise data center and 5G processing at the edge. Learn more about NVIDIA H100 CNX.

Accelerate Every Workload, Everywhere

The NVIDIA H100 is an integral part of the NVIDIA data center platform. Built for AI, HPC, and data analytics, the platform accelerates over 3,000 applications and is available everywhere from data center to edge, delivering both dramatic performance gains and cost-saving opportunities.

Deploy H100 with the NVIDIA AI Platform

NVIDIA AI is the end-to-end open platform for production AI built on NVIDIA H100 GPUs. It includes NVIDIA accelerated computing infrastructure, a software stack for infrastructure optimization and AI development and deployment, and application workflows to speed time to market. Experience NVIDIA AI and NVIDIA H100 on NVIDIA LaunchPad through free hands-on labs.

NVIDIA AI Platform Overview: This diagram illustrates the NVIDIA AI ecosystem. It shows various application domains such as Medical Imaging, Speech AI, Robotics, Autonomous Vehicles, and Cybersecurity feeding into the NVIDIA AI Enterprise software suite. The suite includes AI and Data Science Development Tools, Cloud Native Management, and Infrastructure Optimization, all powered by Accelerated Infrastructure (Cloud, Data Center, Edge, Embedded). NVIDIA LaunchPad offers hands-on labs for experience.

Ready to Get Started?

To learn more about the NVIDIA H100 Tensor Core GPU, visit: www.nvidia.com/h100

	NVIDIA H100 PCIe GPU Product Brief: Specifications and Features Detailed product brief for the NVIDIA H100 PCIe GPU, covering specifications, features, NVLink support, power connectors, AI enterprise software, and support information.
	NVIDIA H100 NVL GPU Product Brief A product brief detailing the NVIDIA H100 NVL GPU, its specifications, features, and support information for data center applications in AI, data analytics, and high-performance computing (HPC).
	NVIDIA H100 Tensor Core GPU Datasheet for AI and HPC Datasheet detailing the NVIDIA H100 Tensor Core GPU, featuring unprecedented performance, scalability, and security for data centers. Highlights include Hopper architecture, Transformer Engine, NVLink, and accelerated AI/HPC workloads.
	NVIDIA H100 Tensor Core GPU Architecture Whitepaper Explore the NVIDIA H100 Tensor Core GPU architecture, detailing its advanced features, performance enhancements for AI, HPC, and data analytics, and its role in next-generation data centers.
	NVIDIA H100 PCIe GPU Product Brief Detailed product brief for the NVIDIA H100 PCIe GPU, covering its specifications, features, NVLink support, power requirements, NVIDIA AI Enterprise software integration, and support information.
	NVIDIA AI Enterprise User Guide: Installation, Configuration, and Management Comprehensive user guide for NVIDIA AI Enterprise, detailing installation, configuration, and management of NVIDIA vGPU, AI frameworks, and software components across various hypervisors and operating systems.
	NVIDIA Jetson Orin Nano Super Developer Kit Datasheet The NVIDIA Jetson Orin Nano Super Developer Kit is a compact, powerful, and affordable generative AI supercomputer for edge devices. It features an NVIDIA Ampere architecture GPU, a 6-core ARM CPU, and extensive connectivity, enabling developers, students, and makers to build next-generation AI applications in robotics, vision AI, and more. The kit includes the Jetson Orin Nano 8GB module and a reference carrier board, supported by the NVIDIA AI software stack.
	NVIDIA AI Enterprise User Guide: GPU Virtualization, Deployment, and Management Comprehensive user guide for NVIDIA AI Enterprise, detailing installation, configuration, and management of AI and data analytics workloads on virtualized GPU environments. Covers vGPU, Kubernetes, VMware vSphere, and Red Hat KVM.