NVIDIA T4 Tensor Core GPU

Accelerating AI Training and Inference

Introduction

The NVIDIA T4 GPU is a powerful server accelerator designed to provide scalable performance for AI training and inference. Its 70-watt, low-profile design is optimized for server deployment, offering revolutionary multi-precision inference performance to accelerate a wide range of popular applications.

This advanced GPU is built into a compact, 70-watt, low-power PCIe form factor, optimized for server scalability, and built to deliver outstanding AI performance.

Performance Benchmarks

Inference Performance:

A comparison of one NVIDIA T4 GPU against a server with dual Xeon Gold 6140 CPUs shows significant performance gains:

GNMT: 36X
ResNet-50: 27X
DeepSpeech2: 21X

Training Performance:

A comparison of two NVIDIA T4 GPUs against a server with dual Xeon Gold 6140 CPUs shows significant performance gains:

ResNet-50 (FP16/FP32): 9.3X

Specifications

Specification	Details
GPU Architecture	NVIDIA Turing Tensor Cores
Core Count	320
NVIDIA CUDA® Cores	2560
Single Precision (FP32)	8.1 TFLOPS
Mixed Precision (FP16/FP32)	65 TFLOPS
INT8	130 TOPS
INT4	260 TOPS
GPU Memory	16 GB GDDR6
Memory Bandwidth	300 GB/s
ECC Support	Yes
System Interface	x16 PCIe Gen3
Form Factor	PCIe Low Profile
Thermal Solution	Passive
Compute APIs	CUDA, NVIDIA TensorRT™™, ONNX

Key Features

Compact 70-Watt Design: The low-profile, 70-watt form factor optimizes T4 for scalable servers, offering 50x greater power efficiency compared to CPUs and significantly reducing operational costs.
Revolutionary Multi-Precision Performance: The Turing Tensor Core technology delivers breakthrough AI performance across FP32, FP16, INT8, and INT4 precisions.
Versatile Acceleration: The NVIDIA T4 GPU is ideal for accelerating deep learning, machine learning training and inference, video transcoding, and virtual desktops.
Broad Framework Support: T4 supports all AI frameworks and network types, providing robust performance and efficiency for large-scale deployments.

Learn More

For more details on the NVIDIA T4, visit www.nvidia.cn/T4.

	NVIDIA AI Enterprise User Guide: Installation, Configuration, and Management Comprehensive user guide for NVIDIA AI Enterprise, detailing installation, configuration, and management of NVIDIA vGPU, AI frameworks, and software components across various hypervisors and operating systems.
	NVIDIA AI Enterprise User Guide: GPU Virtualization, Deployment, and Management Comprehensive user guide for NVIDIA AI Enterprise, detailing installation, configuration, and management of AI and data analytics workloads on virtualized GPU environments. Covers vGPU, Kubernetes, VMware vSphere, and Red Hat KVM.
	NVIDIA H100 Tensor Core GPU Datasheet - High-Performance AI and HPC Acceleration Detailed datasheet for the NVIDIA H100 Tensor Core GPU, highlighting its unprecedented performance, scalability, and security for AI and HPC workloads. Features include the Hopper architecture, Transformer Engine, NVLink Switch System, and Confidential Computing.
	NVIDIA TensorRT Quick Start Guide: Optimize Deep Learning Inference This NVIDIA TensorRT Quick Start Guide (v8.4.0 EA) provides essential instructions for optimizing deep learning models for high-performance inference. Learn how to install NVIDIA TensorRT, understand conversion and deployment workflows using ONNX and TF-TRT, and utilize the runtime API for C++ and Python.
	NVIDIA H100 NVL GPU Product Brief A product brief detailing the NVIDIA H100 NVL GPU, its specifications, features, and support information for data center applications in AI, data analytics, and high-performance computing (HPC).
	NVIDIA T4 GPU for Virtualization Datasheet Datasheet detailing the NVIDIA T4 GPU for virtualization, highlighting its NVIDIA Turing architecture, performance in virtual workstations, AI inferencing capabilities, and power efficiency for virtual desktop infrastructure (VDI).
	NVIDIA TensorRT Support Matrix v4.0.1 - Platform and Layer Compatibility Comprehensive support matrix for NVIDIA TensorRT version 4.0.1, detailing compatibility across platforms (Linux, Android, QNX) and software versions (CUDA, cuDNN), along with a detailed breakdown of supported features for each TensorRT layer.
	NVIDIA Jetson Orin Nano Super Developer Kit Datasheet The NVIDIA Jetson Orin Nano Super Developer Kit is a compact, powerful, and affordable generative AI supercomputer for edge devices. It features an NVIDIA Ampere architecture GPU, a 6-core ARM CPU, and extensive connectivity, enabling developers, students, and makers to build next-generation AI applications in robotics, vision AI, and more. The kit includes the Jetson Orin Nano 8GB module and a reference carrier board, supported by the NVIDIA AI software stack.

Introduction

Performance Benchmarks

Specifications

Key Features

Learn More

Related Documents