NVIDIA Collective Communication Library (NCCL)

Developer Guide

Introduction to NCCL

The NVIDIA Collective Communications Library (NCCL), pronounced "Nickel," is a high-performance library designed for multi-GPU collective communication primitives. It is topology-aware and easily integrated into applications, enabling efficient data aggregation and synchronization across multiple processors (GPUs).

NCCL focuses on accelerating collective communication operations such as AllReduce, Broadcast, Reduce, AllGather, and ReduceScatter. It is crucial for applications requiring tight synchronization between communicating processors, particularly in areas like Deep Learning for neural network training, where efficient multi-GPU and multi-node communication is paramount.

The library offers a simple C API, closely following the Message Passing Interface (MPI) standard, making it familiar to developers. NCCL supports various interconnect technologies, including PCIe, NVLink, InfiniBand Verbs, and IP sockets, automatically optimizing communication strategies based on system topology.

Key Features and Applications

Further Information

For more details on NCCL, including examples, API usage, troubleshooting, and support, refer to the full documentation. For feature enhancements and reporting issues, developers can engage with the NVIDIA developer program: NVIDIA Developer Program.

PDF preview unavailable. Download the PDF instead.

NCCL-Developer-Guide Apache FOP Version 1.0

Related Documents

Preview NVIDIA TensorRT Support Matrix v4.0.1 - Platform and Layer Compatibility
Comprehensive support matrix for NVIDIA TensorRT version 4.0.1, detailing compatibility across platforms (Linux, Android, QNX) and software versions (CUDA, cuDNN), along with a detailed breakdown of supported features for each TensorRT layer.
Preview NVIDIA Tesla C2050/C2070 GPU Computing Processor Datasheet
Datasheet for NVIDIA Tesla C2050 and C2070 GPU computing processors, detailing supercomputing performance, technical specifications, features, and benefits for high-performance computing. Highlights Fermi architecture, CUDA cores, ECC memory, and PCIe Gen 2.0 data transfer.
Preview NVIDIA H100 PCIe GPU Product Brief: Specifications and Features
Detailed product brief for the NVIDIA H100 PCIe GPU, covering specifications, features, NVLink support, power connectors, AI enterprise software, and support information.
Preview NVIDIA AI Enterprise User Guide: Installation, Configuration, and Management
Comprehensive user guide for NVIDIA AI Enterprise, detailing installation, configuration, and management of NVIDIA vGPU, AI frameworks, and software components across various hypervisors and operating systems.
Preview NVIDIA H100 Tensor Core GPU Datasheet - High-Performance AI and HPC Acceleration
Detailed datasheet for the NVIDIA H100 Tensor Core GPU, highlighting its unprecedented performance, scalability, and security for AI and HPC workloads. Features include the Hopper architecture, Transformer Engine, NVLink Switch System, and Confidential Computing.
Preview NVIDIA Jetson AGX Orin Developer Kit Reviewer's Guide
Explore the NVIDIA Jetson AGX Orin Developer Kit, a powerful platform for next-generation AI and robotics. Discover its class-leading AI performance, energy efficiency, and comprehensive software stack for edge AI applications.
Preview NVIDIA HPC SDK Release Notes Version 22.5
Explore the NVIDIA HPC SDK version 22.5 release notes, detailing component versions, supported platforms, known limitations, and deprecations for high-performance computing development.
Preview NVIDIA Learning Paths: AI, Deep Learning, Data Science, and More
Discover NVIDIA's comprehensive learning paths covering AI, Deep Learning, Data Science, Accelerated Computing, Cybersecurity, Robotics, Networking, and more. Find self-paced courses, instructor-led workshops, and certifications for developers and administrators.