Atlas 300T Training Card

Technical White Paper (Model 9000)

Issue: 09

Date: 2023-01-09

Manufacturer: HUAWEI TECHNOLOGIES CO., LTD.

About This Document

Purpose

This document describes the Atlas 300T training card (model 9000) in detail, including its appearance, performance parameters, configurations, and application scenarios.

Intended Audience

  • Presales engineers
  • Technical support engineers
  • Maintenance engineers

Disclaimer

The technical specifications described in this document include but are not limited to parameters and performance indicators and vary depending on the actual release. This technical white paper does not constitute a commitment or guarantee on technical specifications of related products. Huawei may update relevant information from time to time. Huawei reserves the right to update or correct the information about related products or solutions. Updates are described in detail in the latest release notes or introduction.

Symbol Conventions

SymbolDescription
[DANGER]Indicates a hazard with a high level of risk which, if not avoided, will result in death or serious injury.
[WARNING]Indicates a hazard with a medium level of risk which, if not avoided, could result in death or serious injury.
[CAUTION]Indicates a hazard with a low level of risk which, if not avoided, could result in minor or moderate injury.
[NOTICE]Indicates a potentially hazardous situation which, if not avoided, could result in equipment damage, data loss, performance deterioration, or unanticipated results. NOTICE is used to address practices not related to personal injury.
[NOTE]Supplements the important information in the main text. NOTE is used to address information not related to personal injury, equipment damage, and environment deterioration.

Change History

IssueRelease DateDescription
092023-01-09This issue is the ninth official release.
  • Optimized 1.1 Overview and 2.1 Performance.
  • Added the description of the AI processor, network, heat dissipation mode, and virtual instance specifications in 3.1 Basic Specifications.
  • Added the pin description of the auxiliary power connector in 3.5 Power Management.
  • Added 4 Hardware Compatibility.
082022-07-15This issue is the eighth official release. Modified 3.1 Basic Specifications.
072022-02-22This issue is the seventh official release. Updated 5.2 Out-of-Band Management.
062021-12-24This issue is the sixth official release. Modified 1.1 Overview.
052021-04-19This issue is the fifth official release. Modified 3.1 Basic Specifications.
042020-12-10This issue is the fourth official release. Modified 1.1 Overview, 1.2 Front Panel, and 3.1 Basic Specifications.
032020-10-10This issue is the third official release. Modified 3.1 Basic Specifications.
022020-09-23This issue is the second official release. Modified 3.1 Basic Specifications.
012020-06-10This issue is the first official release.

1 Product Description

1.1 Overview

The Huawei Atlas 300T training card (model 9000) is an AI accelerator card that works with servers to provide powerful computing power for data centers. A single card provides up to 220 TFLOPS FP16 computing power, accelerating deep learning training. The card features superior computing power, high integration, and high bandwidth to meet the requirements for AI training of Internet, carrier, and finance industries and computing power of high-performance computing.

1.2 Front Panel

Textual description of Figure 1-1 Appearance: A visual representation of the Huawei Atlas 300T training card, showcasing its sleek, metallic casing with red accents.

Figure 1-2 shows the front panel of an Atlas 300T training card (model 9000). Table 1-1 describes the indicators on the front panel.

Textual description of Figure 1-2 Front panel: Diagram of the Atlas 300T training card's front panel, featuring multiple network ports (indicated by '+') and two sets of indicator lights labeled 'LINK/ACT' and 'SPEED', numbered 1 and 2.

No.SilkscreenMeaningColorState Description
1LINK/ACT indicatorNetwork port status indicatorGreen
  • Off: The port is in the link down state.
  • On: The port is in the link up state.
  • Blinking: The port is in the link-up state and is transmitting data.
2SPEED indicatorGreen
  • Off: The port is in the link down state.
  • On: The port is in the link up high speed state.

[NOTE] Only indicators of group 1 on the left of the port are supported.

Figure 1-3 shows the port. Table 1-2 describes the port.

Textual description of Figure 1-3 Port: A detailed view of the network port area on the Atlas 300T training card, showing the QSFP-DD port and associated indicator lights.

NameTypeQuantityDescription
QSFP-DD portQSFP-DD1The current driver of a PCIe training card supports only one 100GE port. The capability of extending to two 100GE ports is reserved.

1.3 System Architecture

Textual description of Figure 1-4 System architecture: A block diagram illustrating the internal architecture of the Atlas 300T training card. It highlights the Ascend 910 AI Processor as the core component, connected to DDR (ECC) memory, a PCIe 4.0 x16 interface, a QSFP-DD network interface, and power management. An Intelligent Baseboard Management Controller (iBMC) and an MCU are also depicted.

  • As the core of the Atlas 300T training card (model 9000), the Ascend 910 AI Processor supports a 2-rank DDRC interface with a maximum rate of 2400 Mbit/s, and supports 64-bit DDR4 SDRAMs with a maximum capacity of 16 GB.
  • The Intelligent Baseboard Management Controller (iBMC) obtains the PCB, BOM version, board temperature, power consumption, and power voltage information from the MCU.
  • The Ascend 910 AI Processor is powered by a multi-phase power supply with a high energy efficiency ratio and Huawei-developed PSIP.

2 Features

2.1 Performance

  • High Integration: Three-in-one integration of AI computing, general computing, and I/O capabilities. Thirty Huawei Da Vinci AI Cores, sixteen TaiShan cores, and one 100GE RoCE v2 NIC for processors.
  • Supreme computing power: Thirty built-in Da Vinci AI Cores. Industry-leading 220 TFLOPS FP16 computing power.
  • High-speed network bandwidth: PCIe 4.0 and 1 x 100GE RoCE high-speed interface, with a total egress bandwidth of 56.5 Gbit/s. 10-70% improvement in the efficiency of data training and gradient synchronization, without the need for external NICs.

2.2 Maintainability

  • Supports in-band online upgrades to facilitate routine maintenance.
  • Allows users to obtain device status information such as the temperature, voltage, and power consumption in in-band or out-of-band mode.
  • Provides comprehensive command line management functions for users to perform routine device management by using various commands.
  • Supports in-band and out-of-band asset management and provides such information as serial numbers to facilitate asset management.

2.3 Typical Application Scenarios

The Atlas 300T training card (model 9000) is typically used in man-machine interactions in an AI training scenario, as shown in Figure 2-1.

Textual description of Figure 2-1 Typical single-node application scenario: A diagram illustrating a typical AI training workflow involving an Algorithm engineer who interacts with an AI server. Equipment production personnel and a System administrator are also shown, managing and monitoring the system, indicating roles in deployment and operation.

  • System administrator: uses the iBMC to manage devices in out-of-band mode, including OS installation, firmware upgrade, server system information query, and troubleshooting.
  • Equipment production personnel: use the equipment system to interact with the iBMC (out-of-band) and OS (in-band).
  • Algorithm engineers: use an AI framework such as TensorFlow to develop network models, debug training code, import training data sets, start training, observe the training process (including the loss trends of multiple iterations), and export trained models.

3 Specifications

3.1 Basic Specifications

Table 3-1 lists the basic specifications.

ItemSpecifications
Form factorFHFL dual-slot (10.5 inches)
AI processorAscend 910 AI Processor
Thirty Huawei Da Vinci AI Cores and sixteen TaiShan cores integrated
Memory
  • 32 GB HBM
  • 16 GB DDR4
  • 2400 Mbit/s
  • ECC supported
AI computing powera
  • Half precision (FP16): The maximum computing power is 220 TFLOPS.
  • Integer precision (INT8): The maximum computing power is 440 TOPS.
Encoding/Decoding capability16-channel 4K (or 64-channel 1080p) 60 FPS H.264/H.265
  • JPEG decoding: 1080p 2048 FPS or equivalent decoding capability, with a maximum resolution of 8192 x 4320
  • PNG decoding: 1080p 240 FPS or equivalent decoding capability, with a maximum resolution of 4096 x 2160
  • JPEG encoding: 1080p 256 FPS or equivalent encoding capability, with a maximum resolution of 8192 x 4320
Virtual instance specificationsOne Ascend AI Processor can be divided into several virtual NPUs in virtualization mode. Each virtual NPU supports 2, 4, 8, or 16 AI Cores, and other hardware resources (such as memory) are divided proportionally.
PCIe portPCIe x16 Gen4.0
PCI IDsVendor ID: 0x19E5
Device ID: 0xD801
Subsystem vendor ID: 0x0200
Subsystem device ID: 0x0100
Network1 x 100GE QSFP-DD port, supporting RoCE
Power consumptionA maximum of 300 W
Heat dissipation modePassive air cooling
Dimensions (L x W x H)266.7 mm x 111.15 mm x 39.04 mm
Weight1.2 kg
OSFor details, see the Computing Product Compatibility Checker.

a: stable, maximum computing power.

3.2 Environmental Specifications

Table 3-2 lists the hardware application environment conditions.

ItemSpecifications
Temperature
  • Operating temperature: 5°C to 45°C (41°F to 113°F)
  • Storage temperature: -40°C to +70°C (-40°F to +158°F)
Relative humidity
  • Operating humidity: 8% to 90% RH (non-condensing)
  • Storage humidity: 5% to 95% RH (non-condensing)
Maximum altitude≤ 3,050 m (10,006.56 ft)
[NOTE] ASHRAE 2015 compliant:
  • When the server complying with ASHRAE Classes A1 and A2 is used in the altitude of above 900 m (2952.76 ft), the highest operating temperature decreases by 1°C (1.8°F) for every increase of 300 m (984.25 ft).
  • When the server complying with ASHRAE Class A3 is used in the altitude of above 900 m (2952.76 ft), the highest operating temperature decreases by 1°C (1.8°F) for every increase of 175 m (574.15 ft).
  • When the server complying with ASHRAE Class A4 is used in the altitude of above 900 m (2952.76 ft), the highest operating temperature decreases by 1°C (1.8°F) for every increase of 125 m (410.10 ft).

3.3 Clock Requirements

The Atlas 300T training card (model 9000) complies with PCI Express® Card Electromechanical Specification Revision 4.0. The entire card requires only the standard PCIe 4.0 clock, and the signal quality meets the PCIe specifications.

3.4 Hot Swap

The Atlas 300T training card (model 9000) does not support orderly hot swap and surprise hot swap.

3.5 Power Management

The Atlas 300T training card (model 9000) complies with PCI Express® Card Electromechanical Specification Revision 4.0. The maximum power consumption of the entire card is 300 W, which requires that the card slot provide a 5.5 A@12 V or 0.5 A@3.3 V standard power supply and the auxiliary power connector provide a 18.75 A@12 V power supply.

The pin definition of the auxiliary power connector is as follows.

No.Signal DefinitionDescription
1GNDGrounded
2GND
3GND
4GND
512 V12 V power cable
612 V
712 V
812 V

3.6 Heat Dissipation Specifications

3.6.1 Requirements

The Atlas 300T training card (model 9000) is used in an active heat dissipation environment with fans. It supports bidirectional air intake and air exhaust. The air volume must meet the heat dissipation requirements listed in Table 3-3.

Mean Temperature at the Air Intake Vent (°C)Minimum Wind Speed Required by the Air Intake Vent (CFM)Pressure Drop (Pa)
251568
3016178
3519225
4023279
4529341

[NOTE]

  • The ambient temperature at the heat sink inlet refers to the mean temperature at the air intake vent.
  • The required air volume is a recommended value. The air volume and temperature provided by different systems for the Atlas 300T training card (model 9000) may be different. Determine the air volume and temperature based on the actual system.
  • When the Atlas 300T training card (model 9000) is powered on, the minimum air volume required for heat dissipation is 5.0 CFM.

3.6.2 Specifications

The air intake temperature supported by the Atlas 300T training card (model 9000) ranges from 5°C to 45°C. There is a temperature monitoring point inside the card. The Ascend 910 and storage chip can be monitored in real time in both in-band and out-of-band modes to ensure that the card temperature is lower than the specified threshold. See Table 3-4.

SpecificationsAscend 910 AI Core Temperature (°C)Ascend 910 HBM Temperature (°C)
Power-off temperature115105
Underclocking temperature10595
Long-term operating temperature≤ 105≤ 95

4 Hardware Compatibility

The Atlas 300T training card (model 9000) supports Atlas 800 inference server (model 3000) and Atlas 800 inference server (model 3010).

5 Maintenance and Management

The Atlas 300T training card (model 9000) provides various maintenance and management functions, including in-band management command sets running in the OS and out-of-band management functions provided by the iBMC.

5.1 In-Band Management

  • Online upgrade: The firmware is upgraded to facilitate device maintenance.
  • Device management: allows users to obtain device status information such as the temperature, voltage, and power consumption.
  • Command line management: allows users to perform routine device management by using various commands.
  • Asset management: Information, such as serial numbers, is provided to facilitate asset management. For details about how to manage assets, see the Atlas 300T Training Card npu-smi Command Reference (Model 9000).

5.2 Out-of-Band Management

The Atlas 300T training card (model 9000) provides the SMBus interface to support the out-of-band management of servers. The iBMC provides the out-of-band management function and asset information, and monitors the temperature, voltage, real-time power consumption, and chip status of the Atlas 300T training card (model 9000). In addition, the iBMC can manage alarms of the Atlas 300T training card (model 9000).

  • For details about the out-of-band management functions of the Atlas 300T training card (model 9000), see the iBMC User Guide of the server you use.
  • For details about alarms of the Atlas 300T training card (model 9000), see the iBMC Alarm Handling of the server you use.

6 Certifications

No.Country/ RegionCertifica tionStandard
1EuropeCESafety:
  • EN 62368-1:2014+A11:2017
  • EN 60950-1:2006+A11:2009+A1:2010+A12:2011+A2:2013
EMC:
  • EN 55032:2015
  • EN 55032:2015/A11:2020
  • EN 55024:2010
  • EN 55024:2010+A1:2015
  • EN 55035:2017
  • EN 55035:2017/A11:2020
  • ETSI EN 300 386 V1.6.1:2012
  • ETSI EN 300 386 V2.1.1:2016
  • EN 61000-3-2:2014
  • EN IEC 61000-3-2:2019
  • EN IEC 61000-3-2:2019/A1:2021
  • EN 61000-3-3:2013
  • EN 61000-3-3:2013/A1:2019
RoHS:
EN IEC 63000:2018
2EuropeRCM EMC
  • EN 55032:2015
  • EN 55032:2015/A11:2020
  • CISPR 32:2015
  • CISPR 32:2015/AMD1:2019
  • EN 55024:2010
  • EN 55024:2010+A1:2015
  • EN 55035:2017
  • EN 55035:2017/A11:2020
  • CISPR 35:2016
  • ETSI EN 300 386 V1.6.1:2012
  • ETSI EN 300 386 V2.1.1:2016
  • VCCI-CISPR 32:2016
  • AS/NZE CISPR 32:2015+A1:2020*
  • IEC 61000-3-2:2014
  • IEC 61000-3-2:2018
  • IEC 61000-3-2:2018+ADM1:2020
  • EN 61000-3-2:2014
  • EN IEC 61000-3-2:2019
  • EN IEC 61000-3-2:2019/A1:2021*
  • IEC 61000-3-3:2013
  • IEC 61000-3-3:2017
  • EN 61000-3-3:2013
  • EN 61000-3-3:2013/A1:2019
3EuropeFCC EMCFCC CFR47 Part 15 Subpart B
4EuropeICES EMCICES-003 Issue 7: 2020
ICES Gen Issue 1: 2018
5UKUKCASafety:
BS EN 62368-1:2014+A11:2017
EMC:
  • EN 55032:2015
  • EN 55032:2015/A11:2020
  • EN 55024:2010
  • EN 55024:2010+A1:2015
  • EN 55035:2017
  • EN 55035:2017/A11:2020
  • ETSI EN 300 386 V1.6.1:2012
  • ETSI EN 300 386 V2.1.1:2016
  • EN 61000-3-2:2014
  • EN IEC 61000-3-2:2019
  • EN IEC 61000-3-2:2019/A1:2021
  • EN 61000-3-3:2013
  • EN 61000-3-3:2013/A1:2019
RoHS:
BS EN IEC 63000:2018
6EuropeRoHSEN IEC 63000: 2018 & BS EN IEC 63000: 2018
7EuropeVEEE2012/19/EU
8Commodity InspectionRefer to the product certification certificate.

7 Warranty

For details, see Maintenance & Warranty.

A Acronyms and Abbreviations

AArtificial Intelligence
BIntelligent Baseboard Management Controller
CCubic Feet Per Minute
EError Checking and Correcting
OOperating System
PPeripheral Component Interconnect Express
SSystem Management Bus

File Info : application/pdf, 24 Pages, 435.11KB

PDF preview unavailable. Download the PDF instead.

Atlas-300T-Training-Card-Technical-White-Paper-Model-9000

References

AH Formatter V6.2 MR8 for Windows : 6.2.10.20473 (2015/04/14 10:00JST) Antenna House PDF Output Library 6.2.680 (Windows)

Related Documents

PreviewMindX DL V100R020C20 User Guide
This user guide provides comprehensive information on Huawei's MindX DL V100R020C20, a deep learning component reference design. It covers product introduction, installation, deployment, usage guidelines, API references, and FAQs for optimizing Ascend AI Processor performance in deep learning systems.
PreviewHuawei CANN TensorFlow Model Porting and Training Guide
A comprehensive guide from Huawei Technologies on porting and training TensorFlow models using the CANN framework for Ascend AI Processors. Covers auto and manual migration, performance tuning, accuracy tuning, and API references.
PreviewHuawei Atlas 200 DK ATC Tool Instructions
Comprehensive guide to using the Ascend Tensor Compiler (ATC) for converting AI models on the Huawei Atlas 200 DK, covering Caffe and TensorFlow frameworks, AIPP configuration, and operator specifications.
PreviewCANN 3.3.0.alpha001 TBE Custom Operator Development Guide
A comprehensive guide for developers on creating custom operators for Huawei's Ascend AI Processors using the Tensor Boost Engine (TBE). Covers DSL and TIK development modes, AI Core architecture, operator workflows, and API references.
PreviewHUAWEI nova Y60 User Guide: Features, Settings, and Operations
Comprehensive user guide for the HUAWEI nova Y60 smartphone, detailing essential functions, smart features, camera operations, app management, system settings, and customization options to help users maximize their device's capabilities.
PreviewHUAWEI nova 8 User Guide
A comprehensive user guide for the HUAWEI nova 8, covering essentials like gestures, navigation, screen lock, and basic phone operations. It also details smart features, camera functions, app management, and settings.
PreviewHuawei CloudEngine 12800 Series High-Performance Core Switches
An overview of Huawei's CloudEngine 12800 series high-performance core switches, detailing their architecture, features, specifications, and applications for data centers and enterprise networks.
PreviewHuawei Nova 8i User Guide: Navigation, Features, and Settings
Comprehensive user guide for the Huawei Nova 8i, covering essential operations like system navigation, screen locking/unlocking, home screen customization, notification management, and shortcut switches. Learn about smart features, camera functions, and device settings.