Arista - Broadcom AI Networking Solution Brief
Solution Highlights
- Open Standards: Arista and Broadcom support open, standards-based Ethernet and IP technologies for maximum choice, flexibility, and performance.
- End-to-end Performance Optimized RoCE: Partnership ensures best-in-class RoCE performance with various congestion control mechanisms.
- Power Efficient NIC, Switches, and Interconnects: Lowest power and highest performance components reduce infrastructure TCO and leave more power for accelerators.
- End-to-end Telemetry and Network Management: CloudVision integrates NIC and Switch management for automated configuration and management of RoCE and congestion control.
- Simple Configuration: RoCE deployment is simplified with an end-to-end performance optimized baseline configuration.
- Increased Reliability: Lower power and less thermal challenge result in improved MTBF, ensuring network reliability and availability for accelerators.
Overview
The rapid advancement of AI necessitates AI data centers that deliver optimal performance for AI workloads, requiring maximized network bandwidth and minimized latency. Broadcom and Arista have collaborated to address these requirements by providing high-performance network hardware and fine-tuning key parameters for end-to-end 400G and 800G AI network solutions. This collaboration focuses on optimizing network performance and TCO with efficient, highly reliable designs implementing robust congestion control, load balancing, and telemetry.
The diagram shows a Tiered Leaf Spine/Plane based design with Arista switches. The top tier consists of four Arista switches, and the bottom tier consists of eight Arista switches, with numerous connections between them, illustrating a high-density, interconnected network architecture.
Arista's portfolio of fixed and modular systems enables clusters of tens of thousands of accelerators, utilizing efficient 2-tier topologies to minimize complexity and costs while maximizing performance and reliability. Arista's commitment to open standards provides customers with maximum choice in accelerators, NICs, and storage, supporting common cluster deployment configurations like Clos topologies.
Features
To meet the high-performance requirements of AI/ML cluster back-end networks, advanced features are essential:
- RoCEv2: The standard protocol for RDMA over Converged Ethernet, enabling direct memory access over a network for high-bandwidth and low-latency performance.
- Data Center Quantized Congestion Notification (DCQCN): An end-to-end congestion control mechanism for RoCE that enables NICs and switches to detect and respond to network congestion.
- Priority Flow Control (PFC): Utilized by RoCE to establish a lossless network by pausing specific traffic classes during congestion, preventing packet loss.
- Equal-Cost Multi-Path (ECMP): Essential for creating congestion-free multi-hop networks, using a hashing algorithm to route flows in a balanced manner.
- Programmable UDP Source Port: Allows granular control at the Queue Pair level to change the source port and avoid congestion, enhancing flow control and load balancing.
- Dynamic Load Balancing (DLB): Also known as Adaptive Routing, this feature dynamically alleviates congestion by rerouting traffic to paths with minimal load.
- Cluster Load Balancing (CLB): Provides an application-aware approach for large AI training workloads, monitoring demands to ensure optimal uplink utilization and prevent oversubscription.
- Programmable Congestion Control (PCC): Facilitates custom Congestion Control (CC) algorithms to meet specific performance optimization objectives, offering flexibility and tuning.
Arista EOS (Extensible Operating System) and Datacenter Switches
Arista EOS delivers a high-bandwidth, low-latency, lossless network scalable to support thousands of XPUs at speeds from 100G to 800G. It enables a premium lossless network through traffic management, adjustable buffer allocation, and support for PFC and DCQCN for RoCE deployments. Arista's DLB and CLB features maximize network forwarding efficiency by minimizing congestion. The Latency Analyzer (LANZ) feature monitors interface congestion and queuing latency, simplifying the configuration of PFC and ECN thresholds for optimal buffer utilization.
Arista Portfolio
Arista Portfolio | Product Description |
---|---|
7060X5 and 7060X6 | High Density 400G and 800G Fixed Switch Portfolio for AI and DC |
7280R and 7800R | High Performance 400G and 800G Dynamic Deep Buffer Platforms |
7700R4 | 800G Distributed Etherlink Switch for Accelerated Computing |
Broadcom Ethernet NIC Adapters
Broadcom offers a portfolio of Ethernet NIC Adapters with speeds from 1 Gbps to 400 Gbps, delivering high throughput, CPU efficiency, and low workload latency for Ethernet/IP and RoCE traffic. The latest AI-focused adapters, based on the BCM576xx (Thor2) ASIC, support 400GE, 200GE, 100GE, and 25GE in OCP and PCIe form factors. These NICs are optimized for AI applications and support key features like RoCEv2, DCQCN, PFC, and PCC. They are also the lowest-power 400G interfaces available, reducing power and cooling demands. Broadcom's AI NICs support diverse cabling options impacting network power, cost, and reliability.
Broadcom NIC Adapters
OCP3.0 NIC Adapters
Part Number | ASIC | Ports | I/O |
---|---|---|---|
BCM957608-N1400GD | BCM57608 | 1x 400G | QSFP112-DD |
BCM957608-N2200G | BCM57608 | 2x 200G | QSFP112 |
PCIe NIC Adapters
Part Number | ASIC | Ports | I/O |
---|---|---|---|
BCM957608-P1400GD | BCM57608 | 1x 400G | QSFP112-DD |
BCM957608-P2200G | BCM57608 | 2x 200G | QSFP112 |
Cabling and Interconnects
Selecting the appropriate cabling is crucial for data center reliability, power usage, cooling, and cost. Broadcom and Arista offer pre-qualified cabling options for seamless integration:
Cable | Distance | Power | Reliability | Cost | MPN |
---|---|---|---|---|---|
Copper Cable (DAC) | 5 m | Low | High | Low | Amphenol: DJERGN-0003 |
Active Electrical Cable (AEC) | 7 m | Medium | Medium | Medium | Credo: CAC82X321A2N-CO-HW |
VSR Optical Transceiver | 50 m | High | Low | High | Switch: Arista OSFP-800G-2VSR4 NIC: Eopotlink EOLQ-854HG-01-M |
DR Optical Transceiver | 500 m | High | Low | High | Switch: Arista OSFP-800G-2XDR4 NIC: Hisense LMQ3621S-PC1 |
DR Linear Pluggable Optic (LPO) | 500 m | Medium | Medium | Medium | Switch: Arista LPO-800G-2DR4 NIC: Eoptolink EOLQ-134HG-5H-MSL |
Arista CloudVision
Arista CloudVision is a multi-domain management platform simplifying network operations using cloud networking principles. Built on Arista's Network Data Lake (NetDL) architecture, it aggregates enterprise data and uses AI/ML for analysis, insights, updates, and alerts, including predictive insights from Arista Autonomous Virtual Assist (Arista AVA).
Arista AI Agent
The AI Agent integrates NICs and Arista's EOS network operating system for managing and monitoring switches and NIC connections, and debugging server-level issues. The AI Agent and CloudVision work together to provide a unified view of network and server statistics, improving troubleshooting efficiency by correlating network events with server-side issues. CloudVision provides real-time insights, updates, and alerts, and uses AI/ML to identify anomalies.
Summary
Arista and Broadcom are committed to meeting the evolving requirements of AI applications and future workloads. Their partnership delivers a robust, pre-configured solution for a highly scalable 400G or 800G end-to-end optimized network. The solution prioritizes power-efficient and reliable NICs, switches, and interconnects to maximize network availability and accelerator efficiency. This rigorously tested and validated solution ensures rapid deployment for AI workloads.
References
- Arista Cloud Grade Routing Products
- Arista Hyper-Scale Data Center Platforms
- Arista EOS Quality of Service
- Arista Priority Flow Control (PFC) and Explicit Congestion Notification (ECN)
- Arista Configuration Guide
- Arista EOS Software Downloads
- Arista AI Networking
- Arista Cloud Vision
- Broadcom Ethernet Network Adapters
- Broadcom Ethernet NIC RoCE Features
- Broadcom Ethernet NIC Configuration Guide
- Broadcom Ethernet NIC Firmware and Drivers Downloads
- Broadcom RoCE Configuration Guide
- Congestion Control for Large-Scale RDMA Deployments
- RoCE Deployment Guide
Contact Information
Headquarters
5453 Great America Parkway
Santa Clara, California 95054
408-547-5500
Support
support@arista.com
408-547-5502
866-476-0000
Sales
sales@arista.com
408-547-5501
866-497-0000