Lenovo Hybrid AI 285 with Cisco Networking

Product Guide

This document is meant to be used in tandem with the Hybrid AI 285 platform guide. It describes the hardware architecture changes required to leverage Cisco networking hardware and the Cisco Nexus Dashboard within the Hybrid AI 285 Platform.

Lenovo Hybrid AI 285 is a platform that enables enterprises of all sizes to quickly deploy hybrid AI factory infrastructure, supporting Enterprise AI use cases as either a new, greenfield environment or an extension of their existing IT infrastructure. This document will often send the user to the base Hybrid AI 285 platform guide as its main purpose is to show the main differences required to implement Cisco networking and Nexus dashboard.

The offering is based on the NVIDIA 2-8-5 PCIe-optimized configuration — 2x CPUs, 8x GPUs, and 5x network adapters — and is ideally suited for medium (per GPU) to large (per node) Inference use cases, and small-to-large model training or fine-tuning, depending on chosen scale. It combines market leading Lenovo ThinkSystem GPU-rich servers with NVIDIA Hopper or Blackwell GPUs, Cisco networking and enables the use of the NVIDIA AI Enterprise software stack with NVIDIA Blueprints.

Figure 1. Lenovo Hybrid AI 285 platform overview with Cisco. This diagram shows the components of the platform including operating system options (Ubuntu, Red Hat Enterprise Linux), AI software components (NVIDIA AI Enterprise, NVIDIA Omniverse Enterprise), orchestration and system management (Kubernetes, Red Hat OpenShift), compute nodes (Lenovo ThinkSystem SR635 V3, SR675 V3), networking switches (Cisco 9300 Series), storage (Lenovo DM & DG Series), and NVIDIA certified storage. It highlights AI scalable units and faster time to value.

Did you know?

The same team of HPC and AI experts that created the Lenovo EveryScale OVX solution, as deployed for NVIDIA Omniverse Cloud, brings the Lenovo Hybrid AI 285 with Cisco networking to market.

Following their excellent experience with Lenovo on Omniverse, NVIDIA has once again chosen Lenovo technology as the foundation for the development and test of their NVIDIA AI Enterprise Reference Architecture (ERA).

Overview

The Lenovo Hybrid AI 285 Platform with Cisco Networking scales from a Starter Kit environment with between 4-32 PCIe GPUs to a Scalable Unit Deployment (SU) with four servers and 32 GPUs in each SU and up to 3 Scalable Units with 12 Servers and 96 GPUs. See figure below for a sizing overview.

Figure 2. Lenovo Hybrid AI 285 with Cisco Networking scaling from Starter Kit to 96 GPUs. This diagram illustrates the scalability from AI Starter Kits (4-32 GPU) to Scalable Units (SU). It shows the components of a Starter Kit and the progression to First SU, Second SU, and Third SU, indicating networking switches, SR675 V3 AI Compute Nodes, SR635 V3 Control Nodes, and PCIe NVIDIA GPUs.

The figure below shows the networking architecture of the platform deployed with 96 GPUs.

Figure 3. Lenovo Hybrid AI 285 with Cisco Networking platform with 3 Scalable Units. This diagram depicts the networking architecture for a 3 SU deployment. It shows storage, enterprise connections, Cisco Nexus 9364D-GX2A switches, and SR675 V3 AI Compute Nodes across three Scalable Units, with connections indicating speeds like 400G, 200G, 100G, and 1G.

Components

The main hardware components of Lenovo Hybrid AI platforms are Compute nodes and the Networking infrastructure. As an integrated solution they can come together in either a Lenovo EveryScale Rack (Machine Type 1410) or Lenovo EveryScale Client Site Integration Kit (Machine Type 7X74).

Topics in this section:

AI Compute Node – SR675 V3

The AI Compute Node leverages the Lenovo ThinkSystem SR675 V3 GPU-rich server.

Figure 4. Lenovo ThinkSystem SR675 V3 in 8DW PCIe Setup. This is an image of the Lenovo ThinkSystem SR675 V3 server chassis, showing its front panel with drive bays and other components.

The SR675 V3 is a 2-socket 5th Gen AMD EPYC 9005 server supporting up to 8 PCIe DW GPUs with up to 5 network adapters in a 3U rack server chassis. This makes it the ideal choice for NVIDIA's 2-8-5 configuration requirement.

Figure 5. Lenovo ThinkSystem SR675 V3 in 8DW PCIe Setup. This is a top-down view of the internal components of the Lenovo ThinkSystem SR675 V3 server, highlighting the placement of CPUs, memory, and PCIe slots.
Figure 6. AI Compute Node Block Diagram. This diagram illustrates the internal architecture of the SR675 V3 configured as an AI compute node. It shows dual AMD EPYC processors, DDR5 memory, PCIe Gen5 switches, and connections to multiple NVIDIA GPUs.

The AI Compute node is configured with two AMD EPYC 9535 64 Core 2.4 GHz processors with an all-core boost frequency of 3.5GHz. Besides providing consistently more than 2GHz frequency this ensures that with 7 Multi Instance GPUs (MIG) on 8 physical GPUs there are 2 Cores available per MIG plus a few additional Cores for Operating System and other operations.

With 12 memory channels per processor socket the AMD based server provides superior memory bandwidth versus competing Intel-based platforms ensuring highest performance. Leveraging 64GB 6400MHz Memory DIMMs for a total of 1.5TB of main memory providing 192GB memory per GPU or a minimum of 1.5X the H200 NVL GPU memory.

The GPUs are connected to the CPUs via two PCIe Gen5 switches, each supporting up to four GPUs. With the NVIDIA H200 NVL PCIe GPU, the four GPUs are additionally interconnected through an NVLink bridge, creating a unified memory space. In an entry configuration with two GPUs per PCIe switch, the ThinkSystem SR675 V3 uniquely supports connecting all four GPUs with an NVLink bridge for maximized shared memory, thereby accommodating larger inference models, rather than limiting the bridge to two GPUs. With the RTX PRO 6000 Blackwell Server Edition, no NVLink bridge is applicable, same applies to configurations with the L40S.

Key difference from the base platform: For the 2-8-5 architecture with Cisco networking the AI Compute node leverages NVIDIA CX-7s for both the East-West and the North South communication. This is a key difference in the AI compute node configuration compared to the base 285 platform with NVIDIA networking. This is done primarily because, as of now, Cisco switching does not work with NVIDIA's dynamic load balancing technology within Spectrum-X, leveraged by their Bluefield-3 cards. This will change in future updates of this document as that technology is onboarded by Cisco and Lenovo brings in the new Silicon One 800GbE switch.

The Ethernet adapters for the Compute (East-West) Network are directly connected to the GPUs via PCIe switches minimizing latency and enabling NVIDIA GPUDirect and GPUDirect Storage operation. For pure Inference workload they are optional, but for training and fine-tuning operation they should provide at least 200Gb/s per GPU.

Finally, the system is completed by local storage with two 960GB Read Intensive M.2 in RAID1 configuration for the operating system and four 3.84TB Read Intensive E3.S drives for local application data.

GPU selection

The Hybrid AI 285 platform is designed to handle any of NVIDIA's DW PCIe form factor GPUs including the new RTX PRO 6000 Blackwell Server Edition, the H200 NVL, L40S and the H100 NVL.

Configuration

AI Compute Node Configuration

The following table lists the configuration of the AI Compute Node with H200 NVL GPUs.

Table 1. AI Compute Node
Part NumberProduct descriptionQuantity per system
7D9RCTOLWWThinkSystem SR675 V31
BR7FThinkSystem SR675 V3 8DW PCIe GPU Base1
C3EFThinkSystem SR675 V3 System Board v21
C2ALThinkSystem AMD EPYC 9535 64C 300W 2.4GHz Processor2
C0CKThinkSystem 64GB TruDDR5 6400MHz (2Rx4) RDIMM-A24
BR7SThinkSystem SR675 V3 Switched 4x16 PCIe DW GPU Direct RDMA Riser2
C3V3ThinkSystem NVIDIA H200 NVL 141GB PCIe GPU Gen5 Passive GPU8
C3V0ThinkSystem NVIDIA 4-way bridge for H200 NVL2
BR7HThinkSystem SR675 V3 2x16 PCIe Front IO Riser1
C2RKThinkSystem SR675 V3 2 x16 Switch Cabled PCIe Rear IO Riser2
BQBNThinkSystem NVIDIA ConnectX-7 NDR200/200GbE QSFP112 2-port PCIe Gen5 x16 Adapter5
BM8XThinkSystem M.2 SATA/x4 NVMe 2-Bay Adapter1
BT7PThinkSystem Raid 540-8i for M.2/7MM NVMe boot Enablement1
BXMHThinkSystem M.2 PM9A3 960GB Read Intensive NVMe PCIe 4.0 x4 NHS SSD2
BTMBThinkSystem 1x4 E3.S Backplane1
C1ABThinkSystem E3.S PM9D3a 3.84TB Read Intensive NVMe PCIe 5.0 x4 HS SSD2
BK1EThinkSystem SR670 V2/ SR675 V3 OCP Enablement Kit1
C5WWThinkSystem SR675 V3 Dual Rotor System High Performance Fan5
BFD6ThinkSystem SR670 V2/ SR675 V3 Power Mezzanine Board1
BE0DN+1 Redundancy With Over-Subscription1
BKTJThinkSystem 2600W 230V Titanium Hot-Swap Gen2 Power Supply4
62522.5m, 16A/100-250V, C19 to C20 Jumper Cord4
C3KAThinkSystem SR670 V2/SR675 V3 Heavy Systems Toolless Slide Rail Kit1
BFNUThinkSystem SR670 V2/ SR675 V3 Intrusion Cable1
BR7UThinkSystem SR675 V3 Root of Trust Module1
BFTHThinkSystem SR670 V2/ SR675 V3 Front Operator Panel ASM1
5PS7B096315Yr Premier NBD Resp + KYD SR675 V31

Service Nodes – SR635 V3

When deploying the Hybrid AI 285 platform in a sizing beyond beyond 2 AI compute nodes additional service nodes are recommended to manage the overall AI cluster environment.

Key difference from the base platform: Because the architecture for this platform does not yet leverage Spectrum-X, there is no need for Bluefields within the service nodes. For this reason the customer can leverage the lower cost SR635 V3 instead of the SR655 V3.

Two Management Nodes provide a high-availability for the System Management and Monitoring provided through NVIDIA Base Command Manager (BMC) as described further in the AI Software Stack chapter.

For the Container operations three Scheduling Nodes build the Kubernetes control plane providing redundant operations and quorum capability.

Figure 7. Lenovo ThinkSystem SR635 V3. This is an image of the Lenovo ThinkSystem SR635 V3 server chassis, a 1U rack server.

The Lenovo ThinkSystem SR635 V3 is an optimal choice for a homogeneous host environment, featuring a single socket AMD EPYC 9335 with 32 cores operating at 3.0 GHz base with an all-core boost frequency of 4.0GHz. The system is fully equipped with twelve 32GB 6400MHz Memory DIMMs, two 960GB Read Intensive M.2 drives in RAID1 configuration for the operating system, and two 3.84TB Read Intensive U.2 drives for local data storage. Additionally, it includes a NVIDIA dual port CX7 adapter to connect the Service Nodes to the Converged Network.

Configuration

The following table lists the configuration of the Service Nodes.

Table 2. Service Nodes
Part NumberProduct descriptionQuantity per system
7D9GCTO1WWServer : ThinkSystem SR635 V3 - 3yr Warranty1
BLK4ThinkSystem V3 1U 10x2.5" Chassis1
BVGLData Center Environment 30 Degree Celsius / 86 Degree Fahrenheit1
C2AQThinkSystem AMD EPYC 9335 32C 210W 3.0GHz Processor1
BQ26ThinkSystem SR645 V3/SR635 V3 1U High Performance Heatsink1
C1PLThinkSystem 32GB TruDDR5 6400MHz (1Rx4) RDIMM-A12
BC4VNon RAID NVMe1
C0ZUThinkSystem 2.5" U.2 VA 3.84TB Read Intensive NVMe PCIe 5.0 x4 HS SSD2
BPC9ThinkSystem 1U 4x 2.5" NVMe Gen 4 Backplane1
B5XJThinkSystem M.2 SATA/NVMe 2-Bay Adapter1
BTTYM.2 NVMe1
BKSRThinkSystem M.2 7450 PRO 960GB Read Intensive NVMe PCIe 4.0 x4 NHS SSD2
BQBNThinkSystem NVIDIA ConnectX-7 NDR200/200GbE QSFP112 2-port PCIe Gen5 x16 Adapter1
BLK7ThinkSystem SR635 V3/SR645 V3 x16 PCIe Gen5 Riser 11
BLK9ThinkSystem V3 1U MS LP+LP BF Riser Cage1
BNFGThinkSystem 750W 230V/115V Platinum Hot-Swap Gen2 Power Supply v32
BH9MThinkSystem V3 1U Performance Fan Option Kit v27
BLKDThinkSystem 1U V3 10x2.5" Media Bay w/ Ext. Diagnostics Port1
7Q01CTS2WW5Yr Premier NBD Resp + KYD SR635 V31

Cisco Networking

The default setup of the Lenovo Hybrid AI 285 platform leverages Cisco Networking with the Nexus 9364D-GX2A for the Converged and Compute Network and the Nexus 9300-FX3 for the Management Network.

Cisco Nexus 9300-GX2A Series Switches

The Cisco Nexus 9364D-GX2A is a 2-rack-unit (2RU) switch that supports 25.6 Tbps of bandwidth and 8.35 bpps across 64 fixed 400G QSFP-DD ports and 2 fixed 1/10G SFP+ ports. QSFP-DD ports also support native 200G (QSFP56), 100G (QSFP28), and 40G (QSFP+). Each port can also support 4 x 10G, 4 x 25G, 4 x 50G, 4 x 100G, and 2 x 200G breakouts.

It supports flexible configurations, including 128 ports of 200GbE or 256 ports of 100/50/25/10GE ports accommodating diverse AI/ML cluster requirements.

Figure 8. Nexus 9364D-GX2A. This is an image of the Cisco Nexus 9364D-GX2A network switch, showing its front panel with numerous QSFP-DD ports.

The Converged (North-South) Network handles storage and in-band management, linking the Enterprise IT environment to the Agentic AI platform. Built on Ethernet with RDMA over Converged Ethernet (RoCE), it supports current and new cloud and storage services as outlined in the AI Compute node configuration.

In addition to providing access to the AI agents and functions of the AI platform, this connection is utilized for all data ingestion from the Enterprise IT data during indexing and embedding into the Retrieval-Augmented Generation (RAG) process. It is also used for data retrieval during AI operations.

The Storage connectivity is exactly half that and described in the Storage Connectivity chapter.

The Compute (East-West) Network facilitates application communication between the GPUs across the Compute nodes of the AI platform. It is designed to achieve minimal latency and maximal performance using a rail-optimized, fully non-blocking fat tree topology with Cisco Nexus 9300 series switches.

Cisco Nexus 9000 series data Center switches deliver purpose-built networking solutions designed specifically to address these challenges, providing the foundation for scalable, high-performance AI infrastructures that accelerate time-to-value while maintaining operational efficiency and security. Built on Cisco's custom Cloud Scale and Silicon One ASICs, these switches provide a comprehensive solution for AI-ready data centers.

Tip: In a pure Inference use case, the Compute Network is typically not necessary, but for training and fine-tuning operations it is a crucial component of the solution.

For configurations of up to five Scalable Units, the Compute and Converged Network are integrated utilizing the same switches. When deploying more than five units, it is necessary to separate the fabric.

The following table lists the configuration of the Cisco Nexus 9364D-GX2A.

Table 3. Cisco Nexus 9364D-GX2A configuration
Part NumberDescriptionQuantity
7DLKCTO1WWCisco Nexus 9300-GX2 Series Switch (N9K-C9364D-GX2A)2
C5P0Cisco Nexus 9300-GX2 Series Switch (N9K-C9364D-GX2A)2
C6FKMode selection between ACI and NXOS (MODE-NXOS)4
62522.5m, 16A/100-250V, C19 to C20 Jumper Cord2
C1P1TN9300XF2-5Y5 Years (60 months) Cisco software Premier license2

Cisco Nexus 9300-FX3 Series Switch

The Cisco Nexus 93108TC-FX3P is a high-performance, fixed-port switch designed for modern data centers. It features 48 ports of 100M/1/2.5/5/10GBASE-T, providing flexible connectivity options for various network configurations. Additionally, it includes 6 uplink ports that support 40/100 Gigabit Ethernet QSFP28, ensuring high-speed data transfer and scalability.

Built on Cisco's CloudScale technology, the 93108TC-FX3P delivers exceptional performance with a bandwidth capacity of 2.16 Tbps and the ability to handle up to 1.2 billion packets per second (Bpps).

This switch also supports advanced features such as comprehensive security, telemetry, and automation capabilities, which are essential for efficient network management and troubleshooting.

Figure 9. Cisco Nexus 93108TC-FX3P. This is an image of the Cisco Nexus 93108TC-FX3P network switch, showing its front panel with numerous Ethernet and QSFP28 ports.

The Out-of-Band (Management) Network encompasses all AI Compute node and BlueField-3 DPU base management controllers (BMC) as well as the network infrastructure management.

The following table lists the configuration of the Cisco Nexus 93108TC-FX3P.

Table 4. Cisco Nexus 93108TC-FX3P configuration
Part NumberDescriptionQuantity
7DL8CTO1WWCisco Nexus 9300-FX3 Series Switch (N9K-C93108TC-FX3)2
C5PBCisco Nexus 9300-GX2 Series Switch (N9K-C9364D-GX2A)2
C6FKMode selection between ACI and NXOS (MODE-NXOS)4
62522.5m, 16A/100-250V, C19 to C20 Jumper Cord2
C1P1TN9300XF-5Y5 Years (60 months) Cisco software Premier license2

Cisco Nexus Dashboard

Cisco Nexus Dashboard, included with every Cisco Nexus 9000 switch tiered licensing purchase, serves as a centralized hub that unifies disparate network configurations and views from multiple switches and data centers. For AI/ML fabric operations, it acts as the ultimate command center, from the initial setup of AI/ML fabric automation to continuous fabric analytics within few clicks.

Figure 10. AI/ML Fabric Workflow on Nexus Dashboard. This screenshot shows the Cisco Nexus Dashboard interface, illustrating the process of creating or onboarding an AI/ML fabric, with options for different fabric types like VXLAN and Routed.

Key capabilities such as congestion scoring, PFC/ECN statistics, and microburst detection empower organizations to proactively identify and address performance bottlenecks for their AI/ML backend infrastructure.

Figure 11. Congestion Score and Congestion Details on Nexus Dashboard. This screenshot displays network congestion metrics within the Cisco Nexus Dashboard, showing a congestion score over time and detailed metrics like WRED AFD Drops, PFC, and ECN.

Advanced features like anomaly detection, event correlation, and suggested remediation ensure networks are not only resilient but also self-healing, minimizing downtime and accelerating issue resolution.

Figure 12. Anomaly Detection on Nexus Dashboard. This screenshot shows the Cisco Nexus Dashboard's anomaly detection feature, highlighting a network congestion indication with details on anomaly level, category, and potential impact.

Purpose-built to handle the high demands of AI workloads, the Cisco Nexus Dashboard transforms network management into a seamless, data-driven experience, unlocking the full potential of AI/ML fabrics.

Lenovo EveryScale Solution

The Server and Networking components and Operating System can come together as a Lenovo EveryScale Solution. It is a framework for designing, manufacturing, integrating and delivering data center solutions, with a focus on High Performance Computing (HPC), Technical Computing, and Artificial Intelligence (AI) environments.

Lenovo EveryScale provides Best Recipe guides to warrant interoperability of hardware, software and firmware among a variety of Lenovo and third-party components.

Addressing specific needs in the data center, while also optimizing the solution design for application performance, requires a significant level of effort and expertise. Customers need to choose the right hardware and software components, solve interoperability challenges across multiple vendors, and determine optimal firmware levels across the entire solution to ensure operational excellence, maximize performance, and drive best total cost of ownership.

Lenovo EveryScale reduces this burden on the customer by pre-testing and validating a large selection of Lenovo and third-party components, to create a “Best Recipe” of components and firmware levels that work seamlessly together as a solution. From this testing, customers can be confident that such a best practice solution will run optimally for their workloads, tailored to the client's needs.

In addition to interoperability testing, Lenovo EveryScale hardware is pre-integrated, pre-cabled, pre-loaded with the best recipe and optionally an OS-image and tested at the rack level in manufacturing, to ensure a reliable delivery and minimize installation time in the customer data center.

Scalability and Deployment

Scalability

A fundamental principle of the solution design philosophy is its ability to support any scale necessary to achieve a particular objective.

In a typical Enterprise AI deployment initially the AI environment is being used with a single use case, like for example an Enterprise RAG pipeline which can connect a Large Language Model (LLM) to Enterprise data for actionable insights grounded in relevant data.

In its simplest form, leveraging the NVIDIA Blueprint for Enterprise RAG pipeline involves three NVIDIA Inference Microservices: a Retriever, a Reranker, and the actual LLM. This setup requires a minimum of three GPUs.

Figure 13. NVIDIA Blueprint Architecture Diagram. This diagram illustrates the architecture of an AI environment using NVIDIA Blueprint, showing components like User, LLM, Chain Server, Enterprise Data, NeMo Retriever Embedding, Vector Database (NVIDIA cuVS), and NeMo Retriever Reranking.

As the deployment of AI within the company continues to grow, the AI environment will be adapted to incorporate additional use cases, including Assistants or AI Agents. Additionally, it has the capacity to scale to support an increasing number of Microservices. Ultimately, most companies will maintain multiple AI environments operating simultaneously with their AI Agents working in unison.

The Lenovo Hybrid AI 285 with Cisco Networking platform has been designed to meet the customer where they are at with their AI application and then seamlessly scale with them through their AI integration. This is achieved through the introduction of the following:

Figure 14. Lenovo Hybrid AI 285 with Cisco Networking Scaling. This diagram visually represents the scaling options for the Lenovo Hybrid AI 285 platform, showing Starter Kits, First SU, Second SU, and Third SU configurations with their respective components.

Entry and AI Starter Kit Deployments

Entry deployment sizings are for customers that want to deploy their initial AI factory with 4-16x GPUs. Entry deployments have one or two SR675 V3 servers, with 8x GPUs per server (AI Compute Nodes). With two servers configured, the two servers are connected directly via the installed NVIDIA ConnectX-7 or NVIDIA BlueField adapters.

If additional networking is required or additional storage is required, then use the AI Starter Kit deployment, which supports up to 4x servers and up to 32x GPUs. The AI Starter Kit uses NVIDIA networking switches and ThinkSystem DM or DG external storage.

The following sections describe these deployments:

Entry sizing

Entry sizing starts with a single AI Compute node, equipped with four GPUs. Such entry deployments are ideal for development, application trials, or small-scale use, reducing hardware costs, control plane overhead, and networking complexity. With all components on one node, management and maintenance are simplified.

Entry sizing can also support two AI Compute nodes, directly connected together and without the need for external networking switches, to scale up to 16 GPUs if fully populated (2 nodes, 8x GPUs per node). The two nodes connect to the rest of your data center using existing networking in your data center.

Figure 15. Entry Deployment Rack View. This diagram shows a rack layout for an 'Entry' deployment, indicating 1-2x SR675 V3 servers with 8x GPUs each.
Table 5. Entry sizing
Compute4-8x GPUs8-16x GPUs
Network adapters per serverMinimum ratio of 1 CX7 per 2 GPUsMinimum ratio of 1 CX7 per 2 GPUs

AI Starter Kits

For customers who want storage and/or networking in the Entry sizing, Lenovo and Cisco worked to develop AI Starter Kits with Cisco networking which allows up to 32 GPUs across 4 nodes, slightly more than the base 285 starter kit sizing which allows for up to only 24 GPUs. This sizing is for customers who do not plan to scale above 32 GPUs in the near future but still need an end-to-end solution for compute and storage.

Networking between the nodes is implemented using the Cisco 9332D-GX2B 200GbE switches and NVIDIA ConnectX-7 dual-port 200Gb adapters in each server.

Storage is implemented using either ThinkSystem DM or ThinkSystem DG Storage Arrays. Features include:

The table and figure below show the hardware involved in various sizes of AI Starter Kit deployments.

Table 6. AI Starter Kit sizing with Cisco Networking
4-8x GPUs8-16x GPUs32x GPUs
Compute1x SR675 V32x SR675 V34x SR675 V3
StorageDG5200DM7200FDM7200F
Network adapters per server5x CX-75x CX-75x CX-7
Networking9332D-GX2B
93108TC-FX3P
9332D-GX2B
93108TC-FX3P
9332D-GX2B
93108TC-FX3P
Figure 16. AI Starter Kits. This diagram illustrates the components of AI Starter Kits, showing Networking, Storage, and AI Compute Nodes for different configurations (Non-HA and HA options).

Scalable Unit Deployment

For configurations beyond two nodes, it is advisable to deploy a full Scalable Unit along with the necessary network and service infrastructure, providing a foundation for further growth in enterprise use cases.

The fist SU consists of up to four AI Compute nodes, minimum five service nodes, and networking switches. When additional AI Compute Nodes are required, additional SUs of four AI Compute Nodes can be added.

Figure 17. Scalable Unit Deployment. This diagram shows the layout of Scalable Units (First SU, Second-Third SU) within a data center rack, detailing Management Network, Spine and Leaf Switches, Service Nodes, Management Node, Scheduling Node, and AI Compute Nodes.

Networking is implemented using Cisco 93108TC-FX3P switches and CX-7 adapters in the AI Compute Nodes. The combination of these two pieces allows the user to take advantage of NVIDIA's Spectrum-X networking, an Ethernet platform that delivers the highest performance for AI, machine learning, and natural language processing.

The networking decision depends on whether the platform is designed to support up to three Scalable Units in total, and whether it will handle exclusively inference workloads or also encompass future fine-tuning and re-training activities. Subsequently, the solution can be expanded seamlessly without downtime by incorporating additional Scalable Units, ultimately reaching a total of three as needed.

Custom Deployment

For high-end scenarios requiring more than eight scalable units, the network can be custom designed to any required size. Lenovo will develop a fully bespoke solution tailored to match the workflow and workload requirements in that case.

Performance

This document is meant to be used in tandem with the Hybrid AI 285 platform guide please see this section there.

Bill of Materials

3 Scalable Unit (3 SU)

This section provides an example Bill of Materials (BoM) of one Scaleable Unit (SU) deployment with NVIDIA Spectrum-X.

This example BoM includes:

Storage is optional and not included in this 3 SU BoM.

In this section:

Table 7. ThinkSystem SR675 V3 BoM
Part NumberProduct DescriptionQty per SystemTotal Qty
7D9RCTOLWWThinkSystem SR675 V3112
BR7FThinkSystem SR675 V3 8DW PCIe GPU Base112
C3EFThinkSystem SR675 V3 System Board v2112
C2ALThinkSystem AMD EPYC 9535 64C 300W 2.4GHz Processor224
C0CKThinkSystem 64GB TruDDR5 6400MHz (2Rx4) RDIMM-A24288
BR7SThinkSystem SR675 V3 Switched 4x16 PCIe DW GPU Direct RDMA Riser224
C3V3ThinkSystem NVIDIA H200 NVL 141GB PCIe GPU Gen5 Passive GPU896
C3V0ThinkSystem NVIDIA 4-way bridge for H200 NVL224
BR7HThinkSystem SR675 V3 2x16 PCIe Front IO Riser112
C2RKThinkSystem SR675 V3 2 x16 Switch Cabled PCIe Rear IO Riser224
BQBNThinkSystem NVIDIA ConnectX-7 NDR200/200GbE QSFP112 2-port PCIe Gen5 x16 Adapter560
BM8XThinkSystem M.2 SATA/x4 NVMe 2-Bay Adapter112
BT7PThinkSystem Raid 540-8i for M.2/7MM NVMe boot Enablement112
BXMHThinkSystem M.2 PM9A3 960GB Read Intensive NVMe PCIe 4.0 x4 NHS SSD224
BTMBThinkSystem 1x4 E3.S Backplane112
C1ABThinkSystem E3.S PM9D3a 3.84TB Read Intensive NVMe PCIe 5.0 x4 HS SSD224
BK1EThinkSystem SR670 V2/ SR675 V3 OCP Enablement Kit112
C5WWThinkSystem SR675 V3 Dual Rotor System High Performance Fan560
BFD6ThinkSystem SR670 V2/ SR675 V3 Power Mezzanine Board112
BE0DN+1 Redundancy With Over-Subscription112
BKTJThinkSystem 2600W 230V Titanium Hot-Swap Gen2 Power Supply448
62522.5m, 16A/100-250V, C19 to C20 Jumper Cord448
C3KAThinkSystem SR670 V2/SR675 V3 Heavy Systems Toolless Slide Rail Kit112
BFNUThinkSystem SR670 V2/ SR675 V3 Intrusion Cable112
BR7UThinkSystem SR675 V3 Root of Trust Module112
BFTHThinkSystem SR670 V2/ SR675 V3 Front Operator Panel ASM112
5PS7B096315Yr Premier NBD Resp + KYD SR675 V3112
Table 8. ThinkSystem SR635 V3 BoM
Part NumberProduct DescriptionQty per SystemTotal Qty
7D9GCTO1WWServer : ThinkSystem SR635 V3 - 3yr Warranty15
BLK4ThinkSystem V3 1U 10x2.5" Chassis15
BVGLData Center Environment 30 Degree Celsius / 86 Degree Fahrenheit15
C2AQThinkSystem AMD EPYC 9335 32C 210W 3.0GHz Processor15
BQ26ThinkSystem SR645 V3/SR635 V3 1U High Performance Heatsink15
C1PLThinkSystem 32GB TruDDR5 6400MHz (1Rx4) RDIMM-A1260
BC4VNon RAID NVMe15
C0ZUThinkSystem 2.5" U.2 VA 3.84TB Read Intensive NVMe PCIe 5.0 x4 HS SSD210
BPC9ThinkSystem 1U 4x 2.5" NVMe Gen 4 Backplane15
B5XJThinkSystem M.2 SATA/NVMe 2-Bay Adapter15
BTTYM.2 NVMe15
BKSRThinkSystem M.2 7450 PRO 960GB Read Intensive NVMe PCIe 4.0 x4 NHS SSD210
BQBNThinkSystem NVIDIA ConnectX-7 NDR200/200GbE QSFP112 2-port PCIe Gen5 x16 Adapter15
BLK7ThinkSystem SR635 V3/SR645 V3 x16 PCIe Gen5 Riser 115
BLK9ThinkSystem V3 1U MS LP+LP BF Riser Cage15
BNFGThinkSystem 750W 230V/115V Platinum Hot-Swap Gen2 Power Supply v3210
BH9MThinkSystem V3 1U Performance Fan Option Kit v2735
BLKDThinkSystem 1U V3 10x2.5" Media Bay w/ Ext. Diagnostics Port15
7Q01CTS2WW5Yr Premier NBD Resp + KYD SR635 V315
Table 9. Cisco 9364D-GX2A Switch BoM
Part NumberProduct DescriptionTotal Qty
7DLKCTO1WWCisco Nexus 9300-GX2 Series Switch (N9K-C9364D-GX2A)2
C5P0Cisco Nexus 9300-GX2 Series Switch (N9K-C9364D-GX2A)2
C6FKMode selection between ACI and NXOS (MODE-NXOS)4
62522.5m, 16A/100-250V, C19 to C20 Jumper Cord2
C1P1TN9300XF2-5Y5 Years (60 months) Cisco software Premier license2
Table 10. Cisco 93108TC-FX3P Switch BoM
Part NumberDescriptionTotal Qty
7DL8CTO1WWCisco Nexus 9300-FX3 Series Switch (N9K-C93108TC-FX3)2
C5PBCisco Nexus 9300-GX2 Series Switch (N9K-C9364D-GX2A)2
C6FKMode selection between ACI and NXOS (MODE-NXOS)4
62522.5m, 16A/100-250V, C19 to C20 Jumper Cord2
C1P1TN9300XF-5Y5 Years (60 months) Cisco software Premier license2
Table 11. Power Distribution Unit (PDU) BoM
Part NumberProduct DescriptionQty per SystemTotal Qty
7DGMCTO1WW-SB- 0U 18 C13/C15 and 18 C13/C15/C19 Switched and Monitored 63A 3 Phase WYE PDU v222
Table 12. Rack Cabinet BoM
Part NumberProduct DescriptionQty per SystemTotal Qty
1410O42Lenovo EveryScale 42U Onyx Heavy Duty Rack Cabinet11
BHC4Lenovo EveryScale 42U Onyx Heavy Duty Rack Cabinet11
BJPD21U Front Cable Management Bracket22
BHC7ThinkSystem 42U Onyx Heavy Duty Rack Side Panel22
BJPAThinkSystem 42U Onyx Heavy Duty Rack Rear Door22
5AS7B07693Lenovo EveryScale Rack Setup Services11
Table 13. XClarity Software BoM
Part NumberProduct DescriptionQty per SystemTotal Qty
SBCVLenovo XClarity XCC2 Platinum Upgrade (FOD)33
00MT203Lenovo XClarity Pro, Per Managed Endpoint w/5 Yr SW S&S55
Table 14. Cables and Transceivers BoM
Part NumberProduct DescriptionTotal Qty
P01DQ3007-07-R Example from luxshare-tech.com400G QSFP-DD to 2 x 200G QSFP56 Active Optical Breakout Cable 7M48
QSFP-200-CU3M From Cisco200G QSFP56 to QSFP56, Passive Copper Cable 3m10
QDD-400-CU2M From Cisco400 Gbps, QSFP-DD to QSFPDD, DAC, 2M10
7Z57A03562Lenovo 3M Passive 100G QSFP28 DAC Cable12

Starter Kit

This section provides an example Bill of Materials (BoM) of Starter Kit deployment with Cisco switches.

This example BoM includes:

In this section:

Table 15. ThinkSystem SR675 V3 BoM
Part NumberProduct DescriptionQty per SystemTotal Qty
7D9RCTOLWWThinkSystem SR675 V314
BR7FThinkSystem SR675 V3 8DW PCIe GPU Base14
C3EFThinkSystem SR675 V3 System Board v214
C2ALThinkSystem AMD EPYC 9535 64C 300W 2.4GHz Processor28
C0CKThinkSystem 64GB TruDDR5 6400MHz (2Rx4) RDIMM-A2496
BR7SThinkSystem SR675 V3 Switched 4x16 PCIe DW GPU Direct RDMA Riser28
C3V3ThinkSystem NVIDIA H200 NVL 141GB PCIe GPU Gen5 Passive GPU832
C3V0ThinkSystem NVIDIA 4-way bridge for H200 NVL28
BR7HThinkSystem SR675 V3 2x16 PCIe Front IO Riser14
C2RKThinkSystem SR675 V3 2 x16 Switch Cabled PCIe Rear IO Riser24
BQBNThinkSystem NVIDIA ConnectX-7 NDR200/200GbE QSFP112 2-port PCIe Gen5 x16 Adapter58
BM8XThinkSystem M.2 SATA/x4 NVMe 2-Bay Adapter116
BT7PThinkSystem Raid 540-8i for M.2/7MM NVMe boot Enablement14
BXMHThinkSystem M.2 PM9A3 960GB Read Intensive NVMe PCIe 4.0 x4 NHS SSD24
BTMBThinkSystem 1x4 E3.S Backplane18
C1ABThinkSystem E3.S PM9D3a 3.84TB Read Intensive NVMe PCIe 5.0 x4 HS SSD24
BK1EThinkSystem SR670 V2/ SR675 V3 OCP Enablement Kit18
C5WWThinkSystem SR675 V3 Dual Rotor System High Performance Fan54
BFD6ThinkSystem SR670 V2/ SR675 V3 Power Mezzanine Board120
BE0DN+1 Redundancy With Over-Subscription14
BKTJThinkSystem 2600W 230V Titanium Hot-Swap Gen2 Power Supply44
62522.5m, 16A/100-250V, C19 to C20 Jumper Cord416
C3KAThinkSystem SR670 V2/SR675 V3 Heavy Systems Toolless Slide Rail Kit116
BFNUThinkSystem SR670 V2/ SR675 V3 Intrusion Cable14
BR7UThinkSystem SR675 V3 Root of Trust Module14
BFTHThinkSystem SR670 V2/ SR675 V3 Front Operator Panel ASM14
5PS7B096315Yr Premier NBD Resp + KYD SR675 V314
Table 16. ThinkSystem DM7200F Storage BoM
Part NumberProduct DescriptionTotal Qty
7DJ3CTO1WWController : Lenovo ThinkSystem DM7200F All Flash Array1
BF3CLenovo ThinkSystem Storage 2U NVMe Chassis1
BWU8Storage Complete Bundle Offering1
C4A4Lenovo ThinkSystem DM7200 Series Controller, 128GB2
C3XKLenovo ThinkSystem 30.7TB (2x 15.36TB NVMe SED) Drive Pack9
C4AALenovo ThinkSystem Storage 100Gb 2 port Ethernet, RoCE Adapter (Host/Cluster)2
C4AALenovo ThinkSystem Storage 100Gb 2 port Ethernet, RoCE Adapter (Host/Cluster)4
C4AGLenovo ThinkSystem Storage ONTAP 9.16 Software Encryption - IPAv21
B0W13 Years1
C6S2Premier 24x7 4hr Response and KYD1
C48TConfigured with Lenovo ThinkSystem DM7200F 3Yr Warranty1
BWUEStorage Encryption Bundle License Key - RoW2
BWUCStorage Complete Bundle License Key2
C49BLenovo ThinkSystem DM/DG Series Jupiter All Flash Ship Kit - Multi-Language1
B6Y6Lenovo ThinkSystem NVMe Rail Kit 4 post1
7S0SCTOMWWThnkSys DM7200F 7DJ3 SWLicense1
SDJELenovo ThinkSystem DM7200F NVMe SSD Unified Complete SW License with 3 Years Support, Per 0.1TB2765
5641PX3XClarity Pro, Per Endpoint w/3 Yr SW S&S1
1340Lenovo XClarity Pro, Per Managed Endpoint w/3 Yr SW S&S1
3444Registration only1
5WS7C066193Yr Premier 24x7 4Hr Resp DM7200F+KYD1
5WS7C072593Yr Premier 24x7 4Hr Resp+KYD (0.1TB NVMe TLC)2765
Table 17. Cisco 9364D-GX2A Switch BoM
Part NumberProduct DescriptionTotal Qty
7DLKCTO1WWCisco Nexus 9300-GX2 Series Switch (N9K-C9364D-GX2A)2
C5P0Cisco Nexus 9300-GX2 Series Switch (N9K-C9364D-GX2A)2
C6FKMode selection between ACI and NXOS (MODE-NXOS)4
62522.5m, 16A/100-250V, C19 to C20 Jumper Cord2
C1P1TN9300XF2-5Y5 Years (60 months) Cisco software Premier license2
Table 18. Cisco 93108TC-FX3P Switch BoM
Part NumberDescriptionTotal Qty
7DL8CTO1WWCisco Nexus 9300-FX3 Series Switch (N9K-C93108TC-FX3)2
C5PBCisco Nexus 9300-GX2 Series Switch (N9K-C9364D-GX2A)2
C6FKMode selection between ACI and NXOS (MODE-NXOS)4
62522.5m, 16A/100-250V, C19 to C20 Jumper Cord2
C1P1TN9300XF-5Y5 Years (60 months) Cisco software Premier license2
Table 19. Cables and Transceivers BoM
Part NumberProduct DescriptionTotal Qty
QSFP-200CU3M From Cisco200G QSFP56 to QSFP56, Passive Copper Cable 3m20
QDD-400CU2M From Cisco400 Gbps, QSFP-DD to QSFPDD, DAC, 2M10
7Z57A03562Lenovo 3M Passive 100G QSFP28 DAC Cable12

Seller training courses

The following sales training courses are offered for employees and partners (login required). Courses are listed in date order.

  1. VTT AI: Introducing the Lenovo Hybrid AI 285 Platform April 2025
    2025-04-30 | 60 minutes | Employees Only

    The Lenovo Hybrid AI 285 Platform enables enterprises of all sizes to quickly deploy AI infrastructures supporting use cases as either new greenfield environments or as an extension to current infrastructures. The 285 Platform enables the use of the NVIDIA AI Enterprise software stack. The AI Hybrid 285 platform is the perfect foundation supporting Lenovo Validated Designs.

    • Technical overview of the Hybrid AI 285 platform
    • AI Hybrid platforms as infrastructure frameworks for LVDs addressing data center-based AI solutions.
    • Accelerate AI adoption and reduce deployment risks

    Tags: Artificial Intelligence (AI), Nvidia, Technical Sales, Lenovo Hybrid AI 285

    Published: 2025-04-30

    Length: 60 minutes

    Start the training:

    Employee link: Grow@Lenovo

    Course code: DVAI215

Notices

Lenovo may not offer the products, services, or features discussed in this document in all countries. Consult your local Lenovo representative for information on the products and services currently available in your area. Any reference to a Lenovo product, program, or service is not intended to state or imply that only that Lenovo product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any Lenovo intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any other product, program, or service. Lenovo may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to:

Lenovo (United States), Inc.
8001 Development Drive
Morrisville, NC 27560
U.S.A.
Attention: Lenovo Director of Licensing

LENOVO PROVIDES THIS PUBLICATION "AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. Lenovo may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

The products described in this document are not intended for use in implantation or other life support applications where malfunction may result in injury or death to persons. The information contained in this document does not affect or change Lenovo product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of Lenovo or third parties. All information contained in this document was obtained in specific environments and is presented as an illustration. The result obtained in other operating environments may vary. Lenovo may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Any references in this publication to non-Lenovo Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this Lenovo product, and use of those Web sites is at your own risk. Any performance data contained herein was determined in a controlled environment. Therefore, the result obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment.

© Copyright Lenovo 2025. All rights reserved.

This document, LP2236, was created or updated on June 17, 2025.

Send us your comments in one of the following ways:

This document is available online at https://lenovopress.lenovo.com/LP2236.

Trademarks

Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.

The following terms are trademarks of Lenovo in the United States, other countries, or both:

The following terms are trademarks of other companies:

Other company, product, or service names may be trademarks or service marks of others.

Models: Hybrid AI 285 with Cisco Networking, Cisco Networking, Networking

File Info : application/pdf, 27 Pages, 2.15MB

PDF preview unavailable. Download the PDF instead.

lp2236

References

wkhtmltopdf 0.12.4-dev-4fa8338 Qt 4.8.7

Related Documents

Preview Lenovo, Cisco, and NVIDIA Collaborate for Accelerated Computing Hybrid AI Infrastructure
Explore the collaboration between Lenovo, Cisco, and NVIDIA to deliver a new accelerated computing hybrid AI infrastructure. Learn how this partnership enhances AI workloads, optimizes network performance, and provides scalable solutions for data centers and enterprise applications.
Preview Lenovo ThinkSystem SR675 V3 Server: Product Guide for AI & HPC
Explore the Lenovo ThinkSystem SR675 V3, a versatile 3U rack server designed for demanding AI, HPC, and graphical workloads. Featuring AMD EPYC processors, NVIDIA GPUs, NVLink, and advanced cooling, it offers high performance and flexibility for data-intensive computing.
Preview Lenovo ThinkSystem SR635 V3 Server Product Guide
Discover the Lenovo ThinkSystem SR635 V3 Server, a powerful 1U rack server designed for demanding enterprise workloads. Featuring AMD EPYC 9004 "Genoa" processors, up to 96 cores, DDR5 memory, and PCIe 5.0 support, it excels in AI, HPC, and data-intensive applications. This guide details its key features, scalability, storage options, advanced manageability, and energy efficiency.
Preview Lenovo ThinkSystem SR685a V3 Server Product Guide - AI & HPC Performance
Explore the Lenovo ThinkSystem SR685a V3 Server Product Guide for detailed insights into its advanced AI and HPC capabilities, featuring AMD EPYC processors, NVIDIA/AMD GPUs, PCIe 5.0, and robust manageability solutions.
Preview Lenovo Announces New ThinkSystem V3 and ThinkAgile V3 Servers with 4th Gen AMD EPYC Processors
Lenovo introduces its latest ThinkSystem V3 and ThinkAgile V3 server lines, powered by 4th Gen AMD EPYC processors. This document details models like SR645 V3, SR665 V3, SR675 V3, SD665 V3, SD665-N V3, VX645 V3, and VX665 V3, highlighting their performance, advanced cooling, and suitability for demanding enterprise workloads.
Preview Lenovo ThinkSystem SR635 V3 Server Product Guide
Discover the Lenovo ThinkSystem SR635 V3 Server, a 1U rack server powered by AMD EPYC 9004 processors, featuring PCIe 5.0 and DDR5 memory for high-performance computing, AI, VDI, and data analytics. Learn about its key features, scalability, and reliability for enterprise environments.
Preview Lenovo ThinkSystem SR680a V3 Server Product Guide
Explore the Lenovo ThinkSystem SR680a V3, a powerful 8U server engineered for demanding AI and HPC workloads. Featuring dual 5th Gen Intel Xeon Scalable processors, eight NVIDIA GPUs, and advanced connectivity, this guide details its specifications, features, and management capabilities for high-performance computing environments.
Preview ThinkSystem NVIDIA HGX B200 180GB 1000W GPU Product Guide
Explore the ThinkSystem NVIDIA HGX B200 180GB 1000W GPU, a powerful platform for accelerating AI, data analytics, and HPC workloads. Learn about its features, specifications, and server compatibility.