AMD MI200 Instinct Accelerator Instruction Manual

Chapter 1 Introduction

This document provides step-by-step instructions for updating the Integrated Firmware Image (IFWI) and Remote Management Firmware (RMFW) using the AMD FW Flash tool (amdfwflash) on the AMD Instinct™ MI200 server platforms.
This user guide is for users who have the following AMD Instinct™ MI200 GPUs and wants to upgrade IFWI and/or RMFW.

  • AMD Instinct™ MI210
  • AMD Instinct™ MI250/MI250X

The AMD FW Flash tool v2.0 is delivered with four versions of IFWI and RMFW:

  • Maintenance Update#1 (mu1)
  • Maintenance Update#2 (mu2)
  • Maintenance Update#3 (mu3)
  • General Availability (GA)

By default, the tool updates to the most recent version of Maintenance Update#3.

The tool also offers the ability to update or rollback your IFWI and/or RMFW to a desired level. For instance, this tool has the capability to update your MI200 platform to Maintenance Update#1 or Maintenance Update#2 version from the GA version. The steps to be followed are outlined in this document.
Note: The AMD FW Flash tool is not intended to be used in a Virtual Machine/Guest Operating System (OS) environment.
CAUTION: Using the AMD FW Flash tool in a Virtual Machine/Guest OS may result in an undefined behavior and unsupported configuration.

Chapter 2 Getting Started

  • Prior to updating the FW, follow the instructions below:
  • Requires installation of the dmidecode package on the system. This is applicable for all systems (Ubuntu/CentOS/RHEL/SLES).
  • Identify the server with the AMD Instinct™ MI200 accelerator(s) requiring a FW update or GPU replacement.
  • Ensure that you have the appropriate login credentials for the server.

Note: To execute the firmware update tool, you must have sudo or root permissions on the server.

  • To access the system console, make sure you have access to the BMC/IPMI interface.
  • Ensure network access to the AMD FW Flash tool repository, “repo.radeon.com”.
  • Ensure that all applications are closed prior to launching the tool and that no Operating System (OS) updates are pending in the background. Notify server users about the server maintenance for firmware update.
  • RMFW updates require the driver to be loaded.

Note: It is strongly recommended to run the firmware tool update from the system console, and not on the network. This prevents any network interruption and loss of connection.

Chapter 3 Commands

The AMD FW Flash utility supports multiple flags and options to update the FWs.

  1. Help
    Flag/Option
    –help/-h [switch]
    Description
    Displays the help text for all switches along with the description of the tool. [switch] is optional.
    • When [switch] is specified, the help for the specified switch is displayed.
    •  When [switch] is not specified, the complete help is displayed.
      Figure 3.1: SUDO/AMD FW Flash –help Generic Options

      Figure 3.2:
      SUDO/AMD FW Flash –help Common Tool Options
  2. List Devices
    Flag/Option
    –list-devices/-l
    Description
    This command performs the following functions:
    • Informs the tool to show the available ASICs along with the SPIROM model and respective part numbers.
    • Indicates whether the firmware update is available or not.
    • When the tool is executed without a command line, the switches display the devices by default.
      The following figure lists the dGPU device information whether an appropriate firmware update is available or not.
      Figure 3.3: SUDO/AMD FW Flash –list-devices

Chapter 4 Instructions

To update the FW on AMD Instinct ™ MI200 Accelerator(s) or when replacing the AMD Instinct ™ MI 200 Accelerator(s) on a server, configure the system for the FW maintenance. Once the system is configured for firmware maintenance, execute the amdfwflash command to update or rollback the FW to a desired version.

Configuring the System for FW Maintenance or AMD Instinct™ MI200 Replacement

Installing the AMD FW Flash Tool

  1. The AMD FW Flash tool repository for Linux is located at: (repo.radeon.com/fwupdater/amdfwflash/latest).
  2. Log in to the server with the MI200 GPUs requiring a FW update.
  3. Setup the AMD FW Flash tool package repository.
    Setup Ubuntu OS apt repo 


    Setup RHEL 8 or RHEL 9 yum repo

    Setup SLES 15 SP3 or SP4 zypper repo
  4. Update the AMD FW Flash tool package repository
    Ubuntu OS

    To verify, search for the amdfwflash package:


    RHEL 8 or RHEL 9


    To verify, search for the amdfwflash package:

    SLES 15 SP3 or SP4


    To verify, search for the amdfwflash package:

  5. Install the AMD FW Flash tool package.
    Ubuntu OS

    RHEL 8 or RHEL 9

    SLES 15 SP3 or SP4
    Prior to installing set iomem=relaxed in the grub and remake the kernel config.

  6. Verify the AMD FW Flash tool package installation.
    Ubuntu OS

    RHEL 8, RHEL 9

    SLES 15 SP3, or SLES 15 SP4
  7. Reboot the server for FW maintenance update or power off to rep

    or

    Note: If there is a replacement of the AMD Instinct ™ MI200 Accelerator in the system, power off the system.
    Refer to the section Updating and Rolling Back the AMD Instinct™ MI200 FW Version to update or rollback the AMD Instinct™ MI200 FW to a desired version.

Updating and Rolling Back the AMD Instinct™ MI200 FW Version

Follow the below steps to update or rollback the AMD Instinct ™ MI200 FW to a desired version.

  1. Log in to the server’s BMC/IPMI interface identified for FW update.
  2. Launch the remote/virtual console on the server.
  3. Log in to the server.
  4. Run the amdfwflash utility to list the GPU devices.

    Note: The output should list all the GPU devices in the system. If the output does not list all the GPU devices, contact customer care (Customer Care).
  5. Execute the amdfwflash command to update the IFWI and/or RMFW of all GPUs in the system to the latest MI200 Maintenance Update#3 version.

    or

    or
  6. Follow this step to update the IFWI and/or RMFW of all GPUs in the system to the MI200 Maintenance Update#2 version.
  7. Follow this step to update the IFWI and/or RMFW of all GPUs in the system to the MI200 Maintenance Update#1 version.
  8. Save the system log and console output to a file.
  9. The amdfwflash tool saves a copy of the old IFWI and/or RMFW images under /tmp before updating. Archive the generated FW images from /tmp folder for later reference.
  10. Reboot the server (an AC power cycle is recommended) to make the FW update effective.

    or
  11. Refer to the section Verifying the AMD Instinct™ MI200 FW Version to complete the FW update. After a successful verification of the FW update, the server may resume normal

Rolling Back to the MI200 GA FW Version

  1. Log in to the server’s BMC/IPMI interface identified for FW update.
  2. Launch the remote/virtual console on the server.
  3. Log in to the server.
  4. Run the amdfwflash utility to list the GPU devices.

    Note: The output should list all the GPU devices in the system. If the output does not list all the GPU devices, contact customer care (Customer Care).
  5. Execute the amdfwflash command to rollback the IFWI and/or RMFW of all GPUs to the GA version.
  6. Run the amdfwflash to rollback the IFWI and/or RMFW of all GPUs to the Maintenance Update#2 version from Maintenance Update#3 version.
  7. Run the amdfwflash to rollback the IFWI and/or RMFW of all GPUs to the Maintenance Update#1 version from Maintenance Update#2 version.
  8. Save the system log and console output to a file.
  9. The amdfwflash tool saves a copy of the old IFWI and/or RMFW images under /tmp before updating. Archive the generated FW images from /tmp folder for later reference.
  10. Reboot the server (an AC power cycle is recommended) to make the FW update effective.
    or
  11. Refer to the section Verifying the AMD Instinct™ MI200 FW Version to complete the FW update. After a successful verification of the FW update, the server may resume normal

Verifying the AMD Instinct™ MI200 FW Version

  1. Log in to the system.
  2. If the AMD ROCm software is installed, run the showhw command to display the firmware version under VBIOS The output should list all the GPU devices in the system. If the output does not list all the GPU devices, contact customer care (Customer Care).
    Verifying the AMD Instinct
    Note: If your environment has blacklisted the amdgpu driver for normal operation, run the following command to load the driver before executing rocm-smi.
  3. Run the amdfwflash utility to list all the GPU devices.

    Note: Please refer to the command (List Devices) section.
  4. Ensure that all MI200 GPUs have the same updated IFWI and RMFW versions.
    Note: In the event of a console output error, contact customer care (Customer Care).
    After a successful verification of the FW update, the server may resume normal operation.

Uninstalling the AMD FW Flash Tool

  1. Uninstall the AMD FW Flash amdfwflash tool package.
    Ubuntu OS

    RHEL 8 or RHEL 9

    SLES15 SP3 or SP4

Replacing the AMD Instinct™ MI200 GPU (RMA)

The IFWI and RMFW versions of all AMD Instinct™ MI200 Accelerators within a system must be identical for the system to work properly.

  1. When replacing the AMD Instinct™ MI200 Accelerator(s) in a system, the system must be configured for the AMD Instinct™ MI200 Refer to the section Configuring the System for FW Maintenance or AMD Instinct™ MI200 Replacement for steps on how to configure the system.
  2. Once the system is configured for the AMD Instinct™ MI200 replacement, power off the system and replace the AMD Instinct™ MI200 Accelerator(s) according to the assembly instruction
  3. After replacing the AMD Instinct™ MI200 Accelerator, power on the system and follow the steps in Updating and Rolling Back the AMD Instinct™ MI200 FW Version to update or rollback the IFWI and/or RMFW on all AMD Instinct™ MI200 Accelerator(s) to a desired

Chapter 5 References

For additional information, please refer to the following web sites:

Chapter 6 Customer Care

If you have any questions or need additional information, please contact your AMD Representative. You may also submit a question at Online Service Request (https://www.amd.com/en/support/contact- email-form) using the keyword amdfwflash in the subject line.

Chapter 7 Frequently Asked Questions (FAQ)

”1.

A: No. The message does not indicate an error.

”2.

”A:

4. Q: What is GA version?

A: GA version refers to the IFWI and RMFW shipped from the factory.

5. Q: What is Return Merchandise Authorization (RMA)?

A: RMA means adding a new card into a system that already contains existing cards. This may include field replacements or adding additional GPUs to a server.

Appendix A Notices

© Copyright 2024 Advanced Micro Devices, Inc.

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

THIS INFORMATION IS PROVIDED “AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

A.1 Trademarks 

AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc.

Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Documents / Resources

AMD MI200 Instinct Accelerator [pdf] Instruction Manual
MI200 Instinct Accelerator, MI200, Instinct Accelerator, Accelerator

References

Leave a comment

Your email address will not be published. Required fields are marked *