NVIDIA.

NVIDIA DGX OS Server Release 4.9

Release Notes and Update Guide

DA-08260-490_v01 | July 2021

Primary Changes in Release 4.9

The following are the primary new features of DGX OS Server Release 4.8 since Release 4.7:

Delivery and Update Mechanisms

Initial 4.9 Release

DGX OS Server Release 4.9, version 4.9.0, is provided as an ISO image which is available from NVIDIA Enterprise Support in the event the server needs to be re-imaged. Version 4.9.0 is also provided as an “over-the-network" update, which requires an internet connection and ability to access the NVIDIA public repositories.

Refer to the DGX-2 User Guide (https://docs.nvidia.com/dgx/dgx2-user-guide/index.html) and DGX-1 User Guide (https://docs.nvidia.com/dgx/dgx1-user-guide/index.html) for the following instructions:

Update Advisement

NVIDIA GPU Cloud Containers

In conjunction with DGX OS Server v4.9, customers should update their NVIDIA GPU Cloud containers to the latest container release.

Ubuntu Security Updates

Customers are responsible for keeping the DGX server up to date with the latest Ubuntu security updates using the 'apt full upgrade' procedure. See the Ubuntu Wiki Upgrades web page for more information. Also, the Ubuntu Security Notice site lists known Common Vulnerabilities and Exposures (CVEs), including those that can be resolved by updating the DGX OS software.

Version History

This section lists the changes made in each released version of DGX OS Release 4.9. See DGX OS Server Software Content for the software component list and versions.

Version 4.9.0

See the NVIDIA Deep Learning Frameworks documentation website (http://docs.nvidia.com/deeplearning/dgx/index.htm) for information on the latest container releases as well as https://docs.nvidia.com/deeplearning/dgx/user-guide/index.html for instructions on how to access them.

DGX OS Server Software Content

The following tables provides version information for software included in the DGX OS Server ISO image as well as software installed on the system after getting subsequent updates.

Package Versions in Version 4.9.0

The following table shows the version information for software included in the DGX OS Server version 4.8.0.

Component Version (R418 package) Version (R450 package)
GPU Driver 418.211.00 (includes CUDA update to 10.1, if previously installed separately) 450.142.00 (includes CUDA update to 11.0.3, if previously installed separately)
Fabric Manager N/A 450.142.00
NVIDIA System Health Monitor (NVSM) 20.03.6 20.09.33
Data Center GPU Management (DCGM) 1.7.4 2.2.8
NVIDIA Container Toolkit nvidia-container-runtime 3.3.0
Ubuntu 18.04.4 LTS
Ubuntu kernel 4.15.0-1472
Docker Engine 19.03.15
Mellanox OFED MLNX4.9-2.2.6.0

KVM Package Components (DGX-2 only)

Version
dgx-kvm-sw 19.07.0
dgx-kvm-host-utils 21.01.0
dgx-kvm-host-conf 20.12.0
dgx-kvm-image dgx-kvm-image-4-9-0_4.9.0~210615-153549.0_amd64.deb

If updating over-the-network, your kernel version may be a later version depending on when the update is performed.

DGX Server Firmware Version Reference

The Mellanox firmware is updated as part of the DGX OS update. The following are the updated versions for each product:

Product Network Card Version
NVIDIA DGX-1 ConnectX-4 12.28.2006
ConnectX-5 16.28.2006
NVIDIA DGX-2 ConnectX-5 16.28.2006
ConnectX-6 20.28.2006

For other firmware, see the DGX-2 System Firmware Update Container Version 20.10.7.2 and DGX-1 System Firmware Update Container Version 20.10.2.1 release notes for the corresponding firmware versions available at the time of this DGX OS release.

Updating the Software

These instructions explain how to update the DGX OS server software through an internet connection to the NVIDIA public repository. The process updates a DGX system image to the latest versions of the entire DGX software stack, including the drivers. Perform the updates using commands on the DGX server console.

Preparing for Updating the Software

Connecting to the DGX server Console

Connect to the DGX server console using either a direct connection or a remote connection through the BMC.

Note: SSH can be used to perform the update. However, if the Ethernet port is configured for DHCP, there is the potential that the IP address can change after the DGX server is rebooted during the update, resulting in loss of connection. If this happens, connect using either a direct connection or through the BMC to continue the update process.

Warning: Connect directly to the DGX server console if the DGX is connected to a 172.17.xx.xx subnet.

DGX OS Server software installs Docker CE which uses the 172.17.xx.xx subnet by default for Docker containers. If the DGX server is on the same subnet, you will not be able to establish a network connection to the DGX server.

Refer to the appropriate DGX-1 or DGX-2 User Guide for instructions on how to change the default Docker network settings after performing the update.

Direct Connection

  1. Connect a display to the VGA connector and a keyboard to any one of the USB ports.
  2. Power on the DGX server.

Remote Connection through the BMC

Refer to the appropriate user guide (DGX-1 or DGX-2) for instructions on establishing a remote connection to the BMC.

Verifying the DGX Server Connection to the Repositories

Before attempting to perform the update, verify that the DGX server network connection can access the public repositories and that the connection is not blocked by a firewall or proxy.

On DGX-1 Systems if Upgrading from Version 2.x.

Enter the following on the DGX-1 system.

$ wget -O f1-changelogs http://changelogs.ubuntu.com/meta-release-lts
$ wget -O f2-archive http://archive.ubuntu.com/ubuntu/dists/xenial/Release
$ wget -O f3-usarchive http://us.archive.ubuntu.com/ubuntu/dists/xenial/Release
$ wget -O f4-security http://security.ubuntu.com/ubuntu/dists/xenial/Release
$ wget -O f5-download https://download.docker.com/linux/ubuntu/dists/xenial/Release
$ wget -O f6-international http://international.download.nvidia.com/dgx/repos/dists/xenial/Release

All the wget commands should be successful and there should be six files in the directory with non-zero content.

On DGX-2 and DGX-1 Systems

Enter the following on the DGX system

$ wget -O f1-changelogs http://changelogs.ubuntu.com/meta-release-lts
$ wget -O f2-archive http://archive.ubuntu.com/ubuntu/dists/bionic/Release
$ wget -O f3-usarchive http://us.archive.ubuntu.com/ubuntu/dists/bionic/Release
$ wget -O f4-security http://security.ubuntu.com/ubuntu/dists/bionic/Release
$ wget -O f5-international http://international.download.nvidia.com/dgx/repos/bionic/dists/bionic/Release
$ wget -O f6-international http://international.download.nvidia.com/dgx/repos/bionic/dists/bionic-r418+cuda10.1/Release
$ wget -O f7-international http://international.download.nvidia.com/dgx/repos/bionic/dists/bionic-r450+cuda11.0/Release

All the wget commands should be successful and there should be seven files in the directory with non-zero content.

Performing the Updates

Update Path Instructions

Follow the instructions corresponding to your current DGX OS server software.

Updating from Release 4.1 and Later

See the section Connecting to the DGX Console for guidance on connecting to the console to perform the update.

Caution: These instructions update all software for which updates are available from your configured software sources, including applications that you installed yourself. If you want to prevent an application from being updated, you can instruct the Ubuntu package manager to keep the current version. For more information, see Introduction to Holding Packages on the Ubuntu Community Help Wiki.

Update Instructions
  1. If you have not already done so, verify that your DGX system can access the public repositories as explained in Verifying the DGX Server Connection to the Repositories.
  2. (Optional) Skip this step to stay with the R418 package; however, to move to the R450 package, issue the following.
    $ sudo apt update
    $ sudo apt install -y dgx-bionic-r450+cuda11.0-repo
  3. Update the list of available packages and their versions.
    $ sudo apt update
  4. Review the packages that will be updated.
    $ sudo apt full-upgrade -s

    To prevent an application from being updated, instruct the Ubuntu package manager to keep the current version. See Introduction to Holding Packages.

  5. Upgrade to version 4.9.0.
    $ sudo apt full-upgrade

    Answer any questions that appear.

    • Most questions require a Yes or No response. When asked to select the grub configuration to use, select the current one on the system.
    • Other questions will depend on what other packages were installed before the update and how those packages interact with the update.
    • If a message appears indicating that nvidia-docker.service failed to start, you can disregard it and continue with the next step. The service will start normally at that time.
  6. Reboot the system.

Recovering from an Interrupted or Failed Update

If the script is interrupted during the update, such as from a loss of power or loss of network connection, then restore power or restore the network connection, whichever caused the interruption.

Updating from 4.0.1 (or later)

For Release 4.0, only updates from versions 4.0.1 and later are supported with these instructions. To update from version 4.0.0, you must re-image the system.

See the section Connecting to the DGX Console for guidance on connecting to the console to perform the update.

Caution: These instructions update all software for which updates are available from your configured software sources, including applications that you installed yourself. If you want to prevent an application from being updated, you can instruct the Ubuntu package manager to keep the current version. For more information, see Introduction to Holding Packages on the Ubuntu Community Help Wiki.

Update Instructions
  1. If you have not already done so, verify that your DGX system can access the public repositories as explained in Verifying the DGX Server Connection to the Repositories.
  2. Update the list of available packages and their versions.
    $ sudo apt update
  3. Install the 4.1.0 components from the repository.
    $ sudo apt install -y dgx-bionic-r418+cuda10.1-repo
  4. (Optional) Skip this step to stay with the R418 package; however, to move to the R450 package, issue the following.
    $ sudo apt install -y dgx-bionic-r450+cuda11.0-repo
  5. Update the new list of packages and their versions.
    $ sudo apt update
  6. Review the packages that will be updated.
    $ sudo apt full-upgrade -s

    To prevent an application from being updated, instruct the Ubuntu package manager to keep the current version. See Introduction to Holding Packages.

  7. Upgrade to version 4.8.0.
    $ sudo apt full-upgrade

    Answer any questions that appear.

    • Most questions require a Yes or No response. When asked to select the grub configuration to use, select the current one on the system.
    • Other questions will depend on what other packages were installed before the update and how those packages interact with the update.
    • If a message appears indicating that nvidia-docker.service failed to start, you can disregard it and continue with the next step. The service will start normally at that time.
  8. Reboot the system.

Recovering from an Interrupted or Failed Update

If the script is interrupted during the update, such as from a loss of power or loss of network connection, then restore power or restore the network connection, whichever caused the interruption.

Updating from 3.1.x

See the section Connecting to the DGX Console for guidance on connecting to the console to perform the update.

Caution: These instructions update all software for which updates are available from your configured software sources, including applications that you installed yourself. If you want to prevent an application from being updated, you can instruct the Ubuntu package manager to keep the current version. For more information, see Introduction to Holding Packages on the Ubuntu Community Help Wiki.

Update Instructions
  1. If you have not already done so, verify that your DGX-1 system can access the public repositories as explained in Verifying the DGX Server Connection to the Repositories.
  2. Update the list of available packages and their versions.
    $ sudo apt update
  3. Install any updates.
    $ sudo apt -y full-upgrade
  4. Install dgx-release-upgrade.
    $ sudo apt install -y dgx-release-upgrade
  5. Begin the update process.
    $ sudo dgx-release-upgrade

    If you are using a proxy server, then add the –E option to keep your proxy environment variables.

    Example:

    $ sudo -E dgx-release-upgrade
  6. After starting the update process, respond to the presented options as follows:
    • Press y if you are logged in to the DGX server remotely through secure shell (SSH) and are asked if you want to continue running under SSH.

      Continue running under SSH?

      This session appears to be running under ssh. It is not recommended to perform a upgrade over ssh currently because in case of failure it is harder to recover.

      If you continue, an additional ssh daemon will be started at port '1022'.

      Do you want to continue?

      Continue [yN]

      An additional sshd daemon is started.

      Press Enter in response to the following message.

      Starting additional sshd

      To make recovery in case of failure easier, an additional sshd will be started on port '1022'. If anything goes wrong with the running ssh you can still connect to the additional one.

      If you run a firewall, you may need to temporarily open this port. As this is potentially dangerous it's not done automatically. You can open the port with e.g.:

      'iptables -I INPUT -p tcp --dport 1022 -j ACCEPT'

      To continue please press [ENTER]

    • Press Enter in response to the message warning you that third-party sources are disabled.

      Third party sources disabled

      Some third party entries in your sources.list were disabled. You can re-enable them after the upgrade with the 'software-properties' tool or your package manager.

      To continue please press [ENTER]

    • Press N if prompted about dgx.list configuration choices.

      Configuration file '/etc/apt/sources.list.d/dgx.list'

      ==> Modified (by you or by a script) since installation.

      ==> Package distributor has shipped an updated version.

      What would you like to do about it ? Your options are:

      • Y or I : install the package maintainer's version
      • N or O : keep your currently-installed version
      • D : show the differences between the versions
      • Z : start a shell to examine the situation

      The default action is to keep your current version.

      dgx.list (Y/I/N/O/D/Z) [default=N] ?

    • When prompted to resolve other configuration files, evaluate the changes before accepting the package maintainer's version, keeping the local version, or manually resolving the difference. You are also asked to confirm that you want to remove obsolete packages.
    • At the prompt to confirm starting the upgrade, press Y to begin.

      Do you want to start the upgrade?

      Installing the upgrade can take several hours. Once the download has finished, the process cannot be canceled.

      Continue [yN] Details [d]

    • Press Y to proceed with the final reboot.

      System upgrade is complete.

      Restart required

      To finish the upgrade, a restart is required.

      If you select 'y' the system will be restarted.

      Continue [yN]

      After this reboot, the update process will take several minutes to perform some final installation steps.

      Your system is now updated to the latest DGX OS 4 release.

    • (Optional) Follow the instructions at Updating from Release 4.1 and Later if you want to install the R450 driver package.

Known Issues

This chapter captures the issues related to the DGX OS software or DGX hardware at the time of the software release.

Known Software Issues

The following are known issues with the software.

DCGM Service Labelled as Deprecated

Issue

When inquiring the status of dcgm.service, it is reported as deprecated.

$ sudo systemctl status dcgm.service
dcgm.service  DEPRECATED. Please use nvidia-dcgm.service
Explanation

The message can be ignored. dcgm.service is, indeed, deprecated, but can still be used without issue. The name of the DCGM service is in the process of migrating from dcgm.service to nvidia-dcgm.service. During the transition, both are included in DCGM 2.2.8.

A later version of DGX OS 4 will enable nvidia-dcgm.service by default. You can enable nvidia-dcgm.service manually (even though there is no functional difference) as follows:

$ sudo systemctl stop dcgm.service
$ sudo systemctl disable dcgm.service
$ sudo systemctl start nvidia-dcgm.service
$ sudo systemctl enable nvidia-dcgm.service

NVSM May Raise ‘md1 is corrupted' Alert

Issue

On a system where one OS drive is used for the EFI boot partition and one is used for the root file system (each configured as RAID 1), NVSM raises 'md1 is corrupted' alerts.

Explanation

The OS RAID 1 drives are running in a non-standard configuration, resulting in erroneous alert messages. If you alter the default configuration, you must let NVSM know so that the utility does not flag the configuration as an error, and so that NVSM can continue to monitor the health of the drives.

To configure NVSM to support a custom drive partitioning, perform the following.

  1. Stop NVSM services.
    $ systemctl stop nvsm
  2. Edit /etc/nvsm/nvsm.config and set the "use_standard_config_storage" parameter to false.
    "use_standard_config_storage":false
  3. Remove the NVSM database.
    $ sudo rm /var/lib/nvsm/sqlite/nvsm.db
  4. Restart NVSM.
    $ systemctl restart nvsm

nvsm show health Reports Empty /proc/driver Folders

Issue

When issuing nvsm show health, the nvsmhealth_log.txt log file reports that the /proc/driver/folders are empty.

Example from a DGX-1

2020-09-01 20:03:05,204 INFO: Found empty path glob
"/proc/driver/nvidia/*/gpus/*/information"
2020-09-01 20:03:06,206 INFO: Found empty path glob
"/proc/driver/nvidia/*/registry"
2020-09-01 20:03:09,742 INFO: Found empty path glob
"/proc/driver/nvidia/*/params"
2020-09-01 20:03:10,743 INFO: Found empty path glob
"/proc/driver/nvidia/*/registry"
2020-09-01 20:03:11,745 INFO: Found empty path glob
"/proc/driver/nvidia/*/version"
2020-09-01 20:03:12,747 INFO: Found empty path glob
"/proc/driver/nvidia/*/warnings/*"
Explanation

This is an erroneous message as the folder content is actually loaded during the software installation. The message can be ignored. This will be resolved in a future NVSM release.

NVSM Reports "Unknown" for Number of logical CPU cores on non-English system

Issue

On systems set up for a non-English locale, the nvsm show health command lists the number of logical CPU cores as Unknown.

Number of logical CPU cores [None] Unknown

Resolution

This issue will be resolved in a later version of the DGX OS software.

InfiniBand Bandwidth Drops for KVM Guest VMs

Issue

The InfiniBand bandwidth when running on multi-GPU guest VMs is lower than when running on bare metal.

Explanation

Currently, performance when using GPUDirect within a guest VM will be lower than when used on a bare-metal system.

Known DGX-2 System Issues

The following are known issues specific to the DGX-2 server.

DGX KVM: nvidia-vm health-check May Fail

Issue

When running nvidia-vm health-check to check the health of specific GPUs used by the DGX KVM guest VM, the command may fail.

Example:

$ sudo nvidia-vm health-check --gpu-count 1 --gpu-index 0 --fulltest run
ERROR: Unexpected response from blacklist "connection"
ERROR: Unexpected response from blacklist "to"
ERROR: Unexpected response from blacklist "the"
ERROR: Unexpected response from blacklist "host"
ERROR: Unexpected response from blacklist "engine"
ERROR: Unexpected response from blacklist "is"
ERROR: Unexpected response from blacklist "not"
ERROR: Unexpected response from blacklist "valid"
ERROR: Unexpected response from blacklist "any"
ERROR: Unexpected response from blacklist "longer"
ERROR: No healthy/unhealthy data returned from blacklist command
Explanation and Resolution

This occurs because the health-check VM is created from an image based on the DGX OS ISO, which uses the R418 driver package, but the host was updated to the R450 driver package. The two packages use different DCGM releases which cannot communicate with each other, resulting in the error.

Issue

If the GPU PCIe link is downgraded to Gen1, NVSM still reports the GPU health status as OK.

Explanation and Resolution

The NVSM software currently does not check for this condition. The check will be added in a future software release.

Known DGX-1 System Issues

The following are known issues specific to the DGX-1 server.

nvidia-nvswitch Version Mismatch Message Appears when Running DCGM

Issue

When starting the DCGM service, a version mismatch error message similar to the following will appear:

[78075.772392] nvidia-nvswitch: Version mismatch, kernel version 450.80.02 user version 450.51.06
Explanation

This occurs with GPU driver versions later than 450.51.06. The version check occurs on all DGX systems, but applies only to NVSwitch systems, so the message can be ignored on non-NVSwitch systems such as the DGX Station or DGX-1.

Forced Reboot Hangs the OS

Issue

When issuing reboot -f (forced reboot), I/O error messages appear on the console and then the system hangs.

The system reboots normally when issuing reboot.

Resolution

This issue will be resolved in a future version of the DGX OS server.

Known Issues Related to Ubuntu / Linux Kernel

The following are known issues related to the Ubuntu OS or the Linux kernel that affect the DGX server.

System May Slow Down When Using mpirun

Issue

Customers running Message Passing Interface (MPI) workloads may experience the OS becoming very slow to respond. When this occurs, a log message similar to the following would appear in the kernel log:

kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899!
Explanation

Due to the current design of the Linux kernel, the condition may be triggered when get_user_pages is used on a file that is on persistent storage. For example, this can happen when cudaHostRegister is used on a file path that is stored in an ext4 filesystem. DGX systems implement /tmp on a persistent ext4 filesystem.

Workaround

Note: If you performed this workaround on a previous DGX OS software version, you do not need to do it again after updating to the latest DGX OS version.

In order to avoid using persistent storage, MPI can be configured to use shared memory at /dev/shm (this is a temporary filesystem).

If you are using Open MPI, then you can solve the issue by configuring the Modular Component Architecture (MCA) parameters so that mpirun uses the temporary file system in memory.

For details on how to accomplish this, see the Knowledge Base Article DGX System Slows Down When Using mpirun (requires login to the NVIDIA Enterprise Support portal).

Known Limitations

This section list known limitations and other issues that will not be fixed.

[DGX-2] srp_daemon Causes NVIDIA KVM Update Failure

Issue

When performing an over-the-network update on the NVIDIA KVM, the update fails with a “Package mlnx-ofed-all is not configured yet” message.

The issue does not occur if you have installed the DGX OS from the ISO.

Explanation

This issue is the result of the srp_daemon within the Mellanox driver. The daemon is used to discover and connect to InfiniBand SCSI RDMA Protocol (SRP) targets.

If you are not using RDMA, then disable the srp_daemon as follows.

sudo systemctl disable srp_daemon.service
sudo systemctl disable srptools.service

[DGX-2] Hot-plugging of Storage NVMe Drives is not Supported

Issue

Hot-plugging or hot-swapping one of the storage non-volatile memory express (NVMe) drive might result in system instability or incorrect device reporting.

Workaround and Resolution

Turn off the system before removing and replacing any of the storage NVMe drives.

[DGX-2] Serial Over LAN Does not Work After Cold Resetting the BMC

Issue

After performing a cold reset on the BMC (ipmitool mc reset cold) while serial over LAN (SOL) is active, you cannot restart a SOL session.

Workaround

To re-active SOL, either:

c) Identify the Process ID of the SOL TTY process by running the following.

ps -ef | grep "/sbin/agetty -o -p \u --keep-baud 115200,38400,9600 ttyS0 vt220"
kill <PID>
where <PID> is the Process ID returned by the previous command.

e) Either wait for the cron job to respawn the process or manually restart the process by running

/sbin/agetty -o -p \u --keep-baud 115200,38400,9600 ttyS0 vt220
Issue

On the BMC dashboard, the following Quick Links appear by mistake and should not be used.

[DGX-2] Applications Cannot be Run Immediately Upon Powering on the DGX-2

Issue

When attempting to run an application that uses the GPUs immediately upon powering on the DGX-2 system, you may encounter the following error.

CUDA_ERROR_SYSTEM_NOT_READY
Explanation and Workaround

The DGX-2 uses a fabric manager service to manage communication between all the GPUs in the system. When the DGX-2 system is powered on, the fabric manager initializes all the GPUs. This can take approximately 45 seconds. Until the GPUs are initialized, applications that attempt to use them will fail.

If you encounter the error, wait and launch the application again.

[DGX-2] PKCS Errors Appear When the System Boots

Issue

When the DGX system boots, “PKCS#7 signature not signed with a trusts key" messages appear on the console and system logs.

Explanation

DGX OS Server installs Ubuntu 18.04, which checks all kernel modules for signatures even though Secure Boot is not enabled. Since the NVIDIA drivers are not part of the Ubuntu kernel, the drivers will be flagged with the message when the system boots. This does not affect the system nor indicate a problem with system software.

[DGX-2 KVM] Logfile Setup Error When Creating a VM

Issue

The following error may appear while creating a VM:

..Error setting up logfile: No write access to directory
/home/$USER/.cache/virt-manager
Workaround

To avoid the error, remove the /home/$USER/.cache/virt-manager directory after installing KVM packages or before running the first nvidia-vm command.

[DGX-2 KVM] nvidia-vm vmshow Command Does Not Work for Running VMs

Issue

When running nvidia-vm vmshow, the information for running guest VMs is reported as "Unknown".

[DGX-1] Script Cannot Recreate RAID Array After Re-inserting a Known Good SSD

Issue

When a good SSD is removed from the DGX-1 RAID 0 array and then re-inserted, the script to recreate the array fails.

Explanation and Workaround

After re-inserting the SSD back into the system, the RAID controller sets the array to offline and marks the re-inserted SSD as Unconfigured_Bad (UBad). The script will fail when attempting to rebuild an array when one or more of the SSDs are marked Ubad.

To recreate the array in this case,

  1. Set the drive back to a good state.
    # sudo /opt/MegaRAID/storcli/storcli /c0/e<enclosure_id>/s<drive_slot> set good
  2. Run the script to recreate the array.
    # sudo /usr/bin/configure_raid_array.py -c -f

Appendix A. Third Party License Notice

This NVIDIA product contains third party software that is being made available to you under their respective open source software licenses. Some of those licenses also require specific legal information to be included in the product. This section provides such information.

msecli

The msecli utility (https://www.micron.com/products/solid-state-storage/storage-executive-software) is provided under the following terms:

Micron Technology, Inc. Software License Agreement

PLEASE READ THIS LICENSE AGREEMENT ("AGREEMENT") FROM MICRON TECHNOLOGY, INC. ("MTI") CAREFULLY: BY INSTALLING, COPYING OR OTHERWISE USING THIS SOFTWARE AND ANY RELATED PRINTED MATERIALS ("SOFTWARE"), YOU ARE ACCEPTING AND AGREEING TO THE TERMS OF THIS AGREEMENT. IF YOU DO NOT AGREE WITH THE TERMS OF THIS AGREEMENT, DO NOT INSTALL THE SOFTWARE.

LICENSE: MTI hereby grants to you the following rights: You may use and make one

  1. backup copy the Software subject to the terms of this Agreement.

You must maintain all copyright notices on all copies of the Software.

You agree not to modify, adapt, decompile, reverse engineer, disassemble, or otherwise translate the Software. MTI may make changes to the Software at any time without notice to you.

In addition MTI is under no obligation whatsoever to update, maintain, or provide new versions or other support for the Software.

OWNERSHIP OF MATERIALS: You acknowledge and agree that the Software is proprietary property of MTI (and/or its licensors) and is protected by United States copyright law and international treaty provisions. Except as expressly provided herein, MTI does not grant any express or implied right to you under any patents, copyrights, trademarks, or trade secret information. You further acknowledge and agree that all right, title, and interest in and to the Software, including associated proprietary rights, are and shall remain with MTI (and/or its licensors). This Agreement does not convey to you an interest in or to the Software, but only a limited right to use and copy the Software in accordance with the terms of this Agreement. The Software is licensed to you and not sold.

DISCLAIMER OF WARRANTY: THE SOFTWARE IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. MTI EXPRESSLY DISCLAIMS ALL WARRANTIES EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, NONINFRINGEMENT OF THIRD PARTY RIGHTS, AND ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. MTI DOES NOT WARRANT THAT THE SOFTWARE WILL MEET YOUR REQUIREMENTS, OR THAT THE OPERATION OF THE SOFTWARE WILL BE UNINTERRUPTED OR ERROR-FREE. FURTHERMORE, MTI DOES NOT MAKE ANY REPRESENTATIONS REGARDING THE USE OR THE RESULTS OF THE USE OF THE SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, RELIABILITY, OR OTHERWISE. THE ENTIRE RISK ARISING OUT OF USE OR PERFORMANCE OF THE SOFTWARE REMAINS WITH YOU. IN NO EVENT SHALL MTI, ITS AFFILIATED COMPANIES OR THEIR SUPPLIERS BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, INCIDENTAL, OR SPECIAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, OR LOSS OF INFORMATION) ARISING OUT OF YOUR USE OF OR INABILITY TO USE THE SOFTWARE, EVEN IF MTI HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Because some jurisdictions prohibit the exclusion or limitation of liability for consequential or incidental damages, the above limitation may not apply to you.

TERMINATION OF THIS LICENSE: MTI may terminate this license at any time if you are in breach of any of the terms of this Agreement. Upon termination, you will immediately destroy all copies the Software.

GENERAL: This Agreement constitutes the entire agreement between MTI and you regarding the subject matter hereof and supersedes all previous oral or written communications between the parties. This Agreement shall be governed by the laws of the State of Idaho without regard to its conflict of laws rules.

CONTACT: If you have any questions about the terms of this Agreement, please contact MTI's legal department at (208) 368-4500.

By proceeding with the installation of the Software, you agree to the terms of this Agreement. You must agree to the terms in order to install and use the Software.

Mellanox (OFED)

MLNX OFED (http://www.mellanox.com/) is provided under the following terms:

Copyright (c) 2006 Mellanox Technologies. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.

IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

PDF preview unavailable. Download the PDF instead.

DGX-OS-server-4.9-relnotes-update-guide Windows User Adobe PDF Library 21.5.90

Related Documents

Preview NVIDIA System Management User Guide
This guide provides comprehensive information on using NVIDIA System Management (NVSM) for monitoring NVIDIA DGX™ nodes in a data center. It covers system health monitoring, alerts, log generation, and command-line interface (CLI) usage for system administrators.
Preview NVIDIA DGX SuperPOD Deployment Guide
This document provides detailed instructions for deploying NVIDIA Base Command Manager on NVIDIA DGX SuperPOD configurations, covering initial cluster setup, head node configuration, and high availability setup.
Preview NVIDIA DGX B300 Datasheet: AI Factory Performance
Explore the NVIDIA DGX B300, a powerful AI infrastructure solution designed for AI factory performance, from training to inference. Learn about its key features, specifications, and how it enables enterprises to scale AI operations.
Preview NVIDIA DGX B200 Firmware Update Guide
This guide provides comprehensive instructions for updating the firmware of the NVIDIA DGX B200 system. It covers firmware update prerequisites, methods, steps, and troubleshooting for various components including BMC, SBIOS, BIOS, CPLDs, NVMe, Power Supply Units, PCIe Switches, PCIe Retimers, ConnectX-7, Intel NIC, and GPU tray components. The document also details the nvfwupd command-line utility and its syntax.
Preview Red Hat OpenShift on DGX User Guide
A user guide for installing and configuring Red Hat OpenShift 4 with Red Hat CoreOS on DGX worker nodes, including information on the NVIDIA GPU Operator and NVSM.
Preview NVIDIA DGX SuperPOD: Next-Generation AI Infrastructure Reference Architecture
This document outlines the reference architecture for the NVIDIA DGX SuperPOD, a scalable infrastructure designed for AI leadership. It details the key components, network fabrics, storage architecture, and software stack, including NVIDIA DGX GB200 systems, InfiniBand, NVLink, and Mission Control software, to power next-generation AI factories.
Preview NVIDIA Jetson Linux 35.1 GA Release Notes
This document provides release notes for NVIDIA Jetson Linux version 35.1 GA, detailing new features, known issues, and fixed issues for Jetson platforms.
Preview NVIDIA Data Center GPU Driver 450.119.03/452.96 Release Notes
This document provides release notes for the NVIDIA Data Center GPU Driver version 450.119.03 for Linux and 452.96 for Windows, detailing version highlights, fixed issues, known issues, and hardware/software support.