MT25408 ConnectX® Firmware fw-25408 Release Notes
[Mellanox Technologies Logo]
Revision: 2.7.000
1 Overview
These are the release notes for the ConnectX® and ConnectX® EN adapters firmware, fw-25408 Rev 2.7.000. This firmware supports the following protocols:
- InfiniBand
- Ethernet
- Fibre Channel over Ethernet (FCoE)
- Virtual Protocol Interconnect (VPI) – this capability enables ConnectX and ConnectX-2 devices to support the InfiniBand, Ethernet and DCE network standards, including auto-sensing of the network protocol to which each device port is connected.
This firmware supports the devices and protocols listed in Table 1. For the most updated list of adapter cards supported, visit the firmware download pages via www.mellanox.com.
Note: After burning new firmware to an adapter card, reboot the machine so that the new firmware can take effect. If you do not reboot, you will get an error in the RUN_FW command.
Table 1: PCI Device ID
PCI Device ID (Decimal) | Device Part Number | Device Name | Supported Protocols |
---|---|---|---|
25408 | MT25408A0-FCC-SI | ConnectX, Dual Port 10Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 2.5GT/s Interface | InfiniBand, Ethernet, FCoE, VPI |
25418 | MT25408A0-FCC-DI | ConnectX, Dual Port 20Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 2.5GT/s Interface | |
26418 | MT25408A0-FCC-GI | ConnectX, Dual Port 20Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 5.0GT/s Interface | |
26428 | MT25408A0-FCC-QI | ConnectX, Dual Port 40Gb/s InfiniBand / 10GigE Adapter IC with PCIe 2.0 x8 5.0GT/s Interface | |
25448 | MT25448A0-FCC-SE | ConnectX EN, Dual Port 10GigE Adapter IC with PCIe 2.0 x8 2.5GT/s Interface | Ethernet |
26448 | MT26448A0-FCC-TE | ConnectX EN, Dual Port 10GigE Adapter IC with PCIe 2.0 x8 2.5GT/s Interface |
The document consists of the following sections:
- "Revision Compatibility" (page 4)
- "Changes & Major New Features" (page 4)
- "Bug Fixes" (page 4)
- "Known Issues" (page 5)
- “Creating a Device Configuration (.ini) File" (page 6)
- "History of Fixed Issues" (page 7)
2 Revision Compatibility
Firmware fw-25408 Rev 2.7.000 complies with the following programmer's reference manual:
- ConnectX Programmer's Reference Manual (PRM), Rev 0.39 or later, which has Command Interface Revision 0x3. The command interface revision can be retrieved by means of the QUERY_FW command and is indicated by the field cmd_interface_rev.
3 Changes & Major New Features
3.1 Changes From Rev 2.6.900
- Support for extended port counters
- QUERY_DEV_CAP.local_ack_timeout is now taken from the .ini file
- Support for Ethernet loopback
- Support for ModStatCfg opcode modifiers 2,3 (query)
- Support for ModStatCfg.mac
- Bug fixes
3.2 Changes of Rev 2.6.900 From Rev 2.6.000
- Added the field QUERY_PORT.port_type
- Bug fixes
4 Bug Fixes
The following table describes known issues from previous releases of ConnectX® firmware which were fixed in this firmware release.
Table 2: Bug Fixes
Issue | Description | Discovered in | Fixed in |
---|---|---|---|
1. InfiniBand auto-negotiation issues | Fixed | 2.6.900 | 2.7.000 |
5 Known Issues
The following table describes known issues in this firmware release and possible workarounds.
Table 3: Known Issues
Index | Issue | Description | Current Implemented Workaround in FW | Possible Workaround | Scheduled Release (fix) |
---|---|---|---|---|---|
1. | UAR Bar is too small for 64k-page machines | The small BAR causes driver loading to fail | N/A | Change the "log2_uar_bar_megabytes" .ini parameter under the [HCA] section as follows: log2_uar_bar_megabytes = 5 | N/A |
2. | Change of memory bars on a disabled system | Changing memory bars size / addresses between SYS_DIS and SYS_EN may cause the device to hang (ID: 24206) | N/A | N/A | N/A |
3. | BAR resizing on an enabled system | Changing bar sizes when a system is enabled may cause the device to hang (ID: 24208) | N/A | N/A | N/A |
4. | Ethernet only: Must query all capabilities upon boot | If not all capabilities are queried upon boot, then the query command may fail. See the QUERY_CAP command in ConnectX EN Programmer's Reference Manual | N/A | Query all capabilities upon boot | N/A |
5. | Disrupting QDR negotiation may lead to port rising as SDR | Disconnecting an IB cable (or closing the port) during QDR negotiation and then reconnecting (or reopening) may cause the adapter to bring up the port at SDR | N/A | Disconnect the cable (or close the port) again and then reconnect (reopen). To avoid this scenario, wait for QDR negotiation to finish prior to disconnecting the cable (or closing the port) and reconnecting (or reopening). The following are two possible methods to verify QDR negotiation is complete: a. The physical (green) LED is on. b. A query of LinkPhyState using a GetPortInfo MAD indicates LinkUp. | N/A |
6. | Ethernet link issues with back-to-back setting on 5m Twinax cables | ||||
7. | InfiniBand Static rate is not supported |
6 Creating a Device Configuration (.ini) File
Mellanox firmware burning tools enable setting and/or changing configuration variables by the use of an optional configuration (.ini) file. This is needed in case the default values of some variables do not suit a user's specific system requirements. This section describes how to create this configuration file.
To begin with, the .ini file is a text file composed of one or several configuration sections (see Section 6.1 for the format and/or an example). It is recommended to include, under the appropriate sections, only those variables that need to be changed.
A firmware release includes a reference file called fw-25218-defaults.ref. This file contains the list of all variables which can be configured by a configuration (.ini) file. For each variable the reference file includes a short explanation, the [
To create the .ini file, simply copy the lines with the variables you wish to set, paste them under their appropriate [
6.1 Configuration (.ini) File Format
The .ini file is composed of one or more sections with variable settings. Each section in the file starts with its name between square brackets, e.g. [ADAPTER], [HCA], [IB], etc. The section name is followed by one or more lines of configuration settings and comments, as in the .ini file example shown below. Note that comment lines start with a semicolon.
Excerpt from fw-25408-defaults.ref:
;;;;; VPD support can be Disabled/Enabled
;;;;; Under [ADAPTER] section
;;;;; Boolean parameter. Possible values: true, false
vpd_enable = true
Example of a .ini file:
;Begin of .ini file
[ADAPTER]
vpd_enable = false
;This is a comment line
;End of .ini file
7 History of Fixed Issues
Table 4: History of Bug Fixes (Sheet 1 of 5)
Issue | Description | Discovered in | Fixed in |
---|---|---|---|
1. MOD_STAT_CFG.port_en command does not apply to Ethernet | Fixed | 2.6.000 | 2.6.900 |
2. Mellanox auto-negotiation is not optimized for environments with significant crosstalk | Fixed | 2.6.000 | 2.6.900 |
3. CQ/EQ doorbell is lost in Configuration cycles | Fixed | 2.6.000 | 2.6.900 |
4. Congestion issues | Fixed | 2.6.000 | 2.6.900 |
5. SW2HW_MPT command may fail when MPT is in FREE state (used for FRWR) | Fixed | 2.6.000 | 2.6.900 |
6. Adapter may get stuck upon multicast traffic stress | Fixed | 2.6.000 | 2.6.900 |
7. IB Spec Release 1.2 link data-rate issues | Fixed | 2.6.000 | 2.6.900 |
8. Adapter may get stuck if QPC extended auxiliary table base address is not aligned to 1MB | Fixed | 2.6.000 | 2.6.900 |
9. Port mod_stat_cfg.disable does not take effect if the device is configured as an Ethernet device | Fixed | 2.6.000 | 2.6.300 |
10. HCA might hang when entering PCIe L1 mode | Fixed | 2.6.000 | 2.6.200 |
11. Link may come up as SDR instead of DDR on some systems | Fixed | 2.5.000 | 2.6.000 |
12. Possible live lock in QP upon retransmission stress | Fixed (ID: 49870, 52066) | 2.5.000 | 2.6.000 |
13. Wrong link state reported during link speed negotiation | Fixed (ID: 49951) | 2.5.000 | 2.6.000 |
14. CQs may be generated after CQ overrun error | Fixed (ID: 49982) | 2.5.000 | 2.6.000 |
15. Slow handling of configuration cycles | Fixed (ID: 49807) | 2.5.000 | 2.6.000 |
16. Wrong fields in CQE-w-Error on XRC QP | Fixed (ID: 49742) | 2.5.000 | 2.6.000 |
17. Wrong handling of Remote Invalidate Error | Fixed | 2.5.000 | 2.6.000 |
18. Multiple RNR Nack may cause slowdown | Fixed (ID: 49559) | 2.5.000 | 2.6.000 |
Table 4: History of Bug Fixes (Sheet 2 of 5)
Issue | Description | Discovered in | Fixed in |
---|---|---|---|
19. QUERY_DEV_CAP.apm bit was fixed at 0 even though APM was active | Fixed (ID: 49548) | 2.5.000 | 2.6.000 |
20. PCIe physical errors upon entering L1 state | Fixed (ID: 52025) | 2.5.000 | 2.6.000 |
21. PCI_CFG.interrupt_disable has no impact | Fixed (ID: 53350) | 2.5.000 | 2.6.000 |
22. Non-default setting of VLCap via .ini does not take effect | Fixed | 2.5.000 | 2.6.000 |
23. SET_PORT may lead to non-optimal RX buffer reallocation if opvl was less than vlcap | Fixed | 2.5.000 | 2.6.000 |
24. Modified PLL parameter settings in .ini | Some PLL parameter settings were changed to allow a longer period for PLL stabilization | 2.5.000 | 2.6.000 |
25. Adapter may generate PCIe transactions with wrong function ID | Fixed | 2.5.000 | 2.6.000 |
26. Adapter may generate PCIe ERR_NON_FATAL in case of an unsupported request | Fixed | 2.5.000 | 2.6.000 |
27. Ethernet only: Different VLAN priorities for WQE and QP may cause wrong SchedQ allocation | Fixed | 2.5.000 | 2.6.000 |
28. Wrong VLAN Priority in PPP mode in PPP mode (ID: 49533) | The ConnectX device may send pause frames for the wrong priority | 2.5.000 | 2.5.900 |
29. Wrong aliasing in address steering mode | This issues applies only to the VMware® Infrastructure 3 v3.5 operating system in netq mode | 2.5.000 | 2.5.900 |
30. Possible packet dropping though the pause policy is set | Fixed | 2.5.000 | 2.5.900 |
31. Wrong PCI Class Code for Ethernet Network Controller | This issue applies to Ethernet devices only. Fixed. | 2.5.000 | 2.5.900 |
32. Bringing up/down of an adapter port may lead to PHY errors on the second adapter port | Applies only to dual-port adapter cards. Fixed (ID: 51356) | 2.5.000 | 2.5.900 |
33. PPP does not work for an MTU of 9600 | Fixed | 2.5.000 | 2.5.900 |
34. PCI Express compliancy issues | • Fixed L1 and L0s power states compliancy issues • Fixed PCIE-CV test completion_timeout failure • Fixed interoperability issue with all available PCIe Gen. 2.0 servers (Ref. ID: 43852) |
2.3.000 | 2.5.000 |
35. INTA may be lost under stress | Fixed. (Ref. ID: 44473) | 2.3.000 | 2.5.000 |
36. Modifying SRQ number using RTS2RTS | Modifying SRQ number using RTS2RTS does not guarantee that no new CQEs will be generated using the old SRQ number. Fixed. | 2.3.000 | 2.5.000 |
Table 4: History of Bug Fixes (Sheet 3 of 5)
Issue | Description | Discovered in | Fixed in |
---|---|---|---|
37. QP may get stuck upon Responder Gather Error | Fixed. | 2.3.000 | 2.5.000 |
38. Wrong handling of SL mismatch between WQE and MLX QP | An SL mismatch between WQE and MLX QP may cause the QP to get stuck. Fixed. | 2.3.000 | 2.5.000 |
39. UC QP CQE with Error causes corruption | Fixed. | 2.3.000 | 2.5.000 |
40. Query_CQ/Query_EQ commands may return the old consumer_index | Fixed. | 2.3.000 | 2.5.000 |
41. CQ error may cause corruption | A CQ error such as an overrun may cause CQ corruption, leading to a wrong CQ number in the CQ error event or to an internal FW error. Fixed. | 2.3.000 | 2.5.000 |
42. Possible FW internal error on a very noisy link | Fixed. (Ref. ID:41526) | 2.3.000 | 2.5.000 |
43. QueryDebugMSG command returns wrong status | Fixed. (Ref. ID: 44744) | 2.3.000 | 2.5.000 |
44. Dropping a ReadResponse packet may lead to 'retry exceeded' | Fixed. | 2.3.000 | 2.5.000 |
45. CQ moderation parameters are wrongly configured | Fixed. (Ref. ID: 45570) | 2.3.000 | 2.5.000 |
46. False generation of CQE with error (vendor code 0x6f) upon large stress | Fixed. (Ref. ID: 45317) | 2.3.000 | 2.5.000 |
47. Bandwidth degradation if SetPort command is not called | Fixed. | 2.3.000 | 2.5.000 |
48. SQERR2RTS command followed by an error causes QP to be unfunctional | Fixed. (Ref. ID: 45828 45848) | 2.3.000 | 2.5.000 |
49. QUERY_FW fails after RUN_FW | The command QUERY_FW fails after running the RUN_FW command | 2.2.000 | 2.3.000 |
50. HCA stall | The HCA might stall in any of the following scenarios: • If running the command SET_DEBUG_MESSAGE (ID:42128) • Under large stress (ID: 43385, 43378) • Upon closing a large number of QPs (ID: 43697) • If the WQE SL is different than the QP Context SL in a UD QP (ID: 41423) • Upon multiple retransmissions |
2.2.000 | 2.3.000 |
51. QUERY_QP errors | Wrong QUERY_QP command in the following cases: • Returns wrong values (ID: 42078, 40707) • Enters the error state erroneously (ID: 43110) |
2.2.000 | 2.3.000 |
52. IB & PCI Express links quality | General improvements | 2.2.000 | 2.3.000 |
53. Incomplete support for PCI Express 2.0 configuration header | Fixed | 2.2.000 | 2.3.000 |
54. Wrong trap generation rate | The HCA might exceed the maximum trap generation rate upon processing different trap types | 2.2.000 | 2.3.000 |
Table 4: History of Bug Fixes (Sheet 4 of 5)
Issue | Description | Discovered in | Fixed in |
---|---|---|---|
55. Client Reregister event not generated | The HCA might fail to generate a Client Reregister event under large stress. (ID: 42232) | 2.2.000 | 2.3.000 |
56. Possible ICM corruption | Possible ICM (Interconnect Context Memory) corruption upon large stress (ID: 42529) | 2.2.000 | 2.3.000 |
57. Performance | HCA performance improvements for the following cases: • Upon receiving multiple ACK packets • Upon multiple QPs in error state (ID:43377) • Upon multiple RNR NACKs |
2.2.000 | 2.3.000 |
58. Wrong wqe_index in Receive CQE with Error | This can occur when running stress IPOIB CM tests. (ID: 43076) | 2.2.000 | 2.3.000 |
59. Possible multicast corruption | Fixed (ID: 43301) | 2.2.000 | 2.3.000 |
60. Wrong limit on number of supported EQ UARs | The HCA now supports the requested number of EQ UARs specified in INIT_HCA | 2.2.000 | 2.3.000 |
61. SchedQueue corruption | Fixed (ID: 43289) | 2.2.000 | 2.3.000 |
62. Wrong SL2VL mapping upon set_sl2vl | Fixed | 2.2.000 | 2.3.000 |
63. False MAD packet drops | The HCA might drop MAD packet erroneously under large stress | 2.2.000 | 2.3.000 |
64. PCI Express 2.0 x1 link fails to rise | Fixed | 2.2.000 | 2.3.000 |
65. Command timeouts | The HCA times out commands while closing multiple QPs | 2.2.000 | 2.3.000 |
66. False internal error generation | Fixed | 2.2.000 | 2.3.000 |
67. Transport timeouts | Multiple RNR NACKs may lead to transport timeouts (ID: 44160) | 2.2.000 | 2.3.000 |
68. Opcode/Input Modifier verification | Command Opcode/Input Modifier values are now checked for correctness. If a wrong value is provided, the command status indicates the error. | 2.2.000 | 2.3.000 |
69. Wrong sl and/or port number returned | The QUERY_QP command may return a wrong sl value and/or a wrong port number (ID: 40707) | 2.1.000 | 2.2.000 |
70. HCA stall | The HCA might stall upon stress involving RNR Nacks and RDMA reads (ID: 41918) | 2.1.000 | 2.2.000 |
71. QP corruption | QP corruption may occur following a CQ_overrun | 2.1.000 | 2.2.000 |
72. Sched Queue corruption | Sched Queue corruption may occur upon multiple re-transmissions | 2.1.000 | 2.2.000 |
73. False SRQ WQE limit event | A false SRQ WQE limit event is generated due to a race condition | 2.0.164 | 2.1.000 |
74. Wrong Dt value returned | The QUERY_FW command may return a wrong Dt value | 2.0.164 | 2.1.000 |
75. HCA hangs | The device hangs in one of the following cases: • upon retry - due to local_ack_timeout • upon retry - due to RNR Nack • upon ringing a CQ doorbell for an invalid QP • upon stress conditions (IDs: 41543,732/6,755,778) |
2.0.164 | 2.1.000 |
Table 4: History of Bug Fixes (Sheet 5 of 5)
Issue | Description | Discovered in | Fixed in |
---|---|---|---|
76. High ACK latency | Delays in ACK may cause multiple local ACK timeouts | 2.0.164 | 2.1.000 |
77. HCA performance | HCA performance may be impacted in the following conditions: • QPs in error state • Slow QP context handling |
2.0.164 | 2.1.000 |
78. IB link stability issues | 2.0.164 | 2.1.000 | |
79. High QP closing duration | Closing QPs with outstanding posted WQs may take a long time due to slow CQE with error generation | 2.0.164 | 2.1.000 |