intel-LOGO

intel Native Loopback Accelerator Functional Unit (AFU)

intel-Native-Loopback-Accelerator-Functional-Unit-(AFU)-PRO

About this Document

Conventions
Table 1. Document Conventions

ConventionDescription
#Precedes a command that indicates the command is to be entered as root.
$Indicates a command is to be entered as a user.
This fontFilenames, commands, and keywords are printed in this font. Long command lines are printed in this font. Although long command lines may wrap to the next line, the return is not part of the command; do not press enter.
<variable_name>Indicates the placeholder text that appears between the angle brackets must be replaced with an appropriate value. Do not enter the angle brackets.

Acronyms
Table 2. Acronyms

AcronymsExpansionDescription
AFAccelerator FunctionCompiled Hardware Accelerator image implemented in FPGA logic that accelerates an application.
AFUAccelerator Functional UnitHardware Accelerator implemented in FPGA logic which offloads a computational operation for an application from the CPU to improve performance.
APIApplication Programming InterfaceA set of subroutine definitions, protocols, and tools for building software applications.
ASEAFU Simulation EnvironmentCo-simulation environment that allows you to use the same host application and AF in a simulation environment. ASE is part of the Intel® Acceleration Stack for FPGAs.
CCI-PCore Cache InterfaceCCI-P is the standard interface AFUs use to communicate with the host.
CLCache Line64-byte cache line
DFHDevice Feature HeaderCreates a linked list of feature headers to provide an extensible way of adding features.
FIMFPGA Interface ManagerThe FPGA hardware containing the FPGA Interface Unit (FIU) and external interfaces for memory, networking, etc.

The Accelerator Function (AF) interfaces with the FIM at run time.

FIUFPGA Interface UnitFIU is a platform interface layer that acts as a bridge between platform interfaces like PCIe*, UPI and AFU-side interfaces such as CCI-P.
continued…

Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current specifications in accordance with Intel’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. *Other names and brands may be claimed as the property of others.

AcronymsExpansionDescription
MPFMemory Properties FactoryThe MPF is a Basic Building Block (BBB) that AFUs can use to provide CCI-P traffic shaping operations for transactions with the FIU.
MsgMessageMessage – a control notification
NLBNative LoopbackThe NLB performs reads and writes to the CCI-P link to test connectivity and throughput.
RdLine_IRead Line InvalidMemory Read Request, with FPGA cache hint set to invalid. The line is not cached in the FPGA, but may cause FPGA cache pollution.

Note: The cache tag tracks the request status for all outstanding requests on Intel Ultra Path Interconnect (Intel UPI).

Therefore, even though RdLine_I is marked invalid upon completion, it consumes the cache tag temporarily to track the request status over UPI. This action may result in the eviction of a cache line, resulting in cache pollution. The advantage of using RdLine_I is that it is not tracked by CPU directory; thus it prevents snooping from CPU.

RdLine-SRead Line SharedMemory read request with FPGA cache hint set to shared. An attempt is made to keep it in the FPGA cache in a shared state.
WrLine_IWrite Line InvalidMemory Write Request, with FPGA cache hint set to Invalid. The FIU writes the data with no intention of keeping the data in FPGA cache.
WrLine_MWrite Line ModifiedMemory Write Request, with the FPGA cache hint set to Modified. The FIU writes the data and leaves it in the FPGA cache in a modified state.

Acceleration Glossary
Table 3. Acceleration Stack for Intel Xeon® CPU with FPGAs Glossary

TermAbbreviationDescription
Intel Acceleration Stack for Intel Xeon® CPU with FPGAsAcceleration StackA collection of software, firmware, and tools that provides performance- optimized connectivity between an Intel FPGA and an Intel Xeon processor.
Intel FPGA Programmable Acceleration Card (Intel FPGA PAC)Intel FPGA PACPCIe FPGA accelerator card. Contains an FPGA Interface Manager (FIM) that pairs with an Intel Xeon processor over the PCIe bus.

The Native Loopback Accelerator Functional Unit (AFU)

Native Loopback (NLB) AFU Overview

  • The NLB sample AFUs comprise a set of Verilog and System Verilog files to test memory reads and writes, bandwidth, and latency.
  • This package includes three AFUs that you can build from the same RTL source. Your configuration of the RTL source code creates these AFUs.

The NLB Sample Accelerator Function (AF)
The $OPAE_PLATFORM_ROOT/hw/samples directory stores source code for the following NLB sample AFUs:

  • nlb_mode_0
  • nlb_mode_0_stp
  • nlb_mode_3

Note: The $DCP_LOC/hw/samples directory stores the NLB sample AFUs source code for the 1.0 release package.

To understand the NLB sample AFU source code structure and how to build it, refer to one of the following Quick Start Guides (depending on which Intel FPGA PAC you are using):

  • If you are using Intel PAC with Intel Arria® 10 GX FPGA, refer to the IntelProgrammable Acceleration Card with Intel Arria 10 GX FPGA.
  • If you are using Intel FPGA PAC D5005, refer to the Intel Acceleration Stack Quick Start Guide for Intel FPGA Programmable Acceleration Card D5005.

The release package provides the following three sample AFs:

  • NLB mode 0 AF: requires hello_fpga or fpgadiag utility to perform the lpbk1 test.
  • NLB mode 3 AF: requires fpgadiag utility to perform the trupt, read, and write tests.
  • NLB mode 0 stp AF: requires hello_fpga or fpgadiag utility to perform the lpbak1 test.
    Note: The nlb_mode_0_stp is the same AFU as nlb_mode_0 but with Signal Tap debug feature enabled.
    The fpgadiag and hello_fpga utilities help the appropriate AF to diagnose, test and report on the FPGA hardware.

Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current specifications in accordance with Intel’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. *Other names and brands may be claimed as the property of others.

Figure 1. Native Loopback (nlb_lpbk.sv) Top Level Wrapper

intel-Native-Loopback-Accelerator-Functional-Unit-(AFU)-1

Table 4. NLB Files

File NameDescription
nlb_lpbk.svTop-level wrapper for NLB that instantiates the requestor and arbiter.
arbiter.svInstantiates the test AF.
requestor.svAccepts requests from the arbiter and formats the requests according to the CCI-P specification. Also implements flow control.
nlb_csr.svImplements a 64-bit read/write Control and Status (CSR) registers. The registers support both 32- and 64-bit reads and writes.
nlb_gram_sdp.svImplements a generic dual-port RAM with one write port and one read port.

NLB is a reference implementation of an AFU compatible with the Intel Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual. NLB’s primary function is to validate host connectivity using different memory access patterns. NLB also measures bandwidth and read/write latency. The bandwidth test has the following options:

  • 100% read
  • 100% write
  • 50% read and 50% writes

Related Information

  • Intel Acceleration Stack Quick Start Guide for Intel Programmable Acceleration Card with Arria 10 GX FPGA
  • Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual
  • Intel Acceleration Stack Quick Start Guide for Intel FPGA Programmable Acceleration Card D5005

Native Loopback Control and Status Register Descriptions
Table 5. CSR Names, Addresses and Descriptions

 Byte Address (OPAE)Word Address (CCI-P) Access Name Width Description
0x00000x0000RODFH64AF Device Feature Header.
0x00080x0002ROAFU_ID_L64AF ID low.
0x00100x0004ROAFU_ID_H64AF ID high.
0x00180x0006RsvdCSR_DFH_RSVD064Mandatory Reserved 0.
0x00200x0008ROCSR_DFH_RSVD164Mandatory Reserved 1.
0x01000x0040RWCSR_SCRATCHPAD064Scratchpad register 0.
0x01080x0042RWCSR_SCRATCHPAD164Scratchpad register 2.
0x01100x0044RWCSR_AFU_DSM_BASE L32Lower 32-bits of AF DSM base address. The lower 6 bits are 4×00 because the address is aligned to the 64-byte cache line size.
0x01140x0045RWCSR_AFU_DSM_BASE H32Upper 32-bits of AF DSM base address.
0x01200x0048RWCSR_SRC_ADDR64Start physical address for source buffer. All read requests target this region.
0x01280x004ARWCSR_DST_ADDR64Start physical address for destination buffer. All write requests target this region
0x01300x004CRWCSR_NUM_LINES32Number of cache lines.
0x01380x004ERWCSR_CTL32Controls test flow, start, stop, force completion.
0x01400x0050RWCSR_CFG32Configures test parameters.
0x01480x0052RWCSR_INACT_THRESH32Inactivity threshold limit.
0x01500x0054RWCSR_INTERRUPT032SW allocates Interrupt APIC ID and Vector to device.
DSM Offset Map
0x00400x0010RODSM_STATUS32Test status and error register.

Table 6. CSR Bit Fields with Examples
This table lists the CSR bit fields that depend on the value of the CSR_NUM_LINES, <N>. In the example below <N> = 14.

NameBit FieldAccessDescription
CSR_SRC_ADDR[63:<N>]RW2^(N+6)MB aligned address points to the start of the read buffer.
[<N>-1:0]RW0x0.
CSR_DST_ADDR[63:<N>]RW2^(N+6)MB aligned address points to the start of the write buffer.
[<N>-1:0]RW0x0.
CSR_NUM_LINES[31:<N>]RW0x0.
continued…
NameBit FieldAccessDescription
 [<N>-1:0]RWNumber of cache lines to read or write. This threshold may be different for each test AF.

Note: Ensure that source and destination buffers are large enough to accommodate the <N> cache lines.

CSR_NUM_LINES should be less than or equal to <N>.

For the following values, assume <N>=14. Then, CSR_SRC_ADDR and CSR_DST_ADDR accept 2^20 (0x100000).
CSR_SRC_ADDR[31:14]RW1MB aligned address.
[13:0]RW0x0.
CSR_DST_ADDR[31:14]RW1MB aligned address.
[13:0]RW0x0.
CSR_NUM_LINES[31:14]RW0x0.
[13:0]RWNumber of cache lines to read or write. This threshold may be different for each test AF.

Note: Ensure that source and destination buffers are large enough to accommodate the <N> cache lines.

Table 7. Additional CSR Bit Fields

NameBit FieldAccessDescription
CSR_CTL[31:3]RWReserved.
[2]RWForce test completion. Writes test completion flag and other performance counters to csr_stat. After forcing test completion, the hardware state is identical to a non-forced test completion.
[1]RWStarts test execution.
[0]RWActive low test reset. When low, all configuration parameters change to their default values.
CSR_CFG[29]RWcr_interrupt_testmode tests interrupts. Generates an interrupt at the end of each test.
 [28]RWcr_interrupt_on_error sends an interrupt when upon error
   detection.
 [27:20]RWcr_test_cfg configures the behavior of each test mode.
 [13:12]RWcr_chsel selects the virtual channel.
 [10:9]RWcr_rdsel configures the read request type. The encodings have the
   following valid values:
   •    1’b00: RdLine_S
   •    2’b01: RdLine_I
   •    2’b11: Mixed mode
 [8]RWcr_delay_en enables random delay insertion between requests.
 [6:5]RWConfigures test mode,cr_multiCL-len. Valid values are 0,1,and 3.
 [4:2]RWcr_mode, configures test mode. The following values are valid:
   •    3’b000: LPBK1
   •    3’b001: Read
   •    3’b010: Write
   •    3’b011: TRPUT
continued…
NameBit FieldAccessDescription
   For more information about the test mode, refer to the Test Modes topic below.
[1]RWc_cont selects test rollover or test termination.

•    When 1’b0, the test terminates. Updates the status CSR when

CSR_NUM_LINES count is reached.

•    When 1’b1, the test rolls over to the start address after it reaches the CSR_NUM_LINES count. In rollover mode, the test terminates only upon error.

[0]RWcr_wrthru_en switches between WrLine_I and Wrline_M request types.

•    1’b0: WrLine_M

•    1’b1: WrLine_I

CSR_INACT_THRESHOLD[31:0]RWInactivity threshold limit. Detects the duration of stalls during a test run. Counts the number of consecutive idle cycles. If the inactivity count

> CSR_INACT_THRESHOLD, no requests are sent, no responses are

received, and the inact_timeout signal is set. Writing 1 to CSR_CTL[1] activates this counter.

CSR_INTERRUPT0[23:16]RWThe Interrupt Vector Number for the device.
[15:0]RWapic_id is the APIC OD for the device.
DSM_STATUS[511:256]ROError dump form Test Mode.
[255:224]ROEnd Overhead.
[223:192]ROStart Overhead.
[191:160]RONumber of Writes.
[159:128]RONumber of Reads.
[127:64]RONumber of Clocks.
[63:32]ROTest error register.
[31:16]ROCompare and exchange success counter.
[15:1]ROUnique ID for each DSM status write.
[0]ROTest completion flag.

Test Modes
CSR_CFG[4:2] configures the test mode. The following four tests are available:

  • LPBK1: This is a memory copy test. The AF copies CSR_NUM_LINES from the source buffer to the destination buffer. Upon test completion, the software compares the source and destination buffers.
  • Read: This test stresses the read path and measures read bandwidth or latency. The AF reads CSR_NUM_LINES starting from the CSR_SRC_ADDR. This is only a bandwidth or latency test. It does not verify the data read.
  • Write: This test stresses the write path and measures write bandwidth or latency. The AF reads CSR_NUM_LINES starting from the CSR_SRC_ADDR. This is only a bandwidth or latency test. It does not verify the data written.
  • TRPUT: This test combines the reads and writes. It reads CSR_NUM_LINES starting from CSR_SRC_ADDR location and writes CSR_NUM_LINES to CSR_SRC_ADDR. It also measures read and write bandwidth. This test does not check the data. The reads and writes have no dependencies

The following table shows the CSR_CFG encodings for the four tests. This table sets and CSR_NUM_LINES, <N>=14. You can change the number of cache lines by updating the CSR_NUM_LINES register.

Table 8. Test Modes

FPGA Diagnostics: fpgadiag
The fpgadiag utility includes several tests to diagnose, test, and report on the FPGA hardware. Use the fpgadiag utility to run all the test modes. For more information about using the fpgadiag utility, refer to the fpgadiag section in the Open Programmable Acceleration Engine (OPAE) Tools Guide.

NLB Mode0 Hello_FPGA Test Flow

  1. Software initializes Device Status Memory (DSM) to zero.
  2. Software writes the DSM BASE address to the AFU. CSR Write(DSM_BASE_H), CSRWrite(DSM_BASE_L)
  3. Software prepares source and destination memory buffer. This preparation is test specific.
  4. Software writes CSR_CTL[2:0]= 0x1. This write brings the test out of reset and into configuration mode. Configuration can proceed only when CSR_CTL[0]=1 & CSR_CTL[1]=1.
  5. Software configures the test parameters, such as src, destaddress, csr_cfg, num lines, and so on.
  6. Software CSR writes CSR_CTL[2:0]= 0x3. The AF begins test execution.
  7. Test completion:
    • Hardware completes when the test completes or detects an error. Upon completion, the hardware AF updates DSM_STATUS. Software polls DSM_STATUS[31:0]==1 to detect test completion.
    • Software can force test completion by writing CSR writes CSR_CTL[2:0]=0x7. Hardware AF updates DSM_STATUS.

Document Revision History for the Native Loopback Accelerator Functional Unit (AFU) User Guide

Document VersionIntel Acceleration Stack VersionChanges
 2019.08.052.0 (supported with Intel

Quartus Prime Pro Edition

18.1.2) and 1.2 (supported with

Intel Quartus Prime Pro Edition 17.1.1)

Added support for the Intel FPGA PAC D5005 platform in the current release.
 2018.12.041.2 (supported with Intel

Quartus® Prime Pro Edition 17.1.1)

Maintenance release.
  2018.08.061.1 (supported with Intel

Quartus Prime Pro Edition

17.1.1) and 1.0 (supported with

Intel Quartus Prime Pro Edition 17.0.0)

Updated the location of the source code for the NLB sample AFU in The NLB Sample Accelerator Function (AF) section.
 2018.04.111.0 (supported with Intel

Quartus Prime Pro Edition 17.0.0)

Initial release.

Intel Corporation. All rights reserved. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Intel warrants performance of its FPGA and semiconductor products to current specifications in accordance with Intel’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Intel assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Intel. Intel customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. *Other names and brands may be claimed as the property of others.

Documents / Resources

intel Native Loopback Accelerator Functional Unit (AFU) [pdf] User Guide
Native Loopback Accelerator Functional Unit AFU, Native Loopback, Accelerator Functional Unit AFU, Functional Unit AFU

References

Leave a comment

Your email address will not be published. Required fields are marked *