User Manual for STMicroelectronics models including: STM32H5 Series Microcontrollers, STM32H5, Series Microcontrollers, Microcontrollers

How to use STM32 cache to optimize performance and power efficiency for STM32 MCUs How to use STM32 cache to optimize performance and power efficiency for STM32 MCUs - Application note

This application note describes the instruction cache (ICACHE), and the data cache (DCACHE) the first caches developed by STMicroelectronics.

STMICROELECTRONICS


File Info : application/pdf, 24 Pages, 937.85KB

PDF preview unavailable. Download the PDF instead.

an5212-how-to-use-stm32-cache-to-optimize-performance-and-power-efficiency-for-stm32-mcus-stmicroelectronics
AN5212
Application note
How to use STM32 cache to optimize performance and power efficiency for STM32 MCUs

Introduction
This application note describes the instruction cache (ICACHE) and the data cache (DCACHE), the first caches developed by STMicroelectronics.
The ICACHE and DCACHE introduced on the AHB bus of the Arm® Cortex®-M33 processor are embedded in the STM32 microcontroller (MCUs) listed in the table below. These caches allow users to improve their application performance and reduce the consumption when fetching instruction and data from both internal and external memories, or for data traffic from external memories.
This document gives typical examples to highlight the ICACHE and DCACHE features and facilitate their configuration.

Type Microcontrollers

Table 1. Applicable products Product series
STM32H5 series, STM32L5 series, STM32U5 series

AN5212 - Rev 5 - March 2024 For further information contact your local STMicroelectronics sales office.

www.st.com

1
Note:

AN5212
General information
General information
This application note applies to the STM32 series microcontrollers that are Arm® Cortex® core-based devices. Arm is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.

AN5212 - Rev 5

page 2/24

AN5212
ICACHE and DCACHE overview

2

ICACHE and DCACHE overview

This section provides an overview of the ICACHE and DCACHE interfaces embedded in the STM32 Arm® Cortex® core-based microcontrollers.
This section details the ICACHE and DCACHE diagram and integration in the system architecture.

2.1

STM32L5 series smart architecture

This architecture is based on a bus matrix allowing multiple masters (Cortex-M33, ICACHE, DMA1/2, and SDMMC1) to access multiple slaves (such as flash memory, SRAM1/2, OCTOSPI1, or FSMC).

The figure below describes the STM32L5 series smart architecture.

C-bus

Cortex-M33 with Arm TrustZone®
and FPU
8-Kbyte ICACHE

S-bus

Figure 1. STM32L5 series smart architecture

DMA1 DMA2 SDMMC1

Legend
Bus multiplexer
when remapped by ICACHE
ICACHE access
MPCBBx: Memory protection controller block-based MPCWMx: Memory protection controller watermarked

Slow bus Fast bus

MPCBB1 MPCBB2
MPCWM1

Flash memory
SRAM1
SRAM2
AHB1 peripherals
AHB2 peripherals
OCTOSPI1

OTFDEC

MPCWM2 MPCWM3

FSMC

BusMatrix-S

The Cortex-M33 performance is improved by using the 8-Kbyte ICACHE interface introduced to its C-AHB bus, when fetching code or data from the internal memories (flash memory, SRAM1, or SRAM2) through the fast bus, and also from the external memories (OCTOSPI1 or FSMC) through the slow bus.

2.2

STM32U5 series smart architecture

This architecture is based on a bus matrix allowing multiple masters (Cortex-M33, ICACHE, DCACHE, GPDMA, DMA2D and SDMMCs, OTG_HS, LTDC, GPU2D, GFXMMU) to access multiple slaves (such as flash memory, SRAMs, BKPSRAM, HSPI/OCTOSPI, or FSMC).

AN5212 - Rev 5

page 3/24

2.3

AN5212
ICACHE and DCACHE overview

The figure below describes the STM32U5 series smart architecture.

Figure 2. STM32U5 series smart architecture

C-bus

Cortex-M33 with TrustZone mainline and FPU
ICACHE
(8/32-Kbyte )

S-bus

GPDMA1

DMA2D

SD MMC1

SD MMC2

OTG HS

LTDC

GPU2D

GFXMMU

M0 port M1 port

Port 0 Port 1

Slow-bus Fast-bus

DCACHE1
(4/16-Kbyte)

DCACHE2
(16-Kbyte)

32-bit bus matrix

APB1 peripherals APB2 peripherals

Legend
Bus multiplexer

Master Interface

Fast bus multiplexer

Slave Interface

Fast bus multiplexer on STM32U59x/5Ax/5Fx/5Gx

Fast bus multiplexer on STM32U575/585

MPCBBx: Block-based memory protection controller

MPCWMx: Watermark-based memory protection controller

Peripheral not present in STM32U535/545

Peripheral not present in STM32U535/545/575/585

Peripheral present only in STM32U5Fx/5Gx

ICACHE access DCACHE1 access DCACHE2 access

128-bit cache refill

FLASH
(512-Kbyte/ 2/4-Mbyte)

MPCBB1 MPCBB2

SRAM1 SRAM2

MPCBB3

SRAM3

MPCBB5

MPCBB6

AHB1 peripherals

MPCWM4

AHB2 peripherals

SRAM5 SRAM6 BKPSRAM

MPCWM1 MPCWM5
MPCWM6 MPCWM2 MPCWM3
SRD

OTFDEC1 OTFDEC2
MPCBB4

OCTOSPI1 OCTOSPI2
HSPI1
FSMC SRAM4 AHB3 peripherals

DT70004V2

The Cortex-M33 and the GPU2D interfaces both benefit from using CACHE.

·

ICACHE improves the performance of Cortex-M33 when fetching code or data from the internal memories

through fast bus (flash memory, SRAMs) and from external memories through slow bus (OCTOSPI1/2 and

HSPI1, or FSMC). DCACHE1 improves the performance when fetching data from internal or external

memories through the sbus (GFXMMU, OCTOSPI1/2 and HSPI1, or FSMC).

·

DCACHE2 improves the performance of GPU2D when fetching data from internal and external memories

(GFXMMU, flash memory, SRAMs, OCTOSPI1/2 and HSPI1, or FSMC) through the M0 port bus.

STM32H5 series smart architecture
STM32H523/H533, STM32H563/H573 and STM32H562 smart architecture
This architecture is based on a bus matrix allowing multiple masters (Cortex-M33, ICACHE, DCACHE, GPDMAs,Ethernet and SDMMCs) to access multiple slaves (such as flash memory, SRAMs, BKPSRAM, OCTOSPI and FMC). The figure below describes the STM32H5 series smart architecture.

AN5212 - Rev 5

page 4/24

AN5212
ICACHE and DCACHE overview

Figure 3. STM32H563/H573 and STM32H562 series smart architecture

Cortex-M33 with TrustZone mainline and FPU

GPDMA1

GPDMA2

ETHERNET MAC(1)

SDMMC1

SDMMC2
(1)

S-bus

C-bus

Port 1 Port 0 Port 1 Port 0

ICACHE (8-Kbyte)

DCACHE (4-Kbyte)

1. Not available on STM32H523/H573

Slow-bus Fastbus

32-bit Bus Matrix

AHB1 peripherals
AHB2 peripherals
MPCWM1
MPCWM2
AHB4 peripherals
AHB3 peripherals

MPCBB1 MPCBB2 MPCBB3 MPCWM4
OTFDEC

Flash SRAM1 SRAM2 SRAM3 BKPSRAM
OCTOSPI FMC
DT72430V2

The Cortex-M33 benefits from using CACHE.

·

ICACHE improves the performance of Cortex-M33 when fetching code or data from the internal memories

through fast bus (flash memory, SRAMs) and from external memories through slow bus (OCTOSPI and

FMC).

·

DCACHE improves the performance when fetching data from external memories through the slow bus

(OCTOSPI and FMC).

STM32H503 smart architecture
This architecture is based on a bus matrix allowing multiple masters (Cortex-M33, ICACHE and GPDMAs) to access multiple slaves (such as flash memory, SRAMs and BKPSRAM). The figure below describes the STM32H5 series smart architecture.

Fast-bus

C-bus

Cortex-M33 with FPU
ICACHE (8-Kbyte)

S-bus

Figure 4. STM32H503 series smart architecture

Port 0 Port 1 Port 0 Port 1

GPDMA1 GPDMA2

APB1 peripherals APB2 peripherals

128-bit cache refill

Legend

Bus multiplexer

Master interface

Fast bus multiplexer

Slave interface

MPCBBx: Block-based memory protection controller

MPCWM: Watermark-based memory protection controller

MPCBB1

FLASH (128 Kbytes)
SRAM1

32-bit Bus Matrix

AHB1 peripherals
AHB2 peripherals
AHB3 peripherals

MPCBB2 MPCWM

SRAM2 BKPSRAM

DT68871V2

AN5212 - Rev 5

page 5/24

AN5212
ICACHE and DCACHE overview

The Cortex-M33 benefits from using CACHE.

·

ICACHE improves the performance of Cortex-M33 when fetching code or data from the internal memories

through fast bus (flash memory, SRAMs).

AN5212 - Rev 5

page 6/24

AN5212
ICACHE and DCACHE overview

Cortex-M33 with TrustZone and FPU Execution port interface
AHB1 Master ports interface
BusMatrix-S

2.4

ICACHE block diagram

The ICACHE block diagram is given in the figure below.

Figure 5. ICACHE block diagram

ICACHE
Execution port
C-bus
ICACHE interrupt

Configuration slave port for ICACHE registers access

Configuration interface

Cache control logic
Cache FSM

pLRU-t

REMAP

Cache memory port

AHB master1
port Fast bus
AHB master2
port Slow bus

2 ways

Cache TAG memories

2 ways

Cache Data memories

The ICACHE memory includes:

·

the TAG memory with:

­ the address tags that indicate which data are contained in the cache data memory

­ the validity bits

·

the data memory, that contains the cached data

AN5212 - Rev 5

page 7/24

AN5212
ICACHE and DCACHE overview

2.5

DCACHE block diagram

The DCACHE block diagram is given in the figure below.

Figure 6. DCACHE block diagram

AHB

Configuration slave port

Read-hit monitor Read-miss monitor

Configuration interface

Write-hit monitor

CMD range start @

Write-miss monitor

CMD range end @

Control Status

Input port AHB
dcache_it

Slave port interface

Cache control logic
Cache FSM

pLRU-t

Maintenance operations

Cache memory port

Master port
AHB

Master port interface Main AHB

S-AHB or M0 port

GPU2D

Cortex-M33

DT71536V1

DCACHE

n ways

Cache TAG memories

n ways

Cache data memories

The DCACHE memory includes:

·

the TAG memory with:

­ the address tags that indicate which data are contained in the cache data memory

­ the validity bits

­ the privilege bits

­ the dirty bits

·

the data memory, that contains the cached data

AN5212 - Rev 5

page 8/24

AN5212
ICACHE and DCACHE features

3

ICACHE and DCACHE features

3.1

ICACHE features

3.1.1

Dual masters

The ICACHE accesses the AHB bus matrix either over:

·

One AHB master port: master1 (fast bus)

·

Two AHB master ports: master1 (fast bus) and master2 (slow bus)

This feature allows the traffic to be decoupled when accessing different memory regions (such as internal flash memory, internal SRAM and external memories), in order to reduce the CPU stalls on cache misses.

The following table summarizes memory regions and their addresses.

Type Internal

Table 2. Memory regions and their addresses

Peripheral

Name

Product name and region size

FLASH SRAM1 SRAM2

STM32H503 128 KB

STM32L5 series/ STM32U535/ 545/ STM32H523/ 533

512 KB

STM32U575/ 585
STM32H563/ 573/562

2 MB

STM32U59x/ 5Ax/5Fx/5Gx

4 MB

STM32H503 16 KB

STM32L5 series/ STM32U535/ 545/575/585

192 KB

STM32H523/ 533

128 KB

STM32H563/ 573/562

256 KB

STM32U59x/ 5Ax/5Fx/5Gx

768 KB

STM32H503 series

16 KB

STM32L5 series/ STM32U535/ 545/575/585

64 KB

STM32H523/ 533

64 KB

Cacheable memory access

Bus name

Nonsecure region starting address

Secure, nonsecure
callable region starting address

N/A

0x0800 0000 0x0C00 0000

N/A ICACHE fast bus
0x0A00 0000 0x0E00 0000
0x0A00 4000 N/A 0x0A03 0000 0x0E03 0000 0x0A04 0000 0x0E04 0000

Not cacheable memory access

Bus name

Nonsecure region starting address

Secure, nonsecure
callable region starting address

N/A N/A

N/A

0x2000 0000 0x3000 0000
Sbus 0x2000 4000 N/A 0x2003 0000 0x3003 0000 0x2004 0000 0x3004 0000

AN5212 - Rev 5

page 9/24

AN5212
ICACHE and DCACHE features

Peripheral

SRAM2

STM32H563/ 573/562

80 KB

STM32U59x/ 5Ax/5Fx/5Gx

64 KB

STM32U575/ 585

512 KB

Internal

SRAM3

STM32H523/ 533

64 KB

STM32H563/ 573/562

320 KB

STM32U59x/ 5Ax/5Fx/5Gx

832 KB

SRAM5

STM32U59x/ 5Ax/5Fx/5Gx

832 KB

SRAM6

STM32U5Fx/ 5Gx

512 KB

HSPI1

STM32U59x/ 5Ax/5Fx/5Gx

FMC STM32H563/ SDRAM 573/562

OCTOSPI1 bank
nonsecure

STM32L5/U5 series
STM32H563/ 573/562

External

FMC bank 3
nonsecure

STM32L5/U5

series

256 MB

STM32H563/ 573/562

OCTOSPI2 STM32U575/ bank 585/59x/5Ax/
nonsecure 5Fx/5Gx

FMC bank 1
nonsecure

STM32L5/U5 series
STM32H563/ 573/562

1. To be selected when remapping such regions.

Cacheable memory access 0x0A04 0000 0x0E04 0000 0x0A0C 0000 0x0E0C 0000 0x0A04 0000 0x0E04 0000
ICACHE fast bus 0x0A05 0000 0x0E05 0000
0x0A0D 0000 0x0E0D 0000 0x0A1A 0000 0x0E1A 0000 0x0A27 0000 0x0E27 0000

Alias address

in the range of

[0x0000 0000

to 0x07FF

ICACHE FFFF] or

slow bus [0x1000

N/A

(1)

0000:0x1FFF

FFFF] defined

by means of

remapping

feature

Not cacheable memory access 0x2004 0000 0x3004 0000 0x200C 0000 0x300C 0000 0x2004 0000 0x3004 0000
0x2005 0000 0x3005 0000
0x200D 0000 0x300D 0000 0x201A 0000 0x301A 0000 0x2027 0000 Sbus 0xA000 0000 0xC000 0000
0x9000 0000 N/A
0x8000 0000
0x7000 0000
0x6000 0000

AN5212 - Rev 5

page 10/24

3.1.2

AN5212
ICACHE and DCACHE features

1-way versus 2-way ICACHE

By default, the ICACHE is configured in associative operating mode (two ways enabled), but it is possible to configure the ICACHE in direct mapped mode (one way enabled), for applications requiring a very-low power consumption. The ICACHE configuration is done with the WAYSEL bit in ICACHE_CR as follows:

·

WAYSEL = 0: direct mapped operating mode (1-way)

·

WAYSEL = 1 (default): associative operating mode (2-way)

Table 3. 1-way versus 2-way ICACHE

Parameter

1-way ICACHE

Cache size (Kbytes)

Cache number of ways

1

Cache line size

Number of cache lines

512(1)/2048(2)

1. For STM32L5 series /STM32H5 series /STM32U535/545/575/585 2. For STM32U59x/5Ax/5Fx/5Gx

2-way ICACHE 8(1)/32(2)
2 128 bits (16 bytes)
256(1)/1024(2) per way

AN5212 - Rev 5

page 11/24

3.1.3

AN5212
ICACHE and DCACHE features

Burst type

Some Octo-SPI memories support the WRAP burst, that provides the benefit of critical-word-first feature performance. The ICACHE burst type of the AHB memory transaction for remapped regions is configurable. It implements incremental burst or WRAP burst, selected with the HBURST bit in the ICACHE_CRRx register.

The differences between the WRAP and the incremental bursts are given below (see also the figure):

·

WRAP burst:

­ cache line size = 128 bits

­ burst starting address = word address of the first data requested by the CPU

·

Incremental burst:

­ cache line size = 128 bits

­ burst starting address = address aligned on the boundary of the cache line containing the requested word

Figure 7. Incremental versus WRAP burst

128 bits (16 bytes) group alignement boundaries Incremental burst
0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xA 0xB 0xC0xD 0xE 0xF WRAP burst
0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xA 0xB 0xC0xD 0xE 0xF

3.1.4

Cacheable regions and remapping feature
The ICACHE is connected to the Cortex-M33 through the C-AHB bus, and caches the code region from addresses [0x0000 0000 to 0x1FFF FFFF].
Since the external memories are mapped at an address in the range [0x6000 0000 to 0xAFFF FFFF], the ICACHE supports a remap feature that allows any external memory region to be remapped at an address in the range of [0x0000 0000 to 0x07FF FFFF] or [0x1000 0000 to 0x1FFF FFFF], and to become accessible through the C-AHB bus.
Up to four external memory regions can be remapped with this feature.
Once a region is remapped, the remap operation occurs even if the ICACHE is disabled or if the transaction is not cacheable.
The cacheable memory regions can be defined and programmed by the user in the memory protection unit (MPU). The table below summarizes the configurations of the STM32L5 and STM32U5 series memories.

Table 4. Configuration of STM32L5 and STM32U5 series memories

Product memory
Flash memory SRAM External memories (HSPI/ OCTOSPI or FSMC)

Cacheable (MPU programming)
Yes or No Not recommended
Yes or No

Remapped in ICACHE (ICACHE_CRRx programming)
Not required
Required if the user wants external code fetching on CAHB bus (else on S-AHB bus)

AN5212 - Rev 5

page 12/24

3.1.5

AN5212
ICACHE and DCACHE features
Benefit of ICACHE external memory remapping The example in the figure below shows how to benefit from the ICACHE enhanced performance during code execution or data read when accessing an external 8-Mbyte external Octo-SPI memory (such as external flash memory or RAM).
Figure 8. Octo-SPI memory remap example

0x2000 0000

SRAM1 (non-secure)

Code (non-secure)

0x1080 0000 0x1000 0000

Cacheable 8-Mbyte external memory code or data (alias)
Callable code (non-secure)

0xA000 0000
OCTOSPI1 memory-mapped region

Remap

Not cacheable 8-Mbyte 0x9080 0000 external memory code or data
0x9000 0000 FSMC Bank 3

Note:

The following steps are needed to remap this external memory:
1. OCTOSPI configuration for the external memory Configure the OCTOSPI interface in order to access the external memory in Memory mapped mode (the external memory is seen as an internal memory mapped in the [0x9000 0000 to 0x9FFF FFFF] region). Since the external memory size is 8 Mbytes, it is seen at the region [0x9000 0000 to 0x907F FFFF]. The external memory at this region is accessed through the Sbus and is not cacheable. The next step shows the ICACHE configuration in order to remap this region.
For the OCTOSPI configuration in memory-mapped mode, refer to the application note OctoSPI interface on STM32 microcontrollers (AN5050).

AN5212 - Rev 5

page 13/24

AN5212
ICACHE and DCACHE features
2. ICACHE configuration to remap the external memory mapped region The 8 Mbytes placed in the [0x9000 0000 to 0x907F FFFF] region are remapped to the [0x1000 0000 to 0x107F FFFF] region. They can then be accessed through the slow bus (ICACHE master2 bus). ­ ICACHE_CR register configuration a. Disable ICACHE with EN = 0. b. Select 1-way or 2-ways (depending on the application needs) with WAYSEL = 0 or 1, respectively. ­ ICACHE_CRRx register configuration (up to four regions, x = 0 to 3) a. Select the 0x1000 0000 base address (remap address) with BASEADDR [28:21] = 0x80. b. Select the 8-Mbyte region size to remap with RSIZE[2:0] = 0x3. c. Select the 0x9000 0000 remapped address REMAPADDR[31:21] = 0x480. d. Select the ICACHE AHB master2 port for external memories with MSTSEL = 1. e. Select the WRAP burst type with HBURST = 0. f. Enable the remapping for region x with REN = 1. The following figure shows how the memory regions are seen with IAR after enabling the remap.
Figure 9. Memory regions remapping example

3.1.6 Note: 3.1.7

The 8-Mbyte external memory is now remapped and can be accessed over the [0x1000 0000 to 0x107F FFFF] region.
3. ICACHE enable
­ ICACHE_CR register configuration Enable the ICACHE with EN = 1.

Hit and miss monitors

ICACHE provides two monitors for performance analysis: a 32-bit hit monitor and a 16-bit miss monitor.

·

The hit monitor counts the cacheable AHB-transactions on slave cache port that hit ICACHE content

(fetched data already available in the cache). The hit monitor counter is available in the ICACHE_HMONR

register.

·

The miss monitor counts the cacheable AHB-transactions on slave cache port that miss ICACHE content

(fetched data not already available in the cache).

The miss monitor counter is available in the ICACHE_MMONR register.

These two monitors do not wrap over when reaching their maximum values.

These monitors are managed from the following bits in the ICACHE_CR register:

·

HITMEN bit (respectively MISSMEN bit) to enable/stop the hit (respectively miss) monitor

·

HITMRST bit (respectively MISSMRST bit) to reset the hit (respectively miss) monitor

By default, theses monitors are disabled in order to reduce power consumption.

ICACHE maintenance
The software can invalidate the ICACHE by setting the CACHEINV bit in the ICACHE_CR register. This action invalidates the whole cache, making it empty. Meanwhile, if some remapped regions are enabled, the remap feature is still active, even when the ICACHE is disabled.
As the ICACHE only manages read transactions and does not manage write transactions, it does not ensure coherency in case of writes. Consequently, the software must invalidate the ICACHE after programming a region.

AN5212 - Rev 5

page 14/24

3.1.8 3.1.9
3.2
3.2.1
3.2.2

AN5212
ICACHE and DCACHE features

ICACHE security ICACHE is a securable peripheral that can be configured as secure through the GTZC TZSC secure configuration register. When it is configured as secure, only secure accesses are allowed to the ICACHE registers. ICACHE can also be configured as privileged through the GTZC TZSC privilege configuration register. When ICACHE is configured as privileged, only privileged accesses are allowed to the ICACHE registers. By default, the ICACHE is nonsecure and non-privileged through the GTZC TZSC.
Event and interrupt management The ICACHE manages the functional errors when detected, by setting the ERRF flag in ICACHE_SR. An interrupt can also be generated if the ERRIE bit is set in ICACHE_IER. In case of ICACHE invalidation, when the cache busy state finished, the BSYENDF flag is set in ICACHE_SR. An interrupt can also be generated if the BSYENDIE bit is set in ICACHE_IER. The table below lists the ICACHE interrupt and event flags.

Register ICACHE_SR ICACHE_IER ICACHE_FCR

Table 5. ICACHE interrupt and event management bits

Bit name
BUSYF BSYENDF
ERRF ERRIE BSYENDIE CERRF CBSYENDF

Bit description
Cache executing a full invalidate operation Cache invalidation operation finished An error occurred during caching operation Enable interrupt for cache error Enable interrupt in case of invalidation operation finished Clears ERRF in ICACHE_SR Clears BSYENDF in ICACHE_SR

Bit access type
Read-only
Read/write Write-only

DCACHE features
The purpose of the data cache is to cache external memories data loads and data stores coming from the processor or from another bus master peripheral. DCACHE manages both read and write transactions.
DCACHE cacheability traffic The DCACHE caches the external memories from the master port interface through the AHB bus. The incoming memory requests are defined cacheable according to its AHB transaction memory lockup attribute. The DCACHE write policy is defined as write-through or write-back depending to the memory attribute configured by the MPU. When a region is configured as non-cacheable , the DCACHE is bypassed.

Table 6. DCACHE cacheability for AHB transaction

AHB lookup attribute 0 1
1

AHB bufferable attribute X 0
1

Cacheability Read and write: non cacheable Read: cacheable Write: (cacheable) write-through Read: cacheable Write: (cacheable) write-back

DCACHE cacheable regions
For STM32U5 series, the DCACHE1 slave interface is connected to the Cortex-M33 through the S-AHB bus, and caches the GFXMMU, FMC, and HSPI/OCTOSPIs. The DCACHE2 slave interface is connected to the DMA2D through the M0 port bus, and caches all the internal and external memories (except SRAM4 and BRKPSRAM).
For STM32H5 series, the DCACHE slave interface is connected to the Cortex-M33 through the S-AHB external memories through FMC and OCTOSPI.

AN5212 - Rev 5

page 15/24

Note: 3.2.3 3.2.4 3.2.5
Note:
3.2.6

AN5212
ICACHE and DCACHE features

Table 7. DCACHE cacheable regions and interfaces

Cacheable memory address region GFXMMU SRAM1 SRAM2 SRAM3 SRAM5 SRAM6 HSPI1 OCTOSPI1
FMC BANKs OCTOSPI2

DCACHE1 cacheable interfaces X
N/A
X X X X

DCACHE2 cacheable interfaces X X X X X X X X X X

Some interfaces are not supported in certain products. Refer to Figure 1 or the specific product reference manual.

Burst type Same as ICACHE, the DCACHE supports incremental and wrapped bursts (see Section 3.1.3). For DCACHE, the burst type is configured through the HBURST bit in DCACHE_CR.

DCACHE configuration
During boot , DCACHE is disabled by default making the slave memory requests forwarded directly to master port. To enable DCACHE, EN bit must be set in the DCACHE_CR register.

Hit and miss monitors

The DCACHE implements four monitors for cache performance analysis:

·

Two 32-bit (R/W) hit monitor: counts the number of times the CPU read or write data in the cache memory

without generating a transaction on DCACHE master ports (data already available in the cache). The (R/W)

hit monitors counters are available respectively in the DCACHE_RHMONR and DCACHE_WHMONR

registers.

·

Two 16-bit (R/W) miss monitor: counts the number of times the CPU read or write data in the cache

memory and generates a transaction on DCACHE master ports, in order to load the data from the memory

region (fetched data not already available in the cache). The (R/W) miss monitors counters are available

respectively in the DCACHE_RMMONR and DCACHE_WMMONR registers.

These four monitors do not wrap over when reaching their maximum values. These monitors are managed from the following bits in the DCACHE_CR register:

·

WHITMEN bit (respectively WMISSMEN bit) to enable/stop the write hit (respectively miss) monitor

·

RHITMEN bit (respectively RMISSMEN bit) to enable/stop the read hit (respectively miss) monitor

·

WHITMRST bit (respectively WMISSMRST bit) to reset the write hit (respectively miss) monitor

·

RHITMRST bit (respectively RMISSMRST bit) to reset the read hit (respectively miss) monitor

By default, theses monitors are disabled in order to reduce power consumption.

DCACHE maintenance The DCACHE offers multiple maintenance operations that can be configured through CACHECMD[2:0] in DCACHE_CR. 000: no operation (default) 001: clean range. Clean a certain range in the cache 010: invalidate range. Invalidate a certain range in the cache 010: clean and invalidate range. Clean and invalidate a certain range in the cache

AN5212 - Rev 5

page 16/24

Note: 3.2.7
3.2.8

AN5212
ICACHE and DCACHE features

The selected range is configured through:

·

CMDSTARTADDR register: command starting address

·

CMDENDADDR register: command ending address

This register must be set before CACHECMD is written.

The cache command maintenance starts when STARTCMD bit is set in DCACHE_CR register. The DCACHE also support a full CACHE invalidation by setting the CACHEINV bit in DCACHE_CR register.

DCACHE security
The DCACHE is a securable peripheral that can be configured as secure through the GTZC TZSC secure configuration register. When it is configured as secure, only secure accesses are allowed to the DCACHE registers.
DCACHE can also be configured as privileged through the GTZC TZSC privilege configuration register. When DCACHE is configured as privileged, only privileged accesses are allowed to the DCACHE registers.
By default, the DCACHE is nonsecure and non- privileged through the GTZC TZSC.

Event and interrupt management
The DCACHE manages the functional errors when detected, by setting the ERRF flag in DCACHE_SR. An interrupt can also be generated if the ERRIE bit is set in DCACHE_IER. In case of DCACHE invalidation, when the cache busy state finished, the BSYENDF flag is set in DCACHE_SR.
An interrupt can also be generated if the BSYENDIE bit is set in DCACHE_IER. The DCACHE command status can be checked through CMDENF and BUSYCMDF through the DCACHE_SR
An interrupt can also be generated if the CMDENDIE bit is set in DCACHE_IER. The table below lists the DCACHE interrupts and events flags.

Register DCACHE_SR DCACHE_IER DCACHE_FCR

Table 8. DCACHE Interrupt and events management bits

Register BUSYF BSYENDF BUSYCMDF CMDENDF ERRF ERRIE CMDENDIE BSYENDIE CERRF CCMDENDF CBSYENDF

Bit description Cache executing a full invalidate operation Cache full invalidate operation ended Cache executing a range command A range command end An error occurred during caching operation Enable interrupt for cache error Enable interrupt on range command end Enable interrupt on full invalidate operation end Clears ERRF in DCACHE_SR Clears CMDENDF in DCACHE_SR Clears BSYENDF in DCACHE_SR

Bit access type Read-only Read/write Write-only

AN5212 - Rev 5

page 17/24

4
Note:

AN5212
ICACHE and DCACHE performance and power consumption

ICACHE and DCACHE performance and power consumption

Using ICACHE and DCACHE improve the application performance when accessing external memories. The following table shows the impact of ICACHE and DCACHE on CoreMark® execution when accessing external memories.

Table 9. ICACHE and DCACHE performance on CoreMark execution with external memories

(1)

CoreMark code

CoreMark Data

ICACHE configuration

DCACHE configuration

CoreMark score/Mhz

Internal Flash memory

Internal SRAM

Enabled (2-ways)

Disabled

3.89

Internal Flash memory

External Octo-SPI PSRAM ( Sbus)

Enabled (2-ways)

Enabled

3.89

Internal Flash memory

External Octo-SPI PSRAM ( Sbus)

Enabled (2-ways)

Disabled

0.48

External Octo-SPI Flash (C-bus)

Internal SRAM

Enabled (2-ways)

Disabled

3.86

External Octo-SPI Flash (C-bus)

Internal SRAM

Disabled

Disabled

0.24

Internal Flash memory

Internal SRAM

Disabled

Disabled

2.69

1. Test Conditions:

·

Applicable product: STM32U575/585

·

System frequency: 160 MHz.

·

External Octo-SPI PSRAM memory: 80 MHz (DTR mode).

·

External Octo-SPI flash memory: 80 MHz (STR mode).

·

Compiler: IAR V8.50.4.

·

Internal Flash PREFETCH: ON.

Using ICACHE and DCACHE reduce the power consumption when accessing internal and external memories. The following table shows the impact of ICACHE on power consumption during CoreMark execution.

Table 10. CoreMark execution ICACHE impact on power consumption

(1)

ICACHE configuration

MCU power consumption (mA)

Enabled (2-ways)

7.60

Enabled (1-way)

7.13

Disabled

8.89

1. Test Conditions:

·

Applicable product: STM32U575/585

·

CoreMark code: internal Flash memory.

·

CoreMark data: internal SRAM.

·

Internal Flash memory PREFETCH: ON.

·

System frequency: 160 MHz.

·

Compiler: IAR V8.32.2.

·

Voltage range: 1.

·

SMPS: ON.

2-way set associative configuration is more performing than 1-way set associative configuration for code that cannot be fully loaded in cache. Meanwhile, 1-way set associative cache is almost always more power efficient than 2-way set associative cache. Each code has to be evaluated in both associativity configurations, in order to select the best trade-off between performance and power consumption. The selection depends on the user priority.

AN5212 - Rev 5

page 18/24

AN5212
Conclusion

5

Conclusion

The first caches developed by STMicroelectronics, ICACHE and DCACHE, are able to cache internal and external memories, offering performance enhancement for data traffic and instruction fetches. This document shows the different features supported by the ICACHE and DCACHE, their configuration simplicity and flexibility allow lower development cost and faster time to market.

AN5212 - Rev 5

page 19/24

Revision history
Date 10-Oct-2019 27-Feb-2020
7-Dec-2021
15-Feb-2023
11-Mar-2024

AN5212

Table 11. Document revision history

Version 1 2 3
4
5

Changes
Initial release.
Updated: · Table 2. Memory regions and their addresses · Section 2.1.7 ICACHE maintenance · Section 2.1.8 ICACHE security
Updated: · Document title · Introduction · Section 1 ICACHE and DCACHE overview · Section 4 Conclusion
Added: · Section 2 ICACHE and DCACHE features · Section 3 ICACHE and DCACHE performance and power consumption
Updated: · Section 2.2: STM32U5 series smart architecture · Section 2.5: DCACHE block diagram · Section 3.1.1: Dual masters · Section 3.1.2: 1-way versus 2-way ICACHE · Section 3.1.4: Cacheable regions and remapping feature · Section 3.2: DCACHE features · Section 3.2.2: DCACHE cacheable regions · Section 4: ICACHE and DCACHE performance and power consumption
Added: · Section 1: General information
Updated: · Section 2.3: STM32H5 series smart architecture · Section 3.1.1: Dual masters

AN5212 - Rev 5

page 20/24

AN5212
Contents
Contents
1 General information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 2 ICACHE and DCACHE overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 STM32L5 series smart architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 STM32U5 series smart architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 STM32H5 series smart architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 ICACHE block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 DCACHE block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 ICACHE and DCACHE features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 ICACHE features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 Dual masters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1.2 1-way versus 2-way ICACHE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.3 Burst type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1.4 Cacheable regions and remapping feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1.5 Benefit of ICACHE external memory remapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.6 Hit and miss monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.7 ICACHE maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1.8 ICACHE security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.9 Event and interrupt management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 DCACHE features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.1 DCACHE cacheability traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.2 DCACHE cacheable regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2.3 Burst type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.4 DCACHE configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.5 Hit and miss monitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.6 DCACHE maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.7 DCACHE security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2.8 Event and interrupt management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 ICACHE and DCACHE performance and power consumption . . . . . . . . . . . . . . . . . . . . . .18 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20 List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 List of figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23

AN5212 - Rev 5

page 21/24

AN5212
List of tables

List of tables

Table 1. Table 2. Table 3. Table 4. Table 5. Table 6. Table 7. Table 8. Table 9. Table 10. Table 11.

Applicable products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Memory regions and their addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1-way versus 2-way ICACHE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Configuration of STM32L5 and STM32U5 series memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 ICACHE interrupt and event management bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 DCACHE cacheability for AHB transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 DCACHE cacheable regions and interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 DCACHE Interrupt and events management bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 ICACHE and DCACHE performance on CoreMark execution with external memories . . . . . . . . . . . . . . . . . . . . 18 CoreMark execution ICACHE impact on power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Document revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

AN5212 - Rev 5

page 22/24

AN5212
List of figures

List of figures

Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7. Figure 8. Figure 9.

STM32L5 series smart architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 STM32U5 series smart architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 STM32H563/H573 and STM32H562 series smart architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 STM32H503 series smart architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 ICACHE block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 DCACHE block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Incremental versus WRAP burst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Octo-SPI memory remap example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Memory regions remapping example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

AN5212 - Rev 5

page 23/24

AN5212
IMPORTANT NOTICE ­ READ CAREFULLY STMicroelectronics NV and its subsidiaries ("ST") reserve the right to make changes, corrections, enhancements, modifications, and improvements to ST products and/or to this document at any time without notice. Purchasers should obtain the latest relevant information on ST products before placing orders. ST products are sold pursuant to ST's terms and conditions of sale in place at the time of order acknowledgment. Purchasers are solely responsible for the choice, selection, and use of ST products and ST assumes no liability for application assistance or the design of purchasers' products. No license, express or implied, to any intellectual property right is granted by ST herein. Resale of ST products with provisions different from the information set forth herein shall void any warranty granted by ST for such product. ST and the ST logo are trademarks of ST. For additional information about ST trademarks, refer to www.st.com/trademarks. All other product or service names are the property of their respective owners. Information in this document supersedes and replaces information previously supplied in any prior versions of this document.
© 2024 STMicroelectronics ­ All rights reserved

AN5212 - Rev 5

page 24/24



References

C2 v20.4.0000 build 240 - c2 rendition config : Techlit Active Antenna House PDF Output Library 7.2.1732; modified using iText 2.1.7 by 1T3XT