Instruction Manual for ST models including: Cortex-M0, Cortex-M23, Cortex-M33-M35P, Cortex-M55, Cortex-M85, Cortex-M0 Plus Microcontrollers, Cortex-M0 Plus, Microcontrollers

STM32U0-System-ARM Cortex M0 (Core) rev2

escodam

STM32U0-System-ARM Cortex M0 (Core) rev2

1 ago 2024 — Let's start our description of the CPU by the processor core in charge of fetching and executing instructions. 4. Page 5. ARM Cortex-M0 → 2-stage pipeline.


File Info : application/pdf, 11 Pages, 554.07KB

PDF preview unavailable. Download the PDF instead.

en.stm32u0-system-arm-cortex-m0core
STM32U0- ARM® Core
ARM Cortex®-M0+ Core
Hello, and welcome to this presentation of the ARM® Cortex®M0+ core which is embedded in all products of the STM32U0 microcontroller family.
1

Cortex-M0+ processor overview

· ARMv6-M architecture · Von Neuman architecture, 2-stage pipeline · Single-issue architecture · Multiply in 1-cycle · Memory Protection Unit (MPU) · Single-cycle I/O port

Arm Cortex®-M0+
Nested vectored interrupt controller

CPU arm v6-M

Memory Protection Unit

AHB-Lite
Fast I/O port

Data watchpoint
Breakpoint unit

Serial Wire Debug

Ultra low power design
Low power consumption and high energy efficiency

Very compact code
Except control instructions and branch and link, all instructions are 16 bits long

2

The Cortex®-M0+ core is part of the ARM Cortex-M group of 32-bit RISC cores. It implements the ARMv6-M architecture and features a 2-stage pipeline.
The Cortex®-M0+ has a unique AHB-Lite master port, but supports concurrent instruction fetch and data access when the data access targets the Fast I/O Port address range.

2

Cortex-M processors compatibility
· Seamless architecture across all applications

Cortex-M23

Cortex-M33/M35P

Cortex-M55

Cortex-M85

TrustZone

TrustZone

TrustZone

TrustZone

Cortex-M0/M0+
Ultra low power Small footprint

Cortex-M3

Cortex-M4

Exceptional 32-bit performance with low power consumption

Control and performance for mixed
signal devices

Binary and tool compatible

Cortex-M7
Highest performance Cortex-M processor

3
STM32U0 microcontrollers integrate an ARM® Cortex®-M0+ core in order to benefit from the incomparable performance per milliwatt ratio. All Cortex®-M CPUs have a 32-bit architecture. The Cortex®-M3 was the first Cortex®-M CPU released by ARM. Then ARM decided to distinguish two product lines: high performance and low power, while maintaining the compatibility between them. The Cortex®-M0+ belongs to the low power product line. It is designed for batterypowered devices, very sensitive to power consumption.

3

Core architecture overview

ARM® Cortex®-M0+

PROCESSOR CORE 2-stage pipeline

BUS INTERFACE
UNIT

MPU

AHB-Lite AHB-Lite bus matrix

Internal memories Internal peripherals

NVIC DEBUG

32 Interrupt requests Debug
4

The Cortex®-M0+ core delivers more performance than the Cortex®-M0 core thanks to the 2-stage instruction pipeline. Let's start our description of the CPU by the processor core in charge of fetching and executing instructions.

4

ARM Cortex-M0+  2-stage pipeline

FETCH AND PRE-DECODE

DECODE AND EXECUTE

Two 16-bit instruction fetches

No instruction fetch Possible data access

Two 16-bit instruction fetches

Fetch InstrN, InstrN+1

Pre-decode InstrN

Clock 1

Decode InstrN

Execute InstrN
Pre-decode InstrN+1

Decode InstrN+1

Execute InstrN+1

Fetch InstrN+2, InstrN+3

Pre-decode InstrN+2

Clock 2

Clock 3

5
Most V6-M instructions are 16 bits long. There are only six 32bit instructions and most of them are control instructions, rarely used. However, the branch and link instruction, which is used to call a sub-program is also 32 bits long, in order to support a large offset between this instruction and the label pointing to the next instruction to be executed. Ideally one 32-bit access loads two 16-bit instructions, which results in less fetches per instruction. During clock number 2, no instruction fetch occurs. The AHB Lite port is available to execute a data access when instruction N is a load/store instruction.

5

Branch performance

· Cortex®-M0+ core
· Maximum two 16-bit branch shadow instructions

... Label:

Inst0 B Inst1 Inst2 ... InstN InstN+1

Label

; Branch to Label ; Branch shadow instruction ; Branch shadow instruction

Fetch and pre-decode Decode and execute

Clock 1 Inst0, B Label

Clock 2 Inst0

Clock 3 Ins1, Inst2
B Label

Clock 4 InstrN, InstN+1

6
On a given branch, fewer pre-fetched instructions are wasted (thanks to the 2-stage pipeline). In clock number 1, the processor fetches Inst0 and an unconditional branch instruction. In clock number 2, it executes Instr0. In clock number 3, it executes the branch instruction while fetching the two next sequential instructions Inst1 and Inst2 called branch shadow instructions. In clock number 4, the processor discards Inst1 and Inst2 and fetches InstrN and InstN+1. Cortex-M0, M3 and M4 implement a 3-stage pipeline: Fetch, Decode and Execute. The number of branch shadow instructions is larger: up to four 16-bit instructions.

6

Core architecture overview

ARM® Cortex®-M0+

PROCESSOR CORE 2-stage pipeline

BUS INTERFACE
UNIT

MPU

SINGLE-CYCLE I/O PORT

AHB-Lite AHB-Lite bus matrix

GPIO ports

Internal memories Internal peripherals

NVIC DEBUG

32 Interrupt requests Debug

7
The Cortex®-M0+ has neither an embedded cache nor internal RAM. Consequently, any instruction fetch transaction is steered to the AHB-Lite interface and any data access is steered either to the AHB-Lite interface or the Single-cycle I/O port. Note that the STM32U0 implements a SoC-level instruction cache, external to the CPU, located in the embedded flash controller. The AHB-Lite master port is connected to a bus matrix, enabling the CPU to access memories and peripherals. Since transactions are pipelined on AHB-Lite, the best throughput is 32 bits of data or instructions per clock, with a minimum 2clock latency. The Cortex®-M0+ also features a Single-cycle I/O Port, enabling the CPU to access data with a 1-clock latency. An external decoding logic determines the address range in which data accesses are steered to this port.

7

In the STM32U0, the Single-cycle I/O Port is not used to access GPIO port registers. GPIO ports are mapped to AHB instead, allowing to be accessed by DMA.
7

Memory protection unit
· MPU attribute settings define access permissions · 8 independent memory regions
· Can execute code? · Can write data ? · Unprivileged mode access?
8
The MPU in STM32U0 microcontroller offers support for eight independent memory regions, with independent configurable attributes for: - access permission: allowed or not read/write in privileged/unprivileged mode, - execution permission: executable region or region prohibited for instruction fetch.
8

References
· For more details, please refer to the following documentation:
· STM32G0 Series Cortex®-M0+ processor programming manual (PM0223) · Managing memory protection unit (MPU) in STM32 MCUs (AN4838) · ARM website at the following link:
· http://www.arm.com/products/processors/cortex-m/cortex-m0+-processor.php
9
For more details, please refer to these application notes and the Cortex®-M0+ programming manual available on www.st.com website. Also visit the ARM website where you will find more information about the Cortex®-M0+ core.
9

Thank you

© STMicroelectronics - All rights reserved.

ST logo is a trademark or a registered trademark of STMicroelectronics International NV or its affiliates in the EU and/or other countries.

For additional information about ST trademarks, please refer to www.st.com/trademarks.

All other product or service names are the property of their respective owners.

10

Thanks for attending this presentation!

10



References

PDFCreator 2.4.1.13