ARM Cortex-M#Cortex-M3

The Cortex -M3 is an architecture ( ARMv7 -M ) from the ARM Cortex- M family of microprocessors from ARM. This architecture was introduced in 2004 and can be considered as a replacement for the ARM7 in the field of microcontrollers.


ARM architectures have become more and more complex over time. The first popular ARM processor was (and still is) the ARM7, ARM9 and ARM11 followed by in different designs.

The applications for these three families can be divided roughly as follows:

ARM7: This was a self-runner in the field of microcontrollers. Not originally intended for use in microcontrollers, this processor has established there. Used ARM7 based microcontroller where more CPU power is required for, for example, can provide an 8- bit processor.

ARM9: area of ​​application environments that require far more computing power than eg ARM7. Here are mechanisms such as MMU and MPU for use. Application thereof include real-time systems

ARM11: high-end applications in the embedded systems that require a lot of computing power.

His new architecture divided into three branches on ARM to create successors for all three areas. This division should allow an even wider use of the new architecture (s) in the world of embedded systems.

Cortex A: Application ( OS-based applications)

Cortex R: real-time ( real-time applications )

Cortex M: Microcontrollers (Cores for microcontrollers )

The Cortex- M3 architecture can thus be viewed as a successor for the ARM7 and provides more computing power than an ARM7 with less complexity of the programming model and reduced chip area available. Other sub-groups, such as the M1 are, for example, for the Implementation of an FPGA available.

Basic structure

The Cortex -M3 is a revolutionary new architecture of ARM with the aim to design a very powerful, but the programming model fro less complex processor in the current and future microcontrollers of the ( classical ) 8 - used and 16- bit area can be. Like all ARM architectures also has the M3 ( internally ) a 32 -bit architecture, but only works with the new Thumb2 instruction set. ( Other ARM cores can switch between ARM and Thumb instruction set. )

The heart of the Cortex-M3 processor, the Cortex- M3 core with three-stage pipeline, based on the Harvard architecture. ARM processors for both architectures, Von Neumann (where a data and address bus is used to store instructions and data ), and Harvard available. The Harvard architecture is characterized in that two separate bus systems ( and two separate storage ) exist for loading data and commands, that is, the processor can simultaneously read both data and instructions (or data in the memory to write back ). Outwardly ( programming model ) of the Cortex- M3 is, however, a Von Neumann model, which means that all his ( shared ) address space can be linearly programmed. This saves costly accesses to the program memory, if there are constants stored.

For a 32 -bit processor, the available address space with 232 addresses for a microcontroller is colossal in size. Therefore, there are plenty of addresses in order to address both memory in the shared address space. For the programmer of the system, this is a very great advantage since data stored in flash ( program memory ) are stored (constants, strings, etc. ) can be directly addressed linearly and must be not only loaded with complicated commands.

In the core of the Cortex- M3, some new features have been integrated. So include Nested Vectored Interrupt Controller is a real, a kind of branch prediction and multiplication in a single cycle, do so.

Branch prediction ( brief explanation )

Branch prediction in the Cortex- M3 is solved interesting: To load a command only 16 bits are needed, the memory interface, however, is 32 bits wide, and there are always two instructions simultaneously loaded ( fetch). A command is each cached. In the case of a jump is only known in the Exec level (see Pipelines ) whether the jump is taken or not. If not, how had continued to work. If so, the buffered instruction is loaded into the pipeline and continue working with it. Thus, in the three -stage pipeline only one clock is lost ( instead of two ).


The processor core works exclusively with the new instruction set Thumb2, making it more efficient to some as older ARM processors with Thumb (based on the execution time ) and with about 30% more compact code operates as an older ARM processors in the ARM mode. The Thumb2 instruction set includes both 16 - and 32- bit instructions that are designed as efficiently as possible to work with compilers that implement d hz as C / C code, but can of course be used also in assembler.

Almost all the classic ARM instructions (such as Thumb ) only 16 bits long. This allows the loading of two commands in just one stroke.

Other major innovations in Thumb2 are eg native bitfield manipulation, hardware division, and if-then statements. The latter allow for conditional execution of code ( without jump must ).

Easy -held programming model

Another great advantage of the Cortex- M3 architecture is that the programmer must have for relatively simple tasks no precise knowledge of the internal structure and the structure of the core and no assembler knowledge needed for programming.

A hardware-based interrupt scheme allows a very simple writing interrupt handlers without first complicated start-up code must be written in assembler and implemented register.


The Cortex -M3 already has some peripherals with it. For example, a real Vectored Interrupt Controller ( VIC), memory protection, timers and debug and trace options are integrated into the processor.

Additional peripherals such as UART, more timers, PWM, I2C, SPI, etc. are developed by the chip manufacturers, or to buy as IP ( Intellectual Property) of ARM. These can be - as usual - addressed via register.

Vectored Interrupt Controller

Classically, there is a vector table ARM - based controllers from real commands, ie commands usually jump to an address ( to which another jump or the routine (ISR ) is ). An exception may form the location for the FIQ (Fast Interrupt reQuest ) here, as it is the last place and you can start there immediately with the code for this interrupt routine to be executed very quickly.

With the implementation of the Cortex-M3 NVIC in this table is now a real vector table that is standing jump addresses ( vectors), and no valid instructions there. Through its direct interface to the core of these vectors can be (addresses ) loaded and jumped very quickly.

The interrupt controller is very closely connected with the Cortex- M3 core. The backup of the context is done automatically by the processor. Other special features such as "Late Arriving interrupt " and " Tail Chaining " in which a POP / PUSH sequence is saved ( cf. ARM7 ), result in a very efficient interrupt system. In general, each interrupt can be assigned a priority, ie the priority is not fixed on the position in the vector table. The priorities can be organized into groups. Interrupts in different groups can interrupt each other, within a group interrupts a higher priority interrupt niederpriorigen no. This is important, for example, to avoid deadlocks when hardware jointly used.

Register set

Classic ARM cores have a set of 37 32 -bit registers, divided in different modes. Depending on the mode, different registers are visible. This saves the saving of registers, eg for the FIQ ( shadow registers R8 to R14 cover the registers from the user-mode ).

The Cortex -M3 only has a set of registers ( R0 to R12, SP, LR, PC and CPSR ) and an additional stack pointer (SP). When you jump to an ISR, the Core automatically backs up the eight registers PC, CPSR, LR R0 to R3 and R12.


ARM is a fabless semiconductor company, that is, they represent themselves no chips forth. ARM develops processor architectures ( and peripheral ), and provides these companies other than IP block (Intellectual Property ) available.

The first implementation of this processor in a microcontroller was made ​​by the startup company Luminary Micro. Luminary Micro is also a fabless company and manufacturer of the " Stellaris " microcontroller family ( LM3S811, LM3S828, and Others ). A variety of peripherals has been implemented on the chip; addition to the usual elements such as UART, timers, etc. can be found, for example, a complex 3-phase motor control.

Another Cortex- M3 -based microcontroller family from STMicroelectronics available for some time (see links). The STM32 family is also based on the Cortex- M3 core from ARM and includes a number of very interesting peripheral elements, for example, for 3-phase motor control, Hall-/Inkrementalgeber-Interface, DMA, 12 -bit ADC, and a lot of Standard peripherals. Very interesting points of this implementation, for example, are separated from the core debug module that provides access to the core and to " accidentally " Ver programming of the clock trees, as well as a clock security circuit including a failure of the external Clock noticed and switch to an internal oscillator (eg quartz ) can - without the Core " crashes ". A clean programming model for Peripherals and a system of header files for the register mapping, which allows easy programming of the cores without Driver Library, complete this microcontroller.

ARM Cortex Microcontroller Software Interface Standard ( CMSIS )

In order to enable a common basis for the programming of the Cortex- M- based microcontrollers, has published a standard HAL on 12 November 2008 by ARM. It was created in collaboration with partners in the industry, and offers on the Layer1, the abstraction of the registers to be mapped on the memory layout of a common, easy -to-use programming base. Until now, to the plurality of implementations their own approaches to the programming model of # defines to individual registers to complex structures. HAL uses the concept of complex structures, for each peripheral, there is a structure, which reflects the registers of the respective peripherals completely. The structures are mapped to the addresses of the peripherals, which allows access Peripheral → register = value or value = Peripheral → Register.

A Layer 2, a very efficient layer of inline functions, provides additional helper functions for NVIC simple handling and a system tick timer setup, which is tuned to the requirements of real-time operating systems.