A Guide to Using ARM Stack Limit Registers
Stack overflows have notoriously plagued the development processes. They often can go undetected and can present themselves in obscure ways. We have implemented software mechanisms to protect against them, but these have limitations and still don’t protect against all conditions.
With the maturity of the ARM architecture, wouldn’t it be better to have a fool-proof mechanism for detecting overflows?
We will explore using the MSP Limit and the PSP Limit Registers on the ARM Cortex-M33 architecture to detect stack overflows. We will walk through an implementation on the Renesas DA1469x and look at practical examples of detecting stack overflows. Additionally, we will look at supplementary options for scenarios that the MSPLIM and PSPLIM features fall short.
Table of Contents
Basic Terminology
The ARM Cortex-M33 introduced two new stack limit registers, PSPLIM and MSPLIM 1. ARM has included this in its ARMv8 specification, so any processors before this will not have support.
How does it work?
The best part about this new feature is how easy it is to use and how it takes the guesswork out of debugging stack overflows.
We need to set the PSPLIM, and the MSPLIM registers to the boundary of the stack. If the MSP == MSPLIM register or the PSP == PSPLIM register, a UsageFault is generated. The UsageFault Status Register 2 contains a sticky bit in position four to indicate that a stack overflow occurred.
Having hardware protection for the PSP and MSP allows flexibility within an OS. For example, we can protect the MSP during exceptions and interrupts. We can also switch out the PSPLIM value on a context switch to safeguard each task’s stack. If you need a refresher on context switching, check a previous post here.
If no RTOS is present, we can still monitor the MSP, as this will protect your whole application.
Implementing the Limit Registers
For an implementation example on the DA1469x, you can access this at Renesas’s Github 3.
Initializing the MSPLIM Register
We use the MSR instruction to write to these registers, which requires us to be in privileged mode. In this case, we will set the MSP Limit in the Reset_Handler:
Reset_Handler:
ldr r0, =__StackLimit
add r0, r0, #dg_configMSP_PADDING
msr MSPLIM, r0
Specifically to the DA1469x, we also need to place the same initialization in the Wakeup_Reset_Handler:
Wakeup_Reset_Handler:
ldr r0, =__StackLimit
add r0, r0, #dg_configMSP_PADDING
msr MSPLIM, r0
The DA1469x sleep architecture doesn’t pertain to all Cortex-M33 architectures. After initialization, the DA1469x replaces the Reset_Handler with the Wakeup_Reset_Handler to handle the restoration of Cortex-M33 registers.
There are two definitions provided elsewhere in the project. __StackLimit is defined in vector_table_da1469x.S:
.section .stack
.align 3
.globl __StackTop
.globl __StackLimit
__StackLimit:
.space Stack_Size
.size __StackLimit, . - __StackLimit
__StackTop:
.size __StackTop, . - __StackTop
This definition helps us find the stack limit for setting the MSP.
We also added padding to this value. You will find this value in a configuration file:
#define dg_configMSP_PADDING (16)
When the MSPLIM is equal to the MSP, the UsageFault exception is triggered. The padding is required to enable pushing items to the stack on Exception entry. If we don’t make space for the fault handler, nested exceptions can occur as the MSPLIM register would be continuously exceeded, usually resulting in a LOCKUP.
The alternative would be to use a naked function4. However, I prefer to add padding as it provides more flexibility in the fault handler and allows for Memfault hooks!
Initializing the PSPLIM Register
On the DA1469x SDK, it makes use of FreeRTOS. The psp is used for each task’s stack so we can set up the PSPLIM register to protect against a task overflow. This implementation is superior to FreeRTOS’s stack overflow check5 for the following reasons:
-
FreeRTOS only checks the watermark on a context switch. Therefore, if a thread overflows the stack and isn’t yielding, it can corrupt memory, access null pointers, etc.
-
FreeRTOS does not recommend using this feature in production environments because of the context switch overhead6.
Implementing this is more straightforward. First, we must adjust the PSPLIM during a context switch in FreeRTOS.
In tasks.c we create the following above vTaskSwitchContext:
void vTaskSwitchStackGuard(void)
{
volatile uint32_t end_of_stack_val = (uint32_t)pxCurrentTCB->pxStack;
__set_PSPLIM( end_of_stack_val);
}
Next, we add the call immediately after the context switch:
void xPortPendSVHandler( void ){
...
" bl vTaskSwitchContext \n"
" bl vTaskSwitchStackGuard \n"
...
}
That’s all we need to do for the PSP.
Setting up the UsageFault_Handler
All that’s left is doing some work in the UsageFault_Handler. We will declare the UsageFault_Handler in exceptions_handler.s and call a separate handler afterward.
First, we declare an application handler:
__RETAINED_CODE void UsageFault_HandlerC(uint8_t stack_pointer_mask)
{
volatile uint16_t usage_fault_status_reg __UNUSED;
usage_fault_status_reg = (SCB->CFSR & SCB_CFSR_USGFAULTSR_Msk) >> SCB_CFSR_USGFAULTSR_Pos;
hw_watchdog_freeze();
if(usage_fault_status_reg & (SCB_CFSR_STKOF_Msk >> SCB_CFSR_USGFAULTSR_Pos))
{
while(1){}
}
while (1) {}
}
Next, let’s add our UsageFault_Handler into the exceptions_handler.S:
#if (dg_configCODE_LOCATION == NON_VOLATILE_IS_FLASH)
.section text_retained
#endif
.align 2
.thumb
.thumb_func
.globl UsageFault_Handler
.type UsageFault_Handler, %function
UsageFault_Handler:
ldr r2,=UsageFault_HandlerC
mrs r1, msp
mrs r0, MSPLIM
cmp r0, r1
beq UsageFault_with_MSP_Overflow
mrs r1, psp
mrs r0, PSPLIM
cmp r0, r1
beq UsageFault_with_PSP_Overflow
mov r0, #0
bx r2
UsageFault_with_PSP_Overflow:
mov r0, #2
bx r2
UsageFault_with_MSP_Overflow:
ldr r1, =__StackLimit
msr MSPLIM, r1
mov r0, #1
bx r2
Since the USFR does not indicate if the psp or the msp caused the fault, I decided to add some detection in assembly. I prefer doing this in assembly to ensure no stack pushes before the application handler call.
- 0 - General UsageFault
- 1 - MSP Overflow
- 2 - PSP Overflow (Task Overflow)
In this function, we are checking the MSP and the PSP registers against the
limit registers. If the MSP matches the MSPLIM register, we restore the MSPLIM
to __StackLimit
(Removing the padding we placed initially) and then call our
application fault handler.
Testing our Implementation
We need a small piece of code to test the implementation. In our example, there is a macro provided for causing an overflow for the MSP or the PSP:
#define TOGGLE_MSP_OVERFLOW (0) //0 Creates an application overflow in FreeRTOS task, 1 creates it on the MSP
When the button is pressed, depending on this macro setting, it calls a recursive function either in interrupt context or in our main task.
static void test_overflow_func(void)
{
test_overflow_func();
}
static void _wkup_key_cb(void)
{
BaseType_t need_switch;
/* Clear the WKUP interrupt flag!!! */
hw_wkup_reset_interrupt();
#if TOGGLE_MSP_OVERFLOW > 0
test_overflow_func();
#endif
xTaskNotifyFromISR(overFlow_handle, BUTTON_PRESS_NOTIF, eSetBits, &need_switch);
portEND_SWITCHING_ISR(need_switch);
}
...
void prvTestOverFlowTask( void *pvParameters )
{
_wkup_init();
overFlow_handle = xTaskGetCurrentTaskHandle();
for ( ;; ) {
uint32_t notif;
/*
* Wait on any of the notification bits, then clear them all
*/
xTaskNotifyWait(0, 0xFFFFFFFF, ¬if, portMAX_DELAY);
/* Notified from BLE manager? */
if (notif & BUTTON_PRESS_NOTIF) {
test_overflow_func();
}
}
}
After pressing the button, we should see the UsageFault_HandlerC get called in our application code.
Limitations and Further Improvements
The MSPLIM and PSPLIM registers will help against most stack overflows. Unfortunately, they do not protect us from local buffers corrupting the stack. We will look at the most common; buffer overflow. A buffer overflow occurs when a fixed buffer is allocated on the stack, and the program starts writing to memory addresses outside this boundary. This results in corrupted data and can even change the return address of a function, causing undesired execution of application code.
There are different ways to handle this condition on other architectures. For example, Zephyr uses the MPU to guard the PSP on each thread. Here, we will discuss stack canaries.
Stack Canary
Stack Canaries are widely implemented as a means of code hardening. A function will place a value (canary) on the end of a stack frame and will check the value is intact before it returns. This mechanism protects against buffer overflow attacks, where malicious source code could overflow the buffer to redirect the return address to its function.
This same idea can also be used to guard against buffer overflows in our application.
FreeRTOS Buffer Overflow protection
FreeRTOS implements a means for overflow detection, as discussed in Initializing the PSPLIM Register. This uses the concept of a canary, which will periodically check the value during a context switch.
FreeRTOS has two different configurations that follow this concept:
#if( ( configCHECK_FOR_STACK_OVERFLOW > 1 ) && ( portSTACK_GROWTH < 0 ) )
#define taskCHECK_FOR_STACK_OVERFLOW() \
{ \
const uint32_t * const pulStack = ( uint32_t * ) pxCurrentTCB->pxStack; \
const uint32_t ulCheckValue = ( uint32_t ) 0xa5a5a5a5; \
\
if( ( pulStack[ 0 ] != ulCheckValue ) || \
( pulStack[ 1 ] != ulCheckValue ) || \
( pulStack[ 2 ] != ulCheckValue ) || \
( pulStack[ 3 ] != ulCheckValue ) ) \
{ \
vApplicationStackOverflowHook( ( TaskHandle_t ) pxCurrentTCB, pxCurrentTCB->pcTaskName ); \
} \
}
#endif /* #if( configCHECK_FOR_STACK_OVERFLOW > 1 ) */
/*-----------------------------------------------------------*/
#if( ( configCHECK_FOR_STACK_OVERFLOW > 1 ) && ( portSTACK_GROWTH > 0 ) )
#define taskCHECK_FOR_STACK_OVERFLOW() \
{ \
int8_t *pcEndOfStack = ( int8_t * ) pxCurrentTCB->pxEndOfStack; \
static const uint8_t ucExpectedStackBytes[] = { tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, \
tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, \
tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, \
tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, \
tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE, tskSTACK_FILL_BYTE }; \
\
\
pcEndOfStack -= sizeof( ucExpectedStackBytes ); \
\
/* Has the extremity of the task stack ever been written over? */ \
if( memcmp( ( void * ) pcEndOfStack, ( void * ) ucExpectedStackBytes, sizeof( ucExpectedStackBytes ) ) != 0 ) \
{ \
vApplicationStackOverflowHook( ( TaskHandle_t ) pxCurrentTCB, pxCurrentTCB->pcTaskName ); \
} \
}
#endif /* #if( configCHECK_FOR_STACK_OVERFLOW > 1 ) */
Both methods check for an expected value at the end of the stack. If this value is overwritten, then vApplicationStackOverflowHook is called, and the application should record and reset. Unfortunately, the periodicity is non-deterministic, as it relies on a context switch. Periodic checks lead to a race condition when a task doesn’t yield in time. You can test this from the previous example by setting the following:
#define dg_configARMV8_USE_STACK_GUARDS (0)
#define #define TOGGLE_MSP_OVERFLOW (0)
In this example, prvTestOverFlowTask will not yield, so FreeRTOS does not catch this condition.
Compiler Enabled Overflow Detection
Compilers have started enabling SSP (Stack Smashing Protection) libraries. The library options will allow the compiler to use canaries within function calls. We’re going to look at GCC’s implementation7 specifically. GCC provides the following compiler flags:
-
-fstack-protector: This includes functions that call alloca and functions with buffers larger than or equal to 8 bytes. The guards are initialized when a function is entered and then checked when the function exits.
-
-fstack-protector-strong - Like -fstack-protector but includes additional functions to be protected — those that have local array definitions, or have references to local frame addresses.
-
-fstack-protector-all: all functions are protected.
-
-fstack-protector-explicit: protects those functions which have the stack_protect attribute
GCC SSP Example
Let’s take a look at using the ssp library in GCC. First, let’s add the compiler flag from the previous example: -fstack-protector. The ssp library has two externs that we define as follows.
uint32_t__stack_chk_guard = 0xDEADBEEF;
void __stack_chk_fail(void) { /* will be called if guard/canary gets corrupted */
__ASM volatile ("cpsid i" : : : "memory");
hw_watchdog_freeze(); // Stop WDOG
while(1){}
}
Let’s also add the vulnerability in our application and add it to the prvTestOverFlowTask:
__attribute__((optimize("O0"))) static uint8_t stack_buffer_test(uint16_t iters)
{
uint8_t buffer[16];
uint16_t i;
for(i = 0; i < iters; i++)
{
buffer[i] = 0xaa;
}
return buffer[8];
}
NOTE: __stack_chk_guard should be randomized on startup if using ssp for security reasons.
This function has a fixed buffer to pass a value larger than the local buffer. Let’s add a stack_buffer_test(17) to our task and look at the assembly.
(gdb) disassemble stack_buffer_test
Dump of assembler code for function stack_buffer_test:
0x0000ccc8 <+0>: push {r7, lr}
0x0000ccca <+2>: sub sp, #32
0x0000cccc <+4>: add r7, sp, #0
0x0000ccce <+6>: mov r3, r0
0x0000ccd0 <+8>: strh r3, [r7, #6]
0x0000ccd2 <+10>: ldr r3, [pc, #68] ; (0xcd18 <stack_buffer_test+80>)
0x0000ccd4 <+12>: ldr r3, [r3, #0]
0x0000ccd6 <+14>: str r3, [r7, #28]
0x0000ccd8 <+16>: mov.w r3, #0
0x0000ccdc <+20>: movs r3, #0
0x0000ccde <+22>: strh r3, [r7, #10]
0x0000cce0 <+24>: b.n 0xccf4 <stack_buffer_test+44>
0x0000cce2 <+26>: ldrh r3, [r7, #10]
0x0000cce4 <+28>: adds r3, #32
0x0000cce6 <+30>: add r3, r7
0x0000cce8 <+32>: movs r2, #170 ; 0xaa
0x0000ccea <+34>: strb.w r2, [r3, #-20]
0x0000ccee <+38>: ldrh r3, [r7, #10]
0x0000ccf0 <+40>: adds r3, #1
0x0000ccf2 <+42>: strh r3, [r7, #10]
0x0000ccf4 <+44>: ldrh r2, [r7, #10]
0x0000ccf6 <+46>: ldrh r3, [r7, #6]
0x0000ccf8 <+48>: cmp r2, r3
0x0000ccfa <+50>: bcc.n 0xcce2 <stack_buffer_test+26>
0x0000ccfc <+52>: ldrb r3, [r7, #20]
0x0000ccfe <+54>: ldr r2, [pc, #24] ; (0xcd18 <stack_buffer_test+80>)
0x0000cd00 <+56>: ldr r1, [r2, #0]
0x0000cd02 <+58>: ldr r2, [r7, #28]
0x0000cd04 <+60>: eors r1, r2
0x0000cd06 <+62>: mov.w r2, #0
0x0000cd0a <+66>: beq.n 0xcd10 <stack_buffer_test+72>
0x0000cd0c <+68>: bl 0xcdb0 <__stack_chk_fail>
0x0000cd10 <+72>: mov r0, r3
0x0000cd12 <+74>: adds r7, #32
0x0000cd14 <+76>: mov sp, r7
0x0000cd16 <+78>: pop {r7, pc}
0x0000cd18 <+80>: strh r4, [r6, #44] ; 0x2c
0x0000cd1a <+82>: movs r0, #0
Here we can see the compiler loading the canary at the end of the stack frame:
0x0000ccd2 <+10>: ldr r3, [pc, #68] ; (0xcd18 <stack_buffer_test+80>)
0x0000ccd4 <+12>: ldr r3, [r3, #0]
0x0000ccd6 <+14>: str r3, [r7, #28]
(gdb) x /1a 0xcd18
0xcd18 <stack_buffer_test+80>: 0x200085b4 <__stack_chk_guard>
(gdb) x /1a 0x200085b4
0x200085b4 <__stack_chk_guard>: 0xdeadbeef
Before return, we can see the function checking the canary at the end of the stack frame and calling __stack_chk_fail if the value is corrupted:
0x0000cd0a <+66>: beq.n 0xcd10 <stack_buffer_test+72>
0x0000cd0c <+68>: bl 0xcdb0 <__stack_chk_fail>
Running the rest of the example should confirm the call of __stack_chk_fail.
Practical implementations for GCC stack Canaries
Implementing the ssp library does provide additional overhead in execution time and code space. A function will add 7 additional instructions to make use of this feature. The developer should weigh these factors when choosing which setting to use in GCC.
My preference would be to develop and test with a stricter setting and more coverage and move to a more relaxed setting when getting closer to production. For example, you could start your development process with -fstack-protector-all, and later relax this to -fstack-protector-strong or -fstack-protector as the code matures.
Closing
The PSPLIM and the MSPLIM registers are great new features from ARM and a much-needed addition to the architecture. These can also be supplemented with other techniques to fortify your application. We hope you found this helpful, and will be inspired to make use of it in your application. Implementing these features should prevent many development headaches and safeguard your application in the field!