I am currently working on the porting of a proprietary RTOS that was originally written for the Coldfire CPU to the Zynq, which has a Cortex-A9 Armv7 CPU inside! I've been struggling for a while to make the context switch function work and feel like it only worked for the Coldfire architecture! Nevertheless, I have tried some workarounds that brought me nowhere, and I am considering breaking the whole scheduler implementation and re-implementing it from scratch for the Zynq.
I will, of course, expose the details of the implementation and would love it if anyone could suggest a way to make it work, or a workaround, or eventually conclude that it will not be possible and it's better to start over...
ColdFire Implementation
Let me start by showing here the assembly routines for context switching and critical sections!
_syst_CS:
move.w sr,d0
move.w #0x2700,sr
rts
nop
_syst_CSEnd:
move.w 6(a7),d0
move.w d0,sr
rts
These functions are used for the Critical Sections since on the Coldfire the SR register allows disabling interrupts (and more)!
ColdFire Status Register (SR) Layout
ASCII Bit Layout
Bit: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| T | 0 | S | M | 0 | I | CCR |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Bit Field Descriptions
| Bits | Name | Description |
|---|---|---|
| 15 | T | Trace Enable - When set, processor performs trace exception after every instruction |
| 14 | - | Reserved - Must be cleared |
| 13 | S | Supervisor/User State - 0: User mode, 1: Supervisor mode |
| 12 | M | Master/Interrupt State - Cleared by interrupt exception, can be set by RTE or move to SR |
| 11 | - | Reserved - Must be cleared |
| 10-8 | I | Interrupt Level Mask - Defines current interrupt level (3 bits = levels 0-7) |
| 7-0 | CCR | Condition Code Register - Contains condition codes (8 bits) |
Here is the code for the context start and the context switch, respectively!
syst_McuCtxStart(uint32_t *old_sp, uint32_t new_stack,
void (*new_pc)(void *), void *new_context);
_syst_McuCtxStart:
; save current task
link a6,#-40
movem.l d2/d3/d4/d5/d6/d7/a2/a3/a4/a5,(a7)
move.w sr, d0 ; for irq level
move.l d0, -(a7)
move.l 8(a6), a0 ; Store old StackPointer
move.l a7, (a0)
; start other task
move.l 12(a6), a7
add.l 16(a6), a7 ; Init sp
move.l 20(a6), a0 ; First pc
move.l 24(a6), d0 ; context arg
move.l d0, -(a7)
move.w #0x2000, sr ; Init sr
jsr (a0) ; call body
loop:
bra loop
syst_McuCtxSw(uint32_t *current_context, uint32_t next_context);
_syst_McuCtxSw:
; save current task
link a6,#-40
movem.l d2/d3/d4/d5/d6/d7/a2/a3/a4/a5,(a7)
move.w sr, d0 ; for irq level
move.l d0, -(a7)
move.l 8(a6), a0
move.l a7, (a0)
; restore other task
move.l 12(a6), a7
move.l (a7)+, d0
move.w d0, sr
movem.l (a7),d2/d3/d4/d5/d6/d7/a2/a3/a4/a5
lea 40(a7),a6
unlk a6
rts
And finally, here is the interrupt handler for the Timer interrupt. This is the timer that triggers every time a waiting task has to wake up. The timer is actually a queue of timer requests that set the counter of the timer in their order of arival
_syst_McuCtxIrq:
move.w #0x2700, sr ; no other iterrupt can insert a timer Req
link a6,#0
lea -16(a7),a7
movem.l d0/d1/a0/a1,(a7)
jsr _timer_ReqRaise
movem.l (a7),d0/d1/a0/a1
unlk a6
rte
The function called here, timer_ReqRaise, simply stops the timer, suspends the current task by putting it into the wait-list, and eventually (after changing some metadata to specify the new task) calls syst_McuCtxSw. Note that tasks don't have to be changed only by IRQs—this can also happen from the normal code flow (to Suspend, Abort, or other actions...).
Arm Architecture
What was immediately concerning to me is the fact that on Coldfire, there is no notion of multiple modes, banked registers between said modes, or separate stacks for each of them! So in my first implementation, I tried to avoid scheduling directly from the IRQ—simply setting global variables to signal a scheduling request and processing them in IDLE. However, as you can imagine, this made the code way more non-deterministic and critical in nature! So I had to re-evaluate my approach... But of course, I quickly saw the mess it was going to create if I allowed saving of contexts in the IRQ stack. I immediately had undefined behaviors and a whole spaghetti interrupt hell, so I decided to lay out a different approach.
My Horrible Attempt
Please know that since this RTOS is being ported to a Xilinx-made CPU, I made my life easier (at least that’s what I thought) by using their standard library and, in general, the Vitis Workflow.
Critical Sections
For the Critical sections it was a pretty easy and direct conversion, albeit I'm not even sure that this is the correct method:
__attribute__((naked)) unsigned int syst_CS(void)
{
__asm__ volatile (
"mrs r0, cpsr \n\t" // Read current CPSR into r0 (return value)
"orr r1, r0, %0 \n\t" // OR with IRQ mask to disable IRQs
"msr cpsr_c, r1 \n\t" // Write back to CPSR control field
"bx lr \n\t" // Return (r0 contains old CPSR)
:
: "I" (XIL_EXCEPTION_IRQ) // Immediate operand constraint
: "r1" // Clobbered register
);
}
__attribute__((naked)) void syst_CSEnd(unsigned int prev_level)
{
__asm__ volatile (
"mrs r1, cpsr \n\t" // Read current CPSR into r1
"bic r1, r1, %0 \n\t" // Clear IRQ bit in current CPSR
"and r0, r0, %0 \n\t" // Mask prev_level to only IRQ bit
"orr r1, r1, r0 \n\t" // Combine current flags with prev IRQ state
"msr cpsr_c, r1 \n\t" // Write back to CPSR control field
"bx lr \n\t" // Return
:
: "I" (XIL_EXCEPTION_IRQ) // Immediate operand constraint
: "r1" // Clobbered register
);
}
Now for the context switch. Firstly let me explain my logic:
- There are multiple stacks unlike on the Coldfire
- There are multiple modes unlike on the Coldfire
- We must avoid saving contexts when in the IRQ stack
- We will only ever have 2 CPU modes allowed (MACHINE/IRQ)
- I do not want to change any logic of the upper level code, only assembly
With that in mind I decided that the best approach would be to save R4-R12 LR somewhere in a global variable, we will call it saved_regs, when entering the IRQHandler (in the Vector Exception Table). That way when I eventually get to the call of timer_ReqRaise that will eventually call syst_McuCtxSw, I Would be able to:
- Save the CPSR and change temporarily to user mode making sure the interrups are still disabled
- Push the saved registers to the stack of the interrupted task
- Also push the usr_LR (which can be different from LR in a case of a leaf function)
- Finaly push the SPSR (state of USER/MACHINE mode CPSR when the interrupt occured)
- Eventually push a magic word to better see the saved contexts
- Save the SP where the function expects it to be (&old_sp)
- Now we are ready for the switch
- From the new stack pop the registers (just like those that were saved)
- Change the usr_LR nd LR if necessary
- Do not branch directly to the LR, let the whole code unroll
- When back in the IRQHandler routine let it branch to the correct LR and restore back to MACHINE mode.
Of course as mentioned before since we are not only switching because of the timer_ReqRaise, the MACHINE mode switch just pushes and respectively pops the LR register twice (to adjust for the usr_LR). Also you will see portins with the DEBUG_RTOS macro, this is to insert a magic word, or verify some properties of the registers pushed/pop from the context
Here is the IRQHandler:
IRQHandler: /* IRQ vector handler */
stmdb sp!,{r0-r3} /* state save from compiled code*/
movw r1, #:lower16:saved_regs
movt r1, #:upper16:saved_regs
stmia r1,{r4-r12,lr}
bl IRQInterrupt
movw r1, #:lower16:saved_regs
movt r1, #:upper16:saved_regs
ldmia r1,{r4-r12,lr}
ldmia sp!,{r0-r3} /* state restore from compiled code */
subs pc, lr, #4 /* adjust return */
Now for the start context:
void __attribute__((naked)) syst_McuCtxStart(uint32_t *old_sp, uint32_t new_stack,
void (*new_pc)(void *), void *new_context)
{
__asm__ __volatile__(
"stmfd sp!, {r4-r12,lr} \n\t" // Save registers
"mrs r4, cpsr \n\t" // Save current CPSR
"stmfd sp!, {lr} \n\t" // USR_LR
"stmfd sp!, {r4} \n\t" // Store CPSR on stack
#ifdef DEBUG_RTOS
"mov r4, #0xAA55 \n\t" // Store Magic low
"movt r4, #0xAA55 \n\t" // Store Magic high
"stmfd sp!, {r4} \n\t" // Store Magic on the stack
#endif
// Before changing the SP put the original r4 back
// The offset is 8 or 4 depending on if the DEBUG magic value was pushed
#ifdef DEBUG_RTOS
"ldr r4, [sp, #8] \n\t"
#else
"ldr r4, [sp, #4] \n\t"
#endif
"str sp, [%[old_sp]] \n\t" // Save SP to old context
// Start the new task
"mov sp, %[new_stack] \n\t" // Set SP to (new_stack + stack_size)
"mov r0, %[new_context] \n\t" // Set up argument in r0
// Switch to user mode and start task
"msr cpsr_c, #0x1F \n\t" // Switch to machine mode
"mov pc, %[new_pc] \n\t" // Jump to entry point
// Infinite loop (should never return here)
"1: b 1b \n\t"
:
: [old_sp] "r"(old_sp), [new_stack] "r"(new_stack), [new_pc] "r"(new_pc), [new_context] "r"(new_context)
: "sp", "memory");
}
And finaly the big thing! The context switch!
void __attribute__((naked))
syst_McuCtxSw(uint32_t *current_context, uint32_t next_context)
{
// r0 and r1 used as scratch
register uint32_t *ctx_current __asm__("r2") = current_context;
register uint32_t ctx_next __asm__("r3") = next_context;
__asm__ __volatile__(
// Make sure that the interrupts are disabled!
// Otherwise, blackhole!
// We are not supposed to be context switching when interrupts are enabled
#ifdef DEBUG_RTOS
"mrs r0, cpsr \n\t"
"and r0, r0, #0x80 \n\t"
"cmp r0, #0x00 \n\t"
"beq 3f \n\t"
#endif
// The stack used to save must be one from the User/System mode not IRQ!
// Also, if we are coming from the IRQ then the current r4-r12 are what the IRQ routines have put there
// Not the contents that were there before the IRQ occurred!
"mrs r1, cpsr \n\t"
"and r1, r1, #0x1F \n\t"
"cmp r1, #0x12 \n\t"
"bne is_machine \n\t"
// In IRQ case we have to POP from the USR SP because that's where the stak was when we got interrupted
// Also because of the architecture of the xilinx library I had to overwrite (wrap) the IRQHandler.
// Specifically The registers we have to save are currently copied into saved_regs
"is_irq: \n\t"
// We need another register to use as scratch, lets use r4
"push {r4} \n\t"
// We have to get the SP from the Machine/User space
// For that change back to USR/Machine
"movw r1, #:lower16:saved_regs \n\t"
"movt r1, #:upper16:saved_regs \n\t"
"add r1, r1, #36 \n\t" // Place at the end of the list
"mrs r0, cpsr \n\t" // Save CPSR of IRQ mode
"msr cpsr_fsxc, 0x9f \n\t" // Machine mode with interrupts disabled!
// At this point R1 contains the address of the registers (r4-r12, lr) saved by the IRQHandler
// Since we are now In User/Machine mode we have directly access to the SP, lets fill it up
// Load the registers into the User/Machine stack
// Stack all the registers
"ldr r4, [r1], #-4 \n\t" // Stacking LR
"sub r4, r4, #4 \n\t"
"str r4, [sp], #-4 \n\t"
"ldr r4, [r1], #-4 \n\t" // Stacking R12
"str r4, [sp], #-4 \n\t"
"ldr r4, [r1], #-4 \n\t" // Stacking R11
"str r4, [sp], #-4 \n\t"
"ldr r4, [r1], #-4 \n\t" // Stacking R10
"str r4, [sp], #-4 \n\t"
"ldr r4, [r1], #-4 \n\t" // Stacking R9
"str r4, [sp], #-4 \n\t"
"ldr r4, [r1], #-4 \n\t" // Stacking R8
"str r4, [sp], #-4 \n\t"
"ldr r4, [r1], #-4 \n\t" // Stacking R7
"str r4, [sp], #-4 \n\t"
"ldr r4, [r1], #-4 \n\t" // Stacking R6
"str r4, [sp], #-4 \n\t"
"ldr r4, [r1], #-4 \n\t" // Stacking R5
"str r4, [sp], #-4 \n\t"
"ldr r4, [r1], #-4 \n\t" // Stacking R4
#ifdef DEBUG_RTOS
"str r4, [sp], #-8 \n\t" // -12 here because we would still have to store the CPSR/SPSR and the Debug Magic!
#else
"str r4, [sp], #-4 \n\t" // -8 here because we would still have to store the CPSR/SPSR
#endif
// Also since we came from an IRQ handler and we might have been interrupted in a leaf function, we must also save the usr_lr
"str lr, [sp], #-4 \n\t" // Stacking USR_LR
// Store the SP into current_context (r2)
"str sp, [r2] \n\t" // current_context (r2)
"msr cpsr_fsxc, r0 \n\t" // Back to IRQ Mode
// The only thing left to store on the User/Machine stack is the SPSR
// Load the SP back to R0
"ldr r0, [r2] \n\t"
// Add the correct offset to reposition at CPSRs place in the stack frame
#ifdef DEBUG_RTOS
"add r0, r0, #4 \n\t"
#endif
// Read SPSR
"mrs r1, spsr \n\t"
// Store SPSR
"str r1, [r0] \n\t"
#ifdef DEBUG_RTOS
// If the Debug Magic word is also part of the frame then
"mov r1, #0xAA55 \n\t"
"movt r1, #0xAA55 \n\t"
"sub r0, r0, #4 \n\t"
"str r1, [r0] \n\t"
#endif
"b next_context_switch \n\t" // Done saving the context switch to the sub routine to load the next one
// Save the current task context
"is_machine: \n\t"
"stmfd sp!, {r4-r12, lr} \n\t" // Push registers to the USER/SYSTEM (task) stack
"stmfd sp!, {lr} \n\t" // USR_LR
"mrs r0, cpsr \n\t" // Save CPSR
"stmfd sp!, {r0} \n\t"
#ifdef DEBUG_RTOS
"mov r0, #0xAA55 \n\t"
"movt r0, #0xAA55 \n\t"
"stmfd sp!, {r0} \n\t"
#endif
"str sp, [r2] \n\t" // current_context (r2)
// Weather from IRQ or from Machine mode This is the part where the new context gets restored
"next_context_switch: \n\t"
#ifdef DEBUG_RTOS
// Place suspected faulting switch address here
"mov r0, #0x1000 \n\t"
"movt r0, #0x0000 \n\t"
"mov r1, #0x1000 \n\t" // Set your desired range value
"sub r2, r0, r1 \n\t" // r2 = r0 - range (lower bound)
"add r1, r0, r1 \n\t" // r1 = r0 + range (upper bound)
"cmp r3, r2 \n\t" // Compare r3 with lower bound
"blt outrangef \n\t" // Branch if r3 < (r0 - range)
"cmp r3, r1 \n\t" // Compare r3 with upper bound
"bgt outrangef \n\t" // Branch if r3 > (r0 + range)
"b 4f \n\t" // Branch if within range
"outrangef: \n\t" // Label for out of range
#endif
#ifdef DEBUG_RTOS
// The new stack is in next_context (r3)
// Check the Magic word!
"ldmfd r3!, {r0} \n\t"
"mov r1, #0xAA55 \n\t"
"movt r1, #0xAA55 \n\t"
"cmp r1, r0 \n\t"
"bne 1f \n\t"
#endif
// If we are coming from the IRQ we have to patch r4-r12,lr and put the tasks CPSR into SPSR
"mrs r1, cpsr \n\t"
"and r1, r1, #0x1F \n\t"
"cmp r1, #0x12 \n\t"
"bne is_machine_mode \n\t"
// In IRQ case
"is_irq_mode: \n\t"
// CPSR restoring
"ldmfd r3!, {r0} \n\t"
// Store to SPSR
"msr spsr_cxsf, r0 \n\t"
// Restore USR_LR into R4
"ldmfd r3!, {r4} \n\t"
// Store registers
"movw r1, #:lower16:saved_regs \n\t"
"movt r1, #:upper16:saved_regs \n\t"
"ldmfd r3!, {r0} \n\t"
"str r0, [r1], #4 \n\t" // Storing R4
"ldmfd r3!, {r0} \n\t"
"str r0, [r1], #4 \n\t" // Storing R5
"ldmfd r3!, {r0} \n\t"
"str r0, [r1], #4 \n\t" // Storing R6
"ldmfd r3!, {r0} \n\t"
"str r0, [r1], #4 \n\t" // Storing R7
"ldmfd r3!, {r0} \n\t"
"str r0, [r1], #4 \n\t" // Storing R8
"ldmfd r3!, {r0} \n\t"
"str r0, [r1], #4 \n\t" // Storing R9
"ldmfd r3!, {r0} \n\t"
"str r0, [r1], #4 \n\t" // Storing R10
"ldmfd r3!, {r0} \n\t"
"str r0, [r1], #4 \n\t" // Storing R11
"ldmfd r3!, {r0} \n\t"
"str r0, [r1], #4 \n\t" // Storing R12
"ldmfd r3!, {r0} \n\t"
"add r0, r0, #4 \n\t"
"str r0, [r1], #4 \n\t" // Storing LR
// Verify LR
#ifdef DEBUG_RTOS
"movw r2, #:lower16:_text_start \n\t"
"movt r2, #:upper16:_text_start \n\t"
"cmp r0, r2 \n\t"
"blt 2f \n\t"
"movw r2, #:lower16:_text_end \n\t"
"movt r2, #:upper16:_text_end \n\t"
"cmp r0, r2 \n\t"
"bge 2f \n\t"
#endif
// Change SP and USR_LR
"mrs r0, cpsr \n\t" // Save CPSR of IRQ mode
"msr cpsr_fsxc, 0x9f \n\t" // Machine mode with interrupts disabled!
"mov sp, r3 \n\t"
"mov lr, r4 \n\t"
"msr cpsr_fsxc, r0 \n\t" // Back to IRQ Mode
"pop {r4} \n\t"
"bx lr \n\t"
// Restore the stack context of the next task
"is_machine_mode: \n\t"
"ldmfd r3!, {r0} \n\t" // Restore CPSR into r0
"ldmfd r3!, {r2} \n\t" // Restore USR_LR into r2
"ldmfd r3!, {r4-r12, lr} \n\t"
// Perform DEBUG checks on the LR
#ifdef DEBUG_RTOS
"movw r1, #:lower16:_text_start \n\t"
"movt r1, #:upper16:_text_start \n\t"
"cmp lr, r1 \n\t"
"blt 2f \n\t"
"movw r1, #:lower16:_text_end \n\t"
"movt r1, #:upper16:_text_end \n\t"
"cmp lr, r1 \n\t"
"bge 2f \n\t"
#endif
// Put the USR_LR and IRQ_LR at the correct spots
"mov r1, lr \n\t"
"mov lr, r2 \n\t"
"mov sp, r3 \n\t"
"msr cpsr_cxsf, r0 \n\t"
"bx r1 \n\t"
// Different blackholes for each error
#ifdef DEBUG_RTOS
"1: b 1b \n\t"
"2: b 2b \n\t"
"3: b 3b \n\t"
"4: b 4b \n\t"
#endif
:
: "r"(ctx_current), "r"(ctx_next)
: "memory"
);
}
As you can see it follows the functionality I have described above.
When I saw the sheer size of this functions and compared it to the context switch of another RTOS (FreeRTOS by any chance), I realised that something is definitely wrong in my approach...
I would really love some guidance, and will be able to give you any additional informaition needed, this problem is driving me crazy these last weeks.