0

I am currently working on the porting of a proprietary RTOS that was originally written for the Coldfire CPU to the Zynq, which has a Cortex-A9 Armv7 CPU inside! I've been struggling for a while to make the context switch function work and feel like it only worked for the Coldfire architecture! Nevertheless, I have tried some workarounds that brought me nowhere, and I am considering breaking the whole scheduler implementation and re-implementing it from scratch for the Zynq.

I will, of course, expose the details of the implementation and would love it if anyone could suggest a way to make it work, or a workaround, or eventually conclude that it will not be possible and it's better to start over...

ColdFire Implementation

Let me start by showing here the assembly routines for context switching and critical sections!

_syst_CS:
        move.w  sr,d0
        move.w  #0x2700,sr
        rts
        nop
_syst_CSEnd:    
        move.w  6(a7),d0
        move.w  d0,sr
        rts

These functions are used for the Critical Sections since on the Coldfire the SR register allows disabling interrupts (and more)!

ColdFire Status Register (SR) Layout

ASCII Bit Layout

Bit:  15  14  13  12  11  10   9   8   7   6   5   4   3   2   1   0
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
     | T | 0 | S | M | 0 |     I     |            CCR                |
     +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
      15  14  13  12  11  10   9   8   7   6   5   4   3   2   1   0

Bit Field Descriptions

Bits Name Description
15 T Trace Enable - When set, processor performs trace exception after every instruction
14 - Reserved - Must be cleared
13 S Supervisor/User State - 0: User mode, 1: Supervisor mode
12 M Master/Interrupt State - Cleared by interrupt exception, can be set by RTE or move to SR
11 - Reserved - Must be cleared
10-8 I Interrupt Level Mask - Defines current interrupt level (3 bits = levels 0-7)
7-0 CCR Condition Code Register - Contains condition codes (8 bits)

Here is the code for the context start and the context switch, respectively!

syst_McuCtxStart(uint32_t *old_sp, uint32_t new_stack,
                                             void (*new_pc)(void *), void *new_context);
_syst_McuCtxStart:
        ; save current task
        link    a6,#-40
        movem.l d2/d3/d4/d5/d6/d7/a2/a3/a4/a5,(a7)

        move.w  sr, d0      ; for irq level
        move.l  d0, -(a7)
        move.l  8(a6), a0   ; Store old StackPointer
        move.l  a7, (a0)
        
        ; start other task
        move.l  12(a6), a7
        add.l   16(a6), a7  ; Init sp
        move.l  20(a6), a0  ; First pc
        move.l  24(a6), d0  ; context arg
        move.l  d0, -(a7)
        move.w  #0x2000, sr ; Init sr
        jsr (a0)        ; call body
loop:
        bra loop
syst_McuCtxSw(uint32_t *current_context, uint32_t next_context);
_syst_McuCtxSw:
        ; save current task
        link    a6,#-40
        movem.l d2/d3/d4/d5/d6/d7/a2/a3/a4/a5,(a7)
        move.w  sr, d0      ; for irq level
        move.l  d0, -(a7)
        move.l  8(a6), a0
        move.l  a7, (a0)
        
        ; restore other task
        move.l  12(a6), a7
        move.l  (a7)+, d0
        move.w  d0, sr
        movem.l (a7),d2/d3/d4/d5/d6/d7/a2/a3/a4/a5
        lea 40(a7),a6
        unlk    a6
        rts

And finally, here is the interrupt handler for the Timer interrupt. This is the timer that triggers every time a waiting task has to wake up. The timer is actually a queue of timer requests that set the counter of the timer in their order of arival

_syst_McuCtxIrq:
        move.w  #0x2700, sr ; no other iterrupt can insert a timer Req
        link    a6,#0
        lea -16(a7),a7
        movem.l d0/d1/a0/a1,(a7)
        jsr _timer_ReqRaise
        movem.l (a7),d0/d1/a0/a1
        unlk    a6
        rte

The function called here, timer_ReqRaise, simply stops the timer, suspends the current task by putting it into the wait-list, and eventually (after changing some metadata to specify the new task) calls syst_McuCtxSw. Note that tasks don't have to be changed only by IRQs—this can also happen from the normal code flow (to Suspend, Abort, or other actions...).

Arm Architecture

What was immediately concerning to me is the fact that on Coldfire, there is no notion of multiple modes, banked registers between said modes, or separate stacks for each of them! So in my first implementation, I tried to avoid scheduling directly from the IRQ—simply setting global variables to signal a scheduling request and processing them in IDLE. However, as you can imagine, this made the code way more non-deterministic and critical in nature! So I had to re-evaluate my approach... But of course, I quickly saw the mess it was going to create if I allowed saving of contexts in the IRQ stack. I immediately had undefined behaviors and a whole spaghetti interrupt hell, so I decided to lay out a different approach.

My Horrible Attempt

Please know that since this RTOS is being ported to a Xilinx-made CPU, I made my life easier (at least that’s what I thought) by using their standard library and, in general, the Vitis Workflow.

Critical Sections

For the Critical sections it was a pretty easy and direct conversion, albeit I'm not even sure that this is the correct method:

__attribute__((naked)) unsigned int syst_CS(void)
{
    __asm__ volatile (
        "mrs r0, cpsr           \n\t"   // Read current CPSR into r0 (return value)
        "orr r1, r0, %0         \n\t"   // OR with IRQ mask to disable IRQs
        "msr cpsr_c, r1         \n\t"   // Write back to CPSR control field
        "bx lr                  \n\t"   // Return (r0 contains old CPSR)
        :
        : "I" (XIL_EXCEPTION_IRQ)       // Immediate operand constraint
        : "r1"                          // Clobbered register
    );
}
__attribute__((naked)) void syst_CSEnd(unsigned int prev_level)
{
    __asm__ volatile (
        "mrs r1, cpsr           \n\t"   // Read current CPSR into r1
        "bic r1, r1, %0         \n\t"   // Clear IRQ bit in current CPSR
        "and r0, r0, %0         \n\t"   // Mask prev_level to only IRQ bit
        "orr r1, r1, r0         \n\t"   // Combine current flags with prev IRQ state
        "msr cpsr_c, r1         \n\t"   // Write back to CPSR control field
        "bx lr                  \n\t"   // Return
        :
        : "I" (XIL_EXCEPTION_IRQ)       // Immediate operand constraint
        : "r1"                          // Clobbered register
    );
}

Now for the context switch. Firstly let me explain my logic:

  • There are multiple stacks unlike on the Coldfire
  • There are multiple modes unlike on the Coldfire
  • We must avoid saving contexts when in the IRQ stack
  • We will only ever have 2 CPU modes allowed (MACHINE/IRQ)
  • I do not want to change any logic of the upper level code, only assembly

With that in mind I decided that the best approach would be to save R4-R12 LR somewhere in a global variable, we will call it saved_regs, when entering the IRQHandler (in the Vector Exception Table). That way when I eventually get to the call of timer_ReqRaise that will eventually call syst_McuCtxSw, I Would be able to:

  • Save the CPSR and change temporarily to user mode making sure the interrups are still disabled
  • Push the saved registers to the stack of the interrupted task
  • Also push the usr_LR (which can be different from LR in a case of a leaf function)
  • Finaly push the SPSR (state of USER/MACHINE mode CPSR when the interrupt occured)
  • Eventually push a magic word to better see the saved contexts
  • Save the SP where the function expects it to be (&old_sp)
  • Now we are ready for the switch
  • From the new stack pop the registers (just like those that were saved)
  • Change the usr_LR nd LR if necessary
  • Do not branch directly to the LR, let the whole code unroll
  • When back in the IRQHandler routine let it branch to the correct LR and restore back to MACHINE mode.

Of course as mentioned before since we are not only switching because of the timer_ReqRaise, the MACHINE mode switch just pushes and respectively pops the LR register twice (to adjust for the usr_LR). Also you will see portins with the DEBUG_RTOS macro, this is to insert a magic word, or verify some properties of the registers pushed/pop from the context

Here is the IRQHandler:

IRQHandler:                 /* IRQ vector handler */

    stmdb   sp!,{r0-r3}     /* state save from compiled code*/
    movw    r1, #:lower16:saved_regs
    movt    r1, #:upper16:saved_regs
    stmia   r1,{r4-r12,lr}
    bl      IRQInterrupt
    movw    r1, #:lower16:saved_regs
    movt    r1, #:upper16:saved_regs
    ldmia   r1,{r4-r12,lr}
    ldmia   sp!,{r0-r3}     /* state restore from compiled code */
    subs    pc, lr, #4          /* adjust return */

Now for the start context:

void __attribute__((naked)) syst_McuCtxStart(uint32_t *old_sp, uint32_t new_stack,
                                             void (*new_pc)(void *), void *new_context)

{
    __asm__ __volatile__(

        "stmfd sp!, {r4-r12,lr}        \n\t" // Save registers
        "mrs   r4, cpsr                 \n\t" // Save current CPSR
        "stmfd sp!, {lr}                \n\t" // USR_LR
        "stmfd sp!, {r4}                \n\t" // Store CPSR on stack
#ifdef DEBUG_RTOS
        "mov   r4, #0xAA55              \n\t" // Store Magic low
        "movt   r4, #0xAA55              \n\t" // Store Magic high
        "stmfd sp!, {r4}                \n\t" // Store Magic on the stack
#endif
        // Before changing the SP put the original r4 back
        // The offset is 8 or 4 depending on if the DEBUG magic value was pushed
#ifdef DEBUG_RTOS
        "ldr r4, [sp, #8]                       \n\t"
#else
        "ldr r4, [sp, #4]                       \n\t"
#endif
        "str   sp, [%[old_sp]]                 \n\t" // Save SP to old context

        // Start the new task
        "mov   sp, %[new_stack]                   \n\t" // Set SP to (new_stack + stack_size)
        "mov   r0, %[new_context]                   \n\t" // Set up argument in r0


        // Switch to user mode and start task
        "msr   cpsr_c, #0x1F            \n\t" // Switch to machine mode
        "mov   pc, %[new_pc]            \n\t" // Jump to entry point

        // Infinite loop (should never return here)
        "1:    b     1b                 \n\t"
        :
        : [old_sp] "r"(old_sp), [new_stack] "r"(new_stack), [new_pc] "r"(new_pc), [new_context] "r"(new_context)
        : "sp", "memory");
}

And finaly the big thing! The context switch!

void __attribute__((naked))
syst_McuCtxSw(uint32_t *current_context, uint32_t next_context)
{
    // r0 and r1 used as scratch
    register uint32_t *ctx_current __asm__("r2") = current_context;
    register uint32_t ctx_next     __asm__("r3") = next_context;

    __asm__ __volatile__(
        // Make sure that the interrupts are disabled!
        // Otherwise, blackhole!
        // We are not supposed to be context switching when interrupts are enabled
#ifdef DEBUG_RTOS
        "mrs   r0, cpsr                 \n\t"
        "and   r0, r0, #0x80            \n\t"
        "cmp   r0, #0x00                \n\t"
        "beq   3f                       \n\t"
#endif
        // The stack used to save must be one from the User/System mode not IRQ!
        // Also, if we are coming from the IRQ then the current r4-r12 are what the IRQ routines have put there
        // Not the contents that were there before the IRQ occurred!
        "mrs   r1, cpsr                 \n\t"
        "and   r1, r1, #0x1F            \n\t"
        "cmp   r1, #0x12                \n\t"
        "bne   is_machine               \n\t"


        // In IRQ case we have to POP from the USR SP because that's where the stak was when we got interrupted
        // Also because of the architecture of the xilinx library I had to overwrite (wrap) the IRQHandler.
        // Specifically The registers we have to save are currently copied into saved_regs
        "is_irq:                        \n\t"
        // We need another register to use as scratch, lets use r4
        "push {r4}                      \n\t"
        // We have to get the SP from the Machine/User space
        // For that change back to USR/Machine
        "movw       r1, #:lower16:saved_regs \n\t"
        "movt       r1, #:upper16:saved_regs \n\t"
        "add        r1, r1, #36              \n\t" // Place at the end of the list
        "mrs        r0, cpsr                 \n\t" // Save CPSR of IRQ mode
        "msr        cpsr_fsxc, 0x9f          \n\t" // Machine mode with interrupts disabled!
        // At this point R1 contains the address of the registers (r4-r12, lr) saved by the IRQHandler
        // Since we are now In User/Machine mode we have directly access to the SP, lets fill it up
        // Load the registers into the User/Machine stack

        // Stack all the registers
        "ldr   r4, [r1], #-4             \n\t" // Stacking LR
        "sub   r4, r4, #4                \n\t"
        "str   r4, [sp], #-4             \n\t"
        "ldr   r4, [r1], #-4             \n\t" // Stacking R12
        "str   r4, [sp], #-4             \n\t"
        "ldr   r4, [r1], #-4             \n\t" // Stacking R11
        "str   r4, [sp], #-4             \n\t"
        "ldr   r4, [r1], #-4             \n\t" // Stacking R10
        "str   r4, [sp], #-4             \n\t"
        "ldr   r4, [r1], #-4             \n\t" // Stacking R9
        "str   r4, [sp], #-4             \n\t"
        "ldr   r4, [r1], #-4             \n\t" // Stacking R8
        "str   r4, [sp], #-4             \n\t"
        "ldr   r4, [r1], #-4             \n\t" // Stacking R7
        "str   r4, [sp], #-4             \n\t"
        "ldr   r4, [r1], #-4             \n\t" // Stacking R6
        "str   r4, [sp], #-4             \n\t"
        "ldr   r4, [r1], #-4             \n\t" // Stacking R5
        "str   r4, [sp], #-4             \n\t"
        "ldr   r4, [r1], #-4             \n\t" // Stacking R4

#ifdef DEBUG_RTOS
        "str   r4, [sp], #-8            \n\t" // -12 here because we would still have to store the CPSR/SPSR and the Debug Magic!
#else
        "str   r4, [sp], #-4             \n\t" // -8 here because we would still have to store the CPSR/SPSR
#endif
        // Also since we came from an IRQ handler and we might have been interrupted in a leaf function, we must also save the usr_lr
        "str   lr, [sp], #-4             \n\t" // Stacking USR_LR
        // Store the SP into current_context (r2)
        "str   sp, [r2]                 \n\t" // current_context (r2)
        "msr   cpsr_fsxc, r0            \n\t" // Back to IRQ Mode
        // The only thing left to store on the User/Machine stack is the SPSR
        // Load the SP back to R0
        "ldr   r0, [r2]                 \n\t"
        // Add the correct offset to reposition at CPSRs place in the stack frame
#ifdef DEBUG_RTOS
        "add   r0, r0, #4              \n\t"
#endif
        // Read SPSR
        "mrs   r1, spsr                \n\t"
        // Store SPSR
        "str   r1, [r0]                 \n\t"
#ifdef DEBUG_RTOS
        // If the Debug Magic word is also part of the frame then
        "mov   r1, #0xAA55              \n\t"
        "movt  r1, #0xAA55              \n\t"
        "sub   r0, r0, #4               \n\t"
        "str   r1, [r0]                 \n\t"
#endif
        "b     next_context_switch      \n\t" // Done saving the context switch to the sub routine to load the next one


        // Save the current task context
        "is_machine:                    \n\t"
        "stmfd sp!, {r4-r12, lr}        \n\t" // Push registers to the USER/SYSTEM (task) stack
        "stmfd sp!, {lr}                \n\t" // USR_LR
        "mrs   r0, cpsr                 \n\t" // Save CPSR
        "stmfd sp!, {r0}                \n\t"
#ifdef DEBUG_RTOS
        "mov   r0, #0xAA55              \n\t"
        "movt  r0, #0xAA55              \n\t"
        "stmfd sp!, {r0}                \n\t"
#endif
        "str   sp, [r2]                 \n\t" // current_context (r2)


        // Weather from IRQ or from Machine mode This is the part where the new context gets restored
        "next_context_switch:           \n\t"
#ifdef DEBUG_RTOS
        // Place suspected faulting switch address here
        "mov   r0, #0x1000              \n\t"
        "movt  r0, #0x0000              \n\t"
        "mov   r1, #0x1000              \n\t"  // Set your desired range value
        "sub   r2, r0, r1               \n\t"  // r2 = r0 - range (lower bound)
        "add   r1, r0, r1               \n\t"  // r1 = r0 + range (upper bound)
        "cmp   r3, r2                   \n\t"  // Compare r3 with lower bound
        "blt   outrangef                \n\t"  // Branch if r3 < (r0 - range)
        "cmp   r3, r1                   \n\t"  // Compare r3 with upper bound
        "bgt   outrangef                \n\t"  // Branch if r3 > (r0 + range)
        "b     4f                       \n\t"  // Branch if within range
        "outrangef:                     \n\t"  // Label for out of range
#endif
#ifdef DEBUG_RTOS
// The new stack is in next_context (r3)
// Check the Magic word!
        "ldmfd r3!, {r0}                \n\t"
        "mov   r1, #0xAA55              \n\t"
        "movt  r1, #0xAA55              \n\t"
        "cmp   r1, r0                   \n\t"
        "bne   1f                       \n\t"
#endif
        // If we are coming from the IRQ we have to patch r4-r12,lr and put the tasks CPSR into SPSR
        "mrs   r1, cpsr                 \n\t"
        "and   r1, r1, #0x1F            \n\t"
        "cmp   r1, #0x12                \n\t"
        "bne   is_machine_mode          \n\t"
        // In IRQ case
        "is_irq_mode:                   \n\t"
        // CPSR restoring
        "ldmfd r3!, {r0}                \n\t"
        // Store to SPSR
        "msr   spsr_cxsf, r0            \n\t"
        // Restore USR_LR into R4
        "ldmfd r3!, {r4}                \n\t"
        // Store registers
        "movw       r1, #:lower16:saved_regs \n\t"
        "movt       r1, #:upper16:saved_regs \n\t"
        "ldmfd r3!, {r0}               \n\t"
        "str   r0, [r1], #4            \n\t" // Storing R4
        "ldmfd r3!, {r0}               \n\t"
        "str   r0, [r1], #4            \n\t" // Storing R5
        "ldmfd r3!, {r0}               \n\t"
        "str   r0, [r1], #4            \n\t" // Storing R6
        "ldmfd r3!, {r0}               \n\t"
        "str   r0, [r1], #4            \n\t" // Storing R7
        "ldmfd r3!, {r0}               \n\t"
        "str   r0, [r1], #4            \n\t" // Storing R8
        "ldmfd r3!, {r0}               \n\t"
        "str   r0, [r1], #4            \n\t" // Storing R9
        "ldmfd r3!, {r0}               \n\t"
        "str   r0, [r1], #4            \n\t" // Storing R10
        "ldmfd r3!, {r0}               \n\t"
        "str   r0, [r1], #4            \n\t" // Storing R11
        "ldmfd r3!, {r0}               \n\t"
        "str   r0, [r1], #4            \n\t" // Storing R12
        "ldmfd r3!, {r0}               \n\t"
        "add   r0, r0, #4              \n\t"
        "str   r0, [r1], #4            \n\t" // Storing LR
        // Verify LR
#ifdef DEBUG_RTOS
        "movw  r2, #:lower16:_text_start \n\t"
        "movt  r2, #:upper16:_text_start \n\t"
        "cmp   r0, r2                   \n\t"
        "blt   2f                       \n\t"
        "movw  r2, #:lower16:_text_end  \n\t"
        "movt  r2, #:upper16:_text_end  \n\t"
        "cmp   r0, r2                   \n\t"
        "bge   2f                       \n\t"
#endif
        // Change SP and USR_LR
        "mrs   r0, cpsr                 \n\t" // Save CPSR of IRQ mode
        "msr   cpsr_fsxc, 0x9f          \n\t" // Machine mode with interrupts disabled!
        "mov   sp, r3                   \n\t"
        "mov   lr, r4                   \n\t"
        "msr   cpsr_fsxc, r0            \n\t" // Back to IRQ Mode
        "pop {r4}                       \n\t"
        "bx lr                          \n\t"
        // Restore the stack context of the next task
        "is_machine_mode:               \n\t"
        "ldmfd r3!, {r0}                \n\t" // Restore CPSR into r0
        "ldmfd r3!, {r2}                \n\t" // Restore USR_LR into r2
        "ldmfd r3!, {r4-r12, lr}        \n\t"
        // Perform DEBUG checks on the LR
#ifdef DEBUG_RTOS
        "movw  r1, #:lower16:_text_start \n\t"
        "movt  r1, #:upper16:_text_start \n\t"
        "cmp   lr, r1                   \n\t"
        "blt   2f                       \n\t"
        "movw  r1, #:lower16:_text_end  \n\t"
        "movt  r1, #:upper16:_text_end  \n\t"
        "cmp   lr, r1                   \n\t"
        "bge   2f                       \n\t"
#endif
        // Put the USR_LR and IRQ_LR at the correct spots
        "mov   r1, lr                   \n\t"
        "mov   lr, r2                   \n\t"
        "mov   sp, r3                   \n\t"
        "msr   cpsr_cxsf, r0            \n\t"
        "bx    r1                       \n\t"
        // Different blackholes for each error
#ifdef DEBUG_RTOS
        "1:    b     1b                 \n\t"
        "2:    b     2b                 \n\t"
        "3:    b     3b                 \n\t"
        "4:    b     4b                 \n\t"
#endif
        :
        : "r"(ctx_current), "r"(ctx_next)
        : "memory"
    );
}

As you can see it follows the functionality I have described above.

When I saw the sheer size of this functions and compared it to the context switch of another RTOS (FreeRTOS by any chance), I realised that something is definitely wrong in my approach...

I would really love some guidance, and will be able to give you any additional informaition needed, this problem is driving me crazy these last weeks.

6
  • The problem is breaking the questions up will seem more confusing! That’s because the OS is heavily architecture reliant hence why I have posted everything in one batch! I have looked up the ABIs, and most of the “new” concepts (I am more familiar with arm then Coldfire) it’s just super confusing as how to integrate this in arm and do it correctly! I did find a very bit pretty workaround that pretty much does a similar thing to scheduling from IDLE but this time by disabling exceptions except in certain conditions. This is more of a cry of help so I tried to give as much context as I could Commented Jun 13 at 18:46
  • How much asm vs C code are you porting? If it's mostly C and just a bit of asm (e.g. mostly reg save/restore), you can just recode in arm from scratch. Or, there should already be equiv. funcs in FreeRTOS or whatever Xilinx provides in its SDK. You can pull those funcs in as wholesale replacements with some tweaking. This is probably better than doing a line-by-line translation of the m68k asm. If the entire m68k RTOS is mostly asm, I'd just toss it and start with FreeRTOS, rtems, etc. Commented Jun 15 at 14:00
  • No it’s just these routines! The thing is I want to make sure that my method of thinking works! And that’s why IAM exposing how I got to that particular code Commented Jun 15 at 20:14
  • Linux switches to a common stack. See: ARM-Linux stack init and ARM-Linux entry points. Linux originally supported x86, but then 68k. These don't have banked registers nor multiple stacks afair. This structure may benefit you. Commented Jun 17 at 15:17
  • Both m68k and (most) Coldfire CPUs have a separate stack in Supervisor mode.. Commented Jun 20 at 10:07

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.