1

I am using below gcc compiler to compile codebase for ARM cortex M33 with optimization -Os.

arm-none-eabi-gcc.exe (Arm GNU Toolchain 14.3.Rel1 (Build arm-14.174)) 14.3.1 20250623
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

In some functions prologue it is pushing r0,r1,r2 as well in stack, this is not done for all functions.

    23d8:   e92d 41ff   stmdb   sp!, {r0, r1, r2, r3, r4, r5, r6, r7, r8, lr}
    23dc:   460c        mov r4, r1
    23de:   4605        mov r5, r0

When it wants that functions argument to be passed to another function, it tries to pop it from stack, when it does, it pops to a wrong location due to incorrect sp manipulation in generated code,

    24f2:   b004        add sp, #16
    24f4:   e8bd 81f0   ldmia.w sp!, {r4, r5, r6, r7, r8, pc}

This is causing the pointer address to be in location not existing on device causing a harfault

Tarmac from ARM Core:

Push:
245840 clk ES  (000023d8:e92d41ff) T thrd:           PUSH     {r0-r8,lr}
                    ST 200046e0  00004a23 00000010 00000009 2000018c     00000000200046e0  NM NSH
                       200046d0  20001910 00000000 000023d9 a0000000     00000000200046d0  NM NSH
                       200046c0  20002350 20001910 ........ ........     00000000200046c0  NM NSH
                    R MSP  200046c8

Pop:
246163 clk ES  (000024f4:e8bd81f0) T thrd:           POP      {r4-r8,pc}
                    LD 200046e0  00004a23 00000010 00000009 2000018c     00000000200046e0  NM NSH
                       200046d0  20001910 00000100 ........ ........     00000000200046d0  NM NSH
                    R R4   00000100
                    R R5   20001910
                    R R6   2000018c
                    R R7   00000009
                    R R8   00000010
                    R MSP  200046f0
                    R XPSR  69000000
                    BR (00004a22) T

R0 On Push is 20001910 and on pop is 00000100, why? [here R4 on pop is used with MLA with another register to get the actual pointer location to be passed to function, expecting R4 to have 20001910]

When I mark all those functions which does push the registers (r0,r1,r2) into stack and retrieves from it as -O2, I am not seeing hardfault. I tried to justify maybe the stack frame is big or stack depth in the call is big, none of it seems to be correlating theme among all the functions where this is done.

Any suggestions on how to mitigate this problem still keeping a global -Os flag? Note there no naked functions in codebase or any manual manipulation of MSP in code. There are some inline assembly or ARM gcc intrinsics from CMSIS.

On -O2:

000023f0 <func>:
23f0:   e92d 43f0   stmdb   sp!, {r4, r5, r6, r7, r8, r9, lr}
23f4:   460d        mov r5, r1
23f6:   4604        mov r4, r0
23f8:   b085        sub sp, #20
23fa:   2202        movs    r2, #2
23fc:   2100        movs    r1, #0
23fe:   f205 201b   addw    r0, r5, #539    @ 0x21b
2402:   f02a f8e3   bl  2c5cc <memset>
2406:   f894 3068   ldrb.w  r3, [r4, #104]  @ 0x68
240a:   f894 01c8   ldrb.w  r0, [r4, #456]  @ 0x1c8
240e:   f3c3 0202   ubfx    r2, r3, #0, #3
2412:   f003 0307   and.w   r3, r3, #7
2416:   f88d 2008   strb.w  r2, [sp, #8]
241a:   2b02        cmp r3, #2
241c:   f3c0 0202   ubfx    r2, r0, #0, #3
2420:   f88d 2009   strb.w  r2, [sp, #9]
2424:   d103        bne.n   242e <func+0x3e>
2426:   f000 0007   and.w   r0, r0, #7
242a:   2801        cmp r0, #1
242c:   d000        beq.n   2430 <func+0x40>
242e:   2000        movs    r0, #0
2430:   f025 fe18   bl  28064 <func3>
2434:   2204        movs    r2, #4
2436:   2100        movs    r1, #0
2438:   a803        add r0, sp, #12
243a:   f02a f8c7   bl  2c5cc <memset>
243e:   2202        movs    r2, #2
2440:   2100        movs    r1, #0
2442:   a802        add r0, sp, #8
2444:   f02a f8c2   bl  2c5cc <memset>
2448:   2201        movs    r2, #1
244a:   f04f 0800   mov.w   r8, #0
244e:   f894 3068   ldrb.w  r3, [r4, #104]  @ 0x68
2452:   f894 11c8   ldrb.w  r1, [r4, #456]  @ 0x1c8
2456:   f013 0307   ands.w  r3, r3, #7
245a:   bf1c        itt ne
245c:   f103 33ff   addne.w r3, r3, #4294967295
2460:   b2db        uxtbne  r3, r3
2462:   eb0d 0403   add.w   r4, sp, r3
2466:   18e8        adds    r0, r5, r3
2468:   eb0d 0343   add.w   r3, sp, r3, lsl #1
246c:   731a        strb    r2, [r3, #12]
246e:   7a23        ldrb    r3, [r4, #8]
2470:   f011 0107   ands.w  r1, r1, #7
2474:   4413        add r3, r2
2476:   7223        strb    r3, [r4, #8]
2478:   f890 321b   ldrb.w  r3, [r0, #539]  @ 0x21b
247c:   f04f 0901   mov.w   r9, #1
2480:   ea43 0302   orr.w   r3, r3, r2
2484:   f880 321b   strb.w  r3, [r0, #539]  @ 0x21b
2488:   bf08        it  eq
248a:   4613        moveq   r3, r2
248c:   4647        mov r7, r8
248e:   bf1c        itt ne
2490:   f101 31ff   addne.w r1, r1, #4294967295
2494:   b2cb        uxtbne  r3, r1
2496:   441d        add r5, r3
2498:   eb0d 0243   add.w   r2, sp, r3, lsl #1
249c:   446b        add r3, sp
249e:   f882 900d   strb.w  r9, [r2, #13]
24a2:   7a1a        ldrb    r2, [r3, #8]
24a4:   ae02        add r6, sp, #8
24a6:   444a        add r2, r9
24a8:   721a        strb    r2, [r3, #8]
24aa:   f895 321b   ldrb.w  r3, [r5, #539]  @ 0x21b
24ae:   ac03        add r4, sp, #12
24b0:   f043 0302   orr.w   r3, r3, #2
24b4:   f885 321b   strb.w  r3, [r5, #539]  @ 0x21b
24b8:   f816 3b01   ldrb.w  r3, [r6], #1
24bc:   2b02        cmp r3, #2
24be:   d812        bhi.n   24e6 <func+0xf6>
24c0:   d02c        beq.n   251c <func+0x12c>
24c2:   2300        movs    r3, #0
24c4:   4638        mov r0, r7
24c6:   461a        mov r2, r3
24c8:   4619        mov r1, r3
24ca:   f8cd 9000   str.w   r9, [sp]
24ce:   f024 fdc1   bl  27054 <func4>
24d2:   1c7b        adds    r3, r7, #1
24d4:   2b02        cmp r3, #2
24d6:   f04f 0701   mov.w   r7, #1
24da:   f104 0402   add.w   r4, r4, #2
24de:   d1eb        bne.n   24b8 <func+0xc8>
24e0:   b005        add sp, #20
24e2:   e8bd 83f0   ldmia.w sp!, {r4, r5, r6, r7, r8, r9, pc}
24e6:   7823        ldrb    r3, [r4, #0]
24e8:   b1b3        cbz r3, 2518 <func+0x128>
24ea:   2100        movs    r1, #0
24ec:   4640        mov r0, r8
24ee:   f025 fad9   bl  27aa4 <func2>
24f2:   7863        ldrb    r3, [r4, #1]
24f4:   f108 0001   add.w   r0, r8, #1
24f8:   b2c0        uxtb    r0, r0
24fa:   b1e3        cbz r3, 2536 <func+0x146>
24fc:   2101        movs    r1, #1
24fe:   f025 fad1   bl  27aa4 <func2>
2502:   2301        movs    r3, #1
2504:   2103        movs    r1, #3
2506:   9300        str r3, [sp, #0]
2508:   4638        mov r0, r7
250a:   2300        movs    r3, #0
250c:   f108 0802   add.w   r8, r8, #2
2510:   fa5f f888   uxtb.w  r8, r8
2514:   461a        mov r2, r3
2516:   e7da        b.n 24ce <func+0xde>
2518:   2107        movs    r1, #7
251a:   e7e7        b.n 24ec <func+0xfc>
251c:   7823        ldrb    r3, [r4, #0]
251e:   b9db        cbnz    r3, 2558 <func+0x168>
2520:   7863        ldrb    r3, [r4, #1]
2522:   b983        cbnz    r3, 2546 <func+0x156>
2524:   f1b8 0f02   cmp.w   r8, #2
2528:   d007        beq.n   253a <func+0x14a>
252a:   2301        movs    r3, #1
252c:   2102        movs    r1, #2
252e:   9300        str r3, [sp, #0]
2530:   4638        mov r0, r7
2532:   2300        movs    r3, #0
2534:   e7ee        b.n 2514 <func+0x124>
2536:   2107        movs    r1, #7
2538:   e7e1        b.n 24fe <func+0x10e>
253a:   2101        movs    r1, #1
253c:   2300        movs    r3, #0
253e:   4638        mov r0, r7
2540:   461a        mov r2, r3
2542:   9100        str r1, [sp, #0]
2544:   e7c3        b.n 24ce <func+0xde>
2546:   4640        mov r0, r8
2548:   2101        movs    r1, #1
254a:   f108 0801   add.w   r8, r8, #1
254e:   f025 faa9   bl  27aa4 <func2>
2552:   fa5f f888   uxtb.w  r8, r8
2556:   e7e5        b.n 2524 <func+0x134>
2558:   4640        mov r0, r8
255a:   2100        movs    r1, #0
255c:   f108 0801   add.w   r8, r8, #1
2560:   f025 faa0   bl  27aa4 <func2>
2564:   fa5f f888   uxtb.w  r8, r8
2568:   e7da        b.n 2520 <func+0x130>

On -Os:

000023f0 <func>:
23f0:   e92d 41ff   stmdb   sp!, {r0, r1, r2, r3, r4, r5, r6, r7, r8, lr}
23f4:   460c        mov r4, r1
23f6:   4605        mov r5, r0
23f8:   2202        movs    r2, #2
23fa:   2100        movs    r1, #0
23fc:   f204 201b   addw    r0, r4, #539    @ 0x21b
2400:   f02a f8dc   bl  2c5bc <memset>
2404:   f895 3068   ldrb.w  r3, [r5, #104]  @ 0x68
2408:   f895 01c8   ldrb.w  r0, [r5, #456]  @ 0x1c8
240c:   f3c3 0202   ubfx    r2, r3, #0, #3
2410:   f003 0307   and.w   r3, r3, #7
2414:   f88d 2008   strb.w  r2, [sp, #8]
2418:   2b02        cmp r3, #2
241a:   f3c0 0202   ubfx    r2, r0, #0, #3
241e:   f88d 2009   strb.w  r2, [sp, #9]
2422:   f000 0007   and.w   r0, r0, #7
2426:   d173        bne.n   2510 <func+0x120>
2428:   2801        cmp r0, #1
242a:   d171        bne.n   2510 <func+0x120>
242c:   f025 fe12   bl  28054 <func3>
2430:   2204        movs    r2, #4
2432:   2100        movs    r1, #0
2434:   a803        add r0, sp, #12
2436:   f02a f8c1   bl  2c5bc <memset>
243a:   2202        movs    r2, #2
243c:   2100        movs    r1, #0
243e:   a802        add r0, sp, #8
2440:   f02a f8bc   bl  2c5bc <memset>
2444:   f895 3068   ldrb.w  r3, [r5, #104]  @ 0x68
2448:   aa04        add r2, sp, #16
244a:   f013 0307   ands.w  r3, r3, #7
244e:   bf1c        itt ne
2450:   f103 33ff   addne.w r3, r3, #4294967295
2454:   b2db        uxtbne  r3, r3
2456:   eb02 0143   add.w   r1, r2, r3, lsl #1
245a:   2201        movs    r2, #1
245c:   f801 2c04   strb.w  r2, [r1, #-4]
2460:   f103 0110   add.w   r1, r3, #16
2464:   eb0d 0001   add.w   r0, sp, r1
2468:   f810 1c08   ldrb.w  r1, [r0, #-8]
246c:   4423        add r3, r4
246e:   4411        add r1, r2
2470:   f800 1c08   strb.w  r1, [r0, #-8]
2474:   f893 121b   ldrb.w  r1, [r3, #539]  @ 0x21b
2478:   f04f 0801   mov.w   r8, #1
247c:   4311        orrs    r1, r2
247e:   f883 121b   strb.w  r1, [r3, #539]  @ 0x21b
2482:   f895 31c8   ldrb.w  r3, [r5, #456]  @ 0x1c8
2486:   af02        add r7, sp, #8
2488:   f013 0307   ands.w  r3, r3, #7
248c:   bf0e        itee    eq
248e:   4613        moveq   r3, r2
2490:   f103 33ff   addne.w r3, r3, #4294967295
2494:   b2db        uxtbne  r3, r3
2496:   aa04        add r2, sp, #16
2498:   eb02 0243   add.w   r2, r2, r3, lsl #1
249c:   441c        add r4, r3
249e:   f802 8c03   strb.w  r8, [r2, #-3]
24a2:   f103 0210   add.w   r2, r3, #16
24a6:   f894 321b   ldrb.w  r3, [r4, #539]  @ 0x21b
24aa:   eb0d 0102   add.w   r1, sp, r2
24ae:   f043 0302   orr.w   r3, r3, #2
24b2:   f884 321b   strb.w  r3, [r4, #539]  @ 0x21b
24b6:   2400        movs    r4, #0
24b8:   4626        mov r6, r4
24ba:   f811 2c08   ldrb.w  r2, [r1, #-8]
24be:   ad03        add r5, sp, #12
24c0:   4442        add r2, r8
24c2:   f801 2c08   strb.w  r2, [r1, #-8]
24c6:   f817 3b01   ldrb.w  r3, [r7], #1
24ca:   2b02        cmp r3, #2
24cc:   d926        bls.n   251c <func+0x12c>
24ce:   782b        ldrb    r3, [r5, #0]
24d0:   b303        cbz r3, 2514 <func+0x124>
24d2:   2100        movs    r1, #0
24d4:   4620        mov r0, r4
24d6:   f025 fadd   bl  27a94 <func2>
24da:   786b        ldrb    r3, [r5, #1]
24dc:   1c60        adds    r0, r4, #1
24de:   b2c0        uxtb    r0, r0
24e0:   b1d3        cbz r3, 2518 <func+0x128>
24e2:   2101        movs    r1, #1
24e4:   f025 fad6   bl  27a94 <func2>
24e8:   2301        movs    r3, #1
24ea:   9300        str r3, [sp, #0]
24ec:   2300        movs    r3, #0
24ee:   2103        movs    r1, #3
24f0:   461a        mov r2, r3
24f2:   3402        adds    r4, #2
24f4:   b2e4        uxtb    r4, r4
24f6:   4630        mov r0, r6
24f8:   f024 fda4   bl  27044 <func4>
24fc:   1c73        adds    r3, r6, #1
24fe:   2b02        cmp r3, #2
2500:   f04f 0601   mov.w   r6, #1
2504:   f105 0502   add.w   r5, r5, #2
2508:   d1dd        bne.n   24c6 <func+0xd6>
250a:   b004        add sp, #16
250c:   e8bd 81f0   ldmia.w sp!, {r4, r5, r6, r7, r8, pc}
2510:   2000        movs    r0, #0
2512:   e78b        b.n 242c <func+0x3c>
2514:   2107        movs    r1, #7
2516:   e7dd        b.n 24d4 <func+0xe4>
2518:   2107        movs    r1, #7
251a:   e7e3        b.n 24e4 <func+0xf4>
251c:   d117        bne.n   254e <func+0x15e>
251e:   782b        ldrb    r3, [r5, #0]
2520:   b12b        cbz r3, 252e <func+0x13e>
2522:   4620        mov r0, r4
2524:   2100        movs    r1, #0
2526:   f025 fab5   bl  27a94 <func2>
252a:   3401        adds    r4, #1
252c:   b2e4        uxtb    r4, r4
252e:   786b        ldrb    r3, [r5, #1]
2530:   b12b        cbz r3, 253e <func+0x14e>
2532:   4620        mov r0, r4
2534:   2101        movs    r1, #1
2536:   f025 faad   bl  27a94 <func2>
253a:   3401        adds    r4, #1
253c:   b2e4        uxtb    r4, r4
253e:   2101        movs    r1, #1
2540:   2300        movs    r3, #0
2542:   2c02        cmp r4, #2
2544:   9100        str r1, [sp, #0]
2546:   461a        mov r2, r3
2548:   bf18        it  ne
254a:   2102        movne   r1, #2
254c:   e7d3        b.n 24f6 <func+0x106>
254e:   2300        movs    r3, #0
2550:   f8cd 8000   str.w   r8, [sp]
2554:   461a        mov r2, r3
2556:   4619        mov r1, r3
2558:   e7cd        b.n 24f6 <func+0x106>

Trace for -Os:

246161 clk ES  (000024f0:d1dd)     T thrd:    CCFAIL BNE      0x24ae
 246162 clk ES  (000024f2:b004)     T thrd:           ADD      sp,sp,#0x10
                R MSP  200046d8
 246163 clk ES  (000024f4:e8bd81f0) T thrd:           POP      {r4-r8,pc}
                LD 200046e0  00004a23 00000010 00000009 2000018c     00000000200046e0  NM NSH
                   200046d0  20001910 00000100 ........ ........     00000000200046d0  NM NSH
                R R4   00000100
                R R5   20001910
                R R6   2000018c
                R R7   00000009
                R R8   00000010
                R MSP  200046f0
                R XPSR  69000000
                BR (00004a22) T
 246173 clk ES  (00004a22:464b)     T thrd:           MOV      r3,r9
                R R3   20002350
 246174 clk ES  (00004a24:462a)     T thrd:           MOV      r2,r5
                R R2   20001910
 246175 clk ES  (00004a26:4620)     T thrd:           MOV      r0,r4
                R R0   00000100
 246176 clk ES  (00004a28:f8d67770) T thrd:           LDR      r7,[r6,#0x770]
                LD 200008f0  000039fd ........ ........ ........     00000000200008f0  NM NSH
                R R7   000039fd
 246178 clk ES  (00004a2c:2102)     T thrd:           MOVS     r1,#2
                R R1   00000002
                R XPSR  29000000
 246179 clk ES  (00004a2e:47b8)     T thrd:           BLX      r7
                R LR   00004a31
                R XPSR  29000000
                BR (000039fc) T
 246181 clk ES  (000039fc:e92d4ff0) T thrd:           PUSH     {r4-r11,lr}
                ST 200046e0  00004a31 000000b5 000000b5 20002350     00000000200046e0  NM NSH
                   200046d0  00000010 000039fd 2000018c 20001910     00000000200046d0  NM NSH
                   200046c0  00000100 ........ ........ ........     00000000200046c0  NM NSH
                R MSP  200046cc
 246191 clk ES  (00003a00:4698)     T thrd:           MOV      r8,r3
                R R8   20002350
 246192 clk ES  (00003a02:2300)     T thrd:           MOVS     r3,#0
                R R3   00000000
                R XPSR  69000000
 246193 clk ES  (00003a04:b08d)     T thrd:           SUB      sp,sp,#0x34
                R MSP  20004698
 246194 clk ES  (00003a06:4605)     T thrd:           MOV      r5,r0
                R R5   00000100
 246195 clk ES  (00003a08:4617)     T thrd:           MOV      r7,r2
                R R7   20001910
 246196 clk ES  (00003a0a:9109)     T thrd:           STR      r1,[sp,#0x24]
                ST 200046b0  00000002 ........ ........ ........     00000000200046b0  NM NSH
 246198 clk ES  (00003a0c:930b)     T thrd:           STR      r3,[sp,#0x2c]
                ST 200046c0  ........ ........ 00000000 ........     00000000200046c0  NM NSH
 246199 clk ES  (00003a0e:f88d302b) T thrd:           STRB     r3,[sp,#0x2b]
                ST 200046c0  ........ ........ ........ 00......     00000000200046c0  NM NSH
 246200 clk ES  (00003a12:2906)     T thrd:           CMP      r1,#6
                R XPSR  89000000
 246201 clk ES  (00003a14:f2008105) T thrd:    CCFAIL BHI      0x3c22
 246202 clk ES  (00003a18:e8dff011) T thrd:           TBH      [pc,r1,LSL #1]
                LD 00003a20  ........ ........ ........ ....008b     0000000000003a20  NM NSH
                BR (00003b32) T
 246207 clk ES  (00003b32:f44f73b0) T thrd:           MOV.W    r3,#0x160
                R R3   00000160
 246208 clk ES  (00003b36:f04f0b0f) T thrd:           MOV.W    r11,#0xf
                R R11  0000000f
 246209 clk ES  (00003b3a:f04f0a0c) T thrd:           MOV.W    r10,#0xc
                R R10  0000000c
 246210 clk ES  (00003b3e:fb032300) T thrd:           MLA      r3,r3,r0,r2
                R R3   20017910
 246212 clk ES  (00003b42:f8932058) T thrd:           LDRB     r2,[r3,#0x58]
                R CFSR  00008200
                R HFSR  40000000
 246226 clk ES  EXC [3] HardFault
                ST 20004640  89000000 00003b42 00004a31 200046d8     0000000020004640  NM NSH
                   20004630  20017910 20001910 00000002 00000100     0000000020004630  NM NSH
                R MSP  20004630
                R LR   ffffffa8
                R XPSR  89000003
                R CONTROL  00000000
                R FPCCR  c0000019
                R FPCCR  00000000
                BR (0001ea00) T

R4 -> 100, Movs to R0, MLA to derive next pointer gets wrong adddress

Working case for -O2:

     239512 clk ES  (000024e0:b005)     T thrd:           ADD      sp,sp,#0x14
                R MSP  200046d4
 239513 clk ES  (000024e2:e8bd83f0) T thrd:           POP      {r4-r9,pc}
                LD 200046e0  00004a5f 20002350 00000010 00000009     00000000200046e0  NM NSH
                   200046d0  2000018c 20001910 00000000 ........     00000000200046d0  NM NSH
                R R4   00000000
                R R5   20001910
                R R6   2000018c
                R R7   00000009
                R R8   00000010
                R R9   20002350
                R MSP  200046f0
                R XPSR  69000000
                BR (00004a5e) T
 239524 clk ES  (00004a5e:464b)     T thrd:           MOV      r3,r9
                R R3   20002350
 239525 clk ES  (00004a60:462a)     T thrd:           MOV      r2,r5
                R R2   20001910
 239526 clk ES  (00004a62:4620)     T thrd:           MOV      r0,r4
                R R0   00000000
 239527 clk ES  (00004a64:f8d67770) T thrd:           LDR      r7,[r6,#0x770]
                LD 200008f0  00003a39 ........ ........ ........     00000000200008f0  NM NSH
                R R7   00003a39
 239529 clk ES  (00004a68:2102)     T thrd:           MOVS     r1,#2
                R R1   00000002
                R XPSR  29000000
 239530 clk ES  (00004a6a:47b8)     T thrd:           BLX      r7
                R LR   00004a6d
                R XPSR  29000000
                BR (00003a38) T
 239532 clk ES  (00003a38:e92d4ff0) T thrd:           PUSH     {r4-r11,lr}
                ST 200046e0  00004a6d 000001b5 000001b5 20002350     00000000200046e0  NM NSH
                   200046d0  00000010 00003a39 2000018c 20001910     00000000200046d0  NM NSH
                   200046c0  00000000 ........ ........ ........     00000000200046c0  NM NSH
                R MSP  200046cc
 239542 clk ES  (00003a3c:4698)     T thrd:           MOV      r8,r3
                R R8   20002350
 239543 clk ES  (00003a3e:2300)     T thrd:           MOVS     r3,#0
                R R3   00000000
                R XPSR  69000000
 239544 clk ES  (00003a40:b08d)     T thrd:           SUB      sp,sp,#0x34
                R MSP  20004698
 239545 clk ES  (00003a42:4605)     T thrd:           MOV      r5,r0
                R R5   00000000
 239546 clk ES  (00003a44:4617)     T thrd:           MOV      r7,r2
                R R7   20001910
 239547 clk ES  (00003a46:9109)     T thrd:           STR      r1,[sp,#0x24]
                ST 200046b0  00000002 ........ ........ ........     00000000200046b0  NM NSH
 239549 clk ES  (00003a48:930b)     T thrd:           STR      r3,[sp,#0x2c]
                ST 200046c0  ........ ........ 00000000 ........     00000000200046c0  NM NSH
 239550 clk ES  (00003a4a:f88d302b) T thrd:           STRB     r3,[sp,#0x2b]
                ST 200046c0  ........ ........ ........ 00......     00000000200046c0  NM NSH
 239551 clk ES  (00003a4e:2906)     T thrd:           CMP      r1,#6
                R XPSR  89000000
 239552 clk ES  (00003a50:f2008105) T thrd:    CCFAIL BHI      0x3c5e
 239553 clk ES  (00003a54:e8dff011) T thrd:           TBH      [pc,r1,LSL #1]
                LD 00003a50  ....008b ........ ........ ........     0000000000003a50  NM NSH
                BR (00003b6e) T
 239558 clk ES  (00003b6e:f44f73b0) T thrd:           MOV.W    r3,#0x160
                R R3   00000160
 239559 clk ES  (00003b72:f04f0b0f) T thrd:           MOV.W    r11,#0xf
                R R11  0000000f
 239560 clk ES  (00003b76:f04f0a0c) T thrd:           MOV.W    r10,#0xc
                R R10  0000000c
 239561 clk ES  (00003b7a:fb032300) T thrd:           MLA      r3,r3,r0,r2
                R R3   20001910
 239563 clk ES  (00003b7e:f8932058) T thrd:           LDRB     r2,[r3,#0x58]
                LD 20001960  ........ ......86 ........ ........     0000000020001960  NM NSH
                R R2   00000086
7
  • Please post a minimal reproducible example. Without seeing some code, this will be kind of hard to debug. Commented Aug 5 at 9:22
  • Do you have a minimal reproducible example of C source that compiles to asm like this? Looks correct to me, though; it's maybe pushing 4 extra regs (R0-R3) as a way to reserve 16 extra bytes of stack space without a sub-immediate. But in the epilogue it doesn't want to overwrite the R0 return value, so it does add sp, #16 and only pops the call-preserved registers. Extra "dummy" pushes aren't fast, especially on low-end cores, so I'm a bit surprised if that's what's going on at -Os, not just -Oz, but anyway I don't see a correctness problem in the trimmed-down code block you're showing. Commented Aug 5 at 9:51
  • When it wants that functions argument to be passed to another function - Huh? The ldmia instruction you're showing pops into PC, so it's a return from the current function. It's not part of setup for another function call. I haven't looking in any detail at the full asm you showed, since there's no C source to match them up against. Also, R0 On Push is 20001910 and on pop is 00000100, why? - in the code block of output you're showing, the first popped register is R4. R0 isn't popped so your output doesn't show its value. (If I'm reading it right, R4=0 on push, 0x100 on pop?) Commented Aug 5 at 9:58
  • Added trace for -Os compile, R4 is incorrect, should have 0 for right pointer address computation, it doesnt happen in -O2 " 246163 clk ES (000024f4:e8bd81f0) T thrd: POP {r4-r8,pc} LD 200046e0 00004a23 00000010 00000009 2000018c 00000000200046e0 NM NSH 200046d0 20001910 00000100 ........ ........ 00000000200046d0 NM NSH R R4 00000100" Commented Aug 5 at 10:06
  • Probably some code in your function is writing past the end of an array, over the saved R4. At -O2, the stack layout is different so it's probably only overwriting padding (note sub sp, #20 instead of the 16 bytes from four dummy pushes). Have you tried compiling with -fsanitize=undefined or other bug-detection tools? Commented Aug 5 at 10:09

1 Answer 1

1

It is tempting to focus on the apparent mismatch between the values pushed and popped in the disassembly, but the output you show is actually a valid prologue/epilogue pattern for the Arm A32/Thumb-2 ABI when GCC is optimizing for code size. The Procedure-Call Standard for the Arm Architecture divides the general-purpose registers into two groups: r0-r3 are used to pass parameters and return values and may be freely clobbered by callees, whereas r4-r11 must be saved and restored by any function that uses them. In your -O2 build the compiler creates a conventional stack frame by subtracting a constant from the stack pointer and then saving r4-r8 and the link register.

In the -Os build it uses a different pattern where it pushes r0-r3 together with the calle-saved registers. As Raymond Chen explained, this "dummy" push of up to four argument registers is an optimization that implicitly reserves 16 bytes of local stack space.

The compiler can then recover that space with a single add sp, #16 in the epilogue and pop only the preserved registers.

This explains the behaviour you see. In the prologue the function executes stmdb sp!, {r0,r1,r2,r3,r4,r5,r6,r7,r8,lr}. r4 and r5 are then loaded into the incoming arguments from r1 and r0.

Towards the end of the function the compiler needs to free the 16 bytes it reserved with the dummy push. It cannot use pop {r0–r3,…} because r0 is used to return the function's result and r1 may hold a second result; instead it performs add sp, #16 to discard the reserved space and pops only the callee-saved registers ({r4–r8,pc}), which includes the saved link register.

This pattern is exactly what the ABI prescribes: the callee must restore the original values of r4-r8 and return via pc. It does not restore the original values of r0-r3 because those registers are caller-saved; any values you moved into r4 or r5 at the start of the function are not preserved across nested calls.

The hard-fault you are seeing comes from interpreting the restored value of r4 as if it still contained the argument. In your case, r4 is saved on entry, overwritten with the argument for internal use, and restored on return.

When the epilogue pops r4 from the stack, it restors the caller's original r4, not the argument. Reading r4 after the epilogue therefore yields whatever the caller had in that register, and using it as a pointer causes a crash.

With -O2 the compiler reserves stack space with sub sp,#n and keeps the argument in r5/r4 until a later stage, so your code happens to work. With -Os the optimization exposes the bug in your assumption about register lifetimes.

If you cannot change the code to avoid using callee-saved registers for argument propagation, you can compile the affected function with a different optimization level like (__attribute__((optimize("O2")))) so that GCC does not use the dummy-push pattern; you mentioned that this fixes the fault. You can also upgrade to a newer release of the Arm GNU toolchain; the 14.3 release contains over 200 bug fixes compared with 14.2, and later versions (GCC 15) include additional improvements in prologue generation. It can also help to disable certain size-driven optimizations by adding -mno-save-restore or -fno-shrink-wrap to the compiler flags, which forces GCC to use the standard prologue/epilogue sequence.

The safest long-term solution is to follow the ABI strictly: treat r0-r3 as volatile across calls, use r4-r11 only for values that must survive nested calls, and store any persistent arguments in memory rather than relying on the register contents.

Sign up to request clarification or add additional context in comments.

3 Comments

-mno-save-restore is for RISC-V GCC only, So the conclusion is to manually change -O2 for avoid stack spill? -fno-shrink-wrap doesnt remove the stack spill (r0-r2 push)
@Hari: You can't stop GCC from saving/restoring call-preserved registers, or at least you wouldn't want to. Instead, fix your buggy code. (If the caller is hand-written in asm and making wrong assumptions about R4 being updated like this answer is suggesting... don't do that. If not, then like I already commented, the saved R4 is probably overwriting, perhaps by one of the memset calls; buffer overflows are very easy bugs to have.)\
@Peter: I rechecked the code, it is because of some test environment issue, code does overwrite the stack .. I fixed it and its working in -Os as global. This bug is not exposed on -O2 however

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.