1

Context

I'm trying to write a piece of code in inline assembly, which processes all elements of a "small" array (say ~10 elements) as a fully-unrolled loop. I want to avoid falling into the usual trap of declaring it as asm volatile with a generic "memory" clobber.

Right now the code in question is for the ARMv7-M architecture, targeting the Cortex-M4 core, but I often need to do something similar for AArch64.

I have already the following Stack Overflow questions and answers:

How can I indicate that the memory pointed to by an inline ASM argument may be used?

Looping over arrays with inline assembly

Why inline assembly

I know inline assembly is not always the best solution to the problem. For the cases I envision to use this, I believe it may be the best choice:

  • It is extremely performance critical;
  • The compiler is doing an awful job of register allocation;
  • The execution time of these assembly-language routines in are in the order of say ~100 cycles, so the cost of saving and restoring registers to adhere to the calling convention (if written directly in assembly, in a separate .s file) represents a significant portion of the execution time;
  • The code is used in a handful of places in the rest of my project, so I'm willing to pay the cost in code size to inline it in every one of these places;
  • I don't have the resources to rewrite the functions that call into this code in assembly (if I did, I could just directly insert the assembly code and avoid paying for the cost of saving and restoring registers.) I also frankly think that, if I did, it would be bad engineering practice to do so. They're much more readable and easier to maintain as C functions.

The problem

These are the requirements I have, and so I'm looking for a way to set up my inline assembly constraints so as to meet all of them.

  • My code is very near the limit of using all 14 available registers in the Cortex-M4 (while ARMv7-M has 16 registers, the SP and PC are of course reserved). I cannot afford to reserve registers that won't be actually used.
  • My code accesses each element of the array using a base + offset addressing mode, e.g. instructions like ldr r0, [r1, #16].

What I've tried

For the sake of example, the code will be used to implement a function with the following prototype:

void f(int out[10], const int in[10]);

Currently my code is written as such:

asm volatile(
    "my assembly code block"
    : [out] "=r" (out), /* possibly other output constraints */
    : [in] "r" (in), /* possibly other input constraints */
    : "memory"
);

Thus, I'm using the dreaded asm volatile + "memory" clobber.

Following the suggestion in this answer, to inform gcc of the actual addresses being accessed and taking out the volatile from asm and the "memory" clobber, I've tried rewriting it as such:

asm volatile(
    "my assembly code block"
    : [out] "=r" (out), "=m" (*(int(*)[10])out) /* possibly other output constraints */
    : [in] "r" (in), "m" (*(const int(*)[10])in) /* possibly other input constraints */
);

However, gcc complains with errors such as this:

/.../file1:282:1: error: unable to find a register to spill
  282 | }
      | ^
/.../file1:282:1: error: this is the insn:
(insn 6985 6986 6983 5 (set (reg:SI 1979 [ t ])
        (reg:SI 5177 [orig:1979 t ] [1979])) "/.../file2.c":83:5 759 {*thumb2_movsi_vfp}
     (expr_list:REG_DEAD (reg:SI 5177 [orig:1979 t ] [1979])
        (nil)))

This appears to be happening while trying to inline f (written in file2.c) into another function in file1.c.

I've also seen in some cases a message such as "impossible constraint in ‘asm’", which amazingly is solved by just bringing back the volatile keyword to the asm statement, but of course this isn't ideal.

My theory is that the "m" constraints may require a register to materialize, and since I'm already working at the limit of available registers, this is the straw that broke the camel's back.

If I take out one (say the input) memory ("m") constraint and bring back the "memory" clobber, this now works.

This (which I'm not even sure if it makes sense) also generates the same error:

asm volatile(
    "my assembly code block"
    : [out] "=r" (out), "=m" (*(int(*)[10])out) /* possibly other output constraints */
    : [in] "rm" (*(const int(*)[10])in) /* possibly other input constraints */
);

Something which I also tried, and which made me suspect that the above code doesn't even make sense, is:

asm volatile(
    "my assembly code block"
    : [out] "=rm" (*(int(*)[10])out) /* possibly other output constraints */
    : [in] "rm" (*(const int(*)[10])in) /* possibly other input constraints */
);

Now I get a truckload of errors like the following:

/var/folders/bg/8_8vh7ks6vq1t3mq4l3fswcc0000gn/T//ccTvGYHb.s:719: Error: ARM register expected -- `ldr.w fp,[[sp,#40],#28]'

This looks like the compiler is trying to use the address of out on the stack as the memory constraint. My problem is that I need to materialize it in a register so I can do base + offset addressing.

EDIT: I have found the "Q" constraint in the list of constraints for particular machines for ARM, which is described as such: "A memory reference where the exact address is in a single register (‘‘m’’ is preferable for asm statements)". My asm statement looks like this with this constraint:

asm volatile(
    "my assembly code block"
    : [out] "=Q" (*(int(*)[10])out) /* possibly other output constraints */
    : [in] "Q" (*(const int(*)[10])in) /* possibly other input constraints */
);

This results in errors such as this:

/var/folders/bg/8_8vh7ks6vq1t3mq4l3fswcc0000gn/T//ccgymksW.s:40: Error: garbage following instruction -- `ldrd r3,r8,[[r1],#8]'

I feel like I'm getting close to my answer. The pointer I need has been materialized in a register, and apparently it's all down to essentially a formatting issue: since "Q" is still a kind of memory constraint, the register comes wrapped in brackets, i.e. [r1] rather than just r1. All I need is to strip these brackets, and I should be done.

The question

It appears that what I really need is some kind of constraint that simultaneously materializes a pointer in a register, while serving to inform gcc that I'm using a specific region of memory pointed to by that array, so that I don't need to use asm volatile + the "memory" clobber. In other words, I'm hoping there's some constraint X which I can replace in the code below that works for my case:

asm volatile(
    "my assembly code block"
    : [out] "=X" (*(int(*)[10])out) /* possibly other output constraints */
    : [in] "X" (*(const int(*)[10])in) /* possibly other input constraints */
);

What is the proper constraint to replace for "X" here?

I won't rule out the possibility that this is an XY problem after all -- maybe what I need is not a magical constraint, but a completely different style of writing the constraints so that I don't run into this problem. Either way, I am open to any suggestions.

4
  • You can use two separate constraints, mem and pointer in reg and (with optimization enabled) the compiler will realize that it only needs the pointer in one register, at least if no register output is early-clobber. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for an example of how to describe x86 repne scasb (a loop over an array) to the compiler that way. Semi-related: Looping over arrays with inline assembly Commented Aug 18, 2024 at 3:20
  • I can't reproduce your GCC error with the asm statement inside a simple function, or calling that function twice from another function that can inline it. I used GCC14.1 on godbolt.org/z/EoE5jjW5z . You appear to be doing it the right way, so you maybe be hitting a compiler bug. Create a minimal reproducible example. Then edit your question with it and/or report it on gcc.gnu.org/bugzilla Commented Aug 18, 2024 at 3:29
  • If the compiler is running out of registers, "Q" might help it realize that it can use the same register for the "r"(pointer)` and "Q"( *(array type)pointer) constraints, if it wasn't seeing that with "m". There might be a modifier to print just the bare register name instead of the [reg] addressing mode, but gcc.gnu.org/onlinedocs/gcc/… doesn't show one and I wouldn't be surprised if there isn't one. Commented Aug 18, 2024 at 3:33
  • @PeterCordes so I guess what you're saying is that it should work, without allocating an extra register, and that if I'm getting errors, maybe it's actually a compiler bug? Anyway I will try to make an MRE of the bug. Commented Aug 18, 2024 at 3:53

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.