0

There maybe a very simple solution to this problem but it has been bothering me for a while, so I have to ask.

In our embedded projects, it seems common to have simple get/set functions to many variables in separate C-files. Then, those variables are being called from many other C-files. When I look the assembly listing, those function calls are never replaced with move instructions. Faster way would be to just declare monitored variables as global variables to avoid unnecessary function calls.

Let's say you have a file.c which has variables that need to be monitored in another C-file main.c. For example, debugging variables, hardware registers, adc-values, etc. Is there a compiler optimization that replaces simple get/set functions with assembly move instructions thus avoiding unnecessary overhead caused by function calls?

file.h

#ifndef FILE_H
#define FILE_H

#include <stdint.h>

int32_t get_signal(void);
void set_signal(int32_t x);

#endif

file.c

#include "file.h"
#include <stdint.h>

static volatile int32_t *signal = SOME_HARDWARE_ADDRESS;

int32_t get_signal(void)
{
  return *signal;
}

void set_signal(int32_t x)
{
   *signal = x;
}

main.c

#include "file.h"
#include <stdio.h>

int main(int argc, char *args[])
{
   // Do something with the variable
   for (int i = 0; i < 10; i++)
   {
     printf("signal = %d\n", get_signal());
   }
   
   return 0;
}

If I compile the above code with gcc -Wall -save-temps main.c file.c -o main.exe, it gives the following assembly listing for main.c. You can always see the call get_signal even if you compile with -O3 flag which seems silly as we are only reading memory address. Why bother calling such simple function?

Same explanation applies for the simple set function. It is always called even though we would be only writing to one memory location in the function and doing nothing else.

main.s

main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $48, %rsp
    .seh_stackalloc 48
    .seh_endprologue
    movl    %ecx, 16(%rbp)
    movq    %rdx, 24(%rbp)
    call    __main
    movl    $0, -4(%rbp)
    jmp .L4
.L5:
    call    get_signal
    movl    %eax, %edx
    leaq    .LC0(%rip), %rcx
    call    printf
    addl    $1, -4(%rbp)
.L4:
    cmpl    $9, -4(%rbp)
    jle .L5
    movl    $0, %eax
    addq    $48, %rsp
    popq    %rbp
    ret

UPDATED 2023-02-13

Question was closed with several links to inline and Link-time Optimization-related answers. I don't think the same question has been answered before or at least the solution is not obvious for my get_function. What is there to inline if a function just returns a value and does nothing else?

Anyways, it seems, as suggested, that one solution to fix this problem is to add compiler flags -O2 -flto which correctly replaces assembly instruction call get_signal with move instruction with the following partial output:

main:
    subq    $40, %rsp
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
    movl    tmp.0(%rip), %edx
    movl    $10, %eax
    .p2align 4,,10
    .p2align 3
.L4:
    movl    signal(%rip), %ecx
    addl    %ecx, %edx
    subl    $1, %eax
    jne .L4
    leaq    .LC0(%rip), %rcx
    movl    %edx, tmp.0(%rip)
    call    printf.constprop.0
    xorl    %eax, %eax
    addq    $40, %rsp
    ret
    .seh_endproc

Thank you.

7
  • 2
    you must use LTO because the functions are in separate compilation units. There's no way for the compiler to know what are in other compilation units to optimize Commented Feb 13, 2023 at 1:56
  • 1
    Another way might be to remove static from signal in the .c. Move the function definitions to the .h (replacing the prototypes) and add (e.g.) static inline __attribute__((always inline)) to each function. The .h would need (e.g.) extern volatile int32_t *signal; Then, each call will be inlined. Commented Feb 13, 2023 at 2:14
  • Is there a caller which, after calling get_signal(), perhaps calls main_screen(TURN_ON)? :P Sorry, yes, you either need LTO (gcc -O2 -flto when you compile and when you link), or make the full definitions visible to callees at compile time, e.g. in the header with static inline or plain inline; in the latter case you need a stand-alone definition in exactly one .c in case the compiler chooses not to inline at every call-site. Commented Feb 13, 2023 at 2:57
  • 1
    Also if you care about the asm not sucking, enable optimization as well, like -Og at least. If you need it to inline even at -O0, __attribute__((always_inline)) (as well as making the definition visible so that's possible.) Commented Feb 13, 2023 at 3:02
  • What is there to inline if a function just returns a value and does nothing else? - The fact that it's that simple doesn't open up any new possibilities for getting it to inline. That's why I closed it as a duplicate. The compiler can't inline it if it can't see the definition, but if you give it a way to inline it will. Commented Feb 13, 2023 at 5:09

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.