I am trying to build a mapping between the dynamic symbols in ELF files (from glibc) and the actual kernel syscalls they invoke.
My environment is x86_64 Ubuntu 22.04.
What I've Tried
Parsing
man 2Pages: My first attempt was to parse theman 2text. This was effective for extracting argument types and names, but it failed to reliably map the wrapper syscall (e.g.,open) to the actual kernel syscall (e.g.,openat) due to the limitations of the manuals.AI Recommendation (AST): I was advised by an AI that using an Abstract Syntax Tree (AST), for example with
libclang, would be a viable approach. I'm a computer science student, but my university doesn't offer a compiler course, so I lack a deep understanding of ASTs and am seeking expert advice here.
My Core Problem & Example
My main challenge is that glibc is extremely complex, full of preprocessor directives and symbol aliases.
For example, if I compile a C program that calls open(), readelf shows the dynamic symbol [email protected].
I've traced this to the glibc source file open64.c. On my x86_64 system, the __OFF_T_MATCHES_OFF64_T preprocessor macro is defined, which leads to this block:
C
https://git.launchpad.net/ubuntu/+source/glibc/tree/sysdeps/unix/sysv/linux/open64.c?h=ubuntu/jammy
#ifdef __OFF_T_MATCHES_OFF64_T
strong_alias (__libc_open64, __libc_open)
strong_alias (__libc_open64, __open)
libc_hidden_weak (__open)
weak_alias (__libc_open64, open)
#endif
This weak_alias maps open to __libc_open64. The __libc_open64 function then internally calls SYSCALL_CANCEL (openat, ....). This macro (which eventually uses inline assembly) is the lowest-level call I'm trying to find.
My goal is to find this entire chain for all syscalls: [email protected] → weak_alias (__libc_open64, open) → __libc_open64 → SYSCALL_CANCEL (openat, ...)
...and ultimately build the mapping: open → openat, openat → openat.
(key:value)
My Questions
Is it technically feasible to use an AST-based approach (like
libclang) to reliably parse the entire glibc source and resolve all these preprocessor directives and aliases (strong_alias,weak_alias)?My ultimate goal is to create an N:1 mapping from all kernel syscalls (those found near
SYS_ify(name)) to the various user-space aliases that call them. Does a public mapping of this information already exist? I would be overjoyed if I could just use an existing resource.
My goal is to find this entire chain for all syscallswhy? what for?Is it technically feasible to use an AST-based approachYou will most probably have to write your own "AST-ish" parser on top of libclang to handle all cases.Does a public mapping of this information already exist?Not that I am aware.execkernel syscall, and all theexec*wrappers call it after massaging the arguments.