Strange behaviour when linking binary blob with GCC

Question

I'm linking binary data in my C program for ARM Cortex-M with GCC, like this:

arm-none-eabi-ld.exe -r -b binary -o html.o index.html

To work with the data I have these external variables:

extern const unsigned char _binary_index_html_start;
extern const unsigned char _binary_index_html_end;
extern const uint32_t _binary_index_html_size;

static const char* html = &_binary_index_html_start;
static const size_t html_len = &_binary_index_html_size;

What I don't understand is why do I need to get the address of the _binary_index_html_size variable to have the size value?

That would mean that the memory address (pointer) of the _binary_index_html_size variable represents the size value of the blob in bytes. When I debug this it seems to be correct, but to me it seems like a very strange solution to solve this.

Edit:
I guess the reason for this may be: because the size of the blob can never be bigger than the native data size (in my case 2^32), instead of wasting space and storing the size GCC just creates a variable that points to the memory address which represents the size of the blob. So the value is completely random and depends on other code (I tested this). This seems like a clever thing because the size does not occupy space and the pointer is resolved at compile time. So if one does not need the size, no space is wasted.
I think I will instead use (&_binary_index_html_end) - (&_binary_index_html_start), this seems better and is supported by all compilers.

static const size_t html_len = &_binary_index_html_size; looks like a bug to me. Who says you need it? — n. m. could be an AI
– n. m. could be an AI, Commented Jul 11, 2016 at 9:54
The variable _binary_index_html_size contains some random data, can change on recompile. But the address of this variable is exactly the size of the blob. — Jan Hieber
– Jan Hieber, Commented Jul 11, 2016 at 10:36
This is a clever method to save space but it is not compatible with standard C. Perhaps if the blob was partially written in assembly it could employ this method. — n. m. could be an AI
– n. m. could be an AI, Commented Jul 11, 2016 at 10:43
@n.m. I think this is correct, thanks. If you make it an answer I'll accept it. — Jan Hieber
– Jan Hieber, Commented Jul 11, 2016 at 10:59

mw215 · Accepted Answer · 2016-07-12 10:37:52Z

All of the symbols you're dealing with are linker script defined variables and they are accessed exactly the way you did. The explanation for this is very clearly given in the ld documentation.

When a symbol is declared in a high level language such as C, two things happen. The first is that the compiler reserves enough space in the program's memory to hold the value of the symbol. The second is that the compiler creates an entry in the program's symbol table which holds the symbol's address. ie the symbol table contains the address of the block of memory holding the symbol's value.

And then, a little later in the document, we can find what follows.

Linker scripts symbol declarations, by contrast, create an entry in the symbol table but do not assign any memory to them. Thus they are an address without a value.

This means that the address of a linker defined variable is indeed its actual value and that is why you have to take such an address in order to read the value associated with the linker symbol.

Collectives™ on Stack Overflow

Strange behaviour when linking binary blob with GCC

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related