2

I'm linking binary data in my C program for ARM Cortex-M with GCC, like this:

arm-none-eabi-ld.exe -r -b binary -o html.o index.html

To work with the data I have these external variables:

extern const unsigned char _binary_index_html_start;
extern const unsigned char _binary_index_html_end;
extern const uint32_t _binary_index_html_size;

static const char* html = &_binary_index_html_start;
static const size_t html_len = &_binary_index_html_size;

What I don't understand is why do I need to get the address of the _binary_index_html_size variable to have the size value?

That would mean that the memory address (pointer) of the _binary_index_html_size variable represents the size value of the blob in bytes. When I debug this it seems to be correct, but to me it seems like a very strange solution to solve this.

Edit:
I guess the reason for this may be: because the size of the blob can never be bigger than the native data size (in my case 2^32), instead of wasting space and storing the size GCC just creates a variable that points to the memory address which represents the size of the blob. So the value is completely random and depends on other code (I tested this). This seems like a clever thing because the size does not occupy space and the pointer is resolved at compile time. So if one does not need the size, no space is wasted.
I think I will instead use (&_binary_index_html_end) - (&_binary_index_html_start), this seems better and is supported by all compilers.

4
  • static const size_t html_len = &_binary_index_html_size; looks like a bug to me. Who says you need it? Commented Jul 11, 2016 at 9:54
  • The variable _binary_index_html_size contains some random data, can change on recompile. But the address of this variable is exactly the size of the blob. Commented Jul 11, 2016 at 10:36
  • This is a clever method to save space but it is not compatible with standard C. Perhaps if the blob was partially written in assembly it could employ this method. Commented Jul 11, 2016 at 10:43
  • @n.m. I think this is correct, thanks. If you make it an answer I'll accept it. Commented Jul 11, 2016 at 10:59

1 Answer 1

1

All of the symbols you're dealing with are linker script defined variables and they are accessed exactly the way you did. The explanation for this is very clearly given in the ld documentation.

When a symbol is declared in a high level language such as C, two things happen. The first is that the compiler reserves enough space in the program's memory to hold the value of the symbol. The second is that the compiler creates an entry in the program's symbol table which holds the symbol's address. ie the symbol table contains the address of the block of memory holding the symbol's value.

And then, a little later in the document, we can find what follows.

Linker scripts symbol declarations, by contrast, create an entry in the symbol table but do not assign any memory to them. Thus they are an address without a value.

This means that the address of a linker defined variable is indeed its actual value and that is why you have to take such an address in order to read the value associated with the linker symbol.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.