100

I'm looking for a way to easily embed any external binary data in a C/C++ application compiled by GCC.

A good example of what I'd like to do is handling shader code - I can just keep it in source files like const char* shader = "source here"; but that's extremely impractical.

I'd like the compiler to do it for me: upon compilation (linking stage), read file "foo.bar" and link its content to my program, so that I'd be able to access the contents as binary data from the code.

Could be useful for small applications which I'd like to distribute as a single .exe file.

Does GCC support something like this?

1

7 Answers 7

99

There are a couple possibilities:


Update: Here's a more complete example of how to use data bound into the executable using ld -r -b binary:

#include <stdio.h>

// a file named foo.bar with some example text is 'imported' into 
// an object file using the following command:
//
//      ld -r -b binary -o foo.bar.o foo.bar
//
// That creates an bject file named "foo.bar.o" with the following 
// symbols:
//
//      _binary_foo_bar_start
//      _binary_foo_bar_end
//      _binary_foo_bar_size
//
// Note that the symbols are addresses (so for example, to get the 
// size value, you have to get the address of the _binary_foo_bar_size
// symbol).
//
// In my example, foo.bar is a simple text file, and this program will
// dump the contents of that file which has been linked in by specifying
// foo.bar.o as an object file input to the linker when the progrma is built

extern char _binary_foo_bar_start[];
extern char _binary_foo_bar_end[];

int main(void)
{
    printf( "address of start: %p\n", &_binary_foo_bar_start);
    printf( "address of end: %p\n", &_binary_foo_bar_end);

    for (char* p = _binary_foo_bar_start; p != _binary_foo_bar_end; ++p) {
        putchar( *p);
    }

    return 0;
}

Update 2 - Getting the resource size: I could not read the _binary_foo_bar_size correctly. At runtime, gdb shows me the right size of the text resource by using display (unsigned int)&_binary_foo_bar_size. But assigning this to a variable gave always a wrong value. I could solve this issue the following way:

unsigned int iSize =  (unsigned int)(&_binary_foo_bar_end - &_binary_foo_bar_start)

It is a workaround, but it works good and is not too ugly.

Sign up to request clarification or add additional context in comments.

21 Comments

@VJo: then treat the blob as text. You may have to do a bit of work to make sure there's a '\0' at the end of the text if you need it terminated like that. Some experimenting might be in order.
@VJo: text is binary. Everything on a computer is binary.
@MSalters re: "text is binary". Yes, but, ... in text the EOL may be treated differently on different systems. Explicitly calling it binary prevents such foibles.
@atlaste: What you describe is the distinction between writeable ("data") and executable ("code"). Read-only data needs neither method.
Can you tell ld which symbol name to generate for the data?
|
49

As well as the suggestions already mentioned, under linux you can use the hex dump tool xxd, which has a feature to generate a C header file:

xxd -i mybinary > myheader.h

3 Comments

I think this solution is the best. It is also cross platform and cross compiler support.
This is true, but it does have one drawback - the resulting header files are much larger than the original binary file. This has no impact on the final compiled result, but it can be undesirable as part of the build process.
this problem can be solved by using precompiled header.
27

For C23, there now exists the preprocessor directive #embed, which achieves exactly what you are looking for without using external tools. See 6.10.3.1 of the C23 standard (here is a link to the most recent working draft). Here's good blog post about the history of #embed by one of the committee members behind this new feature.

Here is a snippet from the draft standard demonstrating its use:

#include <stddef.h>
void have_you_any_wool(const unsigned char*, size_t);

int main (int, char*[]) {
    static const unsigned char baa_baa[] = {
#embed "black_sheep.ico"
    };
    
    have_you_any_wool(baa_baa, sizeof(baa_baa));
    return 0;
}

An equivalent directive for C++ does not exist at this time.

2 Comments

Is it possible to enable in GCC or Clang yet?
@circl, yes since GCC 15 and Clang 19: en.cppreference.com/w/c/compiler_support
26

The .incbin GAS directive can be used for this task. Here is a totally free licenced library that wraps around it:

https://github.com/graphitemaster/incbin

To recap. The incbin method is like this. You have a thing.s assembly file that you compile with gcc -c thing.s

      .section .rodata
    .global thing
    .type   thing, @object
    .align  4
thing:
    .incbin "meh.bin"
thing_end:
    .global thing_size
    .type   thing_size, @object
    .align  4
thing_size:
    .int    thing_end - thing

In your c or cpp code you can reference it with:

extern const char thing[];
extern const char* thing_end;
extern int thing_size;

So then you link the resulting .o with the rest of the compilation units. Credit where due is to @John Ripley with his answer here: C/C++ with GCC: Statically add resource files to executable/library

But the above method is not as convenient as what incbin can give you. To accomplish the above with incbin you don't need to write any assembler. Just the following will do:

#include "incbin.h"

INCBIN(thing, "meh.bin");

int main(int argc, char* argv[])
{
    // Now use thing
    printf("thing=%p\n", gThingData);
    printf("thing len=%d\n", gThingSize);   
}

2 Comments

I like this method because it allows controlling the symbol name.
The issue with this solution, with C++ at least, is that the resulting std::span you would construct representing your embedded data cannot be constexpr since it depends on extern symbols. That said, I do use your solution extensively.
2

If I want to embed static data into an executable, I would package it into a .lib/.a file or a header file as an array of unsigned chars. That's if you are looking for a portable approach. I have created a command line tool that does both actually here. All you have to do is list files, and pick option -l64 to output a 64bit library file along with a header that includes all pointers to each data.

You can explore more options as well.for example, this option:

>BinPack image.png -j -hx

will output the data of image.png into a header file, as hexadecimal and lines will be justified per -j option.

const unsigned char BP_icon[] = { 
0x89,0x50,0x4e,0x47,0x0d,0x0a,0x1a,0x0a,0x00,0x00,0x00,0x0d,0x49,0x48,0x44,0x52,
0x00,0x00,0x01,0xed,0x00,0x00,0x01,0xed,0x08,0x06,0x00,0x00,0x00,0x34,0xb4,0x26,
0xfb,0x00,0x00,0x02,0xf1,0x7a,0x54,0x58,0x74,0x52,0x61,0x77,0x20,0x70,0x72,0x6f,
0x66,0x69,0x6c,0x65,0x20,0x74,0x79,0x70,0x65,0x20,0x65,0x78,0x69,0x66,0x00,0x00,
0x78,0xda,0xed,0x96,0x5d,0x92,0xe3,0x2a,0x0c,0x85,0xdf,0x59,0xc5,0x2c,0x01,0x49,
0x08,0x89,0xe5,0x60,0x7e,0xaa,0xee,0x0e,0xee,0xf2,0xef,0x01,0x3b,0x9e,0x4e,0xba,
0xbb,0x6a,0xa6,0x66,0x5e,0x6e,0x55,0x4c,0x8c,0x88,0x0c,0x07,0xd0,0x27,0x93,0x84,
0xf1,0xef,0x3f,0x33,0xfc,0xc0,0x45,0xc5,0x52,0x48,0x6a,0x9e,0x4b,0xce,0x11,0x57,
0x2a,0xa9,0x70,0x45,0xc3,0xe3,0x79,0xd5,0x5d,0x53,0x4c,0xbb,0xde,0xd7,0xe8,0x57,
0x8b,0x9e,0xfd,0xe1,0x7e,0xc0,0xb0,0x02,0x2b,0xe7,0x03,0xcf,0xa7,0xa5,0x87,0xff,
0x1a,0xf0,0xb0,0x54,0xd1,0xd2,0x0f,0x42,0xde,0xae,0x07,0xc7,0xf3,0x83,0x92,0x4e,
0xcb,0xfe,0x22,0xc4,0xa7,0x91,0xb5,0xa2,0xd5,0xee,0x97,0x50,0xb9,0x84,0x84,0xcf,
0x07,0x74,0x09,0xd4,0x73,0x5b,0x31,0x17,0xb7,0x8f,0x5b,0x38,0xc6,0x69,0xaf}

Comments

0

I'd like to share a full example for C++ program and hope it helps someone.

Makefile

data_base := $(wildcard *.dat)
all: main
main: main.o $(data_base:.dat=.o)
    $(CXX) -z noexecstack $(LDFLAGS) -o $@ $^ $(LDLIBS)
%.o: %.dat
    ld -r -b binary -o $@ $<
clean:
    rm -f *.o

main.cpp

#include <iostream>
#include <sstream>
extern char _binary_table_dat_start[];
extern char _binary_table_dat_end[];

int main (int argc, char* argv[]) {
    std::cout << "Address of start" << &_binary_table_dat_start << std::endl;
    std::cout << "Address of end" << &_binary_table_dat_end << std::endl;

    std::istringstream ss;
    ss.rdbuf()->pubsetbuf(_binary_table_dat_start, _binary_table_dat_end - _binary_table_dat_start - 1);  // Exclude null terminator
    std::size_t i = 0;
    while (!ss.eof()) {
        std::string s;
        std::getline(ss, s);
        std::cout << "Line" << ++i << ": " << s << std::endl;
    }
    std::cout << "Total number of lines " << i << std::endl;
    return 0;

}

table.dat

random text....

Comments

-4

You could do this in a header file :

#ifndef SHADER_SRC_HPP
#define SHADER_SRC_HPP
const char* shader= "

//source

";
#endif

and just include that.

Other way is to read the shader file.

7 Comments

I think Kos wants to be able to maintain the shader source without having to worry about escaping special characters (among other possible issues).
@VJo: nope - never used a shader. I was approaching the question as embedding arbitrary data residing in external files into the program. I can certainly accept that this might be a much better solution for shaders in particular.
A file which defines (as opposed to declares) a global variable should not be a header file but a source module. And your type is extremely inefficient. Make it const char shader[] = "source"; instead.
Also, I believe C++ doesn't allow you to have multi-line string literals in other way than either opening and closing "" quotes in each line separately or having a backslash at the end of every line. Not to mention the other benefits of having the shader available as a standalone file during development (syntax coloring, at the very least?).
Since C++11 you can use a "raw string literal", it looks like R"*( ... multiline text ... )*". You can use another delimiter instead of *.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.