2

I have created a simple hello.c which contains just the definition of an array:

unsigned char arr[4] = {1,2,3,77};

I have then compiled it with gcc -r in order to produce a relocatable object file. Finally, I'm using objcopy with the option -O binary in order to produce a memory dump of the object file created before.

gcc -r hello.c -o hello.o
objcopy -S -X -O binary hello.o hello.bin

On Ubuntu, I get the following contents of hello.bin:

contents of hello.bin on Linux

The bin file contains the contents of the array, 1,2,3,77, as expected.

However, on Windows, if I'm using mingw gcc 15.2 (https://winlibs.com/), I get the following contents in the bin file:

contents of hello.bin on Windows

The bin file on Windows contains just the compiler identification info and that's it.

Any idea what's going on? I have used the exact same commands on both Linux and Windows but didn't expect such a difference.

3 Answers 3

3

I have then compiled it with gcc -r in order to produce a relocatable object file. Finally, I'm using objcopy with the option -O binary in order to produce a memory dump of the object file created before.

Your observations do not reveal a bug in objcopy on Windows or Linux or both. objcopy is behaving consistently on Windows and on Linux and in conformity with the meaning of its documentation - although the relevant bit of documentation is less than crystal clear. The apparent inconsistency between Windows and Linux is due to:-

  • The differences between your Windows PE object file and your Linux ELF object file.
  • The false expectation that objcopy with the -O binary option by will copy eligible sections of an input object file to discrete output regions in the order they have in the input file. This is what you assume a memory dump of the object file will look like.

By eligible sections I mean those that objcopy will select to copy, and they are the ones that occupy more than 0 bytes both in the file and in a loaded image. In the terminology of objdump output, they are the sections of size > 0 with attributes CONTENTS and ALLOC1

That false expectation is inspired by the objcopy man page, DESCRIPTION section:

objcopy can be used to generate a raw binary file by using an output target of binary (e.g., use -O binary). When objcopy generates a raw binary file, it will essentially produce a memory dump of the contents of the input object file. All symbols and relocation information will be discarded. The memory dump will start at the load address of the lowest section copied into the output file. [Emphasis added]

The emphasised clause is the locus of the unclarity I alluded to. What it means is that eligible sections will be copied as if from the memory address space of a loaded image, their respective load addresses becoming their output file offsets.

This meaning harbours the unstated point that the input file must be a loadable file, as created by a linker, if the output file is resemble a memory dump. But your input file both on Windows and Linux is an unlinked GCC object file, so the load addresses of all sections are 0 ( = undefined). Hence all eligible input sections for objcopy -O binary are output at file offset 0. They are overlaid at the start of the output file, in order of their appearance in the input file.

We can verify this in both cases.

In the Linux case:-

With:

$ gcc --version | head -1
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0

$ objcopy --version | head -1
GNU objcopy (GNU Binutils for Ubuntu) 2.42

$ objdump --version | head -1
GNU objdump (GNU Binutils) 2.42

$ cat hello.c
unsigned char arr[4] = {1,2,3,77};

$ gcc -r hello.c -o hello.o

I get:

$ objdump -h hello.o

hello.o:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .note.gnu.property 00000020  0000000000000000  0000000000000000  00000040  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         00000000  0000000000000000  0000000000000000  00000060  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .data         00000004  0000000000000000  0000000000000000  00000060  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          00000000  0000000000000000  0000000000000000  00000064  2**0
                  ALLOC
  4 .comment      0000002c  0000000000000000  0000000000000000  00000064  2**0
                  CONTENTS, READONLY
  5 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000090  2**0
                  CONTENTS, READONLY

Here, the eligible sections are:

  • note.gnu.property, size 0x20 = 32 bytes,
  • data, size 0x4 bytes.

with VMA = LMA = 0. Accordingly,

$ objcopy -S -X -O binary hello.o hello.bin

should and does make hello.bin a 32-byte file:

$ stat -c "%s" hello.bin
32

and it should be composed of the section .note.gnu.property:

$ objdump -s -j.note.gnu.property hello.o 

hello.o:     file format elf64-x86-64

Contents of section .note.gnu.property:
 0000 04000000 10000000 05000000 474e5500  ............GNU.
 0010 020000c0 04000000 03000000 00000000  ................

with the section .data:

$ objdump -s -j.data hello.o 

hello.o:     file format elf64-x86-64

Contents of section .data:
 0000 0102034d                             ...M
 

overlaid on the first 4 bytes. That would be:

04000000 10000000 05000000 474e5500 020000c0 04000000 03000000 00000000
^^^^^^^^
0102034d - overlaid

i.e.

0102034d 10000000 05000000 474e5500 020000c0 04000000 03000000 00000000

And so it is:

$ objdump -b binary -s hello.bin

hello.bin:     file format binary

Contents of section .data:
 0000 0102034d 10000000 05000000 474e5500  ...M........GNU.
 0010 020000c0 04000000 03000000 00000000  ................

In the Windows case:-

In MSYS2, with:

$ gcc --version | head -1
gcc.exe (Rev5, Built by MSYS2 project) 13.2.0

$ objcopy --version | head -1
GNU objcopy (GNU Binutils) 2.42

$ objdump --version | head -1
GNU objdump (GNU Binutils) 2.42

$ cat hello.c
unsigned char arr[4] = {1,2,3,77};

$ gcc -r hello.c -o hello.o

I get:

$ objdump -h hello.o

hello.o:     file format pe-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC, LOAD, READONLY, CODE
  1 .data         00000010  0000000000000000  0000000000000000  000000dc  2**4
                  CONTENTS, ALLOC, LOAD, DATA
  2 .rdata        00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC, LOAD, READONLY, DATA
  3 .rdata$zzz    00000030  0000000000000000  0000000000000000  000000ec  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .bss          00000000  0000000000000000  0000000000000000  00000000  2**4
                  ALLOC
                  

This time, the eligible sections are:

  • .data, size 0x10 = 16 bytes.
  • .rdata$zzz, size 0x30 = 48 bytes

with VMA = LMA = 0. Accordingly,

$ objcopy -S -X -O binary hello.o hello.bin

should and does make a hello.bin a 48 byte file:

$ stat -c "%s" hello.bin
48

and it should be composed of the section .data:

$ objdump -s -j.data hello.o

hello.o:     file format pe-x86-64

Contents of section .data:
 0000 0102034d 00000000 00000000 00000000  ...M............
 

with the section .rdata$zzz:

$ objdump -s '-j.rdata$zzz' hello.o

hello.o:     file format pe-x86-64

Contents of section .rdata$zzz:
 0000 4743433a 20285265 76352c20 4275696c  GCC: (Rev5, Buil
 0010 74206279 204d5359 53322070 726f6a65  t by MSYS2 proje
 0020 63742920 31332e32 2e300000 00000000  ct) 13.2.0......
 

overlaid on the whole 16 bytes of the input .data section, and extending the output a further 32 bytes. That would simply be a copy of section .rdata$zzz. And once again, so it is:

$ objdump -b binary -s hello.bin

hello.bin:     file format binary

Contents of section .data:
 0000 4743433a 20285265 76352c20 4275696c  GCC: (Rev5, Buil
 0010 74206279 204d5359 53322070 726f6a65  t by MSYS2 proje
 0020 63742920 31332e32 2e300000 00000000  ct) 13.2.0......
 

(It is just a coincidence that the section name .data occurs in that objdump output: it gives that section name to the contents of a binary blob by default.)

Reconciling the differences

You see that the uniform behaviour of objcopy preserves the initial data bytes 0102034d in the Linux binary blob just because that is the contents of the input section, .data, that is copied last to file offset 0 in the blob. The remaining 28 bytes of the blob are garbage = the last 28 bytes of input section .note.gnu.property, which was copied just before .data and is the longest eligible section. The contents of the Windows blob are = the input section .rdata$zzz just because it was copied to file offset 0 last and is also the longest eligible section.

Clearly what you've done is not what you want.

Attempting to make a memory dump of a file that is not a loadable file - such as a GCC object file - does not make sense. So objcopy -O binary is a mistaken choice.

Furthermore the GCC option -r, as in your:

$ gcc -r hello.c -o hello.o

is not actually a compiler option to produce a relocatable object file, notwithstanding what the GCC manual, 3.16 Options for Linking might make you think:

-r

    Produce a relocatable object as output. This is also known as partial linking.

This option is in fact passed through to the linkage step, with the effect documented in the linker manual:

-r
--relocatable
    Generate relocatable output---i.e., generate an output file that can in 
    turn serve as input to ld. This is often called partial linking. As a 
    side effect, in environments that support standard Unix magic numbers, 
    this option also sets the output file's magic number to "OMAGIC". If 
    this option is not specified, an absolute file is produced....

Its use case is to make the linker merge the input object files into a single output object file, rather than output a program ( = "absolute file") as it would by default. In this context relocatable means able to serve as input to ld and does not mean what I suspect you think: Position Independent Code, able to be loaded at an arbitrary address.

To compile a position independent object file you do, e.g.

$ gcc -c -fPIC hello.c -o hello.o

which does not invoke the linker at all. On the other hand,

$ gcc -r hello.c -o hello.o

compiles hello.c to a temporary object file that is not assured to be PIC and then redundantly "merges" that solitary object file into the output object file hello.o - which will still be non-PIC if the temporary is non-PIC. Here is a Linux illustration:

$ cat foo.c
unsigned char arr[4] = {1,2,3,77};

unsigned foo(void)
{
    return arr[0];
}

gcc -r foo.c -o foo.o

$ gcc -shared -o libfoo.so foo.o
/usr/bin/ld: foo.o: warning: relocation against `arr' in read-only section `.text'
/usr/bin/ld: foo.o: relocation R_X86_64_PC32 against symbol `arr' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

We attempted to link a shared library from foo.o. A shared library must contain only PI code, on pain of relocation errors. We failed because foo.o was not PIC. But now it is:

$ gcc -c -fPIC foo.c -o foo.o
$ gcc -shared -o libfoo.so foo.o && echo Done
Done

Up sum

If you want to compile C source to an object file that can be loaded anywhere in memory, use GCC option -fPIC, not -r.

It is unclear what you mean or want to achieve by:

produce a memory dump of the object file

because the memory map of a regular GCC object file is undetermined. Memory layout is determined in loadable files (programs, dynamic libraries), produced by the linker. You may wish to formulate what you want to achieve and ask another question about it.


1. I will make use of `objdump` to examine binaries although it is an imperfect parser due to the stretched abstraction it employs over diverse file formats. It is serviceable in this setting because it can be *the same* parser for the Linux and Window cases and inaccuracies do not bite. `readelf` and `dumpbin` are parsers specialised for ELF and PE/COFF binaries respectively.
Sign up to request clarification or add additional context in comments.

Comments

1

I'd found that the objcopy of winlibs doesn't concatenate the object file sections but overwrites them at the same buffer.

For example, the source code is:

unsigned char arr[4] = {1,2,3,77};

unsigned char func(int i) {
    return arr[i];
}

Than the objdump -s output is:

Contents of section .text:
 0000 5589e58b 45080500 0000000f b6005dc3  U...E.........].
Contents of section .data:
 0000 0102034d                             ...M            
Contents of section .rdata$zzz:
 0000 4743433a 20284d69 6e47572d 57363420  GCC: (MinGW-W64 
 0010 69363836 2d756372 742d706f 7369782d  i686-ucrt-posix-
 0020 64776172 662c2062 75696c74 20627920  dwarf, built by 
 0030 42726563 68742053 616e6465 72732c20  Brecht Sanders, 
 0040 72322920 31352e32 2e300000           r2) 15.2.0..    
Contents of section .eh_frame:
 0000 14000000 00000000 017a5200 017c0801  .........zR..|..
 0010 1b0c0404 88010000 1c000000 1c000000  ................
 0020 04000000 10000000 00410e08 8502420d  .........A....B.
 0030 054cc50c 04040000                    .L......        

And the hello.bin dump is:

 0000 14000000 00000000 017a5200 017c0801  .........zR..|..
 0010 1b0c0404 88010000 1c000000 1c000000  ................
 0020 04000000 10000000 00410e08 8502420d  .........A....B.
 0030 054cc50c 04040000 616e6465 72732c20  ........anders, 
 0040 72322920 31352e32 2e300000           r2) 15.2.0..

The beginning is the last section .eh_frame. And the ending is the part of .rdata$zzz from offset 0x38. All previous sections are totally overwritten.

1 Comment

Thanks! Didn't realize this behaviour but it's not just on Windows, now I can fully explain what happens. I'll make a separate comment.
1

@Stas Simonov's response contains a key finding: that objcopy -O binary doesn't correctly handle multiple sections inside the object file.

However, this is not just on Windows but also on Linux. It happens even for my example but it's a bit hidden.

So, if I do

gcc -c

then the sections are in the order .data, .comment and .note.gnu.property.

If I do

gcc -r

then the sections are in the order .note.gnu.property, .data and .comment.

When .data is first, it's written but then overriden by .note.gnu.property.

When .note.gnu.property is first, it's partially overriden by data because .data is smaller. That's why I see 0102030405 when I use -r.

A possible solution to this issue is to use the -j flag so we can select the .data section i.e.

objcopy -j .data -O binary ...

This way, just the .data section is copied to the binary file.

Another gcc compiler flag disables the identification string entirely i.e.

gcc -fno-ident

therefore you can get away without the -j. This way you can come up with a cross platform solution as you don't need to specify the names of the sections (on windows it's .data, on linux it's .rodata, etc.)

I'm honestly not sure if this is a bug, a limitation or simply a counterintuitive intended behaviour of objcopy.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.