I have then compiled it with gcc -r in order to produce a relocatable object file. Finally, I'm using objcopy with the option -O binary in order to produce a memory dump of the object file created before.
Your observations do not reveal a bug in objcopy on Windows or Linux or both. objcopy is behaving consistently on Windows and on Linux and in conformity with the meaning of its documentation - although the relevant bit of documentation is less than crystal clear. The apparent inconsistency between Windows and Linux is due to:-
- The differences between your Windows PE object file and your Linux ELF object file.
- The false expectation that
objcopy with the -O binary option by will copy eligible sections of an input object file to discrete
output regions in the order they have in the input file. This is what you assume a memory dump of the object file will look like.
By eligible sections I mean those that objcopy will select to copy, and they
are the ones that occupy more than 0 bytes both in the file and in a loaded image. In
the terminology of objdump output, they are the sections of size > 0 with attributes CONTENTS and
ALLOC1
That false expectation is inspired by the objcopy man page, DESCRIPTION section:
objcopy can be used to generate a raw binary file by using an
output target of binary (e.g., use -O binary). When objcopy
generates a raw binary file, it will essentially produce a memory
dump of the contents of the input object file. All symbols and
relocation information will be discarded. The memory dump will
start at the load address of the lowest section copied into the
output file. [Emphasis added]
The emphasised clause is the locus of the unclarity I alluded to. What it means is that eligible sections will be
copied as if from the memory address space of a loaded image, their respective load addresses becoming their output file offsets.
This meaning harbours the unstated point that the input file must be a loadable file, as created by a linker, if the output
file is resemble a memory dump. But your input file both on Windows and Linux is an unlinked GCC object file,
so the load addresses of all sections are 0 ( = undefined). Hence all eligible input sections for objcopy -O binary are output at file offset 0. They are overlaid at the start of the output file, in order of their appearance in
the input file.
We can verify this in both cases.
In the Linux case:-
With:
$ gcc --version | head -1
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
$ objcopy --version | head -1
GNU objcopy (GNU Binutils for Ubuntu) 2.42
$ objdump --version | head -1
GNU objdump (GNU Binutils) 2.42
$ cat hello.c
unsigned char arr[4] = {1,2,3,77};
$ gcc -r hello.c -o hello.o
I get:
$ objdump -h hello.o
hello.o: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .note.gnu.property 00000020 0000000000000000 0000000000000000 00000040 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .text 00000000 0000000000000000 0000000000000000 00000060 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
2 .data 00000004 0000000000000000 0000000000000000 00000060 2**0
CONTENTS, ALLOC, LOAD, DATA
3 .bss 00000000 0000000000000000 0000000000000000 00000064 2**0
ALLOC
4 .comment 0000002c 0000000000000000 0000000000000000 00000064 2**0
CONTENTS, READONLY
5 .note.GNU-stack 00000000 0000000000000000 0000000000000000 00000090 2**0
CONTENTS, READONLY
Here, the eligible sections are:
note.gnu.property, size 0x20 = 32 bytes,
data, size 0x4 bytes.
with VMA = LMA = 0. Accordingly,
$ objcopy -S -X -O binary hello.o hello.bin
should and does make hello.bin a 32-byte file:
$ stat -c "%s" hello.bin
32
and it should be composed of the section .note.gnu.property:
$ objdump -s -j.note.gnu.property hello.o
hello.o: file format elf64-x86-64
Contents of section .note.gnu.property:
0000 04000000 10000000 05000000 474e5500 ............GNU.
0010 020000c0 04000000 03000000 00000000 ................
with the section .data:
$ objdump -s -j.data hello.o
hello.o: file format elf64-x86-64
Contents of section .data:
0000 0102034d ...M
overlaid on the first 4 bytes. That would be:
04000000 10000000 05000000 474e5500 020000c0 04000000 03000000 00000000
^^^^^^^^
0102034d - overlaid
i.e.
0102034d 10000000 05000000 474e5500 020000c0 04000000 03000000 00000000
And so it is:
$ objdump -b binary -s hello.bin
hello.bin: file format binary
Contents of section .data:
0000 0102034d 10000000 05000000 474e5500 ...M........GNU.
0010 020000c0 04000000 03000000 00000000 ................
In the Windows case:-
In MSYS2, with:
$ gcc --version | head -1
gcc.exe (Rev5, Built by MSYS2 project) 13.2.0
$ objcopy --version | head -1
GNU objcopy (GNU Binutils) 2.42
$ objdump --version | head -1
GNU objdump (GNU Binutils) 2.42
$ cat hello.c
unsigned char arr[4] = {1,2,3,77};
$ gcc -r hello.c -o hello.o
I get:
$ objdump -h hello.o
hello.o: file format pe-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000000 0000000000000000 0000000000000000 00000000 2**4
ALLOC, LOAD, READONLY, CODE
1 .data 00000010 0000000000000000 0000000000000000 000000dc 2**4
CONTENTS, ALLOC, LOAD, DATA
2 .rdata 00000000 0000000000000000 0000000000000000 00000000 2**4
ALLOC, LOAD, READONLY, DATA
3 .rdata$zzz 00000030 0000000000000000 0000000000000000 000000ec 2**4
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .bss 00000000 0000000000000000 0000000000000000 00000000 2**4
ALLOC
This time, the eligible sections are:
.data, size 0x10 = 16 bytes.
.rdata$zzz, size 0x30 = 48 bytes
with VMA = LMA = 0. Accordingly,
$ objcopy -S -X -O binary hello.o hello.bin
should and does make a hello.bin a 48 byte file:
$ stat -c "%s" hello.bin
48
and it should be composed of the section .data:
$ objdump -s -j.data hello.o
hello.o: file format pe-x86-64
Contents of section .data:
0000 0102034d 00000000 00000000 00000000 ...M............
with the section .rdata$zzz:
$ objdump -s '-j.rdata$zzz' hello.o
hello.o: file format pe-x86-64
Contents of section .rdata$zzz:
0000 4743433a 20285265 76352c20 4275696c GCC: (Rev5, Buil
0010 74206279 204d5359 53322070 726f6a65 t by MSYS2 proje
0020 63742920 31332e32 2e300000 00000000 ct) 13.2.0......
overlaid on the whole 16 bytes of the input .data section, and extending the output a further 32 bytes.
That would simply be a copy of section .rdata$zzz. And once again, so it is:
$ objdump -b binary -s hello.bin
hello.bin: file format binary
Contents of section .data:
0000 4743433a 20285265 76352c20 4275696c GCC: (Rev5, Buil
0010 74206279 204d5359 53322070 726f6a65 t by MSYS2 proje
0020 63742920 31332e32 2e300000 00000000 ct) 13.2.0......
(It is just a coincidence that the section name .data occurs in that objdump output: it gives that section name
to the contents of a binary blob by default.)
Reconciling the differences
You see that the uniform behaviour of objcopy preserves the initial data bytes 0102034d in the Linux binary blob just
because that is the contents of the input section, .data, that is copied last to file offset 0 in the blob. The remaining
28 bytes of the blob are garbage = the last 28 bytes of input section .note.gnu.property, which was copied just before .data and
is the longest eligible section. The contents of the Windows blob are = the input section .rdata$zzz just because it was copied to file offset 0 last and is
also the longest eligible section.
Clearly what you've done is not what you want.
Attempting to make a memory dump of a file that is not a loadable file - such as a GCC object file - does not make sense. So objcopy -O binary is
a mistaken choice.
Furthermore the GCC option -r, as in your:
$ gcc -r hello.c -o hello.o
is not actually a compiler option to produce a relocatable object file,
notwithstanding what the GCC manual, 3.16 Options for Linking
might make you think:
-r
Produce a relocatable object as output. This is also known as partial linking.
This option is in fact passed through to the linkage step, with the effect
documented in the linker manual:
-r
--relocatable
Generate relocatable output---i.e., generate an output file that can in
turn serve as input to ld. This is often called partial linking. As a
side effect, in environments that support standard Unix magic numbers,
this option also sets the output file's magic number to "OMAGIC". If
this option is not specified, an absolute file is produced....
Its use case is to make the linker merge the input object files into a single output object
file, rather than output a program ( = "absolute file") as it would by default. In this
context relocatable means able to serve as input to ld and does not mean what I suspect
you think: Position Independent Code, able to be loaded at an arbitrary address.
To compile a position independent object file you do, e.g.
$ gcc -c -fPIC hello.c -o hello.o
which does not invoke the linker at all. On the other hand,
$ gcc -r hello.c -o hello.o
compiles hello.c to a temporary object file that is not assured to be PIC and then redundantly "merges"
that solitary object file into the output object file hello.o - which will still be non-PIC if the
temporary is non-PIC. Here is a Linux illustration:
$ cat foo.c
unsigned char arr[4] = {1,2,3,77};
unsigned foo(void)
{
return arr[0];
}
gcc -r foo.c -o foo.o
$ gcc -shared -o libfoo.so foo.o
/usr/bin/ld: foo.o: warning: relocation against `arr' in read-only section `.text'
/usr/bin/ld: foo.o: relocation R_X86_64_PC32 against symbol `arr' can not be used when making a shared object; recompile with -fPIC
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
We attempted to link a shared library from foo.o. A shared library must contain
only PI code, on pain of relocation errors. We failed because foo.o was not PIC.
But now it is:
$ gcc -c -fPIC foo.c -o foo.o
$ gcc -shared -o libfoo.so foo.o && echo Done
Done
Up sum
If you want to compile C source to an object file that can be loaded anywhere
in memory, use GCC option -fPIC, not -r.
It is unclear what you mean or want to achieve by:
produce a memory dump of the object file
because the memory map of a regular GCC object file is undetermined. Memory layout
is determined in loadable files (programs, dynamic libraries), produced by the
linker. You may wish to formulate what you want to achieve and ask another
question about it.
1. I will make use of `objdump` to examine binaries although it is an imperfect
parser due to the stretched abstraction it employs over diverse file formats. It is
serviceable in this setting because it can be *the same* parser for the Linux
and Window cases and inaccuracies do not bite. `readelf` and `dumpbin` are parsers specialised for ELF and PE/COFF binaries respectively.