Interesting files in v4.0:
include/uapi/linux/elf.h
: contains most of the format, which is natural since the format is needed from userland to implement compilers, and the most part if not arch dependentinclude/linux/elf.h
arch/x86/include/asm/elf.h
: x86 specifics described in the AMD64 ABI extension, e.g. theR_X86_64_64
typefs/binfmt_elf.c
: this is where the real action happens.do_execve
insrc/fs/exec.c
from the system call callsload_elf_binary
, which prepares everything and culminates in a call tostart_thread
In 5.1, definitions are under various config/
files.
[System V ABI AMD64][] says that main
is the entry point of a C program, and it calls _start
which is the ELF entry point as mentioned in the [System V ABI AMD64][]. main
is not special in pure assembly
The initial state of the process stack, i.e. when _start is called
http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html describes the call sequence that actually happens on Linux.
TODO where is it mentioned in the arch agnostic standards?
The [System V ABI AMD64][] links to the [Itanium C++ ABI][].
http://stackoverflow.com/questions/12122446/how-does-c-linking-work-in-practice/30507725#30507725
To test this one out, try use:
a: .long s
b: .long s + 0x12345678
s:
then:
as --32 -o main.o main.S
objdump -dzr main.o
on Binutils 2.24 gives:
00000000 <a>:
0: 08 00 or %al,(%eax)
0: R_386_32 .text
2: 00 00 add %al,(%eax)
00000004 <b>:
4: 80 56 34 12 adcb $0x12,0x34(%esi)
4: R_386_32 .text
This makes it really clear how:
a
gets 8 added, which is the length ofa
+b
(to reachs
)b
gets0x12345608
==0x12345678 + 8
, where8
is once again the position ofs
This is another common relocation method that does:
S + A - P
on 4 bytes, where P
is the current position of the Program Counter on the text segment, thus the PC
on the name, A.K.A. the RIP
register.
This method is common in x86-64 because RIP
relative addressing is very popular, and for it to work, the relocation must subtract the position P
.
Note that %RIP
points to the next instruction: so it is common to use A = -4
to remove the offset of the 4 byte address which is at the end of the instruction encoding:
X Y A A A A
Next insruction <-- RIP
Does the value of the address to be overwritten before linking matter at all?
GNU adds 8 and 16 bit relocations as an extension to the ELF standard, which it calls with names like R_X86_64_16
.
TODO: confirm: each program header represents either a segment, or a segment section, which is a subdivision of segments.
For instance, a C hello world contains the lines (order changed):
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0006fc 0x0006fc R E 0x200000
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
INTERP 0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R 0x1
NOTE 0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R 0x4
GNU_EH_FRAME 0x0005d0 0x00000000004005d0 0x00000000004005d0 0x000034 0x000034 R 0x4
so that the LOAD
segment contains multiple segment sections PHDR, INTERP, etc.
TODO what are the alignment constraints of global symbols? I observe that 4 byte relocations align at 4 bits... what is going on?