Skip to content

Latest commit

 

History

History
108 lines (63 loc) · 3.71 KB

elf.md

File metadata and controls

108 lines (63 loc) · 3.71 KB

ELF

Linux

Interesting files in v4.0:

  • include/uapi/linux/elf.h: contains most of the format, which is natural since the format is needed from userland to implement compilers, and the most part if not arch dependent
  • include/linux/elf.h
  • arch/x86/include/asm/elf.h: x86 specifics described in the AMD64 ABI extension, e.g. the R_X86_64_64 type
  • fs/binfmt_elf.c: this is where the real action happens. do_execve in src/fs/exec.c from the system call calls load_elf_binary, which prepares everything and culminates in a call to start_thread

GCC

In 5.1, definitions are under various config/ files.

C ABI

_start

Entry point

[System V ABI AMD64][] says that main is the entry point of a C program, and it calls _start which is the ELF entry point as mentioned in the [System V ABI AMD64][]. main is not special in pure assembly

The initial state of the process stack, i.e. when _start is called

http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html describes the call sequence that actually happens on Linux.

TODO where is it mentioned in the arch agnostic standards?

C++ ABI

The [System V ABI AMD64][] links to the [Itanium C++ ABI][].

Relocation

http://stackoverflow.com/questions/12122446/how-does-c-linking-work-in-practice/30507725#30507725

R_386_32

To test this one out, try use:

a: .long s
b: .long s + 0x12345678
s:

then:

as --32 -o main.o main.S
objdump -dzr main.o

on Binutils 2.24 gives:

00000000 <a>:
0:  08 00                   or     %al,(%eax)
            0: R_386_32 .text
2:  00 00                   add    %al,(%eax)

00000004 <b>:
4:  80 56 34 12             adcb   $0x12,0x34(%esi)
            4: R_386_32 .text

This makes it really clear how:

  • a gets 8 added, which is the length of a + b (to reach s)
  • b gets 0x12345608 == 0x12345678 + 8, where 8 is once again the position of s

R_X86_64_PC32

This is another common relocation method that does:

S + A - P

on 4 bytes, where P is the current position of the Program Counter on the text segment, thus the PC on the name, A.K.A. the RIP register.

This method is common in x86-64 because RIP relative addressing is very popular, and for it to work, the relocation must subtract the position P.

Note that %RIP points to the next instruction: so it is common to use A = -4 to remove the offset of the 4 byte address which is at the end of the instruction encoding:

X Y A A A A
Next insruction <-- RIP

Value of the relocated memory before relocation

Does the value of the address to be overwritten before linking matter at all?

R_X86_64_16

8 and 16 bit

GNU adds 8 and 16 bit relocations as an extension to the ELF standard, which it calls with names like R_X86_64_16.

Segment vs segment section

TODO: confirm: each program header represents either a segment, or a segment section, which is a subdivision of segments.

For instance, a C hello world contains the lines (order changed):

LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0006fc 0x0006fc R E 0x200000
PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
INTERP         0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R   0x1
NOTE           0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R   0x4
GNU_EH_FRAME   0x0005d0 0x00000000004005d0 0x00000000004005d0 0x000034 0x000034 R   0x4

so that the LOAD segment contains multiple segment sections PHDR, INTERP, etc.

Alignment of global symbols

TODO what are the alignment constraints of global symbols? I observe that 4 byte relocations align at 4 bits... what is going on?