Skip to content

Latest commit

 

History

History
338 lines (322 loc) · 13.7 KB

abi.md

File metadata and controls

338 lines (322 loc) · 13.7 KB

ABI notes, Jim Kupsch, 2021-09-15

Terminology

  • Using ABI - dependent on system or kernel ABI
  • imported ABI contamination - place where a library's ABI is incorporated in another library or executable

Notes

  1. programming language

    1. names
      1. allowed names & character encoding determined by language
      2. scope determines accessibility of name
        1. global
        2. class
        3. function/method
        4. block
      3. may be nested
        1. classes
        2. namespaces
      4. visibility
        1. within compilation unit only
        2. visible to other compilation units via declarations
    2. types
      1. can be named
      2. data
        1. fundamental types
          1. integers
            1. encoding
            2. signed/unsigned/other
            3. size
          2. floating point
            1. encoding
            2. size
        2. derived types
          1. processor specific vector types
          2. arrays
          3. class/struct/record
            1. collection of named data objects & methods
            2. derived class
              1. extends another class with data and methods
            3. polymorphic (virtual) class
              1. derived class
              2. virtual methods determined at runtime
          4. union
            1. collection of named data objects & methods
            2. each shares the same the same memory
          5. pointers
          6. enums
            1. stored as some int type
            2. has enumerates that map name to constant
          7. bit fields
          8. complex number
        3. type qualifiers
          1. const
          2. volatile
          3. atomic
          4. thread local
      3. function/method
        1. return type
        2. optional type qualifiers
        3. methods have an implicit parameter pointing to the object
        4. parameters
          1. typed parameters
          2. optional variadic parameters
            1. type and number may be fixed (e.g. open)
            2. type and number determined at runtime (e.g. printf)
    3. language runtime support library
      1. dynamic memory management
        1. malloc, std::allocator
      2. runtime type information
      3. exceptions
      4. atexit functionality
    4. object
      1. has a type and size
      2. located in memory or register(s)
      3. represents a
        1. named variable
        2. temporary object
        3. parameter
        4. dynamically allocated
    5. function
      1. has a type
      2. statements define the function's data transformations and calls to other functions
    6. declaration
      1. maps a name to a type
    7. definition
      1. also a declaration
      2. of a type
      3. of a variable
        1. is a named object
      4. of a function
    8. macros
      1. has a name
      2. has substitution text
      3. optionally parameterized
      4. name in source code is textually replace by substitution text before compilation
  2. source code

    1. written in a programming language
    2. textual description of a program
    3. names defined
      1. types
      2. functions
      3. variables
      4. macros
    4. names declared
      1. types
      2. functions
      3. variables
      4. macros
  3. architecture/processor

    1. the processor that the software will run on
    2. has an instruction set architecture (ISA) that is the encoding and semantics of instructions
    3. may have levels where new instructions are added or removed
    4. directs mapping of types and properties
      1. based on efficient hardware support for size and encoding
      2. alignment requirements based on hardware requirements and efficiency
      3. use of emulation software if not available
  4. API

    1. software interface that allows a program to use libraries
    2. declarations present in a library for use by external code
      1. functions
      2. types
      3. data objects
    3. public API is a stable interface to a libraries functions and data allowing the implementation to be enhanced and bugs fixed
    4. no modification required to user code if
      1. types in API are changed to compatible types
      2. additional functions and objects can be added
    5. semantics of functions and data described in documentation
  5. ABI

    1. conventions that allow for the interoperability and compatibility of binary code objects to form an executable that can be run on an OS and architecture.
      1. allows interoperability between
        1. compiler vendors and versions
        2. kernel versions
        3. library implementations and versions
    2. categories
      1. kernel/OS ABI
        1. interface to make system calls to perform OS system calls from a user space process
        2. typically different from the function call ABI in the system ABI
        3. user code will rarely directly use this ABI, instead accessing system calls via a library such as libc
        4. for each OS syscall
          1. syscall number
          2. semantics of functionality
          3. number of parameters & return value
            1. each has a data type
            2. may be input or output
            3. parameter and return type semantics
        5. syscall ABI calling convention
          1. how privileged call to OS is made
          2. how syscall number is passed
          3. how parameters are passed
          4. how results are returned
          5. modification to process state the syscall is allowed to make
            1. stack
            2. registers
            3. flags
      2. system ABI
        1. conventions to call functions, use libraries, begin/end execution
        2. library/executable format
        3. mapping of programming language types to architecture types
        4. function call interface
          1. how parameters are passed
            1. registers
            2. stack (memory)
          2. how results are returned
          3. requirements of the caller
            1. stack alignment
            2. flags/mode requirements
          4. requirements of the callee
            1. flag/mode requirements
            2. register/memory modifications
          5. modification allowed by the callee to CPU state & stack
          6. mechanism to transferred control to function
            1. stack usage
            2. find address of functions entry point
              1. direct call if in same object and not exported
              2. if to external object or exported
                1. GOT, PLT
        5. register usage
        6. loading
          1. how kernel loads executable into memory to begin execution
          2. initial state setup
          3. optionally loads loader
          4. transfers control to object or loader to complete initialization
        7. initial process state
          1. CPU flags
          2. registers
            1. meanings
          3. stack
            1. meaning of data
          4. data from user space and OS
            1. argv array
            2. environment variables
            3. auxiliary vector
        8. rules for signals handlers
        9. mapping of hardware exceptions to signals
        10. threading
          1. thread local data
        11. exception handling
          1. throwing, catching
          2. stack unwinding
        12. CPU, stack, memory conventions
          1. special registers
            1. GOT
            2. frame/base pointer
            3. linkage
          2. stack
            1. red zone
            2. signal
      3. library ABI
        1. created by linking together object files into a library that were created by compiling source code
        2. manifestation of library API as realized in the object code using the system ABI
          1. exported typed functions
          2. exported typed data
  6. compilation

    1. transforms source file(s) to object file
    2. generally source files are compiled separately (a compilation unit)
    3. generates code
      1. use target processor's ISA
      2. using ABI conventions for external functions calls/definitions
        1. imported ABI contamination of symbol use if from another library
      3. accessing data using ABI conventions for the type
        1. imported ABI contamination if type is from another library
      4. functios internal to this object file can be optimized by not following system ABI
      5. for imported-inlined functions
        1. imported ABI contamination if from another library
      6. non-static local variable
        1. generated by code for the function at runtime
        2. imported ABI contamination if type from another library
      7. inlined runtime support library functions
        1. imported ABI contamination from language runtime
    4. generates data
      1. using ABI conventions for
        1. types
        2. size
        3. layout
        4. alignment
      2. imported ABI contamination if definition from another library
      3. global variable data
        1. visibility: outside object
        2. per executable or per thread
      4. static function data
        1. visibility: local to this object
        2. per executable or per thread
    5. maps source code names to object file symbols using ABI
    6. maps source code types to code using ABI
      1. imported ABI contamination if from another library
    7. linkage information
      1. external symbols referenced
      2. how to patch object to allow symbol to be used
    8. synthesizes function/code needed at runtime
      1. implicitly generated functions such constructors/destructors
      2. template instantiation
      3. exception handlers
      4. initialization/finalization
      5. multiple versions of functions or blocks
        1. parallelization
        2. architecture feature enabled
        3. resolver functions to select "best" implementation at runtime
      6. partial inlining
    9. synthesizes runtime necessary data structures
      1. RTTI
      2. exceptions
    10. debug data if needed
    11. data to support link time optimization if needed
    12. macros
      1. textual replacement of a word in the source file
        1. happens before processing by compiler
        2. possibly parameterized
        3. value can be conditionally determined at compile time
      2. names are not present in object file, just artifacts of expansion
      3. imported ABI contamination if macor from another library is exanded
  7. object file

    1. compiled source code
    2. linking information
      1. symbols defined
        1. have course type: function or object (data)
        2. have scope: local or global (exported)
        3. may be versioned
      2. symbols required from other objects/libraries
        1. where in memory to patch is symbols address
    3. no need for types other than symbol names
    4. debug information
    5. link time optimization (LTO) data
  8. linker

    1. creates executable or library from object files and external libraries
    2. relocates object references to function and data
    3. combines from each object file
      1. code
      2. data
      3. debug information
      4. LTO data
      5. external symbols
    4. determines list of symbols to export
    5. symbols can be versioned based on object file symbols and map file
    6. external versioned symbols based on external library passed to linker
  9. executable

    1. file that the OS kernel can load into memory to form a process
    2. may also be a library (rare)
    3. contains
      1. required symbols
      2. external libraries using SONAME to load to resolve needed symbols
  10. library

    1. dynamic library or executable
      1. single object
      2. has an SONAME (library name and ABI version)
    2. static library
      1. collection of object files
    3. contains
      1. list of exported symbols
      2. required symbols
      3. external libraries using SONAME to load to resolve needed symbols
  11. loader

    1. used if dynamic linking is required
    2. loads required libraries recursively based on needed SONAMEs
    3. initializes dynamic linking data structures
    4. patches data using resolved symbols and linkage data
    5. computes runtime determined symbol values
    6. transfers control to executable's entry point
  12. problems

    1. if ABI doesn't cover feature of programming language or compiler
    2. if ABI doesn't cover architecture feature
    3. if tool chain does not follow system or kernel ABI correctly
  13. issues

    1. information to validate library ABI is neither in the libraries or executable as it is discarded by the compiler and linker
    2. needed symbols do not include the SONAME of the library that they should be provided by; symbols may be inadvertently provided by the incorrect library