Skip to content
WALDEMAR KOZACZUK edited this page Apr 11, 2023 · 21 revisions

Low-level Layer

Upon boot time on x86_64, in arch_setup_free_memory(), OSv discovers how much physical memory is available by reading ent820 entries and then linearly maps the identified memory ranges by calling memory::free_initial_memory_range(). On aarch64, the available physical memory information is retrieved from DTB as coded in dtb_setup() in arch/aarch64/arch-dtb.cc. Once the memory is discovered, all corresponding memory ranges are ultimately registered in memory::free_page_ranges of type page_range_allocator that effectively tracks all used/free physical memory and implements lowest level memory allocation logic. The key fields of page_range_allocator are _free_huge and _free. The first one is an intrusive multiset of page ranges of size >= 256 MB, the latter is an array of 16 intrusive lists where each stores page ranges of corresponding logarithmic size. At this level, memory is tracked/allocated/freed in 4K chunks (pages) aligned at 0x...000 addresses, which means that individual page range is a contiguous area of physical memory N-pages long.

So for example, given 100MB from the host on QEMU on x86_64, OSv would find 3 memory ranges - smaller ~640KB in lower memory, medium 1MB located in the 2nd MB, and the largest one starting at wherever loader.elf ends - roughly 9.5MB offset and ending at 100MB. With OSv running with 100MB of RAM and gdb paused right after arch_setup_free_memory(), the free_page_ranges looks like this:

(gdb) osv heap
0x0000400000001000 0x000000000009e000 // Lower RAM < 640KB 
0x0000400000100000 0x0000000000100000 // 2nd MB - ends right below the kernel
0x000040000094d000 0x0000000005a90000 // Starts right above the kernel

For more details on how memory is managed and set up at the lowest level, please read Managing Memory Pages.

High-level layer

From this point on, OSv is ready to handle "malloc/free" family and memory::alloc_page()/free_page() calls by drawing/releasing memory from/to free_page_ranges in form of page_range objects (see methods page_range_allocator::alloc(), alloc_aligned() and free()) and mapping to virtual address ranges. However until much later when SMP is enabled (multiple vCPUs are fully activated), the allocations would be handled at a different granularity than after SMP is on. In addition in the first phase (pre-SMP enabled) the allocations draw pages directly from the free_page_ranges object, whereas after SMP is enabled they draw memory from L1/L2 pools.

There are as many L1 pools as vCPUs (per-cpu construct) and a single global L2 pool - global_l2. The L1 pools draw pages in form of page_batch from the global L2 pool which in turn draws page ranges from free_page_ranges. Both L1 and L2 pools operate at page size level and implement low/high watermark algorithm (for example L1 pools keep at least 128 pages of memory available). The high-level memory allocation functions (like malloc) draw memory from L1 pool using untracked_alloc_page and untracked_free_page .

TODO: Describe L1 and L2 in more detail

It is also worth noting that most malloc functions (except for malloc_large) end up calling std_malloc() that allocates virtual memory in different ways depending on whether we are in pre/post-SMP enabled mode and depending on the size of the memory request. The sizes ranges are:

  • x <= 1024 (page size/4)
  • 1024 < x <= 4096
  • x > 4096

If we are in SMP-enabled mode and the requested size is less or equal to 1024 bytes, the allocation is going to be delegated to malloc pools. Malloc pools are setup per-CPU and dedicated to specific size range (2^(k-1) < x <=2^k where k is less or equal 10). The way std_malloc() handles <= 4K allocations directly impacts varying degrees of underlying physical memory utilization. For example, any request above 1024 bytes will use the whole page and in the worst-case scenario waste 3K of physical memory. Similarly, malloc pool allocations in worst-case scenarios may waste up to half of the 2^k-1 segment size.

TODO: Describe how exactly pre-SMP and post-SMP memory allocation differs.

The malloc_large/free_large() functions draw/release memory directly from/to free_page_ranges in both pre-and post-SMP-enabled phases.

Virtual Memory Mapping

Linear

x86_64

       vaddr            paddr     size perm memattr name
    40200000           200000   67c434 rwxp  normal kernel
400000000000                0 40000000 rwxp  normal main
4000000f0000            f0000    10000 rwxp  normal dmi
4000000f5a10            f5a10      17c rwxp  normal smbios
400040000000         40000000 3ffdd000 rwxp  normal main
40007fe00000         7fe00000   200000 rwxp  normal acpi
4000feb91000         feb91000     1000 rwxp  normal pci_bar
4000feb92000         feb92000     1000 rwxp  normal pci_bar
4000fec00000         fec00000     1000 rwxp  normal ioapic
500000000000                0 40000000 rwxp  normal page
500040000000         40000000 3ffdd000 rwxp  normal page
600000000000                0 40000000 rwxp  normal mempool
600040000000         40000000 3ffdd000 rwxp  normal mempool

aarch64

       vaddr            paddr     size perm memattr name
     8000000          8000000    10000 rwxp     dev gic_dist
     8010000          8010000    10000 rwxp     dev gic_cpu
     9000000          9000000     1000 rwxp     dev pl011
     9010000          9010000     1000 rwxp     dev pl031
    10000000         10000000 2eff0000 rwxp     dev pci_mem
    3eff0000         3eff0000    10000 rwxp     dev pci_io
   fc0000000         40000000   7de000 rwxp  normal kernel
  4010000000       4010000000 10000000 rwxp     dev pci_cfg
40000a000000          a000000      200 rwxp  normal virtio_mmio_cfg
40000a000200          a000200      200 rwxp  normal virtio_mmio_cfg
40000a000400          a000400      200 rwxp  normal virtio_mmio_cfg
40000a000600          a000600      200 rwxp  normal virtio_mmio_cfg
40000a000800          a000800      200 rwxp  normal virtio_mmio_cfg
40000a000a00          a000a00      200 rwxp  normal virtio_mmio_cfg
40000a000c00          a000c00      200 rwxp  normal virtio_mmio_cfg
40000a000e00          a000e00      200 rwxp  normal virtio_mmio_cfg
4000407de000         407de000 7f822000 rwxp  normal main
5000407de000         407de000 7f822000 rwxp  normal page
6000407de000         407de000 7f822000 rwxp  normal mempool

Non-Linear (example)

All the non-linear mappings fall within 0x000000000000 : 0x400000000000 minus any collisions with the devices memory linear mappings.

(gdb) osv mmap
0x0000000000000000 0x0000000000000000 [0.0 kB]         flags=none     perm=none
0x0000100000000000 0x0000100000009000 [36.0 kB]        flags=fpmF     perm=r    offset=0x00000000 path=/usr/lib/fs/libsolaris.so
0x0000100000009000 0x000010000009c000 [588.0 kB]       flags=fpmF     perm=rx   offset=0x00009000 path=/usr/lib/fs/libsolaris.so
0x000010000009c000 0x00001000000c4000 [160.0 kB]       flags=fpmF     perm=r    offset=0x0009c000 path=/usr/lib/fs/libsolaris.so
0x00001000000c4000 0x00001000000c6000 [8.0 kB]         flags=fpmF     perm=r    offset=0x000c3000 path=/usr/lib/fs/libsolaris.so
0x00001000000c6000 0x00001000000c9000 [12.0 kB]        flags=fpmF     perm=rw   offset=0x000c5000 path=/usr/lib/fs/libsolaris.so
0x00001000000c9000 0x00001000000e2000 [100.0 kB]       flags=fp       perm=rw  
0x00001000000e2000 0x00001000000e3000 [4.0 kB]         flags=fmF      perm=r    offset=0x00000000 path=/libvdso.so
0x00001000000e3000 0x00001000000e4000 [4.0 kB]         flags=fmF      perm=rx   offset=0x00001000 path=/libvdso.so
0x00001000000e4000 0x00001000000e5000 [4.0 kB]         flags=fmF      perm=r    offset=0x00002000 path=/libvdso.so
0x00001000000e5000 0x00001000000e6000 [4.0 kB]         flags=fmF      perm=r    offset=0x00002000 path=/libvdso.so
0x00001000000e6000 0x00001000000e7000 [4.0 kB]         flags=fmF      perm=rw   offset=0x00003000 path=/libvdso.so
0x00001000000e7000 0x00001000000e8000 [4.0 kB]         flags=fmF      perm=r    offset=0x00000000 path=/hello
0x00001000000e8000 0x00001000000e9000 [4.0 kB]         flags=fmF      perm=rx   offset=0x00001000 path=/hello
0x00001000000e9000 0x00001000000ea000 [4.0 kB]         flags=fmF      perm=r    offset=0x00002000 path=/hello
0x00001000000ea000 0x00001000000eb000 [4.0 kB]         flags=fmF      perm=r    offset=0x00002000 path=/hello
0x00001000000eb000 0x00001000000ec000 [4.0 kB]         flags=fmF      perm=rw   offset=0x00003000 path=/hello
0x0000200000000000 0x0000200000001000 [4.0 kB]         flags=p        perm=none
0x0000200000001000 0x0000200000002000 [4.0 kB]         flags=p        perm=none
0x0000200000002000 0x0000200000101000 [1020.0 kB]      flags=p        perm=rw   // Most likely stack
0x0000200000101000 0x0000200000102000 [4.0 kB]         flags=p        perm=none
0x0000200000102000 0x0000200000201000 [1020.0 kB]      flags=p        perm=rw   // Most likely stack
0x0000400000000000 0x0000400000000000 [0.0 kB]         flags=none     perm=none
Clone this wiki locally