Skip to content

Latest commit

 

History

History
197 lines (161 loc) · 10.9 KB

7.md

File metadata and controls

197 lines (161 loc) · 10.9 KB

Chapter 7: Non-contiguous Memory Allocation

  • When dealing with large amounts of memory it is better if possible to use physically contiguous pages both in terms of caching and memory access latency.

  • Unfortunately this is not always possible due to external fragmentation problems with the binary buddy allocator (LS - I thought internal fragmentation was more the problem for the binary buddy system? Also is this the only cause of non-contiguous pages?)

  • Linux provides a means for non-contiguous physical memory to be used in contiguous virtual memory via vmalloc().

  • When vmalloc() is used, an area is reserved in the virtual address space between VMALLOC_START and VMALLOC_END - the location of VMALLOC_START varies depending on the amount of available physical memory, but the region will be at least VMALLOC_RESERVE (aliased to __VMALLOC_RESERVE) in size, which on i386 is 128MiB - the exact size of the region was discussed in more detail in section 4.1

  • The page tables in this region are adjusted as needed to point to physical pages which are allocated with the normal physical page allocator - this means that allocation has to be a multiple of the hardware page size.

  • Because performing allocations this way require modifications to be made to the kernel page tables, there is a limitation on how much memory can be mapped with vmalloc() because only the virtual address space between VMALLOC_START and VMALLOC_END is available.

  • Because of the limited nature of vmalloc(), it is used sparingly in the core kernel - in fact in 2.4.22 it is only used for storing swap map information (see chapter 11) and for loading kernel modules into memory.

7.1 Describing Virtual Memory Areas

  • The vmalloc address space is managed with a 'resource map allocator'. struct vm_struct is used for storing the base, size pairs for this allocator and is defined as follows:
struct vm_struct {
        unsigned long flags;
        void * addr;
        unsigned long size;
        struct vm_struct * next;
};
  • In theory a fully-fledged VMA count have been used, but a VMA contains more information that doesn't apply to vmalloc areas and so doing that would be wasteful.

  • Looking at each field:

  1. flags - Either set to VM_ALLOC if used with vmalloc() or VM_IOREMAP when ioremap() is used to map high memory into the kernel virtual address space.

  2. addr - Starting address of the memory block.

  3. size - Size in bytes.

  4. next - Pointer to the next vm_struct, these are ordered by address and the list is protected by the vmlist_lock lock.

  • Each area is separated by at least one page to protect against overruns:
------------------------------------------------------/\/\
|  vmalloc   | Page |  vmalloc   | Page |  vmalloc   |    |
| Allocation | Gap  | Allocation | Gap  | Allocation |    |
------------------------------------------------------/\/\
^                                                         ^
| VMALLOC_START                               VMALLOC_END |
  • When the kernel wants to allocate a new area, the vm_struct list is searched linearly via get_vm_area().

  • Space for the struct is allocated with kmalloc().

  • When the virtual area is used for remapping an area for I/O (known as 'ioremapping'), get_vm_area() will be called directly to map the requested area.

7.2 Allocating a Non-contiguous Area

  • Let's take a look at the vmalloc allocation API:
  1. vmalloc() - Allocate a number of pages in vmalloc space that satisfy the required size.

  2. vmalloc_dma() - Allocate a number of pages in vmalloc space from ZONE_DMA.

  3. vmalloc_32() - Allocates memory that is suitable for 32-bit addressing. This ensures that physical page frames are in ZONE_NORMAL which is required by 32-bit devices.

  • Each of these functions call get_vm_area() to find a region large enough to store the requested amount of memory, searching through a linear linked-list of vm_structs and returning a new struct describing the allocated region.

  • Next they allocate the necessary PGD entries with vmalloc_area_pages() (and subsequently __vmalloc_area_pages()), PMD entries with alloc_area_pmd() and PTE entries with alloc_area_pte() before finally allocating the page with alloc_page().

  • The page table updated by vmalloc() is not the that of the current process, but rather the reference page table stored at init_mm->pgd. As a result processes accessing the vmalloc area will cause a page fault exception (its page tables aren't pointing there) and special handling has to be performed. Diagrammatically:

-------------------                          -------------------
| Process A Calls |                          |  Process B page |
|    vmalloc()    |                          |   page faults   |
-------------------                          | do_page_fault() |
         |                                   -------------------     Process B Address
         v                                             |             Space managed by
-------------------   Reference Page Table             |            Process Page Tables
|  Reserve space  |   --------------------             |            -------------------
|   in Reference  |-\ |                  |             v            |                 |
|   Page Table    | | |                  |    -------------------   |                 |
------------------- | |                  |    |   Fault is in   |   |                 |
         |          | |                  |    |  vmalloc region |   |                 |
         |          | |                  |    -------------------   |                 |
         |          \>|------------------|             |            |                 |
         v            |                  |             |            |                 |
-------------------   |  Inserted Pages  |             |            |                 |
| After reserving |   |                  |             v            |                 |
| space, allocate |   |------------------|   --------------------   |-----------------|
|      pages      |   |                  |   |      Copy in     |   |                 |
-------------------   |   In Reference   |-->|  necessary page  |-->|   Copied Entry  |
   |                  |                  |   |    table entry   |   |                 |
   |                  |------------------|   |  from reference  |   |-----------------|
   |                  |                  |   --------------------   |                 |
   |                  |    Page Table    |                          |                 |
   |                  |                  |                          |                 |
   |     /----------->|------------------|                          |                 |
   |     |            |                  |                          |                 |
   |     |            |------------------|<--    VMALLOC_START_  -->|                 |
   |     |            |                  |                          |                 |
   |     |            |                  |                          |                 |
   |     |            |                  |                          |                 |
   v     |            |                  |                          |                 |
-------------------   |                  |                          |                 |
| Buddy Allocator |   |------------------|<--     PAGE_OFFSET    -->|-----------------|
|  alloc_page()   |   |                  |                          |                 |
-------------------   |                  |                          |                 |
   ^      ^      ^    |    Userspace     |                          |    Userspace    |
   |      |      |    |     Portion      |                          |     Portion     |
------ ------ ------  |                  |                          |                 |
|page| |page| |page|  |                  |                          |                 |
------ ------ ------  --------------------<-- init_mm->pgd          -------------------
      Physically            Virtually
 Non-contiguous Pages   Contiguous Pages

7.3 Freeing a Non-contiguous Area

  • The function vfree() is responsible for freeing a virtual area. It linearly scans the list of vm_structs looking for the appropriate region and then calls vmfree_area_pages() on the region to be freed.

  • vmfree_area_pages() is the exact opposite of vmalloc_area_pages() - it walks the page table and frees up PTEs and associated pages for the region rather than allocating them.