Map kernel higher in virtual memory #1043

wkozaczuk · 2019-06-04T12:37:50Z

Currently OSv kernel (loader.elf) gets loaded at 0x200000 (2MiB) in physical memory and gets mapped 1:1 in virtual memory. This prevents OSv from executing non-PIC Linux executables because the kernel code collides in virtual memory with those executables as they typically are linked to load at 0x400000 (4MiB). The main motivation is to support executing unmodified Linux apps like java, node, python which come with tiny non-PIC bootstrap executables.

So in order to avoid the collision with non-PIE executables we need to map the kernel as high as possible. But before we go into possible solutions of how we might accomplish it let us describe how the memory layout looks like in OSv now:

 ---------  0x 0000 0000 0000 0000
 |       | 
 ---------  0x 0000 0000 0020 0000  elf_start     --\ 
 |       |                                          |- Kernel (Core ELF) - < 8MB 
 ---------  0x 0000 0000 00a0 0000  elf_start + elf_size 
 |       | 
 |--------  0x 0000 1000 0000 0000  program_base  - 16 T --\  
 |       |                                          |- s_program - 8G 
 |-------|  0x 0000 1002 0000 0000  --\           --X 
 |       |                             |- Program   | 
 |-------|  0x 0000 1004 0000 0000  --/             | 
 |       |                                          |- ELF Namespaces(max: 32) - 256G 
 |       |  ......................                  | 
 |       |                                          | 
 |-------|  0x 0000 1042 0000 0000                --/ 
 |       | 
 ---------  0x ffff 8000 0000 0000  phys_mem  --\ 
 |       |                                      |- Main Area - 16T 
 ---------  0x ffff 9000 0000 0000            --X 
 |       |                                      |- Page Area - 16T 
 ---------  0x ffff a000 0000 0000            --X 
 |       |                                      |- Mempool Area - 16T 
 ---------  0x ffff b000 0000 0000            --X 
 |       |                                      |- Debug Area - 80T 
 ---------  0x ffff ffff ffff ffff            --/

There are at least 3 ways to map kernel higher:

Build kernel ELF with -mcmodel=large that allows placing ELF anywhere high in 64-bit virtual space. Currently the kernel is built with -mcmodel=small which is the default mode that limits it to be within 1st 2G of virtual space. Even though large model is very attractive and seems to be the perfect solution it comes with its one downsides - larger and less efficient code (for more information about the memory models please read this). Also there is assembly code in OSv that does not seem to work with this model (for details please see issues reported in this emailing list).
Build the kernel as PIE (Position Independent Executable) which should allow us to place it anywhere in virtual space as well. I am not sure how difficult it is and what it would entail.
Place kernel ELF as high as possible in the 1st 2G of virtual space. That should leave enough room for most application non-PIEs. Obviously kernel would not be mapped 1:1 as it is now. Also it would be desirable to map it high as early as possible in the boot process. The biggest challenge seems to be figuring out the mapping scheme - where exactly in the 1st 2G would we place the kernel - last 10MB? What about debug mode where loader.elf is around 70-80M and fs=ramfs images that could be even bigger (map ramfs part of loader.elf independently even higher above 2GB)?

The solution 3 seems to be the easiest and least risky one. But number 2 is probably most desirable but more difficult.

Please note this is related to an "umbrella" issue #190. There is also some relevant discussion here on the mailing list - https://groups.google.com/forum/#!topic/osv-dev/hYOt5WIhTrM.

The text was updated successfully, but these errors were encountered:

wkozaczuk · 2019-06-05T15:23:01Z

The key not so trivial issue is removing 1:1 mapping of the loader.elf between physical and virtual memory. If we went with the 3rd option we could even map the kernel in the beginning of the 2nd which I think should accommodate 99% of non-PIEs - by default most non-PIEs link at 4th M and should not exceed 1G in size. If we simply pick 1G as the new start of the kernel we still need to figure out when to set the mapping properly. Also currently we only map 1st G early on so we would have to enhance it to map first 2G as well.

wkozaczuk mentioned this issue Jun 5, 2019

Allow running a single unmodified regular (non-PIE) Linux executable #190

Closed

nyh closed this as completed in 2a1795d Jun 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map kernel higher in virtual memory #1043

Map kernel higher in virtual memory #1043

wkozaczuk commented Jun 4, 2019

wkozaczuk commented Jun 5, 2019

Map kernel higher in virtual memory #1043

Map kernel higher in virtual memory #1043

Comments

wkozaczuk commented Jun 4, 2019

wkozaczuk commented Jun 5, 2019