Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow running a single unmodified regular (non-PIE) Linux executable #190

Closed
nyh opened this issue Feb 3, 2014 · 19 comments
Closed

Allow running a single unmodified regular (non-PIE) Linux executable #190

nyh opened this issue Feb 3, 2014 · 19 comments

Comments

@nyh
Copy link
Contributor

nyh commented Feb 3, 2014

OSv currently can only run position-independent shared objects.

This means we usually can't run a random Linux application without recompiling it (with -fPIC -shared), unless we're lucky and the application is already mostly a .so (like is the case in the JVM).

I want to allow running a single position-dependent executable, while moving the rest of OSv to where this executable allows us.

This will probably come with some strings attached: Obviously, only one such executable can be run at a time. Also, we may need to know at boot time about this executable. But I think this should be doable, and will make it really easy to create OSv images out of existing Linux software (e.g., Fedora RPMs) without any recompilation of anything.

@tzach
Copy link
Member

tzach commented Feb 13, 2014

@nyh
Would it be possible to scan the executable for non implemented sys call (e.g. fork)?
If it can be done as part of image building, it can gracefully fail early.

@nyh
Copy link
Contributor Author

nyh commented Feb 13, 2014

It is possible (and very easy) to know if a dynamically-linked executable
might fork, eg by using 'nm'. The problem though is it might use fork in
a non-interesting code path (eg if an option is enabled but you don't plan
to enable it). So you can have false positives.
On Feb 13, 2014 3:54 PM, "Tzach Livyatan" [email protected] wrote:

@nyh https://github.com/nyh
Would it be possible to scan the executable for non implemented sys call
(e.g. fork)?
If it can be done as part of image building, it can gracefully fail early.

Reply to this email directly or view it on GitHubhttps://github.com//issues/190#issuecomment-34979507
.

@tzach
Copy link
Member

tzach commented Feb 13, 2014

Sound like we need a script which given a binary code of a program,
decide whether the program finishes running or continues to run forever.
Shouldn't be too hard!

@glommer
Copy link
Contributor

glommer commented Feb 13, 2014

On Thu, Feb 13, 2014 at 8:38 PM, Tzach Livyatan [email protected]:

Sound like we need a script which given a binary code of a program,
decide whether the program finishes running or continues to run forever.
Shouldn't be too hard!

IMHO, the way to do it is to just let the program run and abort on fork().
An external tool like capstan could have an optional check option to check
if a binary is fully compliant.
If it is, fine. If it isn't, use it at your own risk.

Reply to this email directly or view it on GitHubhttps://github.com//issues/190#issuecomment-34997319
.

@nyh
Copy link
Contributor Author

nyh commented Feb 13, 2014

Good idea. I'll do it right after I figure out a way to square a circle :-)
On Feb 13, 2014 6:38 PM, "Tzach Livyatan" [email protected] wrote:

Sound like we need a script which given a binary code of a program,
decide whether the program finishes running or continues to run forever.
Shouldn't be too hard!

Reply to this email directly or view it on GitHubhttps://github.com//issues/190#issuecomment-34997319
.

@copumpkin
Copy link

It is possible (and very easy) to know if a dynamically-linked executable
might fork, eg by using 'nm'. The problem though is it might use fork in
a non-interesting code path (eg if an option is enabled but you don't plan
to enable it). So you can have false positives.

Doesn't really affect the overall point, but you can also have false negatives, if someone forks directly via syscall without using the libc call.

@gebi
Copy link

gebi commented Aug 17, 2015

which would be the case for all golang executables (they use the libc only for the resolver which is configurable, and AFAIK for getpwuid_r)

(sorry for the noise, i've no idea why i got notified by github for this thread, but anyway if i already got the message i thought i could also write back about it).

@elazarl
Copy link
Contributor

elazarl commented Aug 17, 2015

@copumpkin you can still find that out, by scanning the assembly instructions. This heuristic should work fine for "normal" programs without JIT or self modified code.

@copumpkin
Copy link

@elazarl determining actual executable code to disassemble in a compiled binary is pretty hard to approximate, and generally impossible to get right :) either way, this discussion is entering the realm of impracticality

@elazarl
Copy link
Contributor

elazarl commented Aug 17, 2015

objdump -d ELF|grep syscall is not always correct, but would work for many
"normal" program.

On Mon, Aug 17, 2015 at 9:40 PM, Daniel Peebles [email protected]
wrote:

@elazarl https://github.com/elazarl determining actual executable code
to disassemble in a compiled binary is pretty hard to approximate, and
generally impossible to get right :) either way, this discussion is
entering the realm of impracticality


Reply to this email directly or view it on GitHub
#190 (comment)
.

@copumpkin
Copy link

@elazarl I don't think we're disagreeing 😄

@wkozaczuk wkozaczuk changed the title Allow running a single unmodified Linux executable Allow running a single unmodified regular (non-PIE) Linux executable Apr 14, 2019
@wkozaczuk
Copy link
Collaborator

It looks like most of the discussion is about fork. I understand that fork is not supported on OSv and we have a stub that warns and returns -1 if called. Why is it important to detect if non-PIE uses fork? What am I missing.

I think the most important issue is that non-PIE object would collide with OSv kernel in memory. If that is true, wouldn't a solution be to map kernel so somewhere very high? More specifically we would still load it at 0x200000 in physical memory but then instead of linearly mapping first 1GB 1:1 we would map the kernel part (OSV_KERNEL_BASE) to be the last 10 MB of virtual memory. Would that work in most cases, assuming most executables would typically be in way lower?

Or I am missing some even more fundamental problem.

@nyh
Copy link
Contributor Author

nyh commented Apr 14, 2019

@wkozaczuk you're not missing anything - this issue was hijacked ;-) It was supposed to be about being able to run a regular (not PIE) executable, but then people started to wonder how we'll ever know if this executable can be run because maybe it uses fork() or other things we never implemented. But that's unrelated to the original issue...
I don't remember now the Linux layout and where normal Linux executables are mapped, but we need to figure this out (find some Linux documentation...) and put the OSv kernel possibly in and any other shared objects we later load - in a place that doesn't bother the executable. An unrelated problem we can have is TLS problems, perhaps the same ones as PIE already has. I'm hoping we won't have additional problems, but am not sure.

@wkozaczuk
Copy link
Collaborator

I spent some time researching 64-bit part of Linux kernel boot process and here is what I found:

  • the entry point of 64-bit vmlinux elf is phys_startup_64 whose virtual address is 0x1000000 (16MB) and this is where the first out of 4 LOAD segments is loaded at in physical memory per this:
readelf -s hello-vmlinux.bin | grep phys_startup
 52764: 0000000001000000     0 NOTYPE  GLOBAL DEFAULT  ABS phys_startup_64
readelf -l hello-vmlinux.bin

Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
                 0x0000000000b6e000 0x0000000000b6e000  R E    0x200000
  LOAD           0x0000000000e00000 0xffffffff81c00000 0x0000000001c00000
                 0x00000000000aa000 0x00000000000aa000  RW     0x200000
  LOAD           0x0000000001000000 0x0000000000000000 0x0000000001caa000
                 0x000000000001f6d8 0x000000000001f6d8  RW     0x200000
  LOAD           0x00000000010ca000 0xffffffff81cca000 0x0000000001cca000
                 0x0000000000125000 0x000000000040c000  RWE    0x200000
  NOTE           0x0000000000a031d4 0xffffffff818031d4 0x00000000018031d4
                 0x0000000000000024 0x0000000000000024         0x4

 Section to Segment mapping:
  Segment Sections...
   00     .text .notes __ex_table .rodata .pci_fixup __ksymtab __ksymtab_gpl __kcrctab __kcrctab_gpl __ksymtab_strings __param __modver
   01     .data __bug_table .vvar
   02     .data..percpu
   03     .init.text .altinstr_aux .init.data .x86_cpu_dev.init .parainstructions .altinstructions .altinstr_replacement .iommu_table .apicdrivers .exit.text .smp_locks .data_nosave .bss .brk
   04     .notes
  • phys_startup_64 really is just an alias to startup_64 routine whose address is 0xffffffff81000000 and which jumps to another routine __startup_64 which besides other things fixes paging tables (Linux at this point is really in 64-bit mode with 1st 1GB mapped like in OSv) by, I think, repointing some part of virtual memory (possibly just kernel code) to 0xffffffff81000000. So it looks Linux kernel is remapping itself as it is running.
ffffffff81000000     0 NOTYPE  GLOBAL DEFAULT    1 startup_64

So the question is: could we do something similar where our loader.elf would still be loaded at 0x200000 in physical RAM but start32 (and new vmlinux_entry64) load slightly different version of ident_pt_l4 paging tables where the kernel code portion (elf_start-elf_end) would map to the very high part of virtual memory (higher than ffffffff81000000). I think the higher half of virtual memory in Linux is reserved by kernel so I think no Linux executable would ever collide with OSv kernel (if that is the main problem). Also OSV_KERNEL_BASE would need to match new base (possibly ffffffff80200000 = ffffffff80000000 + 200000 or higher) to make gcc build loader.elf with higher addresses.

Is this as simple as this? Or would there be some bigger ripple effects of changing OSV_KERNEL_BASE to much higher address (large mcmodel=large ?, as described here)

@wkozaczuk
Copy link
Collaborator

I have experimented a bit to see if my theory about moving kernel away is right (and what I have found is not necessarily a proof but just a good indication) and I have managed to run simple non-PIE executable:

#include <stdio.h>

int main(int argc, char* argv[]) {
   printf("Hello!\n");
}

built like so:

gcc hello.c -no-pie -o hello
readelf --segments hello

Elf file type is EXEC (Executable file)
Entry point 0x401040
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x0000000000000268 0x0000000000000268  R      0x8
  INTERP         0x00000000000002a8 0x00000000004002a8 0x00000000004002a8
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x0000000000000438 0x0000000000000438  R      0x1000
  LOAD           0x0000000000001000 0x0000000000401000 0x0000000000401000
                 0x00000000000001bd 0x00000000000001bd  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000402000 0x0000000000402000
                 0x0000000000000148 0x0000000000000148  R      0x1000
  LOAD           0x0000000000002e10 0x0000000000403e10 0x0000000000403e10
                 0x0000000000000220 0x0000000000000228  RW     0x1000
  DYNAMIC        0x0000000000002e20 0x0000000000403e20 0x0000000000403e20
                 0x00000000000001d0 0x00000000000001d0  RW     0x8
  NOTE           0x00000000000002c4 0x00000000004002c4 0x00000000004002c4
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_EH_FRAME   0x000000000000200c 0x000000000040200c 0x000000000040200c
                 0x000000000000003c 0x000000000000003c  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002e10 0x0000000000403e10 0x0000000000403e10
                 0x00000000000001f0 0x00000000000001f0  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 
   03     .init .plt .text .fini 
   04     .rodata .eh_frame_hdr .eh_frame 
   05     .init_array .fini_array .dynamic .got .got.plt .data .bss 
   06     .dynamic 
   07     .note.ABI-tag .note.gnu.build-id 
   08     .eh_frame_hdr 
   09     
   10     .init_array .fini_array .dynamic .got 

with following changes to OSv:

diff --git a/Makefile b/Makefile
index a433074d..9b65b1cf 100644
--- a/Makefile
+++ b/Makefile
@@ -419,7 +419,7 @@ ifeq ($(arch),x64)
 
 # kernel_base is where the kernel will be loaded after uncompression.
 # lzkernel_base is where the compressed kernel is loaded from disk.
-kernel_base := 0x200000
+kernel_base := 0x2000000
 lzkernel_base := 0x100000
 
 $(out)/arch/x64/boot16.o: $(out)/lzloader.elf
diff --git a/core/elf.cc b/core/elf.cc
index ca9226f6..8be53b43 100644
--- a/core/elf.cc
+++ b/core/elf.cc
@@ -259,10 +259,10 @@ void file::load_elf_header()
     // ET_EXEC (ordinary, position-dependent executables) but it will require
     // loading them at their specified address and moving the kernel out of
     // their way.
-    if (_ehdr.e_type != ET_DYN) {
-        throw osv::invalid_elf_error(
-                "bad executable type (only shared-object or PIE supported)");
-    }
+    //if (_ehdr.e_type != ET_DYN) {
+    //    throw osv::invalid_elf_error(
+    //            "bad executable type (only shared-object or PIE supported)");
+    // }
 }
 
 void file::read(Elf64_Off offset, void* data, size_t size)
@@ -286,6 +286,13 @@ void* align(void* addr, ulong align, ulong offset)
 
 void object::set_base(void* base)
 {
+    printf("--> set_base called with base: %p\n", base);
+    if (_ehdr.e_type != ET_DYN) {
+        printf("--> Not a DYN -> ignoring passed in address and setting _base to 0x0\n");
+        _base = 0x0;
+        return;
+    }
+
     auto p = std::min_element(_phdrs.begin(), _phdrs.end(),
                               [](Elf64_Phdr a, Elf64_Phdr b)
                                   { return a.p_type == PT_LOAD
@@ -1186,6 +1193,7 @@ program::load_object(std::string name, std::vector<std::string> extra_path,
         trace_elf_load(name.c_str());
         auto ef = std::shared_ptr<object>(new file(*this, f, name),
                 [=](object *obj) { remove_object(obj); });
+        printf("Setting base for ELF %s\n", name.c_str());
         ef->set_base(_next_alloc);
         ef->setprivate(true);
         // We need to push the object at the end of the list (so that the main
diff --git a/include/osv/elf.hh b/include/osv/elf.hh
index 0f1792c7..dbd7b422 100644
--- a/include/osv/elf.hh
+++ b/include/osv/elf.hh
@@ -363,6 +363,7 @@ public:
     size_t initial_tls_size() { return _initial_tls_size; }
     void* initial_tls() { return _initial_tls.get(); }
     std::vector<ptrdiff_t>& initial_tls_offsets() { return _initial_tls_offsets; }
+    bool is_executable() { return _ehdr.e_type != ET_DYN; }
 protected:
     virtual void load_segment(const Elf64_Phdr& segment) = 0;
     virtual void unload_segment(const Elf64_Phdr& segment) = 0;

and yielded this output:

OSv v0.53.0-5-g053e0a40
--> set_base called with base: 0x2000000
--> Not a DYN -> ignoring passed in address and setting _base to 0x0
eth0: 192.168.122.15
Setting base for ELF /libvdso.so
--> set_base called with base: 0x100000000000
Setting base for ELF /hello
--> set_base called with base: 0x100000004030
--> Not a DYN -> ignoring passed in address and setting _base to 0x0
Hello!

Obviously my patch assumes the ELF executable loads somewhere below 32MB (updated KERNEL_BASE) and I have a sense it has some other holes.

I even tried more complex example - python3x where I replaced python executable by pointing to python3.6 one on my host (Ubuntu) that happened to be ELF executable (unlike 2.6 which is a pie). This time the app actually hangs:

OSv v0.53.0-5-g053e0a40
--> set_base called with base: 0x2000000
--> Not a DYN -> ignoring passed in address and setting _base to 0x0
eth0: 192.168.122.15
Setting base for ELF /libvdso.so
--> set_base called with base: 0x100000000000
Setting base for ELF /python3
--> set_base called with base: 0x100000004030
--> Not a DYN -> ignoring passed in address and setting _base to 0x0
Setting base for ELF /usr/lib/libexpat.so.1
--> set_base called with base: 0

because probably there is a hole in my patch that sets _base of usr/lib/libexpat.so.1 to set to 0x0 (does not seem to be right).

What is interesting per my patch the kernel ELF itself _base is set to 0 as well which I thought was wrong so I tried another patch which I thought was more correct (see elf.cc changes only):

diff --git a/core/elf.cc b/core/elf.cc
index ca9226f6..420a956a 100644
--- a/core/elf.cc
+++ b/core/elf.cc
@@ -259,10 +259,10 @@ void file::load_elf_header()
     // ET_EXEC (ordinary, position-dependent executables) but it will require
     // loading them at their specified address and moving the kernel out of
     // their way.
-    if (_ehdr.e_type != ET_DYN) {
-        throw osv::invalid_elf_error(
-                "bad executable type (only shared-object or PIE supported)");
-    }
+    //if (_ehdr.e_type != ET_DYN) {
+    //    throw osv::invalid_elf_error(
+    //            "bad executable type (only shared-object or PIE supported)");
+    // }
 }
 
 void file::read(Elf64_Off offset, void* data, size_t size)
@@ -286,6 +286,7 @@ void* align(void* addr, ulong align, ulong offset)
 
 void object::set_base(void* base)
 {
+    printf("--> set_base called with base: %p\n", base);
     auto p = std::min_element(_phdrs.begin(), _phdrs.end(),
                               [](Elf64_Phdr a, Elf64_Phdr b)
                                   { return a.p_type == PT_LOAD
@@ -1186,7 +1187,14 @@ program::load_object(std::string name, std::vector<std::string> extra_path,
         trace_elf_load(name.c_str());
         auto ef = std::shared_ptr<object>(new file(*this, f, name),
                 [=](object *obj) { remove_object(obj); });
-        ef->set_base(_next_alloc);
+        printf("Setting base for ELF %s\n", name.c_str());
+        if (ef->is_executable()) {
+            printf("--> Not a DYN -> ignoring passed in address and setting _base to 0x0\n");
+            ef->set_base(0x0);
+        }
+        else {
+            ef->set_base(_next_alloc);
+        }
         ef->setprivate(true);
         // We need to push the object at the end of the list (so that the main
         // shared object gets searched before the shared libraries it uses),

and the same hello EXEC that worked with previous patch hang this time:

OSv v0.53.0-5-g053e0a40
--> set_base called with base: 0x2000000
eth0: 192.168.122.15
Setting base for ELF /libvdso.so
--> set_base called with base: 0x100000000000
Setting base for ELF /hello
--> Not a DYN -> ignoring passed in address and setting _base to 0x0
--> set_base called with base: 0

I wonder if there is some bug with setting a _base for kernel ELF. I remember we had an issue related to this and had to create libvdso.so.

@wkozaczuk
Copy link
Collaborator

With a slightly better patch:

diff --git a/core/elf.cc b/core/elf.cc
index ca9226f6..169a5519 100644
--- a/core/elf.cc
+++ b/core/elf.cc
@@ -259,10 +259,10 @@ void file::load_elf_header()
     // ET_EXEC (ordinary, position-dependent executables) but it will require
     // loading them at their specified address and moving the kernel out of
     // their way.
-    if (_ehdr.e_type != ET_DYN) {
-        throw osv::invalid_elf_error(
-                "bad executable type (only shared-object or PIE supported)");
-    }
+    //if (_ehdr.e_type != ET_DYN) {
+    //    throw osv::invalid_elf_error(
+    //            "bad executable type (only shared-object or PIE supported)");
+    // }
 }
 
 void file::read(Elf64_Off offset, void* data, size_t size)
@@ -286,6 +286,13 @@ void* align(void* addr, ulong align, ulong offset)
 
 void object::set_base(void* base)
 {
+    printf("--> set_base called with base: %p\n", base);
+    if (_ehdr.e_type != ET_DYN) {
+        printf("--> Not a DYN -> ignoring passed in address and setting _base to 0x0\n");
+        _base = 0x0;
+        return;
+    }
+
     auto p = std::min_element(_phdrs.begin(), _phdrs.end(),
                               [](Elf64_Phdr a, Elf64_Phdr b)
                                   { return a.p_type == PT_LOAD
@@ -1186,6 +1193,7 @@ program::load_object(std::string name, std::vector<std::string> extra_path,
         trace_elf_load(name.c_str());
         auto ef = std::shared_ptr<object>(new file(*this, f, name),
                 [=](object *obj) { remove_object(obj); });
+        printf("Setting base for ELF %s\n", name.c_str());
         ef->set_base(_next_alloc);
         ef->setprivate(true);
         // We need to push the object at the end of the list (so that the main
@@ -1201,7 +1209,8 @@ program::load_object(std::string name, std::vector<std::string> extra_path,
         _modules_rcu.assign(new_modules.release());
         osv::rcu_dispose(old_modules);
         ef->load_segments();
-        _next_alloc = ef->end();
+        if (!ef->is_executable())
+           _next_alloc = ef->end();
         add_debugger_obj(ef.get());
         loaded_objects.push_back(ef);
         ef->load_needed(loaded_objects);

I got python3 as non-pie executable running:

OSv v0.53.0-5-g053e0a40
--> set_base called with base: 0x2000000
--> Not a DYN -> ignoring passed in address and setting _base to 0x0
eth0: 192.168.122.15
Setting base for ELF /libvdso.so
--> set_base called with base: 0x100000000000
Setting base for ELF /python3
--> set_base called with base: 0x100000004030
--> Not a DYN -> ignoring passed in address and setting _base to 0x0
Setting base for ELF /usr/lib/libexpat.so.1
--> set_base called with base: 0x100000004030
Setting base for ELF /usr/lib/libz.so.1
--> set_base called with base: 0x1000000410a0
Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting base for ELF /lib/python3.6/lib-dynload/readline.cpython-36m-x86_64-linux-gnu.so
--> set_base called with base: 0x10000041c0b0
Setting base for ELF /usr/lib/libreadline.so.7
--> set_base called with base: 0x100000427490
Setting base for ELF /usr/lib/libtinfo.so.6
--> set_base called with base: 0x100000848c68
>>> print('Works!')
Works!
>>> 
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/lib/python3.6/site.py", line 441, in write_history
    readline.write_history_file(history)
OSError: [Errno 30] Read-only file system

@wkozaczuk
Copy link
Collaborator

Java 11 worked as well:

OSv v0.53.0-5-g053e0a40
--> set_base called with base: 0x2000000
--> Not a DYN -> ignoring passed in address and setting _base to 0x0
eth0: 192.168.122.15
Setting base for ELF /libvdso.so
--> set_base called with base: 0x100000000000
Setting base for ELF /usr/lib/jvm/java/bin/java
--> set_base called with base: 0x100000004030
--> Not a DYN -> ignoring passed in address and setting _base to 0x0
Setting base for ELF /usr/lib/libz.so.1
--> set_base called with base: 0x100000004030
Setting base for ELF /usr/lib/jvm/java/lib/jli/libjli.so
--> set_base called with base: 0x100000415208
Setting base for ELF /usr/lib/jvm/java/lib/server/libjvm.so
--> set_base called with base: 0x10000080f6d8
Setting base for ELF /usr/lib/jvm/java/lib/libverify.so
--> set_base called with base: 0x100001ea7278
Setting base for ELF /usr/lib/jvm/java/lib/libjava.so
--> set_base called with base: 0x10000220dd48
Setting base for ELF /usr/lib/jvm/java/lib/libzip.so
--> set_base called with base: 0x10000262ab48
Setting base for ELF /usr/lib/jvm/java/lib/libjimage.so
--> set_base called with base: 0x100002a07430
Setting base for ELF /usr/lib/jvm/java/lib/libnio.so
--> set_base called with base: 0x100002e2cc78
Setting base for ELF /usr/lib/jvm/java/lib/libnet.so
--> set_base called with base: 0x1000032107f0
Hello, World!

with the same problem of hanging in the and as with a pie as I reported on emailing list.

@wkozaczuk
Copy link
Collaborator

@nyh So it seems there is no fundamental problem in supporting non-PIE executables except to make sure they do not collide with kernel.

What if we make the changes to dynamic loader to accommodate 0-based offset as my research shows. And then during build process we verify there is no collision between executable segment addresses in memory and OSv kernel and calculate value of KERNEL_BASE to avoid collision and print to user what he needs to change KERNEL_BASE to in Makefile. We probably should do the same check for collision in runtime.

Also I think in most cases the default offset of the executable is typically between 0x400000 and 0x500000 so users would not waste much memory. The bonus would be to finally fix arch_memory_setup to add memore below kernel elf_start to avoid any such waste,

In future if we manage to somehow make kernel be mapped very high in virtual memory this collision would never happen.

@wkozaczuk
Copy link
Collaborator

With the latest 2 patches I have recently sent:

  1. Allow running non-PIE executables that do not collide with kernel
  2. Start using memory below kernel

OSv fundamentally supports running single non-PIE executable as long as it does not collide with kernel.

The 1st issue enhances OSv dynamic loader and the 2nd one makes OSv utilize physical memory below the kernel (typically 2M). The 2nd one seems to be unrelated however until we address #1043, this allows us to move kernel_base to the address high enough for at least some executables (tiny java or node.JS) to not collide with kernel and not lose any memory. We could even enhance the makefile/build script to calculate correct kernel_base for given executable and relink the kernel accordingly. Obviously moving kernel higher in virtual memory is the most desired solution to this collision problem.

wkozaczuk added a commit to wkozaczuk/osv that referenced this issue Jun 13, 2019
This patch provides necessary changes to OSv dynamic linker
to allow running non-PIEs (= Position Dependant Executables)
as long as they do not collide with kernel.

Please note this patch does not fully address issue cloudius-systems#190
though it provides necessary groundwork to fully address it in future.
To truly support running unmodified arbitrary non-PIEs
(that typically load at 0x400000 = 4th MB) we need
to modify OSv to load kernel way higher that 0x200000.

There are 2 ways to run non-PIEs with OSv with this patch:

1) Link it by forcing text segment address so it does
   not collide with kernel using -Wl,-Ttext-segment,0x?????? option.

2) Change kernel_base in OSv makefile to a higher address that
   makes it not collide with the executable (for example 0xa00000);
   most non-PIEs by default load at 0x400000

Signed-off-by: Waldemar Kozaczuk <[email protected]>
nyh pushed a commit that referenced this issue Jun 23, 2019
This patch provides all necessary changes to move OSv kernel by 1 GiB higher
in virtual memory space to start at 0x40200000. Most changes involve adding
or subtracting 0x40000000 (OSV_KERNEL_VM_SHIFT) in all relevant places. Please
note that the kernel is still loaded at 2MiB in physical memory.

The motivation for this patch is to make as much space as possible (or just enough)
in virtual memory to allow running unmodified Linux non-PIE executables (issue #190).
Even though due to the advancement of ASLR more and more applications are PIEs (Position
Independent Executables) which are pretty well supported by OSv, there are still many
non-PIEs (Position Dependent Executables) that are out. The most prominent one is
actualy JVM whose most distributions come with tiny (~20K) bootstrap java non-PIE
executable. There are many other examples where small non-PIE executable loads
other shared libraries.

As issue #1043 explains there are at least 3 possible solutions and
this patch implements the 3rd last one described there. Please note that in future
with little effort we could provide slightly beter scheme for OSV_KERNEL_VM_SHIFT
that would allow us to place the kernel even higher at the end of the 2GiB limit
(small memory model) and thus support virtually any non-PIE built using small memory model.

Due to its impact this patch has been tested on following hypervisors:
- QEMU without KVM
- QEMU with KVM
- Firecracker
- VirtualBox 6
- VMware Player
- XEN on EC2
- XEN locally in HVM mode

Fixes #1043

Signed-off-by: Waldemar Kozaczuk <[email protected]>
Message-Id: <[email protected]>
@nyh nyh closed this as completed in 1e22a86 Jun 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants