Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support Windows Subsystem for Linux #1986

Closed
derekbruening opened this issue Aug 14, 2016 · 11 comments
Closed

support Windows Subsystem for Linux #1986

derekbruening opened this issue Aug 14, 2016 · 11 comments

Comments

@derekbruening
Copy link
Contributor

Windows 10 Anniversary Edition introduced the new Windows Subsystem for Linux (WSL), where a process can run under a subsystem that supports 64-bit Linux ELF binaries and system calls.

Information on WSL:

https://blogs.msdn.microsoft.com/wsl/2016/06/08/wsl-system-calls/

  • Syscall interface is completely in the kernel
  • Only supports ELF64 -- no 32-bit support.
  • Uses "pico processes" which have no PEB or TEB

Looks like there is at least some procfs support:
https://blogs.msdn.microsoft.com/wsl/2016/06/15/wsl-file-system-support/

https://blogs.msdn.microsoft.com/wsl/2016/05/23/pico-process-overview/
Added: better fork support, 4KB-boundary mem mgmt, case-sensitive file names

To install: it's under "Turn Windows features on or off" and you then launch Bash.

Initial tests: DR crashes due to TLS problems (this one is trying late injection -- early or late makes no difference):

Program received signal SIGSEGV, Segmentation fault.
get_tls_thread_id () at /dynamorio_package/core/unix/os.c:2292
2292    /dynamorio_package/core/unix/os.c: No such file or directory.
(gdb) bt
#0  get_tls_thread_id () at /dynamorio_package/core/unix/os.c:2292
#1  0x00000000712d414f in get_thread_id () at /dynamorio_package/core/unix/os.c:2279
#2  0x000000007111974f in deadlock_avoidance_lock (lock=0x713d6180 <options_lock>, acquired=true, ownable=true)
    at /dynamorio_package/core/utils.c:574
#3  0x000000007111a1be in mutex_lock (lock=0x713d6180 <options_lock>) at /dynamorio_package/core/utils.c:885
#4  0x000000007111aa6e in write_lock (rw=0x713d6180 <options_lock>) at /dynamorio_package/core/utils.c:1217
#5  0x0000000071081130 in options_init () at /dynamorio_package/core/options.c:2002
#6  0x000000007109067d in dynamorio_app_init () at /dynamorio_package/core/dynamo.c:412
#7  0x00007f18a773082b in _init (argc=1, argv=0x7ffffec044f8, envp=0x7ffffec04508)
    at /dynamorio_package/core/unix/preload.c:175

(gdb) x/10i $pc
=> 0x712d417f <get_tls_thread_id+27>:   mov    %gs:0x70,%rax
   0x712d4188 <get_tls_thread_id+36>:   mov    %rax,-0x8(%rbp)
   0x712d418c <get_tls_thread_id+40>:   mov    -0x8(%rbp),%rax
   0x712d4190 <get_tls_thread_id+44>:   leaveq
   0x712d4191 <get_tls_thread_id+45>:   retq

read_thread_register** returns non-zero:

(gdb) stepi
123     in /dynamorio_package/core/unix/tls.h
1: x/i $pc
=> 0x712d0968 <read_thread_register+35>:        mov    %gs,%eax
(gdb) stepi
0x00000000712d096a      123     in /dynamorio_package/core/unix/tls.h
1: x/i $pc
=> 0x712d096a <read_thread_register+37>:        mov    %eax,-0x4(%rbp)
(gdb) p /x $eax
$1 = 0x2b
(gdb) p /x $gs
$2 = 0x0
(gdb) info reg
rax            0x2b     43
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

gdb think it's 0 yet when I read it I get 0x2b.

os_tls_init for thread 6437
tls_get_fs_gs_segment_base selector 53 index 10 ldt 0
arch_prctl fs => 0x00007fffd14922a8
tls_get_fs_gs_segment_base selector 2b index 5 ldt 0
arch_prctl gs => 0x00007fffd14922a8
privload_tls_init: app TLS segment base is 0x00007fffd14922a8
privload_tls_init: allocated 8192 at 0x000000004eb3d000
privload_tls_init: adjust thread pointer to 0x000000004eb3e700
thread 6437 app lib tls base: 0x00007fffd14922a8, alt tls base: 0x00007fffd14922a8
thread 6437 priv lib tls base: 0x000000004eb3e700, alt tls base: 0x000000004eb39000, DR's tls base: 0x000000004eb39000
os_tls_init: cur gs base is 0x00007fffd1492370
os_tls_init: arch_prctl successful for base 0x000000004eb39000
SYSLOG_ERROR: Application /bin/ls (6437).  Internal Error: DynamoRIO debug check failure: /work/dr/git/src/core/unix/os.
c:1804 is_thread_tls_initialized()

It looks like the ARCH_SET_GS returns 0 for success but does not actually do anything, as a subsequent ARCH_GET_GS gets the same value as before the set.

Same with FS:

arch_prctl get fs => 0x00007fffe9cc8570 res=0x0000000000000000
arch_prctl set fs 0x000000004d091000 => res=0x0000000000000000
arch_prctl get fs => 0x00007fffe9cc8570 res=0x0000000000000000

But how does native work then?
"strace ls" shows:

arch_prctl(ARCH_SET_FS, 0x7f995ac00840) = 0
@derekbruening
Copy link
Contributor Author

It looks like ARCH_SET_GS does work, but ARCH_GET_GS (and _FS) is just plain broken and always returns the same, incorrect value. Combined with the selector not being set by the kernel, this is going to be a pain as we cannot query these segment bases. If we do not support attach and only support early injection we can use the cached value we observe ld.so passing to ARCH_SET_FS, making sure all uses of get_segment_base() use that. So it should be doable if we can figure out a sequence to determine whether our TLS is initialized. We'll have to see whether a new thread has any weirdness.

@derekbruening
Copy link
Contributor Author

To identify uninitialized TLS, I was hoping to clear the selector and use
0x2b vs 0, but it looks like the kernel sets it back to 0x2b on a
transition out of the kernel or sthg:

     print_file(STDERR, "gs selector was "PFX"\n", read_thread_register(SEG_GS));
     byte *new_sel = 0;
     WRITE_DR_SEG(new_sel);
     print_file(STDERR, "gs selector is now "PFX"\n", read_thread_register(SEG_GS));
     print_file(STDERR, "gs selector is "PFX"\n", read_thread_register(SEG_GS));
=>
gs selector was 0x000000000000002b
gs selector is now 0x0000000000000000
gs selector is 0x000000000000002b

@derekbruening
Copy link
Contributor Author

Xref two other platforms with problematic TLS. If we come up with a general solution we may want to consider applying to all of them: #1931, #1936

@derekbruening
Copy link
Contributor Author

For now I put in detection of WSL and a message that it is unsupported.

@derekbruening
Copy link
Contributor Author

I put in a hack to work for the main thread to see what other problems we
hit and it's just a few asserts (maps layout, app state at end that's prob
related to the segment).

For subsequent threads: since we can't rely on the selector value or kernel
query, should we do a safe_read (presumbly can't for main thread but should
be ok later) of gs:TLS_SELF_OFFSET? We have no dcontext for a try/except
nor linear address for safe_read so we'd have to make our own asm
safe_read, and make sure master_signal_handler doesn't freak out w/ no
dcontext for the safe read.

@derekbruening
Copy link
Contributor Author

#2089 does solve some of the TLS issues here.

We need further work though to implement get_segment_base(). The plan is to add a safe read along the lines of #2089's to read DR's self and the app or priv lib self.

@derekbruening
Copy link
Contributor Author

I tested the memtrace_x86_text client and it works running "ls" now (with maps query curiosities).

Dr. Memory, however, hangs while reading the maps file: I think, gdb does not operate very well here. So there is more work to do on maps queries.

@derekbruening
Copy link
Contributor Author

An example maps issue:

<CURIOSITY : (iter.vm_end - iter.vm_start == ((((ptr_uint_t)size) + (((4*1024))-1)) & (~((ptr_uint_t)((4*1024))-1)))) in file /work/dr/git/src/core/unix/os.c line 7209
version 6.1.17013, custom build
-no_dynamic_options -code_api -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct
0x0000000054402b30 0x00000000712e21e6
0x0000000054402c50 0x00000000712e27b0
0x0000000054402cb0 0x00000000712e33a9
0x0000000054402e30 0x000000007110dce2
0x0000000054402e60 0x0000000071105252
0x0000000054402f20 0x000000007110157e
0x0000000054402ff0 0x00000000543a6ec5
0x00007ffff8147060 0x00007fccd6798dd0>
<get_memory_info mismatch! (can happen if os combines entries in /proc/pid/maps)
        os says: 0x00007fccd6571000-0x00007fccd6773000 prot=0x00000000
        cache says: 0x00007fccd676f000-0x00007fccd6773000 prot=0x00000005
>

@derekbruening
Copy link
Contributor Author

The Dr. Memory hang seems to have disappeared when I use the latest DR there: perhaps the hang was #2557.

Another problem, though, with the latest DR is this assert: core/unix/signal_linux_x86.c:530 sc->fpstate != NULL. Even when I add fp instructions to the mmx ones I can't get fpstate to be non-NULL. For now I'm downgrading the assert, but once we start testing signals on WSL we'll likely have to do more.

@derekbruening derekbruening self-assigned this Aug 24, 2017
@derekbruening
Copy link
Contributor Author

Commits related to this issue:

  • 2016-08-28 322330e i#1986 WSL: add detection and unsupported message
  • 2016-12-06 826aae0 i#1986 WSL: use safe reads for get_segment_base() on WSL
  • 2017-08-24 b022ca9 i#1986 WSL: relax fpstate assert and unsupported syslog

@derekbruening
Copy link
Contributor Author

The basics seem to work so I'm closing this issue. Any new specific problems should open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant