Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: IOError: stat: permission denied (EACCES) on udocker #34918

Closed
dr-br opened this issue Feb 28, 2020 · 73 comments
Closed

ERROR: IOError: stat: permission denied (EACCES) on udocker #34918

dr-br opened this issue Feb 28, 2020 · 73 comments

Comments

@dr-br
Copy link

dr-br commented Feb 28, 2020

Intro+Relevance

udocker is a basic user tool to execute simple docker containers in user space without requiring root privileges.
It is the only means to deploy containerized jupyter+X on our super computers.
podman, docker and singularity can not be used (subuid/guid issues…) on the server nodes.

Problem

Current Julia versions (including nightly build) fail to run within a udocker-container. The container OSses testet are ubuntu (18.04) and centos:latest.
Older Julia versions like 1.0.5 and older perfectly work.
Tests with podman or docker are successful for all versions of Julia.

When I execute julia inside a udocker container I get the following error message (the same under ubuntu and centos):

./julia-1.3.1/bin/julia 
ERROR: IOError: stat: permission denied (EACCES) for file "/root/julia-1.3.1/bin/../etc/julia/startup.jl"
Stacktrace:
 [1] stat(::String) at ./stat.jl:69
 [2] isfile at ./stat.jl:311 [inlined]
 [3] load_julia_startup() at ./client.jl:314
 [4] exec_options(::Base.JLOptions) at ./client.jl:258
 [5] _start() at ./client.jl:460

Steps to reproduce

Install udocker

curl https://raw.githubusercontent.com/indigo-dc/udocker/devel/udocker.py > udocker
chmod u+rx ./udocker
./udocker install

Start ubuntu container

export PROOT_NO_SECCOMP=1
udocker pull ubuntu
udocker create --name=ubuntu ubuntu
udocker run --user=root --env="HOME=/root" --workdir="/root" ubuntu

Download and run Julia

Within the ubuntu container run:

apt update && apt install wget
wget https://julialang-s3.julialang.org/bin/linux/x64/1.3/julia-1.3.1-linux-x86_64.tar.gz
tar xvzf julia-1.3.1-linux-x86_64.tar.gz
./julia-1.3.1/bin/julia
@DilumAluthge
Copy link
Member

DilumAluthge commented Feb 28, 2020

What happens if you start Julia with the flag --startup-file=no?

For example: (inside the container)

./julia-1.3.1/bin/julia --startup-file=no

@dr-br
Copy link
Author

dr-br commented Feb 28, 2020

./julia-1.3.1/bin/julia --startup-file=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.3.1 (2019-12-30)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

ERROR: IOError: stat: permission denied (EACCES) for file "/root/.julia/logs"
Stacktrace:
 [1] stat(::String) at ./stat.jl:69
 [2] isdir at ./stat.jl:311 [inlined]
 [3] #mkpath#8(::UInt16, ::typeof(mkpath), ::String) at ./file.jl:217
 [4] mkpath at ./file.jl:215 [inlined]
 [5] setup_interface(::REPL.LineEditREPL, ::Bool, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:859
 [6] #setup_interface#45(::Bool, ::Any, ::typeof(REPL.setup_interface), ::REPL.LineEditREPL) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:769
 [7] setup_interface at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:769 [inlined]
 [8] (::Pkg.var"#1#2")(::REPL.LineEditREPL) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Pkg/src/Pkg.jl:432
 [9] __atreplinit(::REPL.LineEditREPL) at ./client.jl:338
 [10] #invokelatest#1 at ./essentials.jl:709 [inlined]
 [11] invokelatest at ./essentials.jl:708 [inlined]
 [12] _atreplinit at ./client.jl:345 [inlined]
 [13] (::Base.var"#770#772"{Bool,Bool,Bool,Bool})(::Module) at ./client.jl:381
 [14] #invokelatest#1 at ./essentials.jl:709 [inlined]
 [15] invokelatest at ./essentials.jl:708 [inlined]
 [16] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./client.jl:366
 [17] exec_options(::Base.JLOptions) at ./client.jl:304
 [18] _start() at ./client.jl:460

[ Info: Disabling history file for this session

@DilumAluthge
Copy link
Member

DilumAluthge commented Feb 28, 2020

Alright let’s try this: (inside the container)

./julia-1.3.1/bin/julia --startup-file=no --history-file=no

@Keno
Copy link
Member

Keno commented Feb 28, 2020

It's a bit odd for stat to throw. Usually that only happens when the permissions are really odd. Can you show the permissions, as seen inside the container for the /root directory, the /root/.julia directory and the /root/julia-1.3.1/bin/../etc/julia directory?

@dr-br
Copy link
Author

dr-br commented Feb 29, 2020

./julia-1.3.1/bin/julia --startup-file=no --history-file=no works without errors, but nothing can be installed:

julia> using Pkg
ERROR: IOError: stat: permission denied (EACCES) for file "/root/.julia/environments/v1.3"
Stacktrace:
 [1] stat(::String) at ./stat.jl:69
 [2] isdir at ./stat.jl:311 [inlined]
 [3] load_path_expand(::String) at ./initdefs.jl:241
 [4] load_path() at ./initdefs.jl:288
 [5] identify_package(::String) at ./loading.jl:219
 [6] identify_package(::Base.PkgId, ::String) at ./loading.jl:206
 [7] identify_package at ./loading.jl:200 [inlined]
 [8] require(::Module, ::Symbol) at ./loading.jl:882

No /root/.julia directory is created.

ls -la
total 180352
drwxr-xr-x  5 root root     4096 Feb 29 16:51 .
drwxr-xr-x 21 root root     4096 Feb 29 16:43 ..
-rw-------  1 root root     1460 Feb 29 16:52 .bash_history
-rw-r--r--  1 root root     3106 Apr  9  2018 .bashrc
drwxr-xr-x  3 root root     4096 Feb 29 16:50 .local
-rw-r--r--  1 root root      148 Aug 17  2015 .profile
-rw-r--r--  1 root root      248 Feb 29 16:50 .wget-hsts
drwxr-xr-x  7 root root     4096 Sep  9 21:08 julia-1.0.5
-rw-r--r--  1 root root 88706549 Sep 11 21:22 julia-1.0.5-linux-x86_64.tar.gz
drwxr-xr-x  8 root root     4096 Dec 30 22:12 julia-1.3.1
-rw-r--r--  1 root root 95929584 Dec 31 00:20 julia-1.3.1-linux-x86_64.tar.gz
ls -la /root/julia-1.3.1/bin/../etc/julia
total 12
drwxr-xr-x 2 root root 4096 Dec 30 22:12 .
drwxr-xr-x 3 root root 4096 Dec 30 22:12 ..
-rw-r--r-- 1 root root  162 Dec 30 22:12 startup.jl

With ./julia-1.0.5/bin/julia everything works as expected.

@DilumAluthge
Copy link
Member

DilumAluthge commented Feb 29, 2020

First, try this: (inside the container)

export JULIA_DEPOT_PATH="$HOME/.julia:"
./julia-1.3.1/bin/julia --startup-file=no --history-file=no

If that does not work, then try this instead: (inside the container)

export JULIA_DEPOT_PATH="$HOME/.julia"
./julia-1.3.1/bin/julia --startup-file=no --history-file=no

@DilumAluthge
Copy link
Member

DilumAluthge commented Feb 29, 2020

I have a few additional questions:

  1. Can you show the permissions (as seen inside the container) for the / directory?
  2. What do you get when you run whoami, groups, and id inside the container?
  3. Inside the container, start Julia with ./julia-1.3.1/bin/julia --startup-file=no --history-file=no, and then run the following commands in the Julia REPL and post the results:
    • julia> @show homedir()
    • julia> @show Base.DEPOT_PATH
    • julia> versioninfo(stdout; verbose = true)
  4. What happens if you delete your ~/.udocker directory (on the host machine) and then try again to reproduce?

@dr-br
Copy link
Author

dr-br commented Mar 1, 2020

export JULIA_DEPOT_PATH="$HOME/.julia:"
./julia-1.3.1/bin/julia --startup-file=no --history-file=no

and

export JULIA_DEPOT_PATH="$HOME/.julia"
./julia-1.3.1/bin/julia --startup-file=no --history-file=no

both lead to ERROR: IOError: stat: permission denied (EACCES) for file "/root/.julia/environments/v1.3" when trying to using Pkg.

Permissions of /

ls -la /
total 72
drwxr-xr-x  21 root root 4096 Mar  1 08:15 .
drwxr-xr-x  21 root root 4096 Mar  1 08:15 ..
drwxr-xr-x   2 root root 4096 Feb 19 01:17 bin
drwxr-xr-x   2 root root 4096 Apr 24  2018 boot
drwxr-xr-x  19 root root 4400 Mar  1 07:52 dev
drwxr-xr-x  31 root root 4096 Mar  1 08:16 etc
drwxr-xr-x   2 root root 4096 Apr 24  2018 home
drwxr-xr-x   9 root root 4096 Mar  1 08:15 lib
drwxr-xr-x   2 root root 4096 Feb 19 01:15 lib64
drwxr-xr-x   2 root root 4096 Feb 19 01:14 media
drwxr-xr-x   2 root root 4096 Feb 19 01:14 mnt
drwxr-xr-x   2 root root 4096 Feb 19 01:14 opt
dr-xr-xr-x 298 root root    0 Mar  1 07:50 proc
drwx------   3 root root 4096 Mar  1 08:16 root
drwxr-xr-x   5 root root 4096 Feb 21 22:20 run
drwxr-xr-x   2 root root 4096 Feb 21 22:20 sbin
drwxr-xr-x   2 root root 4096 Feb 19 01:14 srv
dr-xr-xr-x  13 root root    0 Mar  1 07:50 sys
drwxr-xr-x   2 root root 4096 Mar  1 08:16 tmp
drwxr-xr-x  10 root root 4096 Feb 19 01:14 usr
drwxr-xr-x  11 root root 4096 Feb 19 01:17 var
root@desktop:~# whoami 
root
root@desktop:~# groups
root adm cdrom sudo audio dip video plugdev G111 G114 G121 G127 G133 G134 G1000 G1002
root@desktop:~# id
uid=0(root) gid=0(root) groups=0(root),4(adm),24(cdrom),27(sudo),29(audio),30(dip),44(video),46(plugdev),111(G111),114(G114),121(G121),127(G127),133(G133),134(G134),1000(G1000),1002(G1002)

and finally:

julia> @show homedir()
homedir() = "/root"
"/root"

julia> @show Base.DEPOT_PATH
Base.DEPOT_PATH = ["/root/.julia", "/root/julia-1.3.1/local/share/julia", "/root/julia-1.3.1/share/julia"]
3-element Array{String,1}:
 "/root/.julia"                       
 "/root/julia-1.3.1/local/share/julia"
 "/root/julia-1.3.1/share/julia"      

julia> versioninfo(stdout; verbose = true)
Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  uname: Linux 5.4.0-16-generic #19-Ubuntu SMP Wed Feb 26 18:35:11 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz: 
              speed         user         nice          sys         idle          irq
       #1  3800 MHz       5963 s         62 s       1828 s     220891 s          0 s
       #2  3800 MHz       6187 s         56 s       2048 s     215920 s          0 s
       #3  3801 MHz       6060 s        214 s       1964 s     219314 s          0 s
       #4  3800 MHz       6131 s        154 s       2024 s     219160 s          0 s
       #5  3808 MHz       6243 s         84 s       1872 s     220415 s          0 s
       #6  3808 MHz       5916 s          3 s       1852 s     221188 s          0 s
       #7  3800 MHz       5134 s         32 s       1829 s     221457 s          0 s
       #8  3800 MHz       5726 s         19 s       1886 s     220930 s          0 s
       
  Memory: 15.52651596069336 GB (10002.5703125 MB free)
  Uptime: 2303.0 sec
  Load Avg:  0.3193359375  0.29736328125  0.34814453125
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_DEPOT_PATH = /root/.julia:
  JULIA_DEPOT_PATH = /root/.julia:
  HOME = /root
  TERM = xterm-256color
  PATH = /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

@DilumAluthge
Copy link
Member

Can you run println(mktempdir()) in the Julia REPL and post the results?

@dr-br
Copy link
Author

dr-br commented Mar 1, 2020

./julia-1.3.1/bin/julia --startup-file=no --history-file=no

julia> println(mktempdir())
/tmp/jl_5ejX2C

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 1, 2020

In the Julia REPL, try the following:

julia> rm("/root/.julia"; force = true, recursive = true)
julia> mkpath("/root/.julia")
julia> open("/root/.julia/foo.txt", "w") do io
              println(io, "hello world")
           end
julia> println(read("/root/.julia/foo.txt", String))
julia> println(realpath("/root/.julia/foo.txt"))

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 1, 2020

Also:

  1. What happens if you delete your ~/.udocker directory (on the host machine) and then try again to reproduce?

Oh, and also:

julia> Base.stat("/")
julia> Base.stat("/root")
julia> Base.stat("/tmp")

Also, is there any chance that your host system is running SELinux?

@dr-br
Copy link
Author

dr-br commented Mar 1, 2020

julia> rm("/root/.julia"; force = true, recursive = true)

There is no .julia dir, so I can't delete it


julia> mkpath("/root/.julia")
ERROR: IOError: stat: permission denied (EACCES) for file "/root/.julia"
Stacktrace:
 [1] stat(::String) at ./stat.jl:69
 [2] isdir at ./stat.jl:311 [inlined]
 [3] #mkpath#8(::UInt16, ::typeof(mkpath), ::String) at ./file.jl:217
 [4] mkpath(::String) at ./file.jl:215
 [5] top-level scope at REPL[2]:1

no ~/.julia created


When I manually create mkdir ~/.julia, file creation under julia works then!

julia> open("/root/.julia/foo.txt", "w") do io
                     println(io, "hello world")
                  end

julia> println(read("/root/.julia/foo.txt", String))
hello world


julia> println(realpath("/root/.julia/foo.txt"))
/root/.julia/foo.txt

julia> Base.stat("/")
StatStruct(mode=0o040755, size=4096)

julia> Base.stat("/root")
StatStruct(mode=0o040700, size=4096)

julia> Base.stat("/tmp")
StatStruct(mode=0o041777, size=2957312)

I always start with a rm -rf ~/udocker ;)


SELinux…I don't know. Tested so far on vanilla Ubuntu 19.10 and 20.04 hosts.

@dr-br
Copy link
Author

dr-br commented Mar 1, 2020

Update:
On Red Hat Enterprise Linux Server release 7.7 host everything works as expected with ubuntu container.

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 1, 2020

There is no .julia dir, so I can't delete it

You can run that command in Julia whether or not the file/directory exists. In Julia, if you run the rm function with force = true, it won’t throw an error if the file/directory doesn’t exist.

SELinux…I don't know.

You can run the following commands in bash to see if SELinux is enabled:

getenforce
sestatus

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 1, 2020

This seems like a bug in one of the following:

  • Base.Filesystem.mkpath in Julia
  • Base.Filesystem.isdir in Julia
  • Base.Filesystem.stat in Julia
  • jl_stat in Julia
  • stat in libuv

@DilumAluthge
Copy link
Member

What happens if you go into a brand-new container, enter a Julia REPL, and run this:

julia> Base.stat("/root/.julia")

@DilumAluthge
Copy link
Member

Also, on one of the hosts that does have this error, can you download and extract Julia on the host (not inside udocker, not inside a container), open a Julia REPL, and try to reproduce this error?

That will help us figure out of the problem is with your host machine or with udocker.

@dr-br
Copy link
Author

dr-br commented Mar 1, 2020

Inside a brand new container:

julia> Base.stat("/root/.julia")
ERROR: IOError: stat: permission denied (EACCES) for file "/root/.julia"
Stacktrace:
 [1] stat(::String) at ./stat.jl:69
 [2] top-level scope at REPL[1]:1

On the host (all machines in reach), I do not get any errors. Everything works as expected.
SELinux is disabled on my Ubuntu machines and on the cluster RHEL 7.7.

Have there been substantial changes between julia 1.0.5 and the following versions with respect to directory creation/stat/security/rights management?

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 1, 2020

What about this: (inside a brand new container)

julia> run(`stat /root/.julia`)

I expect this to throw an error. The question is: will it throw an error because of stat: No such file or directory, or will it throw an error because of stat: Permission denied?

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 1, 2020

Have there been substantial changes between julia 1.0.5 and the following versions with respect to directory creation/stat/security/rights management?

I'm not sure.

Is Julia 1.0.5 the most recent version that works for you? Do any of the Julia 1.1.x versions work?

In particular, if you can confirm that Julia 1.0.5 works for you, but Julia 1.1.0 gives this error, then we can try to look at the diff between those two versions.

@DilumAluthge
Copy link
Member

Here's another thing to try. Inside a fresh container:

julia> Base.stat("/root/.foo")

julia> Base.stat("/root/foo")

@dr-br
Copy link
Author

dr-br commented Mar 1, 2020

julia> run(`stat /root/.julia`)
stat: cannot stat '/root/.julia': No such file or directory
ERROR: failed process: Process(`stat /root/.julia`, ProcessExited(1)) [1]

Stacktrace:
 [1] pipeline_error at ./process.jl:525 [inlined]
 [2] #run#565(::Bool, ::typeof(run), ::Cmd) at ./process.jl:440
 [3] run(::Cmd) at ./process.jl:438
 [4] top-level scope at REPL[5]:1
julia> Base.stat("/root/.foo")
ERROR: IOError: stat: permission denied (EACCES) for file "/root/.foo"
Stacktrace:
 [1] stat(::String) at ./stat.jl:69
 [2] top-level scope at REPL[6]:1

julia> Base.stat("/root/foo")
ERROR: IOError: stat: permission denied (EACCES) for file "/root/foo"
Stacktrace:
 [1] stat(::String) at ./stat.jl:69
 [2] top-level scope at REPL[7]:1

@dr-br
Copy link
Author

dr-br commented Mar 1, 2020

I tried different versions:

  • julia-1.0.5 OK
  • julia-1.1.1 OK
  • julia-1.2.0 OK
  • julia-1.3.1 ERROR
  • julia-ea067fb221 ERROR

@DilumAluthge
Copy link
Member

Can you try julia-1.3.0 also?

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 1, 2020

@Keno does the fact that

run(`stat /root/.julia`)

gives "no such file or directory" (the correct answer), but

Base.stat("/root/.julia")

gives "permission denied (EACCES)" suggest that this is a Julia bug or a libuv bug (rather than a problem with the system)?

@dr-br
Copy link
Author

dr-br commented Mar 1, 2020

  • julia-1.3.0 ERROR

@DilumAluthge
Copy link
Member

So, if you are able to build Julia from source inside the container, the best option will be to git bisect between 1.2.0 and 1.3.0 to figure out which commit introduces this bug.

@DilumAluthge
Copy link
Member

The script for the git bisect can probably be as simple as:

make
rm -rf /root/.julia
./julia -e 'Base.stat("/root/.julia")

@DilumAluthge
Copy link
Member

Pass that script to git bisect run with the tags v1.2.0 and v1.3.0 as the known good and bad commits, respectively. Then the bisect should automatically find which commit introduced this bug.

@Keno
Copy link
Member

Keno commented Mar 2, 2020

To add the reference, as I suspected there is no documented difference between stat and statx with respect to eaccess errors: http://man7.org/linux/man-pages/man2/statx.2.html. @dr-br could you show the output of mount inside the container, so we know what kind of car we're dealing with?

@dr-br
Copy link
Author

dr-br commented Mar 2, 2020

mount inside container:

root@myMachine:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=5988288k,nr_inodes=1497072,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=1206384k,mode=755)
/dev/mapper/vgubuntu-root on / type ext4 (rw,relatime,errors=remount-ro)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=44,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=20854)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
/var/lib/snapd/snaps/telegram-desktop_1234.snap on /snap/telegram-desktop/1234 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/core18_1668.snap on /snap/core18/1668 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/core_8689.snap on /snap/core/8689 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/gnome-logs_81.snap on /snap/gnome-logs/81 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/gnome-3-28-1804_116.snap on /snap/gnome-3-28-1804/116 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/gtk-common-themes_1440.snap on /snap/gtk-common-themes/1440 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/gnome-calculator_544.snap on /snap/gnome-calculator/544 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/blender_36.snap on /snap/blender/36 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/skype_112.snap on /snap/skype/112 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/telegram-desktop_1244.snap on /snap/telegram-desktop/1244 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/chromium_1028.snap on /snap/chromium/1028 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/code-insiders_375.snap on /snap/code-insiders/375 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/gnome-characters_399.snap on /snap/gnome-characters/399 type squashfs (ro,nodev,relatime)
/var/lib/snapd/snaps/signal-desktop_299.snap on /snap/signal-desktop/299 type squashfs (ro,nodev,relatime)
/dev/sda2 on /boot type ext4 (rw,relatime)
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
me@otherMachine: on /mnt/lsdf type fuse.sshfs (rw,relatime,user_id=0,group_id=0,allow_other)
/var/lib/snapd/snaps/code-insiders_376.snap on /snap/code-insiders/376 type squashfs (ro,nodev,relatime)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=1206380k,mode=700,uid=1000,gid=1000)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)

@Keno
Copy link
Member

Keno commented Mar 2, 2020

That's inside the container? Looks more like a host system. I wonder if it's picking up the outside world instead.

@dr-br
Copy link
Author

dr-br commented Mar 2, 2020

To recall this:

Update:
On Red Hat Enterprise Linux Server release 7.7 host everything works as expected with ubuntu container.

The problems seem to be the combination Ubuntu host + Udocker
So I'm not sure, if further digging into this problem is worth it. However, it is strange that julia 1.2.0 and older work.

@Keno: Yes, udocker is not really what you expect, if you know docker or podman.

@DilumAluthge
Copy link
Member

However, it is strange that julia 1.2.0 and older work.

In Julia 1.2.0 and older, the version of libuv bundled with Julia did not use statx. They used stat and/or lstat instead.

Starting with Julia 1.3.0, the version of libuv bundled with Julia uses statx.

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 2, 2020

@dr-br So CentOS host and Ubuntu container is fine. But Ubuntu host and Ubuntu container has the problem.

I wonder: could you try Ubuntu host and CentOS container, and CentOS host and CentOS container?

I am curious: does this bug happen whenever the host Linux distro is the same as the container Linux distro? Or does it only happen for the very specific Ubuntu-Ubuntu combination?

@dr-br
Copy link
Author

dr-br commented Mar 2, 2020

So I just built 1.5.0-DEV on our RHEL 7.7 Xeon Gold 6230 cluster inside an Ubuntu 18.04 udocker container. Works as expected, no errors.
Also the binary packages (1.3.1, ...) work.

@DilumAluthge: I already tried Ubuntu host and CentOS container, it gave the same errors.

@Keno
Copy link
Member

Keno commented Mar 2, 2020

Might be a kernel bug in the Ubuntu host kernel

@dr-br
Copy link
Author

dr-br commented Mar 2, 2020

@Keno: I would not dare to blame the Ubuntu host kernel. I mean, udocker does so many magic tricks. I would rather think of udocker not being well enough tested on "consumer-OSses".

@Keno
Copy link
Member

Keno commented Mar 2, 2020

Meh, we find kernel bugs about once or twice a month around these parts ;), but yes at this point this might be something to take up with the udocker developers.

@DilumAluthge
Copy link
Member

It would be nice to have a MWE. Maybe a simple C program that makes both of the syscalls (stat versus statx).

@DilumAluthge
Copy link
Member

@dr-br I notice that you have this in your workflow:

export PROOT_NO_SECCOMP=1

I wonder if that is relevant.

@DilumAluthge
Copy link
Member

I also wonder if this issue is relevant: https://bugs.launchpad.net/ubuntu/+source/docker.io/+bug/1755250

@dr-br
Copy link
Author

dr-br commented Mar 2, 2020

@dr-br I notice that you have this in your workflow:

export PROOT_NO_SECCOMP=1

I wonder if that is relevant.

Very likely. On the RHEL machines this is not necessary, only on the Ubuntu hosts.

@DilumAluthge
Copy link
Member

What is the latest version of Ubuntu that you have access to?

@dr-br
Copy link
Author

dr-br commented Mar 2, 2020

It seems to be statx inside container.
I compiled 2 example programs, one uses stat, the other statx.
On the host, they run fine, inside the container, only stat succeeds. Even on the RHLE system, statx fails

./statx-example . 
statx(.) = -1
.: Function not implemented

The Ubuntu versions I ran all the above tests are 19.10 and 20.04. They behave the same.

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 2, 2020

@Keno Is there any chance that in the Julia-specific fork of libuv, we could stop using statx?

Alternatively, we could ask upstream libuv to stop using statx, i.e. revert libuv/libuv#2184

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 2, 2020

This is not unique to udocker. For example, this issue: docker/for-linux#208

statx syscalls are only allowed in privileged containers

@dr-br
Copy link
Author

dr-br commented Mar 2, 2020

Funny thing: I don't even get the statx example compiled on the RHEL host, as there is currently a 3.10 kernel running ;)

@DilumAluthge
Copy link
Member

I think that JuliaLang/libuv#7 will fix this.

@dr-br
Copy link
Author

dr-br commented Mar 2, 2020

I think that JuliaLang/libuv#7 will fix this.

After a discussion with my colleague: Is it possible, that libuv already has a fallback, if statx ist not available? How else could julia run on a RHEL 7 host with 3.10 kernel?
On an Ubuntu host, julia/libuv inside the container detects, that statx is available, but udocker does not support this?

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 3, 2020

Is it possible, that libuv already has a fallback, if statx ist not available?

If your kernel is old and does not have statx, then it will return ENOSYS. If libuv detects that statx returned the ENOSYS errno, it will fall back to stat. You can see this code here:

As you can see in the code, the fallback only applies when the errno returned by statx is the ENOSYS errno.

If seccomp blocks your call to statx, then it will return some errno. The specific value of that errno is user-defined. If the errno is the ENOSYS errono, then libuv will fall back to stat, as I described above. However, if the errno is not the ENOSYS errno, then libuv will return the errno, i.e. there is no fallback to stat in that case.

@Keno
Copy link
Member

Keno commented Mar 3, 2020

The errno returned by seccomp is user defined. It not being enosys or something sensible is a bug in udocker or one of its dependencies.

@DilumAluthge
Copy link
Member

The errno returned by seccomp is user defined. It not being enosys or something sensible is a bug in udocker or one of its dependencies.

I’ve corrected my answer.

@DilumAluthge
Copy link
Member

It seems to be statx inside container.
I compiled 2 example programs, one uses stat, the other statx.
On the host, they run fine, inside the container, only stat succeeds. Even on the RHLE system, statx fails

./statx-example . 
statx(.) = -1
.: Function not implemented

The Ubuntu versions I ran all the above tests are 19.10 and 20.04. They behave the same.

@dr-br Can you post an issue on the udocker repo (https://github.com/indigo-dc/udocker) and include the code for your example programs? And cc me in the issue? Hopefully that will help us get things moving on the udocker end.

@Keno
Copy link
Member

Keno commented Mar 3, 2020

Since this turned out not to be a julia issue, I'm gonna go ahead and close this. Discussion can continue here of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants