Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout waiting for log entry "containerrootrc=ready" #196

Closed
phil294 opened this issue Nov 20, 2019 · 18 comments
Closed

Timeout waiting for log entry "containerrootrc=ready" #196

phil294 opened this issue Nov 20, 2019 · 18 comments
Labels

Comments

@phil294
Copy link

phil294 commented Nov 20, 2019

Hi! Not really sure whats happening here. Happens with either of xephyr, xpra, nxagent and hostidsplay. As if the docker image doesnt even get started. Here is a verbose log output upload: https://waritschlager.de/share/922650f43f77a6fe.txt

Seems to have broken with a x11docker update of last few weeks/months, as I did not have any such problems beforehand. I can cycle through recent commits to find the culprit tomorrow.

until then, LG

@mviereck mviereck added the bug label Nov 20, 2019
@mviereck
Copy link
Owner

Thank you for the bug report!

I found this error message:

/tmp/containerrootrc: 11: /tmp/containerrootrc: cannot create /x11docker/container.log: Permission denied

It is this line in containerrootrc:

exec 1>>/x11docker/container.log 2>&1

It only serves to redirect some log output. I'll think about a possible alternative.
The error message is confusing because the file exists for sure.

@phil294
Copy link
Author

phil294 commented Nov 20, 2019

I got around to check the commits already:
bcb791c is when the issue arose. The log says x11docker[14:18:12,517]: Waiting since 2s for /x11docker/containerrootrc.ready to have content, will wait up to 32000 seconds.. Do you need any more verbose logs from around this commit?

Part of the problem (seeing that this issue hasnt been opened yet) may be that I set the permissions on ~/.local and most other folders in $HOME to drwx------ 700. I think it makes the most sense as the contents of this folder arent any other user's business, be it system or human. This is what had had me open another issue once already (#131), where you fixed it by leveraging a /tmp file.

I also found that running with --cap-default fixes it. If you decide this is a wont-fix, I could personally live with using this flag, but I assume a proper error message would be great, at least.

In the end, I am not sure what exactly is causing this but I hope this will help you. Cheers

@phil294
Copy link
Author

phil294 commented Nov 20, 2019

It also did not work after a chmod -R o+rX ~/.cache, however one warning came up during this: chmod: changing permissions of '.cache/x11docker/x11docker-xfce-xfce4-terminal-55668200901/share/tini': Operation not permitted.

This is what the respective share folder looks like:

-rw-r--r-- 1 phi  phi   3054 Nov 20 14:14 container.CMD.sh
-rw-r--r-- 1 phi  phi   4412 Nov 20 14:14 containerrootrc
-rw-r--r-- 1 phi  phi      4 Nov 20 14:14 container.user
-rw-rw-rw- 1 phi  phi    293 Nov 20 14:14 environment
-rw-r--r-- 1 phi  phi      2 Nov 20 14:14 exitcode
-rw-r--r-- 1 phi  phi      0 Nov 20 14:14 journalctl.log
prw-rw-rw- 1 phi  phi      0 Nov 20 14:14 message.fifo
-rw-rw-rw- 1 phi  phi    704 Nov 20 14:14 stderr
-rw-rw-rw- 1 phi  phi      0 Nov 20 14:14 stdout
-rw-r--r-- 1 phi  phi     51 Nov 20 14:14 timetosaygoodbye
prw-rw-r-- 1 phi  phi      0 Nov 20 14:14 timetosaygoodbye.fifo
-rwxr-xr-x 1 root root     0 Nov 20 14:14 tini
-rw-r--r-- 1 phi  phi  69315 Nov 20 14:14 x11docker.log
-rw-r--r-- 1 phi  phi    104 Nov 20 14:14 Xclientcookie

which looks fine I think (?)

mviereck added a commit that referenced this issue Nov 20, 2019
@mviereck
Copy link
Owner

mviereck commented Nov 20, 2019

Thanks for your investigation!

Part of the problem (seeing that this issue hasnt been opened yet) may be that I set the permissions on ~/.local and most other folders in $HOME to drwx------ 700. I think it makes the most sense as the contents of this folder arent any other user's business, be it system or human. This is what had had me open another issue once already (#131), where you fixed it by leveraging a /tmp file.

Ah, yes! I already thought that this bug looks somehow familar. So I reintroduced it.

As a first fix that maybe helps I've changed the file permission of x11docker.log to 666. I doubt that this is enough, but it would be nice if you try it out.

I also found that running with --cap-default fixes it.

The point is that x11docker runs the container with docker option --cap-drop=ALL. That disallows a lot of root privileges. But containerrootrc tries to access x11docker.log as root in container and cannot supersede your 700 settings on host.

Adding the capability to supersede access permissions should solve the issue (compare man capabilities):

x11docker -- --cap-add DAC_OVERRIDE -- x11docker/xfce xfce4-terminal

I am currently not sure what would be the best solution.

  • Always adding capability DAC_OVERRIDE regardless of the setup. Easy, but a security impact on systems that would not need it.
  • Somehow checking for 700 and only adding DAC_OVERRIDE in that case. Adds some complexity in the code.
  • Avoiding container root access on host files. Some work, but probably the cleanest solution.

chmod -R o+rX ~/.cache

Sorry for asking, but what exactly does that do?

@phil294
Copy link
Author

phil294 commented Nov 20, 2019

I tried using the latest commit, still no change.

Nope, DAC_OVERRIDE on its own does not cut it, only cap-default.

I am not very sure on to the details of this. But option 1 and 2 do not sound quite elegant. And it would make sense to prevent root files in the host environment. Those are always a bummer when it comes to archives, searches, deletions etc. Like when you want to backup your home folder but get errors somewhere deep nested inside .local/share/x11docker, as it contains root files. Option 3 makes the most sense to me, but this is only intuition.

chmod -R o+rX ~/.cache
Sorry for asking, but what exactly does that do?

No worries – it means recursively giving read access (+4) to "other" users (not owner or group) to everthing inside .cache. X stands for folder execution rights (+1), so they will be able to list folder contents, but it does not apply to files (so no scripts can be run etc). + means "adding permissions", so if any were present beforehand, those are combined. To set them, use =.

@phil294
Copy link
Author

phil294 commented Nov 21, 2019

Ok wait, the recent commit might have solved this (but everything is kinda laggy now). Just so you dont spend too much unnecessary digging here. I'll get back to you later.

@mviereck
Copy link
Owner

Ok wait, the recent commit might have solved this (but everything is kinda laggy now).

Good! Maybe there is another timeout now. Please give me a fresh logfile and I'll look through it.

And it would make sense to prevent root files in the host environment. Those are always a bummer when it comes to archives, searches, deletions etc.

x11docker already avoids root files in HOME. root only appends lines to existing files. That does not change the file ownership. But this seems to be an issue with 700 for ~/.cache


I am surprised why you have a root-owned tini in the share folder:

-rwxr-xr-x 1 root root     0 Nov 20 14:14 tini

It should not be there. It is shared with the container with:

  --volume '/usr/bin/docker-init':'/usr/local/bin/tini':ro \

In my own x11docker cache tini does not appear.

@phil294
Copy link
Author

phil294 commented Nov 21, 2019

With latest commit:

  • Started with--cap-default from command line: Works fine*
  • Started normally: Does not start (described timeout)
  • Started from a noninteractive shell, with no special options (no cap-default), using a xfce hotkey shortcut on the host system: Works fine*. I did not expect that. It is the opposite behavior of a past issue Fails when run via xfce shortcuts #177 where it was the shortcut action that failed. - I checked the user from such a shortcut job, it is also the normal one, "phi". Maybe a tty issue again?
  • Started as above via xfce hotkey, but with --cap-default: Same behavior

With previous commit, before your latest fix (at 25b8034), everything seems the same as above. So nothing changed.

Here's a new verbose log file as requested, from latest commit's ./x11docker -v --xephyr x11docker/xfce xfce4-terminal, run from terminal: https://waritschlager.de/share/f43f639000f6f3c6.txt
Nothing seems to have changed much

* but is slow, like really laggy, vscode editor is pretty much unusable. But somehow now also older versions of x11docker are as slow, when they were smooth a few days ago(?!) I will investigate this further, you may consider those cases working perfectly for now

@phil294
Copy link
Author

phil294 commented Nov 21, 2019

I am surprised why you have a root-owned tini in the share folder

I thought I'd try to reproduce it by deleting the respective cache folder. But now when I run it (with cap-default), no folder is recreated inside .cache/x11docker anymore, so I cannot give you any more details.

@phil294
Copy link
Author

phil294 commented Nov 21, 2019

Okay, regarding the lagginess: This might not be related to this issue at all, but I'll post the info here anyway:

The lags got introduced with 5a35b81. Note that this is an earlier one than the one that broke everything described above. I only encounter those lags with Xephyr. The application I tested this with is xfce, with vscode as the application running inside. I dont know what is so special about it, but thunar for instance was not lagging. But maybe it's just not as resource intensive. Please tell me if you want more info here.

While digging, I also realized that nxagent and hostdisplay seem to behave more fluent than xephyr and xpra. I dont know how nxagent works, but with hostdisplay it makes sense as no key presses are proxied.

mviereck added a commit that referenced this issue Nov 21, 2019
@mviereck
Copy link
Owner

mviereck commented Nov 21, 2019

Thank you for your detailed investigation!

The lags got introduced with 5a35b81. Note that this is an earlier one than the one that broke everything described above. I only encounter those lags with Xephyr. The application I tested this with is xfce, with vscode as the application running inside. I dont know what is so special about it, but thunar for instance was not lagging.

Finally an issue that I can fix easily! In the commit you found I've enabled Xephyr option -glamor. From Xephyr -help:

-glamor              Enable 2D acceleration using glamor

glamor should help to speed up some things, but appearently it can be problematic. I've disabled it yet. --xephyr should not be laggy anymore.

While digging, I also realized that nxagent and hostdisplay seem to behave more fluent than xephyr and xpra

--hostdisplay is the fastest option because no additional X server is involved. Unfortunately it costs some container isolation. A malicious application could access your host system.
--xpra is the slowest option. But it is a preferred default of x11docker because it provides some nice features, e.g. graphical clipboard support.
Furthermore it is the only seamless solution for --gpu beside the insecure --hostdisplay.
However, if security/container isolation is not a concern, --hostdisplay --gpu is the fastest setup with the lowest overhead.

I thought I'd try to reproduce it by deleting the respective cache folder. But now when I run it (with cap-default), no folder is recreated inside .cache/x11docker anymore, so I cannot give you any more details.

The cache folder only exists while the container is running. If you don't have a cache folder while x11docker is running, something very basical goes wrong.
Your log shows valid pathes:

 /home/phi/.cache/x11docker/x11docker-xfce-xfce4-terminal-44151347946

I just tried to reproduce the 700 issue with chmod -R 700 ~/.cache/x11docker.
Surprisingly I have no issues at all and cannot reproduce your issue. x11docker just starts up well. I'll look closer how to reproduce it.


Started from a noninteractive shell, with no special options (no cap-default), using a xfce hotkey shortcut on the host system: Works fine*. I did not expect that. It is the opposite behavior of a past issue #177 where it was the shortcut action that failed. - I checked the user from such a shortcut job, it is also the normal one, "phi". Maybe a tty issue again?

That's odd. I doubt that it is a tty issue again. I would see that in the log.
Maybe you have two x11docker on your system, e.g. one in a cloned git folder and one in /usr/bin?


Here's a new verbose log file as requested, from latest commit's ./x11docker -v --xephyr x11docker/xfce xfce4-terminal, run from terminal: https://waritschlager.de/share/f43f639000f6f3c6.txt
Nothing seems to have changed much

The log shows version 6.4.0, but the master version with recent commits should show:

$ x11docker --version
6.4.1-beta

@phil294
Copy link
Author

phil294 commented Nov 23, 2019

Huh, sorry - I dont know how the version mismatch happened. I redid it with the latest commit from yesterday, for sure this time: https://waritschlager.de/share/32492a8420106529.txt

That's odd. I doubt that it is a tty issue again. I would see that in the log.
Maybe you have two x11docker on your system, e.g. one in a cloned git folder and one in /usr/bin?

No, this is not the case. xfce shortcut and interactive bash definitely behave differently (one working, the other not). I removed all the times and pids from it with some wild regex and skipped display numbers etc. and below are the notable log output differences when run as xfce shortcut. As expected, the only real difference seems to be that the container.log permission error is gone.

8a9
> DEBUGNOTE: check_host(): Command tty failed. Guess if running on console: no
45a47
> DEBUGNOTE: check_host(): Command tty failed. Guess if running on console: no
124c126
<   Running in a terminal:                          yes
---
>   Running in a terminal:                          no
371,377d372
< grep -x -q 'x11docker/xfce' < /home/phi/.cache/x11docker/docker.imagelist || grep -x -q 'x11docker/xfce:latest' < /home/phi/.cache/x11docker/docker.imagelist || {
<   docker inspect x11docker/xfce >>/home/phi/.cache/x11docker/x11docker-xfce-/share/container.log 2>&1 || {
<     echo 'Image x11docker/xfce not found locally.' >&2
<     echo 'Do you want to pull it from docker hub?' >&2
<     askyesno && Dockerpull=yes || error "Image 'x11docker/xfce' not available locally and not pulled from docker hub."
<   }
< }
382a378
>   env DISPLAY=':0.0' DBUS_SESSION_BUS_ADDRESS='unix:path=/run/user/1000/bus' bash           -c "notify-send 'x11docker: Pulling image x11docker/xfce from docker hub'" 2>/dev/null &
1475c1471,1495
< /tmp/containerrootrc: 11: /tmp/containerrootrc: cannot create /x11docker/container.log: Permission denied
---
> mkdir: created directory '/var/run/dbus'
> mkdir: created directory '/tmp/.ICE-unix'
> mkdir: created directory '/tmp/.X11-unix'
> mkdir: created directory '/tmp/.font-unix'
> srwxrwxrwx 1 1000 1001 0 Nov 23 08:10 /X113
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/message.log <==
> DEBUGNOTE: Running containerrootrc: Setup as root in container
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/share/container.log <==
> lrwxrwxrwx 1 root root 5 Nov 23 08:10 /tmp -> /X113
> mkdir: created directory '/fakehome'
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/message.log <==
> DEBUGNOTE: containerrootrc: Container libc: glibc
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/share/container.log <==
> removed '/etc/shadow'
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/message.log <==
> x11docker: Container system ID: debian
> 
> 
> ==> /home/phi/.cache/x11docker/x11docker-xfce-/share/container.log <==
> chown: changing ownership of '/tmp/chowntestfile': Operation not permitted
1551d1572
< DEBUGNOTE: waitforlogentry(): tailstderr: Found log entry "x11docker=ready" in store.info.
1553,1592c1574,1586
< DEBUGNOTE: waitforlogentry(): containerrc: Waiting since 11s for log entry "containerrootrc=ready" in store.info
< DEBUGNOTE: waitforlogentry(): containerrc: Waiting since 12s for log entry "containerrootrc=ready" in store.info
...
---
> DEBUGNOTE: waitforlogentry(): tailstderr: Found log entry "x11docker=ready" in store.info.
> DEBUGNOTE: waitforlogentry(): containerrc: Found log entry "containerrootrc=ready" in store.info.

So it fails here: https://github.com/mviereck/x11docker/blob/master/x11docker#L4282. $(convertpath share $Containerlogfile), which resolves to /x11docker/container.log isnt accessible, because /x11docker itself cannot be traversed into. I put an ls -l / at that position and here is the output:

total 16
srwxrwxrwx   1 1000 1001    0 Nov 23 08:57 X120
drwxr-xr-x   2 root root 4096 Jul 14 08:49 bin
drwxr-xr-x   2 root root    6 May 13  2019 boot
drwxr-xr-x   5 root root  360 Nov 23 08:57 dev
drwxr-xr-x  40 root root 4096 Nov 23 08:57 etc
drwxr-xr-x   2 root root    6 May 13  2019 home
drwxr-xr-x   8 root root  107 Jul 14 08:49 lib
drwxr-xr-x   2 root root   34 Jul  8 03:30 lib64
drwxr-xr-x   2 root root    6 Jul  8 03:30 media
drwxr-xr-x   2 root root    6 Jul  8 03:30 mnt
drwxr-xr-x   2 root root    6 Jul  8 03:30 opt
dr-xr-xr-x 374 root root    0 Nov 23 08:57 proc
drwx------   2 root root   37 Jul  8 03:30 root
drwxr-xr-x   3 root root   60 Nov 23 08:57 run
drwxr-xr-x   2 root root 4096 Jul  8 03:30 sbin
drwxr-xr-x   2 root root    6 Jul  8 03:30 srv
dr-xr-xr-x  13 root root    0 Nov 23 08:57 sys
drwxrwxrwt   2 root root   29 Nov 23 08:57 tmp
drwxr-xr-x  10 root root  105 Jul  8 03:30 usr
drwxr-xr-x  11 root root  139 Jul  8 03:30 var
drwxrwx---   2 1000 1001 4096 Nov 23 08:57 x11docker
/tmp
ls: cannot access '/x11docker/container.log': Permission denied

and $PWD is /tmp and $USER is phi and $UID is empty and id says uid=0(root) gid=0(root) groups=0(root) and cat /etc/passwd also does not contain phi. So the user $USER doesnt exist..?!

On my host system, the the $UID is (as on most systems) 1000.

Thanks for removing the glamor option! Everything is smooth again.

The cache folder only exists while the container is running.

Oh, huh. The cache folders I described above were present without any container running. So I guess they were leftovers from failed run attempts. Should not matter, this doesnt happen anymore right now.

@mviereck
Copy link
Owner

No, this is not the case. xfce shortcut and interactive bash definitely behave differently (one working, the other not).

That's really odd. I have no good idea why there is a difference. That also indicates that 700 is not the core issue. As been said, I cannot reproduce the issue if I set my own cache to 700.
The only idea I have yet is some sort of >>redirection issue. But I would not know why it only happens in terminal, but not with a shortcut.

  echo "exec 1>>$(convertpath share $Containerlogfile) 2>&1"

Could you try to just disable this line? It only serves to redirect some output into the logfile, so it would not hurt essentially. Though, ls fails, too:

ls: cannot access '/x11docker/container.log': Permission denied

$PWD is /tmp and $USER is phi and $UID is empty and id says uid=0(root) gid=0(root) groups=0(root) and cat /etc/passwd also does not contain phi. So the user $USER doesnt exist..?!

The entries in /etc/passwd and etc/group are done shortly after that in containerrootrc.

The cache folders I described above were present without any container running. So I guess they were leftovers from failed run attempts. Should not matter, this doesnt happen anymore right now.

I also get those leftover folders. It seems x11docker does not get enough time to clean up if I shut down the system while x11docker is running. You can use sudo x11docker --cleanup to remove all leftovers (and currently running x11docker sessions).

@mviereck
Copy link
Owner

I did a test in an old Manjarao VM.
I've set ~/.cache/x11docker to 700. It works well.

I could not update Manjaro due to some package conflicts. But I assume that would not change anything.

So i cannot reproduce your issue and have no idea why x11docker fails on your system in terminal only.

Would it be ok for you to just use --cap-default and close the ticket?
Though, if you have an idea and want to investigate further, I am happy to look at, too.

@phil294
Copy link
Author

phil294 commented Nov 26, 2019

Sure. I'll get back to you when I come accross a meaningful hint. Thank you for the help!

@mviereck
Copy link
Owner

mviereck commented Dec 9, 2019

The latest commit runs containerrootrc with flag --privileged. You should not need --cap-default anymore.
x11docker's root setup in container now has no restrictions. This allow to drop --cap-default and the container command will run without privileges at all.
It would be nice if you try this out.

@phil294
Copy link
Author

phil294 commented Dec 9, 2019

yup, works now out of the box :-)
good job

@mviereck
Copy link
Owner

mviereck commented Dec 9, 2019

Great! :-)

Finally solved although we did not find the very special point where it previously failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants