-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binfmts_misc: Too many symlinks #142
Comments
This appears to be a problem specific to Ubuntu distributions that results in either binfmt_misc not being mounted properly, or else the mount breaking somehow when systemd starts under Ubuntu. Unfortunately, I don't have much more than that to go on, as there doesn't appear to be anything obviously wrong with the startup sequence. I do, however, have a workaround. Simply remounting binfmt_misc over the top of the existing mount with:
restores access. If this works for you, you can automate it by adding:
to your /etc/rc.local file, creating it if it does not exist. |
I am using archwsl and meet the same problem. This workaround worked for me. To make
And replace ExecStart part with:
|
@jitingcn I like your solution as it utilizes preferred method of editing systemd configs by overriding defaults using drop-in files and also it does not require to enable running rc.local on boot (which IS possible on Arch using rc-local package from AUR, except it is considered depreciated…). Thanks. However, I feel that it would be a little more elegant to add the additional mount command in separate ExecStartPre part of the Unit rather than putting it into ExecStart, which can be done by adding below line to [Service] section: |
I have investigated deeper into the problem. So basically, WSL seems to mount and use /proc/sys/fs/binfmt_misc for providing windows interop. It even registers a WSLInterop rule for handling Windows executables with /init. This seems to be wsl in-built mechanism which is executed arbitrarily sometime on the wsl initialization. But from inside the genie (assuming we have masked all systemd binfmt related stuff and not touch the mountpoint at all) there’s something wrong. The mountpoint is listed in the output of commands like This is the reason of systemd units’ failures. proc-sys-fs-binfmt_misc.mount unit, which os the unit responsible for mounting binfmt_misc, reports that it has executed succefully, but in fact it does not perform any action, because it detects binfmt_misc as already mounted. Then, systemd-binfmt.service performs some magic over the „succefully mounted” binfmt_misc - I don’t know the details but it looks like some kind of overlay built basing on the current binfmt_misc contents and some additions, mounted in the same location.The overlay created upon empty dir is corrupted and any attempt of accessing its content result in ‚too many symbolic links level’ error. so, root cause of the failure lies in the fact that from inside of the genie, the default wsl binfmt_misc is reported as mounted, but it behaves like it wasn’t when accessed. In fact, all that is have to done to fix it is to unmount binfmt_misc mounted by WSL itself (or prevent wsl from mounting it at all if possible) before initializing the genie (from outside of it). This should be enought to systemd units related to binfmt start succefully with no manual ingerention and working as expected. Please note that binfmt created by systemd is not wsl-specific, thus it is not preconfigured to support windows executables like the one we unmounted. But we can easily bring the Windows interoperability back by adding config manually, which is done by creating e.g. /etc/binfmt.d/99-WSLInterop.conf with below contents: At last I have qemu-user-static + windows interoperability working + full systemd working under wsl at once without error and manual editions in systemd units, |
@esgie Thanks muchly for you work on solving this one, too. I've added your fix to genie 1.43. |
Please take note on below findings. There is some additional stuff needed to be handled to avoid losing flexibility. But what I have discovered is that after mounting binfmt_misc again using ‘sudo mount -t binfmt_misc none /proc/sys/fs/binfmt_misc' from outside the container, it is not only mounted succefully, but in addition its get populated with the format definitions registered in systemd session inside the container. meanwhile, it seems not to break or interfere with stuff already running inside the container. I was also worried about the systemd container shutdown and how does it will affect outside sessions. But thankfully, at this point, closing the container / stopping units responsible for binfmt does not break binfmt_misc for sessions run outside the genie. The only effect I've noticed is that all registered formats disappear (which is understood as systemd-binfmt unregisters all the stuff on stop), including WSLInterop. Therefore, we have to fix things again by echoeing the And in case we want to init systemd again, we unmount the binfmt_misc again, do the init, mount again etc. So to sum up, if we want not to break things for sessions ran outside of genie, the algorithm of initializing the genie would be something like one below (assuming we are starting WSL so we are outside the genie and it is not running; also I assume that WSLInterop.conf has been installed to /etc/binfmt.d or other supported location, I guess the one in /usr/local/share or something will be more suitable if you plan to include it in the package) so interop will work under systemd. so the steps of initializing the genie are:
The algorithm of closing the container:
That is, I think that the Microsoft's binfmt_misc implementation is somewhat non-standard and involves some interaction with the underlying system-distro I guess, which result in some strange stuff going on when combined with yet another contenerization. I have to say I do not completely get the relations between all the elements and they seem to behave quite strangely sometimes. For example, original WSLInterop rule refers to interpreter location named /tools/init and by default when running mount, you can see that /init is a mountpoint created from some weird instance named 'tools'. I guess it is the custom Microsoft's init, which has inbuilt feature of handling windows exes. However after we start the genie, /init available from inside the container is totally different. In fact, it must be a different - it must be systemd enabled /init in fact, running as pid 1, as this is the fundamental requirement of systemd to run... The custom microsoft /init does not seem to be available inside the container as it is overriden by its own one. So why Windows interoperability works fine inside the container after I register /init as an interpreter for Windows exes even, if /init which was able to handle them has been obpverridden by the one, which shouldn't - I don't know. However, I found above steps working nicely providing as stable and flexible environment of utilizing binfmt_misc as possible without breaking things for longer than it's needed. But there are still things to understand. I am sorry for not providing you with any code, but I am rather poor programmer ;( please also note that same problem, e.g. systemd breaking stuff for outside sessions and vice versa, applies to systemd-resolved issue as well, as each type of session in order dns to work require different resolv.conf, so creating systemd one will result in non-working dns outside and reverting the default one will break it for systemd. |
Hi, |
Interesting. I can't repro this behavior; I have a WSL/genie/systemd session with four days of uptime on it right now and it still interops just fine. But the weird thing, so far as I'm concerned, is the apparent two-init issue you're seeing. Running systemd as init (pid 1) inside the container shouldn't do anything to the /init file; it's invoked directly as /usr/lib/systemd/systemd, as you can see if you look at /proc/(systemdpid)/exe from outside the bottle. /init shouldn't be touched by anything genie does, and so far as I know, neither systemd nor any other init feels the need to rewrite itself. And indeed when I check in various ways for differences between inside and outside the bottle, the mount is the same:
the inode number (
and when I use
...so my next step, here, has to be to ask you if you can run those same steps on your system when/if interop stops working, and let me have the results. And if those show that /init actually has become a different file, I guess we'll need to find out how and when? |
I’ll do more testing then, but btw I am still facing the issue, although now it happens in a “all or nothing” manner, e.g. windows executables work stable - or do not work at all right from the start. I have created my own custom systemd unit which directly executes window’s exe’s (netsh.exe via cmd.exe) on start/stop in order to open/close Windows firewall and forward ports on the host machine on demand, directly via Linux service, so it is easy for me to observe the failure as it causes genie to timeout on start and report failures, otherwise it triggers smoothly. ... Or maybe it is my service that falls in some kind of race condition with initializing ststend? It’s configured (via unit itself) to run After systemd-binfmt is started but maybe there’s conflict elsewhere? I must check that
I am using Arch as my main distro and what I also forgot is that I am using Arch-based wsl kernel, not the MS’s one (that Arch one comes with the headers and include modules support - please note that systemd-modules-load and modprobe@ units (which are generally recommended to be masked under wsl) are working FINE and dkms stuff is compiling fine aswell under genie!). The kernel has some issues with shutting down genie (it shuts down fine but then i am receiving CLR errors on any action taken under WSL and i am forced to shutdown it) and now I am wondering if my kernel isn’t the case.
I’d test on more stock config, like Ubuntu and MS kernel, and also try to check it from the inode point of view and be back.
Pobierz aplikację Outlook dla systemu iOS<https://aka.ms/o0ukef>
…________________________________
Od: Alistair Young ***@***.***>
Wysłane: Wednesday, June 23, 2021 4:53:02 PM
Do: arkane-systems/genie ***@***.***>
DW: Sambor Gogacz ***@***.***>; Mention ***@***.***>
Temat: Re: [arkane-systems/genie] binfmts_misc: Too many symlinks (#142)
Interesting. I can't repro this behavior; I have a WSL/genie/systemd session with four days of uptime on it right now and it still interops just fine.
But the weird thing, so far as I'm concerned, is the apparent two-init issue you're seeing. Running systemd as init (pid 1) inside the container shouldn't do anything to the /init file; it's invoked directly as /usr/lib/systemd/systemd, as you can see if you look at /proc//exe from outside the bottle. /init shouldn't be touched by anything genie does, and so far as I know, neither systemd nor any other init feels the need to rewrite itself.
And indeed when I check in various ways for differences between inside and outside the bottle, the mount is the same:
tools on /init type 9p (ro,relatime,dirsync,aname=tools;fmask=022,loose,access=client,msize=65536,trans=fd,rfd=7,wfd=7)
the inode number (ls -i) is the same
8162774325450668 init
and when I use md5sum on them, they also come out with the identical checksum.
❯ md5sum /init
0258dfe90ae79a649a5c0d6aac80bbf9 /init
...so my next step, here, has to be to ask you if you can run those same steps on your system when/if interop stops working, and let me have the results. And if those show that /init actually has become a different file, I guess we'll need to find out how and when?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Farkane-systems%2Fgenie%2Fissues%2F142%23issuecomment-866907801&data=04%7C01%7C%7C53ba0c119f394359c05908d936569a2a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637600567841975293%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=r%2FpS7tC1npCk4CtmPzFJUH%2FXUCMh4bun%2B1KhwfTqpWg%3D&reserved=0>, or unsubscribe<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABUKZKBORJOEVMXZZWBU523TUHYM5ANCNFSM43ATDOUQ&data=04%7C01%7C%7C53ba0c119f394359c05908d936569a2a%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637600567841975293%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=JgWC1E8aORZJ6adM7yQ4OEheeAzgXGldXl4Yp0BdOsE%3D&reserved=0>.
|
I'm using a custom kernel myself (although compiled using a reconfiguration of the Microsoft config, with modules & DKMS - those units are recommended-disabled just for people running the stock kernel), so FWIW, it seems unlikely to be that. Although, that said, I've never seen any of those other errors with my custom kernel, so. |
Windows version (build number):
10.0.21354.1
Linux distribution:
Ubuntu-20.04 from Store
Genie version:
1.36
Describe the bug
Attempting to access binfmts_misc gives me the error "Too many symlinks" this puts a damper on building foreign chroots while inside the bottle. I also attempted to access it after disabling the
proc-sys-fs-binfmt_misc.mount
andproc-sys-fs-binfmt_misc.automount
which allows me to mount binfmt_misc properly but only under root. If I attempt to mount it using sudo, the directory stays empty.If the bug involves
systemctl
or a service running under systemd, confirm that you are running inside the bottle:inside
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I should see the binfmt_misc directory structure and be able to write to the
register
object inside it.The text was updated successfully, but these errors were encountered: