Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacOS DNS regression in 1.5.0 and 1.5.1 #2811

Open
ryfow opened this issue Aug 23, 2022 · 14 comments
Open

MacOS DNS regression in 1.5.0 and 1.5.1 #2811

ryfow opened this issue Aug 23, 2022 · 14 comments
Assignees
Labels
area/dns kind/bug Something isn't working platform/macos regression Functionality was working in a previous release and is now broken triage/need-to-repro Needs to be reproduced by dev team
Milestone

Comments

@ryfow
Copy link

ryfow commented Aug 23, 2022

Actual Behavior

With Rancher Desktop 1.5.{0,1} on aarch64 MacOS, I'm seeing qemu-system-aarch64 hang for several minutes at a time in my development environment. The problem appears to be triggered by a process making bursty DNS requests for host.docker.internal. The same development environment works fine on Rancher Desktop 1.4.1.

Steps to Reproduce

This isn't how I found the problem, but I think it reproduces the same underlying issue.

  1. Install Rancher Desktop 1.5.1 and configure in Docker/moby mode.
  2. Run docker run --rm --name crashy-crashy -ti ubuntu:20.04 bash -c 'apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y dnsutils psmisc && while true ; do dig host.docker.internal ; done'
  3. Wait for the crashy-crashy container to start logging dig output
  4. Run the following command in another host terminal: docker exec -ti crashy-crashy bash -c "while true ; do killall dig ; sleep .1 ; done"
  5. Wait for a bit, you should see the dig output eventually stop, qemu-system-aarch64 will be running at 100% CPU on your MacOS host, and docker commands will no longer work.

Result

I'm seeing the Rancher Desktop qemu VM become unresponsive until I kill the qemu-system-aarch64 process and restart Rancher Desktop.

Expected Behavior

The Rancher Desktop VM should not hang.

Additional Information

No response

Rancher Desktop Version

1.5.1

Rancher Desktop K8s Version

N/A

Which container engine are you using?

moby (docker cli)

What operating system are you using?

macOS

Operating System / Build Version

MacOS Monterey 12.5.1

What CPU architecture are you using?

arm64 (Apple Silicon)

Linux only: what package format did you use to install Rancher Desktop?

N/A

Windows User Only

No response

@ryfow ryfow added the kind/bug Something isn't working label Aug 23, 2022
@gaktive gaktive added the triage/need-to-repro Needs to be reproduced by dev team label Aug 23, 2022
@ryfow
Copy link
Author

ryfow commented Aug 25, 2022

FWIW, I ran my reproduction steps on a second Macbook, and the see the same behavior.

@jandubois jandubois added this to the Next milestone Aug 26, 2022
@gaktive gaktive added regression Functionality was working in a previous release and is now broken priority/1 Work should be fixed for next release and removed priority/1 Work should be fixed for next release labels Sep 13, 2022
@gaktive gaktive modified the milestones: Next, Later Sep 19, 2022
@Nino-K
Copy link
Member

Nino-K commented Nov 15, 2022

FWIW, I ran my reproduction steps on a second Macbook, and they see the same behavior.

@ryfow is the second Macbook also a M1? or x86?

@jandubois do you think you can reproduce this on your M1 machine?

@matsukaz
Copy link

matsukaz commented Dec 19, 2022

Hi, I'm facing to a similar issue in 1.7.0.
It seems like Lima is stuck on the file descriptor limit, but I haven't found a way to solve it yet.
This issue also occurs on x86 macOS.

Steps to Reproduce

  1. Login to Lima and keep running nslookup.
$ rdctl shell
lima-rancher-desktop:/Users/xxx$ while true; do nslookup www.google.co.jp; done
  1. On host OS, show a list of UDP open files that qemu-system-aarch64 handles.
$ lsof -p $(pgrep qemu-system-aarch64) | grep "UDP"
qemu-syst 6788 xxxx  119u  IPv4 0x2c6ecf140850ff5f         0t0                 UDP *:63544
qemu-syst 6788 xxxx  120u  IPv4 0x2c6ecf140851762f         0t0                 UDP *:63398
  1. A number of UDP open files are keep increasing and after it reaches to FD=1024u, Lima get stuck.
$ lsof -p $(pgrep qemu-system-aarch64) | grep "UDP"
...
qemu-syst 6788 xxxx  1023u  IPv4 0x2c6ecf14085191bf         0t0                 UDP *:54486
qemu-syst 6788 xxxx  1024u  IPv4 0x2c6ecf140852088f         0t0                 UDP *:62934
  1. If you wait exactly 4 minutes, all UDP open files get released and Lima starts running again.

Rancher Desktop Version

1.4.1, 1.6.2, 1.7.0

Rancher Desktop K8s Version

N/A

Which container engine are you using?

moby (docker cli)

Operating System / Build Version / CPU

MacOS Monterey 12.6 (M1 2020)
MacOS Ventura 13.0.1 (Intel Core i5, 2019)

@matsukaz
Copy link

This Issue may be a problem about Alpine Linux.
I tried it with Lima and got the same problem, also with Debian, but not with Ubuntu.

I used the following images.

@Nino-K
Copy link
Member

Nino-K commented Jan 11, 2023

@ryfow the issue has been addressed here: lima-vm/lima#1285, therefore, it should be included in our upcoming release. Thank you again for reporting this.

@ryfow
Copy link
Author

ryfow commented Jan 11, 2023

Awesome! Looking forward to upgrading from 1.4.1 :)

@Nino-K
Copy link
Member

Nino-K commented Jan 12, 2023

I'm going to close this since all the changes are in place now, @ryfow and @matsukaz please keep your eyes on our next release and give it a try. Feel free to re-open if you encounter anything additional. Thanks

@ryfow
Copy link
Author

ryfow commented Mar 22, 2023

@Nino-K This appears to still be a problem with Rancher Desktop 1.8. I don't know for sure if the same thing is making my dev environment hang, but I think it's the most likely suspect.

Edit: I can't figure out how to reopen.

@jandubois jandubois reopened this Mar 22, 2023
@matsukaz
Copy link

@Nino-K
At least in my environment, this issue was resolved with Rancher Desktop 1.8!
I have not seen this issue since I upgraded to 1.8, even with the reproduction procedure I posted earlier.

@ryfow
I'm not sure but t maybe an another problem.

@ryfow
Copy link
Author

ryfow commented Mar 24, 2023

I tried my original reproduction steps with 1.8.1 on a work M1 Macbook and a personal M1 Macbook. It's hangs on both and puts qemu into 100% CPU usage.

@Nino-K
Copy link
Member

Nino-K commented Apr 3, 2023

@ryfow could your issue possibly be related to this one? lima-vm/lima#1333

@ryfow
Copy link
Author

ryfow commented Apr 14, 2023

@Nino-K I don't think it's lima-vm/lima#1333. That bug appears to be talking about Virtualization.Framework. Looks like Rancher Desktop uses qemu.

I tried to follow my reproduction steps on lima 0.15, qemu 7.2.1 and limactl start --name docker template:///docker. I couldn't reproduce, the hang did not happen.

The qemu version is different, so I tried copying my system version of qemu-system-aarch64 (7.2.1) into the "Rancher Desktop.app" but that did not help. I still see the hang on Rancher Desktop with the new qemu.

@ryfow
Copy link
Author

ryfow commented Apr 14, 2023

It's got to be a problem with https://github.com/lima-vm/alpine-lima. When I start a VM with limactl start --name alpine template://alpine the problem reproduces.

@ryfow
Copy link
Author

ryfow commented Jan 16, 2024

As an FYI to anyone else running into this, I've had good results with using the VZ Virtual Machine Type. Things seem way more stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dns kind/bug Something isn't working platform/macos regression Functionality was working in a previous release and is now broken triage/need-to-repro Needs to be reproduced by dev team
Projects
None yet
Development

No branches or pull requests

6 participants