Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random "Is the docker daemon running?" with Docker-in-Docker Feature #192

Open
metaskills opened this issue Jan 22, 2023 · 15 comments
Open

Comments

@metaskills
Copy link

When using the docker in docker feature it has a 20% chance to fail. I created a demo repo to show case this. It uses concurrent jobs to highlight the issue well but is not limited to this workflow style. I am seeing this random failure behavior all over certain project. HELP PLEASE!

failure

If this issue is within the CLI, then I have created an issue there in that project to track it as well:

@metaskills
Copy link
Author

@Chuxel Any thoughts on this?

@Chuxel
Copy link
Member

Chuxel commented Jan 23, 2023

Hmmm. If you add cat /tmp/dockerd.log from your exec, that would output the startup logs. Since docker is started in the background, my bet is that things are going fast enough sometimes that the exec happens before it is fully up. Otherwise there would be errors in that file that could point to the underling issue.

Adding a sleep statement in the exec might also verify whether this is a race condition.

@metaskills
Copy link
Author

Seems Docker does not start at all. Also, when this happens there is no amount of waiting I can do in the devcontainer. Docker will just not work. I tried waiting for several minutes.

cat: /tmp/dockerd.log: No such file or directory

@samruddhikhandale
Copy link
Member

Looking at https://github.com/customink/dnd-demo/actions/runs/3990966349/jobs/6845376920#step:3:689, this issue sounds quite similar to devcontainers/features#372

Looks like this issue mostly occurs in Action runners & not in a Codespace.

@samruddhikhandale
Copy link
Member

@metaskills Even the other issue I pointed at, uses the runs-on: ubuntu-latest image in the workflow. @metaskills Can we change the image and see if that helps?

@metaskills
Copy link
Author

Sure. I'll change it to a few other things and even see if the version of the CI helps. Will report back shortly.

@metaskills
Copy link
Author

So I tested ubuntu-20.04 and after about 50 runs I've had no failures. So that is good news and gives me something to work with while we sort this out. I'll read that other issue too.

@Chuxel
Copy link
Member

Chuxel commented Jan 24, 2023

Very strange. I'd also be curious if running the /usr/local/share/docker-init.sh script again during your exec fixes it. We could layer in some retry if it does. But it's super odd that it's not consistent... almost like its an issue with certain Actions hosts. @samruddhikhandale - Might be worth reaching out to the actions folks to see if anything has been happening?

@metaskills
Copy link
Author

if running the /usr/local/share/docker-init.sh script again during your exec

Do you mean in my runCmd?

@joshaber
Copy link

We've had internal reports of this as well with Debian 11.

@Chuxel
Copy link
Member

Chuxel commented Jan 24, 2023

if running the /usr/local/share/docker-init.sh script again during your exec

Do you mean in my runCmd?

Yes, sorry. (Under the hood its devcontainer exec.)

@samruddhikhandale
Copy link
Member

Very strange. I'd also be curious if running the /usr/local/share/docker-init.sh script again during your exec fixes it. We could layer in some retry if it does. But it's super odd that it's not consistent... almost like its an issue with certain Actions hosts. @samruddhikhandale - Might be worth reaching out to the actions folks to see if anything has been happening?

Created actions/runner-images#6980

@metaskills
Copy link
Author

if running the /usr/local/share/docker-init.sh script again during your exec

@Chuxel Tried that... did not help. The message is still the same when I do this.

          runCmd: |
            /usr/local/share/docker-init.sh
            docker info

@samruddhikhandale
Copy link
Member

The user "runner" used on runner-images is a member of a "docker" group, so you shouldn't expect such problems. 
However to understand the nature of the problem, could you please run the docker-in-docker task without using "devcontainers/[email protected]" action?
We would like to make sure that the root cause is not the action itself.

Originally posted by @Alexey-Ayupov in actions/runner-images#6980 (comment)

@metaskills Would you be interested to test this hypothesis? Thanks!

@metaskills
Copy link
Author

Thanks, I'm subscribed to that issue too so I replied there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants