Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: illumos-amd64 builder consistently failing #58967

Closed
bcmills opened this issue Mar 10, 2023 · 15 comments
Closed

x/build: illumos-amd64 builder consistently failing #58967

bcmills opened this issue Mar 10, 2023 · 15 comments
Assignees
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-illumos WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Mar 10, 2023

The illumos-amd64 builder has timed out testing cmd/go in every run since CL 473697.

Since the code involved in that change has literally no effect on illumos-amd64, I suspect that something has changed in either the builder itself or its configuration in cmd/coordinator to cause the test to become much slower.

@bcmills bcmills changed the title cmd/go,x/build/dashboard: illumos-amd64 builder consistently timing out in cmd/go tests cmd/go,x/build: illumos-amd64 builder consistently timing out in cmd/go tests Mar 10, 2023
@bcmills bcmills added Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go OS-illumos labels Mar 10, 2023
@bcmills
Copy link
Contributor Author

bcmills commented Mar 10, 2023

(attn @jclulow; CC @golang/illumos @golang/release)

@bcmills bcmills added this to the Backlog milestone Mar 10, 2023
@bcmills bcmills changed the title cmd/go,x/build: illumos-amd64 builder consistently timing out in cmd/go tests cmd/go,x/build: illumos-amd64 builder consistently timing out in tests Mar 16, 2023
@bcmills
Copy link
Contributor Author

bcmills commented Mar 16, 2023

Looking at a few of the failures, it's not just cmd/go — there have also been timeouts in at least runtime and testing.

@jclulow
Copy link
Contributor

jclulow commented Mar 16, 2023

As an initial note on this: nothing about the builder has changed to the best of my knowledge, unless AWS have moved it onto a defective or noisy host somehow.

@bcmills
Copy link
Contributor Author

bcmills commented Mar 31, 2023

The failure mode has changed: it is now consistently failing with no space left on device errors during bootstrapping.

@bcmills bcmills changed the title cmd/go,x/build: illumos-amd64 builder consistently timing out in tests cmd/go,x/build: illumos-amd64 builder consistently failing Mar 31, 2023
@bcmills bcmills removed the GoCommand cmd/go label Mar 31, 2023
@jclulow
Copy link
Contributor

jclulow commented Mar 31, 2023

I will investigate!

@jclulow
Copy link
Contributor

jclulow commented Mar 31, 2023

I believe disk space exhaustion was caused by the build-up of files in $HOME/.cache/gopls...

ncdu 1.15.1 ~ Use the arrow keys to navigate, press ? for help
--- /home/gobuild/.cache -------------------------------------------------------
                         /..
   18.1 GiB [##########] /gopls
  133.7 MiB [          ] /staticcheck
   40.8 MiB [          ] /go-build
   17.0 KiB [          ] /screentest

 Total disk usage:  18.3 GiB  Apparent size:   6.0 GiB  Items: 1406542

Should these files be deleted automatically after the job completes? Can I delete them now?

@jclulow
Copy link
Contributor

jclulow commented Mar 31, 2023

NB: in the interim I have activated the warm standby VM here which obviously doesn't (yet!) have a full disk, and appears to be processing jobs OK.

@bcmills bcmills changed the title cmd/go,x/build: illumos-amd64 builder consistently failing x/build: illumos-amd64 builder consistently failing Apr 3, 2023
@bcmills
Copy link
Contributor Author

bcmills commented Aug 24, 2023

@jclulow, it appears that the illumos-amd64 builder is completely disconnected from the coordinator at the moment.

@jclulow
Copy link
Contributor

jclulow commented Aug 24, 2023

@jclulow, it appears that the illumos-amd64 builder is completely disconnected from the coordinator at the moment.

Oh no!

I'm looking at it now, and the issue appears to be that the buildlet user has consumed all 10G of its quota. (I set the quota to less than the VM disk size on this one, after the last builder filled the entire disk in the machine making it harder to manage.)

I'm running ncdu at the moment, which is taking a while because there are a lot of tiny files. They are once again concentrated in /home/gobuild/.cache/gopls like last time.

@bcmills What is responsible for managing the contents of $HOME/.cache/gopls? Should these files be deleted automatically after the job completes? Can I delete them now?

@heschi
Copy link
Contributor

heschi commented Aug 24, 2023

Yeah, you can delete them whenever you need to. Maybe a cron job? :-/

@golang/tools-team if you can find a way to make this stop happening that'd be great. I remain unconvinced that a temporary directory that persists across test runs is the right approach.

@findleyr
Copy link
Contributor

@heschi I thought we already did this in https://go.dev/cl/494297? Does that miss something?

@heschi
Copy link
Contributor

heschi commented Aug 24, 2023

Hm. Forgot about that. @jclulow, does this machine auto-update the buildlet?

@jclulow
Copy link
Contributor

jclulow commented Aug 24, 2023

Hm. Forgot about that. @jclulow, does this machine auto-update the buildlet?

We run the stage0 binary under our service manager, so, if that auto-updates the buildlet then it should do that yes. Indeed it was, when I looked at it, trying to do that but the disk was full so it was unable. It's also been offline for a while, it seems, so maybe that fix happened after the outage started?

I'm currently clearing out .cache/gopls to make room and I'll kick it off after that and we'll see!

@jclulow
Copy link
Contributor

jclulow commented Aug 25, 2023

I have re-enabled the thing and it obtained a new buildlet and is ostensibly doing work now.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 25, 2023
@gopherbot
Copy link
Contributor

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@gopherbot gopherbot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 25, 2023
@golang golang locked and limited conversation to collaborators Sep 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-illumos WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants