Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate if ArtifactsSuite/TestArtifactGC is flakey #10230

Closed
Tracked by #10231
isubasinghe opened this issue Dec 14, 2022 · 5 comments · Fixed by #10298
Closed
Tracked by #10231

Investigate if ArtifactsSuite/TestArtifactGC is flakey #10230

isubasinghe opened this issue Dec 14, 2022 · 5 comments · Fixed by #10298
Assignees
Labels
area/build Build or GithubAction/CI issues type/feature Feature request

Comments

@isubasinghe
Copy link
Member

No description provided.

@isubasinghe isubasinghe added the type/feature Feature request label Dec 14, 2022
@juliev0 juliev0 self-assigned this Dec 14, 2022
@juliev0
Copy link
Contributor

juliev0 commented Dec 14, 2022

Is it possible to provide a link to the failed run?

@juliev0 juliev0 added the area/build Build or GithubAction/CI issues label Dec 14, 2022
@isubasinghe
Copy link
Member Author

@juliev0
Copy link
Contributor

juliev0 commented Dec 16, 2022

@juliev0 I believe it is this one: https://github.com/argoproj/argo-workflows/actions/runs/3691130662/jobs/6248880760

Thanks for finding it!

@juliev0
Copy link
Contributor

juliev0 commented Dec 20, 2022

For some reason the wait container isn't saving the artifact into minio some of the time, so then when the test expects that the artifact was not deleted and looks for it, it was never saved in the first place, so it fails. The wait container is apparently exiting with exit code 2, but there's no error displayed.

Attaching result of "kubectl describe pod" as well as wait container log:
describe-pod.log
wait.log

@juliev0
Copy link
Contributor

juliev0 commented Dec 21, 2022

It seems like maybe the wait container is dying before it's finished, according to this sequence of events:

  1. Here the Controller determines that it can send a SIGTERM to the Pod only once the main container is done (but the wait container may still be running) (I can see in the log that this is happening)
  2. The wait container has tied its context to SIGTERM, such that the context is marked 'done' when the SIGNAL arrives here.
  3. Somehow the wait container seems to be issuing an exit code 2 when the SIGTERM arrives. I'm not sure what would cause an Exit Code 2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build Build or GithubAction/CI issues type/feature Feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants