You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.
Overview of the Issue
Sometimes stack locks are left intact after atlantis automerging PR.
Reproduction Steps
The conditions for reproducing this race are following:
Enable automerge (GitHub provider, but that's probably not that relevant)
Enable some post-workflow hooks
The issue can't be reproduced reliably, but basically you need to create a PR with any number of affected stacks, and trigger apply all. Then rarely following happens:
Thread 1:
Apply command calls autoMerger.automerge (calling GH API, which triggers merge webhook), returns to RunCommentCommand
RunCommentCommand calls RunPostHooks
RunPostHooks WorkingDir.Clone, which calls some git commands which try to fetch remote and create some files in .git directory
Thread 2:
On receiving GH webhook event ClosedPullEvent we call PullCleaner.CleanUpPull
CleanUpPull calls WorkingDir.Delete, which calls os.RemoveAll(w.repoPullDir(r, p))
As os.RemoveAll traverses over workspace and removes files in second thread, git creates some in first thread, which yields following error: error during cleanup of pull data%!(EXTRA *errors.withStack=cleaning workspace: unlinkat /home/atlantis/data/repos/.../123/default/.git/logs/refs/remotes: directory not empty)
As CleanUpPull fails after merge, we are left with merged PR and locks still intact, that's not great as we need to remove them manually.
Logs
Logs
automerging pull request
PUT /repos/.../pulls/123/merge
post-hooks configured, running...
got workspace lock
clone directory \"/home/atlantis/data/repos/.../123/default\" already exists, checking if it's at the right commit
repo is at correct commit \"dead...beef\" so will not re-clone
Initiating cleanup of pull data.
getting remote update failed: Fetching origin\nFrom https://github.com/...\n abcabc..defdef master -> origin/master\nFetching head\nfatal: 'head' does not appear to be a git repository\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\nerror: could not fetch head\n
error during cleanup of pull data%!(EXTRA *errors.withStack=cleaning workspace: unlinkat /home/atlantis/data/repos/.../123/default/.git/logs/refs/remotes: directory not empty)
error handling gh post code: 403 err: cleaning workspace: unlinkat /home/atlantis/data/repos/.../123/default/.git/logs/refs/remotes: directory not empty
error: exit status 1: running \"/usr/local/bin/some-post-workflow-hook\" in \"/home/atlantis/data/repos/.../123/default\": \nSome error about missing atlantis.yaml after workspace delete\n
Error running post-workflow hooks exit status 1: running \"/usr/local/bin/some-post-workflow-hook\" in \"/home/atlantis/data/repos/.../123/default\": \nSome error about missing atlantis.yaml after workspace delete\n.
If not running the latest Atlantis version have you tried to reproduce this issue on the latest version: No, but I checked commits for last two releases and there is nothing that might fix this behaviour.
Flags and configs are not provided, as we have quite complex config and simplifying it for this issue would take too long. I think there is enough information in the description to identify the issue. If not, I'll create a small configuration for reproduction.
Additional Context
I think that some of the possible solutions are:
Don't run post-workflow hooks if PR was automerged in command execution
Merge only after running post-workflow hooks. That might be tricky, as automerge is part of apply command, and post-workflow hooks are running in any case.
Somehow take lock on workspace between commands, i.e. take lock while apply command is running, and wait on that lock in thread which processes merge events, this would eliminate the race, as cleanup will be run after apply command and post-hooks are finished
The first one looks easiest, I'll be happy to send a PR if this is fine behaviour.
Don't run post-workflow hooks if PR was automerged in command execution I believe that should be fine but it should still work for people that do not use automerge.
The post-workflow would still need to run in some cases. I think it would be better to clean up the working directory after the post-workflow-hook completes its run, no?
Community Note
Overview of the Issue
Sometimes stack locks are left intact after atlantis automerging PR.
Reproduction Steps
The conditions for reproducing this race are following:
The issue can't be reproduced reliably, but basically you need to create a PR with any number of affected stacks, and trigger apply all. Then rarely following happens:
autoMerger.automerge
(calling GH API, which triggers merge webhook), returns to RunCommentCommandWorkingDir.Clone
, which calls somegit
commands which try to fetch remote and create some files in.git
directoryos.RemoveAll(w.repoPullDir(r, p))
As
os.RemoveAll
traverses over workspace and removes files in second thread,git
creates some in first thread, which yields following error:error during cleanup of pull data%!(EXTRA *errors.withStack=cleaning workspace: unlinkat /home/atlantis/data/repos/.../123/default/.git/logs/refs/remotes: directory not empty)
As CleanUpPull fails after merge, we are left with merged PR and locks still intact, that's not great as we need to remove them manually.
Logs
Logs
Environment details
Flags and configs are not provided, as we have quite complex config and simplifying it for this issue would take too long. I think there is enough information in the description to identify the issue. If not, I'll create a small configuration for reproduction.
Additional Context
I think that some of the possible solutions are:
The first one looks easiest, I'll be happy to send a PR if this is fine behaviour.
Relevant issues:
atlantis/post_workflow_hook
errors afterapply
when using--automerge
flag due to deleted directory #3031The text was updated successfully, but these errors were encountered: