Best practice for resilient CD pipelines #67
Replies: 1 comment
-
I haven't yet found an automated solution I trust enough to recover from apply failures, so I'm super insterested to hear what people have tried. When there are transient failures with busy pipelines sometimes the next build after the failure will succeed, so the pipeline isn't actually out of action very long. The second plan will include unexpected changes leftover from the first change though, which can be confusing. I have actually seen pipelines that run apply twice in a row just to be sure. I don't like it, but it's hard to argue against when it solves the problem. What I have found useful is to make failures really noisy, e.g. send a message to a public slack channel, tag someone who is responsible for fixing it, and tag anyone who's changes have potentially not been applied (The authors of every commit since the previous successful commit). This makes sure that the right people are informed and looking at the problem, and people quickly learn to review thoroughly before merging 😁. For pipelines where the contributers are not as experienced with terraform it's useful to require reviews from people who are. PR validate checks are useful. I've even gone as far as parsing the terraform code to check for common mistakes. |
Beta Was this translation helpful? Give feedback.
-
I'm trying to build a CD workflow with Github actions that's fairly generic so it can be reused, initially by my org.
There are a couple of challenges I have with getting this to the point where it's resilient and I can reasonably share this with people with little TF experience.
At a basic level I have a simple PR, Review, Merge workflow:
The trouble I have is that sometimes apply fails or partially succeeds. Maybe this is just flakey azure resources and I should focus on that space but I think this will always be a potential issue that should have an intentional and, attempted at least, automated resolution.
Things I have considered:
Any tips, advice or links to blogs or solution would be awesome. Aware that I might have missed some basic principles as I've come at TF with not much background in the space!
Beta Was this translation helpful? Give feedback.
All reactions