Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run configured helm tests during deploy #596

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

davejohnclark
Copy link

Adds a HelmTest specification to the HelmOptions on a Bundle. If enabled, this causes a ReleaseTesting action to be run during deployment, after the install/upgrade.

The happy path works well here: we can see the helm testing action run successfully, the bundle is unready until the tests are complete and once they are it becomes ready. We currently use helm test for our service smoke tests in our non-fleet deployment mechanism, so this is perfect for our use-case.

The error case is a bit less ideal, but does work for our use-case. Our existing non-fleet deployment mechanism uses a helm test failure to trigger a rollback, and that's the behaviour we'd like to be able to build on fleet. This change does give us that - the failure of the tests propagates back up through the bundle status, and then the GitRepo and we can use that to trigger a rollback of the GitRepo to the previous version (we also manage the GitRepo deployment itself via helm). What's not ideal is that after the tests have failed the next attempt to reconcile the bundle re-runs the tests, this happens over and over until something changes to either rollback or fix whatever the issue is. Since the test runs block the fleet-agent this doesn't feel like a great workflow, but one we could definitely work with for the time being (see below for some possible ideas on this - I'm sure it's something you've already put some thought into as well).

The HelmTest specification has an enabled flag to toggle the test run, as well as a timeout and filter spec that matches what is exposed by the helm ReleaseTesting api. I've added a default timeout into the deployer which means we never try and run the tests without a timeout. Since the waiting for the tests is a blocking operation I don't think it would be wise to allow running without a timeout.

I see two main issues with this PR as-is:

  • The fact that test failures cause the agent to endlessly re-run the tests. I've described this above. You can easily reproduce this by setting a small timeout on the test spec and doing a deployment of any helm chart with some tests. The main issue, as I mentioned above, is that they are blocking in the agent which feels less than ideal. A better flow here would, I believe, be to be able to set a flag on the bundle to indicate that in its current state it is not suitable to retry until something upstream changes - a new generation of the GitRepo. The agent could then skip the bundle during reconciliation rather than trying again to make it work. As a change this felt a bit larger in scope than what I was looking to raise, but I'm more than happy to participate in this work if there's a way you see this working. The fluxcd helm-controller (https://github.com/fluxcd/helm-controller/blob/3912a9a4d55bd87eb77d3ad5e5036061a33c4e78/controllers/helmrelease_controller.go#L276) records metadata to be able to make these sorts of decisions, and this feels like a very sane approach.
  • I don't have any tests around the changes that I've introduced. Typically with a change like this that coordinates an external library and resources in the ecosystem I'd like end-to-end tests to prove that they all play nicely together. I don't see any existing testing around the install/upgrade flow here and it's also something I'd be happy to work on - but again introducing a new layer of testing wasn't something I wanted to do in a vacuum. Let me know any thoughts/opinions you have on this

@sgran
Copy link

sgran commented Jan 14, 2022

Bump.

This addresses #527

Can we do anything to help?

@justdan96
Copy link

Bump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants