Run configured helm tests during deploy #596

davejohnclark · 2021-11-05T14:06:08Z

Adds a HelmTest specification to the HelmOptions on a Bundle. If enabled, this causes a ReleaseTesting action to be run during deployment, after the install/upgrade.

The happy path works well here: we can see the helm testing action run successfully, the bundle is unready until the tests are complete and once they are it becomes ready. We currently use helm test for our service smoke tests in our non-fleet deployment mechanism, so this is perfect for our use-case.

The error case is a bit less ideal, but does work for our use-case. Our existing non-fleet deployment mechanism uses a helm test failure to trigger a rollback, and that's the behaviour we'd like to be able to build on fleet. This change does give us that - the failure of the tests propagates back up through the bundle status, and then the GitRepo and we can use that to trigger a rollback of the GitRepo to the previous version (we also manage the GitRepo deployment itself via helm). What's not ideal is that after the tests have failed the next attempt to reconcile the bundle re-runs the tests, this happens over and over until something changes to either rollback or fix whatever the issue is. Since the test runs block the fleet-agent this doesn't feel like a great workflow, but one we could definitely work with for the time being (see below for some possible ideas on this - I'm sure it's something you've already put some thought into as well).

The HelmTest specification has an enabled flag to toggle the test run, as well as a timeout and filter spec that matches what is exposed by the helm ReleaseTesting api. I've added a default timeout into the deployer which means we never try and run the tests without a timeout. Since the waiting for the tests is a blocking operation I don't think it would be wise to allow running without a timeout.

I see two main issues with this PR as-is:

The fact that test failures cause the agent to endlessly re-run the tests. I've described this above. You can easily reproduce this by setting a small timeout on the test spec and doing a deployment of any helm chart with some tests. The main issue, as I mentioned above, is that they are blocking in the agent which feels less than ideal. A better flow here would, I believe, be to be able to set a flag on the bundle to indicate that in its current state it is not suitable to retry until something upstream changes - a new generation of the GitRepo. The agent could then skip the bundle during reconciliation rather than trying again to make it work. As a change this felt a bit larger in scope than what I was looking to raise, but I'm more than happy to participate in this work if there's a way you see this working. The fluxcd helm-controller (https://github.com/fluxcd/helm-controller/blob/3912a9a4d55bd87eb77d3ad5e5036061a33c4e78/controllers/helmrelease_controller.go#L276) records metadata to be able to make these sorts of decisions, and this feels like a very sane approach.
I don't have any tests around the changes that I've introduced. Typically with a change like this that coordinates an external library and resources in the ecosystem I'd like end-to-end tests to prove that they all play nicely together. I don't see any existing testing around the install/upgrade flow here and it's also something I'd be happy to work on - but again introducing a new layer of testing wasn't something I wanted to do in a vacuum. Let me know any thoughts/opinions you have on this

sgran · 2022-01-14T15:09:10Z

Bump.

This addresses #527

Can we do anything to help?

justdan96 · 2022-08-11T09:38:35Z

Bump

Add helm test spec to bundle and run during helm deploy

c2440c0

davejohnclark force-pushed the helm-test branch from 28e3c61 to c2440c0 Compare November 5, 2021 14:23

kkaempf added the kind/enhancement label Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run configured helm tests during deploy #596

Run configured helm tests during deploy #596

davejohnclark commented Nov 5, 2021

sgran commented Jan 14, 2022

justdan96 commented Aug 11, 2022

Run configured helm tests during deploy #596

Are you sure you want to change the base?

Run configured helm tests during deploy #596

Conversation

davejohnclark commented Nov 5, 2021

sgran commented Jan 14, 2022

justdan96 commented Aug 11, 2022