Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NiFi upgrade doesn't work #238

Closed
1 of 4 tasks
soenkeliebau opened this issue Mar 17, 2022 · 4 comments
Closed
1 of 4 tasks

NiFi upgrade doesn't work #238

soenkeliebau opened this issue Mar 17, 2022 · 4 comments
Assignees

Comments

@soenkeliebau
Copy link
Member

soenkeliebau commented Mar 17, 2022

Affected version

0.5.0

Current and expected behavior

Scenario
A NifiCluster with three nodes was deployed with version 1.13.2 and is up and running.

The NifiCluster CRD is now changed to version 1.15.0.

Current Behavior
The StatefulSet is updated with the new image and triggers a rolling restart of the NiFi Pods with the new container image set to NiFi 1.15.0.

However NiFi does not support running a cluster with mixed versions, instead a full stop and restart with the new version is required.

Reference ticket:
https://issues.apache.org/jira/browse/NIFI-4068?jql=project%20%3D%20NIFI%20AND%20text%20~%20%22rolling%20upgrade%22

Due to this, the new pod never successfully starts and the restart hangs indefinitely, or until the user deletes all pods and they are rewritten with the same version by the StatefulSet.

Expected Behavior
The operator should notice that a version changes is happening and trigger a full restart of NiFi.

This is done when

  • I can update my NiFi CR from one NiFi version to another and have the operator automatically run a full-restart of NiFi to have a fully running new NiFi (new = new version)
  • The solution to implement this can be reused by other operators for similar scenarios
  • The proposed solution has been discussed in the architecture meeting before it's fully implemented
  • The implementation solution has been documented in the Contributor's Guide

Possible solution

The operator needs to be able to recognize a version change during reconciliation and then act accordingly to perform a full cluster restart.

Something along the lines of this code (suggested by @teozkr ) might work:

if current_sts.spec.template.spec.image != new_image:
  if current_sts.status.replicas > 0:
    # Wait for all current replicas to die
    new_sts.spec.replicas = 0
  else:
    # All old replicas are dead, do the upgrade
    new_sts.spec.replicas = rolegroup.replicas
    new_sts.spec.template.spec.image = new_image

Environment

This should be reproducable independently of the K8s environment.

soenkeliebau added a commit that referenced this issue Aug 11, 2022
soenkeliebau added a commit that referenced this issue Aug 11, 2022
@lfrancke lfrancke moved this to Development: Waiting for Review in Stackable Engineering Aug 12, 2022
@lfrancke lfrancke moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Aug 12, 2022
@lfrancke lfrancke moved this to Development: Track in Stackable Engineering Aug 23, 2022
@sbernauer sbernauer moved this from Development: Track to Development: In Progress in Stackable Engineering Sep 5, 2022
@lfrancke lfrancke moved this from Development: In Progress to Development: Waiting for Review in Stackable Engineering Sep 20, 2022
@lfrancke lfrancke moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Sep 20, 2022
@bors bors bot closed this as completed in 6bc536e Sep 23, 2022
@razvan razvan moved this from Development: In Review to Development: Done in Stackable Engineering Sep 23, 2022
@lfrancke lfrancke moved this from Development: Done to Acceptance: Waiting for in Stackable Engineering Sep 26, 2022
@lfrancke lfrancke moved this from Acceptance: Waiting for to Acceptance: In Progress in Stackable Engineering Sep 28, 2022
@lfrancke
Copy link
Member

None of the boxes have been checked.
Do they still make sense and if so could you check what's been done?

@razvan @soenkeliebau

@soenkeliebau
Copy link
Member Author

I'm afraid the boxes are not really checkable at the moment, as we did not implement a generic solution for this. Not sure if I wrote those checkboxes back in the day, but I'd say we don't need something abstract at the moment, as this only affects NiFi at the moment.
Opinions?

@lfrancke
Copy link
Member

lfrancke commented Oct 4, 2022

No, I'm fine with that and I'm fine with not checking the boxes.
Do we need a follow-up ticket then or can we create one if something comes up?

@lfrancke lfrancke self-assigned this Oct 4, 2022
@soenkeliebau
Copy link
Member Author

We can create that as and when needed I think.

@lfrancke lfrancke moved this from Acceptance: In Progress to Done in Stackable Engineering Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
3 participants