Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix KafkaRebalanceAssemblyOperator to not ignore exception coming from Cruise Control #10701

Closed
wants to merge 1 commit into from

Conversation

ShubhamRwt
Copy link
Contributor

Type of change

This PR fixes #10631. Now the KafkaRebalanceAssemblyOperator doesn't ignore the Illegal argument excepts which comes from Cruise Control and move the KafkaRebalance to NotReady in case any IllegalArgumentExceptions occurs

  • Bugfix
  • Enhancement / new feature
  • Refactoring
  • Documentation

Description

Please describe your pull request

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@ShubhamRwt ShubhamRwt added this to the 0.44.0 milestone Oct 11, 2024
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the changes, that is probably more something for @ppatierno. But should there be some unit tests to check what you describe? You just changed some unit tests where reconciliation was failing to succeeding. It is not clear to me why. But none seems to test the situation I had where the reconciliation was stuck.

@ppatierno
Copy link
Member

@scholzj I had an offline conversation with @ShubhamRwt about this and right now this is the best solution to fix that issue. Right now errors coming from CC are not handled pretty well. Some of them are "translated" into actual responses (i.e. the not enough data in window) to avoid showing errors to users and putting the KR in a "NotReady" state but showing a "PendingProposal" (because that's what the error means).
Said that, in this case we hava an actual error but failing the future doesn't pop up into the chain of calls moving the resource to "NotReady" so we decided to go this way for now in order to have this fix soon and also @see-quick tests to pass in relation to the stuck situation you see.
After this, I will try to revisit the way we handle CC errors to improve it, but it's better doing that later imho.

@ppatierno
Copy link
Member

@scholzj update :-) ... we'll try to have something better on handling errors from CC for this use case as well, because it seems we do already something this way when getting user tasks status but not when rebalancing requests (as in this scenario). So ... stay tuned ;-)

@scholzj
Copy link
Member

scholzj commented Oct 14, 2024 via email

@ppatierno
Copy link
Member

I went deeper in debugging the issue and I found the related problem which is about a "never ending" reconciliation keeping the Rebalancing state on the KafkaRebalance and of course not leaving the lock for the next reconciliation.
I will open a PR soon with the proposed fix. Then I guess this one could be closed.

@ppatierno
Copy link
Member

After further investigation I found the root cause of the issue and fixed it via #10717.
We can close this one when #10717 is merged.

@scholzj
Copy link
Member

scholzj commented Oct 15, 2024

If #10717 replaces this you should probably close it right away to avoid any confusions?

@ppatierno ppatierno closed this Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove-brokers rebalancing seems to get stuck by race condition
3 participants