Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the failure due to reaching the servlet capacity when getting user tasks #10768

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tinaselenge
Copy link
Contributor

@tinaselenge tinaselenge commented Oct 28, 2024

Type of change

Select the type of your PR

  • Bugfix

Description

If failed to get user tasks due to reaching the servlet capacity, it will no longer transition the KafkaRebalance status to NotReady. Instead, KafkaRebalance status will not be changed and in the next reconciliation, it will retry getting the user tasks again. The operator will also report a warning message.

Resolves #10704

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Check RBAC rights for Kubernetes / OpenShift roles
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md
  • Supply screenshots for visual changes, such as Grafana dashboards

@scholzj
Copy link
Member

scholzj commented Oct 29, 2024

/azp run build

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tinaselenge tinaselenge marked this pull request as ready for review October 29, 2024 12:56
Copy link
Contributor

@katheris katheris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of small suggestions, but otherwise LGTM

*/
CruiseControlUserTasksResponse(String userTaskId, JsonObject json) {
super(userTaskId, json);
// The maximum number of active user tasks that can run concurrently has reached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this comment be on the isMaxActiveUserTasksReached variable?

* Response to user tasks request
*/
public class CruiseControlUserTasksResponse extends CruiseControlResponse {
private boolean isMaxActiveUserTasksReached;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private boolean isMaxActiveUserTasksReached;
private boolean maxActiveUserTasksReached;

Promise<MapAndStatus<ConfigMap, KafkaRebalanceStatus>> p, String sessionId,
Set<Condition> conditions, KafkaRebalance kafkaRebalance,
ConfigMapOperator configMapOperator, boolean dryRun,
String host, CruiseControlApi apiClient,
AbstractRebalanceOptions.AbstractRebalanceOptionsBuilder<?, ?> rebalanceOptionsBuilder) {
if (cruiseControlResponse.isMaxActiveUserTasksReached()) {
LOGGER.warnCr(reconciliation, "The maximum number of active user tasks that can run concurrently has reached therefore will retry getting user tasks in the next reconciliation. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LOGGER.warnCr(reconciliation, "The maximum number of active user tasks that can run concurrently has reached therefore will retry getting user tasks in the next reconciliation. " +
LOGGER.warnCr(reconciliation, "The maximum number of active user tasks that Cruise Control can run concurrently has been reached, therefore will retry getting user tasks in the next reconciliation. " +

Promise<MapAndStatus<ConfigMap, KafkaRebalanceStatus>> p, String sessionId,
Set<Condition> conditions, KafkaRebalance kafkaRebalance,
ConfigMapOperator configMapOperator, boolean dryRun,
String host, CruiseControlApi apiClient,
AbstractRebalanceOptions.AbstractRebalanceOptionsBuilder<?, ?> rebalanceOptionsBuilder) {
if (cruiseControlResponse.isMaxActiveUserTasksReached()) {
LOGGER.warnCr(reconciliation, "The maximum number of active user tasks that can run concurrently has reached therefore will retry getting user tasks in the next reconciliation. " +
"If this occurs often, consider increasing the value for max.active.user.tasks configuration.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"If this occurs often, consider increasing the value for max.active.user.tasks configuration.");
"If this occurs often, consider increasing the value for max.active.user.tasks in the Cruise Control configuration.");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Error getting status of rebalance task via /user_tasks endpoint results in "NotReady" state
4 participants