You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When getting status of rebalance task via /users_tasks, it could return 500 with an error such as:
2024-10-09 15:44:34 ERROR KafkaCruiseControlRequestHandler:88 - Error processing GET request '/user_tasks' due to: 'There are already 5 active user tasks, which has reached the servlet capacity.'.
java.lang.RuntimeException: There are already 5 active user tasks, which has reached the servlet capacity.
This has nothing to do with the actual rebalance task itself, as it is still maybe in progress. This seems to be a failure in generating a new user task for getting the status. When one of the existing user tasks complete, it gets removed from the active user task list e.g:
2024-10-09 15:44:36 INFO UserTaskManager:349 - UserTask 7e280130-47d2-4940-99da-f57f117c3f26 is completed and removed from active tasks list
Once an existing task is completed and removed, we should be able to send a request to /users_tasks without hitting 500. Since this failure does not reflect the actual status of the rebalance task that we are trying to query about, I don't think it makes sense to result in "NotReady" for the KafkaRebalance. We should maybe retry the endpoint again, in the next reconciliation.
Steps to reproduce
Create KafkaRebalance CR for removing/adding brokers with auto approve set, and then immediately apply refresh annotation to create a new rebalance task. This is an intermittent failure depending on how quickly tasks complete.
Expected behavior
No response
Strimzi version
main
Kubernetes version
1.29
Installation method
No response
Infrastructure
No response
Configuration files and logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Said that good catch @tinaselenge. I think it could be make easily reproducible by shortening the max.active.user.tasks when configuring Cruise Control in the Kafka custom resource. Its value is 5 which is exactly what you have.
Triaged on 17/10/2024: agreed to fix this, at least not moving the KafkaRebalance to NotReady state straight when it happens but waiting for next reconciliation(s) as retries. @tinaselenge is going to take a look at it. Thanks Tina!
Bug Description
When getting status of rebalance task via /users_tasks, it could return 500 with an error such as:
This has nothing to do with the actual rebalance task itself, as it is still maybe in progress. This seems to be a failure in generating a new user task for getting the status. When one of the existing user tasks complete, it gets removed from the active user task list e.g:
Once an existing task is completed and removed, we should be able to send a request to /users_tasks without hitting 500. Since this failure does not reflect the actual status of the rebalance task that we are trying to query about, I don't think it makes sense to result in "NotReady" for the KafkaRebalance. We should maybe retry the endpoint again, in the next reconciliation.
Steps to reproduce
Create KafkaRebalance CR for removing/adding brokers with auto approve set, and then immediately apply refresh annotation to create a new rebalance task. This is an intermittent failure depending on how quickly tasks complete.
Expected behavior
No response
Strimzi version
main
Kubernetes version
1.29
Installation method
No response
Infrastructure
No response
Configuration files and logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: