You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tigger a rebalance event, since user didn't poll, it will not be handled
rd_kafka_consumer_close_queue to trigger a RD_KAFKA_OP_TERMINATE in cgrp
when closing rd_kafka_queue_poll will be called, and rebalance event handled
RD_KAFKA_OP_ASSIGN send into cgrp_op queue
RD_KAFKA_OP_TERMINATE handled, and exit the rd_kafka_q_serve
rd_kafka_cgrp_serve called rd_kafka_q_purge(rkcg->rkcg_ops); to purge the RD_KAFKA_OP_ASSIGN event
main thread never get reply from RD_KAFKA_OP_ASSIGN, so it hangs forever
// in rdkafka.c thread_main:
int cnt = rd_kafka_q_serve(rk->rk_ops, timeout_ms, 0,
RD_KAFKA_Q_CB_CALLBACK, NULL, NULL);
if (rk->rk_cgrp) /* FIXME: move to timer-triggered */
rd_kafka_cgrp_serve(rk->rk_cgrp);
a proper fix is to handle all rkcg->rkcg_ops before rd_kafka_q_purge(rkcg->rkcg_ops); PR
How to reproduce
It's a random case, i managed to re-produce it after i modify many code:
change heart beat response to always error ( so we are not in the group)
change rd_kafka_q_serve in main thread to handle 1 op at a time
disable rd_kafka_cgrp_serve before RD_KAFKA_OP_GET_REBALANCE_PROTOCOL called so RD_KAFKA_OP_ASSIGN can be in queue first and handle after rd_kafka_cgrp_serve called. aiquestion@1bfc1e4
build rdkafka_complex_consumer_example.c and run it. it will hang. after apply the fix in PR, it can close successfully.
Description
we enconter a issue when closing the consumer in Rust SDK, it hangs calling rd_kafka_assign() in rebalance_cb.
code is like (using C code)
In some cases the program will never exist, and the call stack hangs in
rd_kafka_assign
After investigation, it seems that:
a proper fix is to handle all rkcg->rkcg_ops before rd_kafka_q_purge(rkcg->rkcg_ops);
PR
How to reproduce
It's a random case, i managed to re-produce it after i modify many code:
aiquestion@1bfc1e4
build rdkafka_complex_consumer_example.c and run it. it will hang. after apply the fix in PR, it can close successfully.
IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/confluentinc/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
<REPLACE with e.g., v0.10.5 or a git sha. NOT "latest" or "current">
<REPLACE with e.g., 0.10.2.3>
<REPLACE with e.g., message.timeout.ms=123, auto.reset.offset=earliest, ..>
<REPLACE with e.g., Centos 5 (x64)>
debug=..
as necessary) from librdkafkaThe text was updated successfully, but these errors were encountered: