-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiProcessConsumer disregards topic offsets between subsequent runs #173
Comments
What Kafka-Python commit are you using? |
I tried several and invoking MultiProcessConsumer the following way exhibits that above behavior. However if I invoke SimpleConsumer the same way (minus num_procs), the SimpleConsumer won't consume any messages since the offset is set at 200. MultiProcessConsumer(kafka, "marcin-group", "test_1", num_procs=2, auto_commit=True, auto_commit_every_n=100) |
I can confirm this behavior as well. I did confirm that setting auto_commit=True in the SimpleConsumers created within _mp_consume() corrects this behavior. It appears this was a design choice to manage auto_commit within the master process only. I haven't dug into the code enough yet to understand why. |
The design choice makes sense because the master process is actually what "consumes" the messages. Fetching something from the queue and committing that without actually doing anything with the message is definitely an incorrect course of action. I don't see anything obviously wrong with the code. I don't know when I'll have more time to devote to this, but if someone could put together a failing test case it would be super helpful. |
I'm working on a fix for this. I see two options:
def __init__(self, client, group, topic, partitions=None, auto_commit=True,
auto_commit_every_n=AUTO_COMMIT_MSG_COUNT,
auto_commit_every_t=AUTO_COMMIT_INTERVAL,
load_initial_offsets=False):
I like option 1 the best but I know they wanted to keep all offset management in the MultiProcessConsumer so option 2 might fit their design choice better. |
i think the best solution here is to fetch committed offsets in the base consumer on |
See PR #356 -- I recall now that the reason SimpleConsumer only fetches commits when auto_commit=True was because everyone was using server v0.8.0 and it did not have support for offset commit/fetch. We should change to check whether client.group is None. |
Sounds good to me, it's simpler to reset offsets only when required (minority of cases), and resuming offsets should be the default behavior. |
#356 merged -- this should be fixed now and will be available in 0.9.4 release |
* Add typing * define types as Struct for simplicity's sake
* Add typing * define types as Struct for simplicity's sake
Not sure if this works as designed but, every time I invoke MultiProcesConsumer on same topic/partition (new execution thread), the consumer disregards the consumer offset in Zookeeper and reads all the messages that are available. SimpleProcessConsumer only reads messages that have not been read. Shouldn't MultiProcessConsumer behave the same way?
Code snippet:
The text was updated successfully, but these errors were encountered: