-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GlobalTable not working when using multiple instances #283
Comments
Please check partitioning (what instance consume partition) and rocksdb storage use different paths on instances |
@Roman1us regarding paths, i’m using different —datadir and —web-port when running the instances. For partitions consumption, im using the topic “greetings_topic” as a source topic to write into the GlobalTable, and every instance is assigned to a different partition of the “greetings_topic”. The problem is that instances can’t consume/write data from/into the GlobalTable but in the exact same partition of the source event, as if it is a local “Table”. The only difference with “GlobalTable” is that at startup, the recovery is done based on all partitions, but once recovery done, every instances will consume/update the GlobalTable only in the assigned partition of the “greetings_topic”. |
We observed the same issue with GlobalTables on multiple workers. Essentially they do not work currently :-( |
Same here, we are running 5 worker instances that should take advantage of the same GlobalTable but they actually see it just as a LocalTable. It's a shame since we need to workaround by adding a lot of boilerplate code just to mock the GlobalTable behaviour. |
Hi @dario-collavini, what’s the approach you used to mock the GlobalTable? |
@Hamdiovish @dario-collavini how many partitions on GlobalTable topic you have? |
@Roman1us as mentioned in « steps to reproduce » section, the global table of my example has 2 partitions. |
@Hamdiovish so, you need same table data across all workers, right? Maybe set topic to 1 partition and use_partitioner=True in table config help you? |
@Roman1us thank you for the prompt reply,
The wrong behavior still happening, at initialization, the recovery occurs as expected and the two instances are showing the same content of the global table.
instance_0:
instance_1:
instance_0:
instance_1:
instance_0:
instance_1:
|
@Hamdiovish show output of |
We saw exactly the same issue reported by @Hamdiovish, updates were not available at runtime. In our case the workaround is just using local Tables instead of GlobalTables and ensuring the matching of topic partitions and key structure among agents and tables. We need to be sure that the topic of each worker instance's agent that needs to use that table is partitioned with the same key used to access table data, so that all global data is split into local subset accessible by the corresponding instance, which is granted by the usage of the same partition key. |
Hi @Roman1us, here is the details:
The output of the source topic:
Regarding Partition assignment:
The boot log of instance_1:
Just a quick note, Global Table works pretty well when there is only one instance, but the problems will start once another instance is added. |
@Hamdiovish okay, i'll try to reproduce this and maybe debug this behavior |
I have experienced a similar problem, with a global table that has 1 partition changelog topic, and multiple workers I've narrowed my problem down to the fact that at least one worker out of a group of workers sharing a global table will be assigned only "active" changelog topic partitions, and not standbys, all the other workers will get active and standbys assigned to them here For the worker that has "actives" only when the _restart_recovery runs in recovery.py it pauses all active partitions after the initial recovery. Which it should do if the "active" partitions are also in the standby set, because the standby partition set gets resumed here but when the active partitions are not in the standby set nothing ever resumes the changelog topic partition for that worker. This problem can be masked if the worker which has only "active" partitions assigned is also the worker which is the leader of the faust topic partition being used to modify the table in that case the local modifications keep the table in sync even though the "active" changelog topics were not resumed (but this pairing is not a guarantee). As soon as the leader of the topic updating the table is a different worker than the worker which has only the active changelog topic partitions and not standbys that workers table will not receive update events, and get out of sync. I'm not sure that this is actually the right place to address this problem, but the workaround I've put in to address this problem is here. Curious if there is a better place to address this? One last note for anyone not doing high volume table updates, if you are checking to make sure your tables are staying in sync one change at a time, you will want to set recovery_buffer_size=1 to make sure your changes are immediately applied, otherwise your tables will appear out of sync even though the change events are being received and just not applied. |
Current solution to this problem as of v0.8.6 is to initialize a GlobalTable with the options |
I am also facing this Issue. The mentioned Solution does not work for us, as we need more than 1 partition in order to split the work accross multiple physical nodes. Am I missing something? |
Checklist
master
branch of Faust.Steps to reproduce
Running the following app in 2 instances, then by pushing a message to one partition, the instance listening to the related partition detect the message and update the GlobalTable, whilst the other instance will not update its version of the GlobalTable and keep showing nothing.
Expected behavior
All instances should detect GlobalTable updates.
Actual behavior
The GlobalTable is not synced across instances.
Versions
The text was updated successfully, but these errors were encountered: