-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix double execution after concurrent node bootstraps #380
Fix double execution after concurrent node bootstraps #380
Conversation
@dkropachev @Lorak-mmk could you take a look? |
Could you explain more? I'm not that familiar with this code - and in general you should not expect reviewers to be as familiar with a piece of code as you are. Some more specific questions regarding the code:
But in the code I see that
|
@Lorak-mmk I added some extra details to the description |
2d9bbfe
to
be1debf
Compare
In some cases We want to only update the pool if previous do not exist or is shutdown. This commit adds additional validation to add_or_renew_pool to make sure this condition is met when needed. Fixes: scylladb#317
be1debf
to
61aaeed
Compare
Rebased on master and reworded the commit message |
Please tell me if I understand everything correctly.
Given the description of As I stated, it does need to care about concurrency - and it probably does try, judging by the existence of a lock. There are however scenarios where it fails. Consider 2 concurrent calls (
I think that we should try to make this function safe (which means properly handling concurrency) - I don't see how we can be sure that anything works when such footguns are present.
Okay, I got it after I wrote above paragraph. So the underlying issue seems to be that the code assumes that This model is easy to understand for developer that reads the code and nullifies most of the problem that we see here in Python Driver. Now I assume we can't wait with fixing the linked issue until we can properly fix pool management as I wrote above - so let's move to your patch. I don't see how this flag fixes anything. Currently there is already a check if previous pool is none or shutdown, in python-driver/cassandra/cluster.py Lines 3359 to 3364 in d768d74
You are performing the exact same check, just a bit later (after you created a new pool) - it doesn't fix anything (because there is no guarantee that assignment to
I think if you change the reproducer to add sleep before assigning to |
@Lorak-mmk You are right, I agree it only reduces the chances of a failure. Thanks for analyzing this carefully |
Do we have an issue for fixing pool management? |
I don't think we do. |
@Lorak-mmk @dkropachev To be honest I don't have any idea how to solve the issue differently. Do you have any idea for the fix? If not as this solution was proven to be incomplete I will close this PR. Additionally I will create an issue for fixing pool management and as the original issue is not that urgent (#317 (comment)) I think the priority should be lower to P2 and maybe it can wait for a proper fix. |
One possible idea:
If 1 can't be solved, but 2 can, then perhaps we can re-fetch |
Hmm... I'm not sure if what I wrote makes sense, I have to think about it more. |
Ok, so points 1 and 2 are good in general, but are not enough to fix the issue - two pools would still be opened, and to assignments would happen. |
I am closing this PR as this is not correct solution, the issue would need a separate one utilizing some of the ideas probably |
In some cases we want to only update the pool if previous do not exist or is shutdown. This commit adds additional validation to
add_or_renew_pool
to make sure this condition is met.Problematic scenario:
The reason
add_or_renew_pool
was executed second time is:python-driver/cassandra/cluster.py
Lines 3359 to 3364 in d768d74
pool
for the node 2 isNone
. That is becauseadd_or_renew_pool
was called, but it does not finished yet (python-driver/cassandra/cluster.py
Line 3324 in d768d74
self._pools[host]
. This is why adding sleep before that line make the issue easier to reproduce (022a3aa)So in that specific case we only want to create new pool if there was no pool before or it was shutdown. But the case might be that the pool was created but not assigned yet, so before assigning second one we want to check if previous met the conditions.
Fixes: #317