-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve upon (currently dangerous) .from_schedule code #269
Conversation
SolarSchedule.Meta.unique_together guarantes that we run into MultipleObjectsReturned with SolarSchedule. Current value: ('event', 'latitude', 'longitude')
except MultipleObjectsReturned: | ||
cls.objects.filter(**spec).delete() | ||
return cls(**spec) | ||
return cls.objects.get_or_create(**spec)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can still raise MultipleObjectsReturned, why is not handling that better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a uniqueness constraint covering all fields that go into spec
so the database should not be able to contain more than a single match at any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, but even the previous logic seems terrible tbh.
As an example, lets say I have 2 different schedules for 2 different tasks, one running every sunrise and one running every sunset (just using solarschedule as the example but this applies to any of them). Now, lets say that my boss comes over and says all tasks running at sunset should be changed to sunrise for only the next week for whatever business reason. Instead of modifying a single schedule, I now have to go through and modify potentially hundreds of different periodic tasks to change their schedule... and then I have to remember which ones I changed so I can change them back later.
It seems like we should be allowing duplicate schedules. I'm really curious why this limitation ever existed in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm digging into where all from_schedule is used to see if we can sort this out some, feel free to do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, lets say that my boss comes over and says all tasks running at sunset should be changed to sunrise for only the next week for whatever business reason. Instead of modifying a single schedule, I now have to go through and modify potentially hundreds of different periodic tasks to change their schedule... and then I have to remember which ones I changed so I can change them back later.
I was thinking of the same scenario earlier. I think this feature also has downsides: Once you have multiple schedules with the same values, you'll have a hard time telling them apart when creating new jobs.
Personally, I see the value in a unique constraint but also the pain with not supporting to change a schedule into something that already exists.
To fully support the scenario that you have in mind, it would take another layer of indirection, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm digging into where all from_schedule is used to see if we can sort this out some, feel free to do the same.
There is a single place outside the test suite:
model_schedule = model_type.from_schedule(schedule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I'm reading through the celery scheduler code those inherit from to work out the logic on what scenarios this call stack actually happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, based on what I've looked through, the whole reason for all this is that when you use the django settings to set up the schedules, it deletes/inserts those into the DB. However, instead of having a flag for "inserted from django conf" or whatever, they compare it to what's already in the DB and treat the schedule items as if they are unique constraints. Still kind of playing around with this, but if that's the case then the deletes may be necessary for now, although there are probably several scenarios where things can get borked. Still playing around with this though.
@@ -138,6 +132,7 @@ class Meta: | |||
verbose_name = _('interval') | |||
verbose_name_plural = _('intervals') | |||
ordering = ['period', 'every'] | |||
unique_together = ('every', 'period') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if I want two different jobs to run on the same interval? This would disallow that, that seems like an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a mixup between jobs and schedules here: You assign the same "every 4 days" schedule to two distinct jobs and they both run at the time. That works before and after this pull request.
The current cls.objects.filter(**spec).delete()
is effectively a unique constraint, except that it might cascade destroy data and does not prevent creation in the first place.
I think what the code needs is one of:
- a) A uniqueness constraint for all schedules (as done in the pull request)
- b) A uniqueness constraint for none of the schedules and no dangerous
cls.objects.filter(**spec).delete()
either
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a mixup between jobs and schedules here: You assign the same "every 4 days" schedule to two distinct jobs and they both run at the time. That works before and after this pull request.
The current
cls.objects.filter(**spec).delete()
is effectively a unique constraint, except that it might cascade destroy data and does not prevent creation in the first place.I think what the code needs is one of:
* a) A uniqueness constraint for _all_ schedules (as done in the pull request) * b) A uniqueness constraint for _none_ of the schedules and no dangerous `cls.objects.filter(**spec).delete()` either
What do you think?
just curious if we add UniqueContraint Check here would that be any useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @auvipy I would need get back into the whole topic since it's been so long ago by now. I can do that work but I would need some "guarantee" that we'll actually get somewhere with this in the next few weeks to not invest in a dead end. (In case you're a friend of voice meetings for stuff like that, we can definitely do a Jitsi or Google Meet on the topic if you like, but no pressure, just one more option.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will release a new version tomorrow and do some more analysis to get back to you to reach a consensus
'every': every, | ||
'period': period, | ||
} | ||
return cls.objects.get_or_create(**spec)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about MultipleObjectsReturned
@@ -174,6 +167,7 @@ class ClockedSchedule(models.Model): | |||
clocked_time = models.DateTimeField( | |||
verbose_name=_('Clock Time'), | |||
help_text=_('Run the task at clocked time'), | |||
unique=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why limit it like this? Having two jobs run at the same time is completely valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see #269 (comment)
except MultipleObjectsReturned: | ||
cls.objects.filter(**spec).delete() | ||
return cls(**spec) | ||
return cls.objects.get_or_create(**spec)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about MultipleObjectsReturned
@@ -280,6 +268,8 @@ class Meta: | |||
verbose_name_plural = _('crontabs') | |||
ordering = ['month_of_year', 'day_of_month', | |||
'day_of_week', 'hour', 'minute', 'timezone'] | |||
unique_together = ('minute', 'hour', 'day_of_week', 'day_of_month', | |||
'month_of_year', 'timezone') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue about limiting by the schedule.
except MultipleObjectsReturned: | ||
cls.objects.filter(**spec).delete() | ||
return cls(**spec) | ||
return cls.objects.get_or_create(**spec)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about MultipleObjectsReturned
Any ideas how to proceed with #269 (comment) ? |
Ping 🙏 |
FWIW, here is my take on this: The idea of adding unique constraints is good, but there should be a warning in the changelog because it's not fully backwards compatible, existing code may try to create duplicate schedules (which currently works) and get a IntegrityError exception with the constraints added. Furthermore, because existing databases can already contain duplicate schedules, the from_schedule methods should either be:
|
I might tackle this in the future if needed. |
@auvipy I see two unresolved topics that this pull request documents:
There is no way forward with the uniqueness constraint that doesn't…
…as explained by @liquidpele at https://github.com/celery/django-celery-beat/pull/269/files#r294091489 . For With where we are today on Best, Sebastian |
For your consideration 😄