Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve upon (currently dangerous) .from_schedule code #269

Closed
wants to merge 3 commits into from
Closed

Improve upon (currently dangerous) .from_schedule code #269

wants to merge 3 commits into from

Conversation

hartwork
Copy link
Contributor

For your consideration 😄

SolarSchedule.Meta.unique_together guarantes
that we run into MultipleObjectsReturned with SolarSchedule.
Current value: ('event', 'latitude', 'longitude')
@hartwork hartwork changed the title Improve upon (currently dangourous) .from_schedule code Improve upon (currently dangerous) .from_schedule code Jun 15, 2019
except MultipleObjectsReturned:
cls.objects.filter(**spec).delete()
return cls(**spec)
return cls.objects.get_or_create(**spec)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can still raise MultipleObjectsReturned, why is not handling that better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a uniqueness constraint covering all fields that go into spec so the database should not be able to contain more than a single match at any time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, but even the previous logic seems terrible tbh.

As an example, lets say I have 2 different schedules for 2 different tasks, one running every sunrise and one running every sunset (just using solarschedule as the example but this applies to any of them). Now, lets say that my boss comes over and says all tasks running at sunset should be changed to sunrise for only the next week for whatever business reason. Instead of modifying a single schedule, I now have to go through and modify potentially hundreds of different periodic tasks to change their schedule... and then I have to remember which ones I changed so I can change them back later.

It seems like we should be allowing duplicate schedules. I'm really curious why this limitation ever existed in the first place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm digging into where all from_schedule is used to see if we can sort this out some, feel free to do the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, lets say that my boss comes over and says all tasks running at sunset should be changed to sunrise for only the next week for whatever business reason. Instead of modifying a single schedule, I now have to go through and modify potentially hundreds of different periodic tasks to change their schedule... and then I have to remember which ones I changed so I can change them back later.

I was thinking of the same scenario earlier. I think this feature also has downsides: Once you have multiple schedules with the same values, you'll have a hard time telling them apart when creating new jobs.

Personally, I see the value in a unique constraint but also the pain with not supporting to change a schedule into something that already exists.

To fully support the scenario that you have in mind, it would take another layer of indirection, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm digging into where all from_schedule is used to see if we can sort this out some, feel free to do the same.

There is a single place outside the test suite:

model_schedule = model_type.from_schedule(schedule)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm reading through the celery scheduler code those inherit from to work out the logic on what scenarios this call stack actually happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, based on what I've looked through, the whole reason for all this is that when you use the django settings to set up the schedules, it deletes/inserts those into the DB. However, instead of having a flag for "inserted from django conf" or whatever, they compare it to what's already in the DB and treat the schedule items as if they are unique constraints. Still kind of playing around with this, but if that's the case then the deletes may be necessary for now, although there are probably several scenarios where things can get borked. Still playing around with this though.

@@ -138,6 +132,7 @@ class Meta:
verbose_name = _('interval')
verbose_name_plural = _('intervals')
ordering = ['period', 'every']
unique_together = ('every', 'period')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I want two different jobs to run on the same interval? This would disallow that, that seems like an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a mixup between jobs and schedules here: You assign the same "every 4 days" schedule to two distinct jobs and they both run at the time. That works before and after this pull request.

The current cls.objects.filter(**spec).delete() is effectively a unique constraint, except that it might cascade destroy data and does not prevent creation in the first place.

I think what the code needs is one of:

  • a) A uniqueness constraint for all schedules (as done in the pull request)
  • b) A uniqueness constraint for none of the schedules and no dangerous cls.objects.filter(**spec).delete() either

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a mixup between jobs and schedules here: You assign the same "every 4 days" schedule to two distinct jobs and they both run at the time. That works before and after this pull request.

The current cls.objects.filter(**spec).delete() is effectively a unique constraint, except that it might cascade destroy data and does not prevent creation in the first place.

I think what the code needs is one of:

* a) A uniqueness constraint for _all_ schedules (as done in the pull request)

* b) A uniqueness constraint for _none_ of the schedules and no dangerous  `cls.objects.filter(**spec).delete()` either

What do you think?

just curious if we add UniqueContraint Check here would that be any useful?

Copy link
Contributor Author

@hartwork hartwork Jan 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @auvipy I would need get back into the whole topic since it's been so long ago by now. I can do that work but I would need some "guarantee" that we'll actually get somewhere with this in the next few weeks to not invest in a dead end. (In case you're a friend of voice meetings for stuff like that, we can definitely do a Jitsi or Google Meet on the topic if you like, but no pressure, just one more option.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will release a new version tomorrow and do some more analysis to get back to you to reach a consensus

'every': every,
'period': period,
}
return cls.objects.get_or_create(**spec)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about MultipleObjectsReturned

@@ -174,6 +167,7 @@ class ClockedSchedule(models.Model):
clocked_time = models.DateTimeField(
verbose_name=_('Clock Time'),
help_text=_('Run the task at clocked time'),
unique=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why limit it like this? Having two jobs run at the same time is completely valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #269 (comment)

except MultipleObjectsReturned:
cls.objects.filter(**spec).delete()
return cls(**spec)
return cls.objects.get_or_create(**spec)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about MultipleObjectsReturned

@@ -280,6 +268,8 @@ class Meta:
verbose_name_plural = _('crontabs')
ordering = ['month_of_year', 'day_of_month',
'day_of_week', 'hour', 'minute', 'timezone']
unique_together = ('minute', 'hour', 'day_of_week', 'day_of_month',
'month_of_year', 'timezone')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue about limiting by the schedule.

except MultipleObjectsReturned:
cls.objects.filter(**spec).delete()
return cls(**spec)
return cls.objects.get_or_create(**spec)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about MultipleObjectsReturned

@hartwork
Copy link
Contributor Author

Any ideas how to proceed with #269 (comment) ?

@hartwork
Copy link
Contributor Author

hartwork commented Jan 5, 2020

Any ideas how to proceed with #269 (comment) ?

Ping 🙏

django_celery_beat/models.py Show resolved Hide resolved
@izimobil
Copy link
Contributor

izimobil commented Mar 2, 2021

FWIW, here is my take on this:
(see also: #389 that fixes the very dangerous from_schedule implementation)

The idea of adding unique constraints is good, but there should be a warning in the changelog because it's not fully backwards compatible, existing code may try to create duplicate schedules (which currently works) and get a IntegrityError exception with the constraints added.

Furthermore, because existing databases can already contain duplicate schedules, the from_schedule methods should either be:

@hartwork
Copy link
Contributor Author

hartwork commented Mar 2, 2021

Closing as superseded by #389.

CC #322 as related.

@auvipy
Copy link
Member

auvipy commented Mar 3, 2021

I might tackle this in the future if needed.

@auvipy auvipy self-assigned this Mar 3, 2021
@hartwork
Copy link
Contributor Author

hartwork commented Mar 9, 2021

I might tackle this in the future if needed.

@auvipy I see two unresolved topics that this pull request documents:

  • introducing uniqueness constraints and
  • potential to use get_or_create to simplify code and increase atomicity.

There is no way forward with the uniqueness constraint that doesn't

  • require user intervention on setups with existing duplicates and
  • have downsides as well

…as explained by @liquidpele at https://github.com/celery/django-celery-beat/pull/269/files#r294091489 .

For get_or_create, a uniqueness constraint on the involved keys would be "needed" to not keep creating duplicate entries. So I see at least a soft dependency on the uniqueness constraint topic here.

With where we are today on master after #389 has been merged, this pull request titled "Improve upon (currently dangerous) .from_schedule code" is not the best way to keep those two potential todos in my opinion, plus I would like to make myself optional in this equation (which is in some conflict with me being the author of the pull request) and keep my own list of open pull request at https://github.com/pulls organized, so I'll close my pull request here again and ask you to create a new dedicated issue to address the topic of adding uniqueness constraints if that's still of interest. Thanks for your understanding 🙏

Best, Sebastian

@hartwork hartwork closed this Mar 9, 2021
@hartwork hartwork deleted the improve-from-schedule branch March 9, 2021 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants