Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-10650][BUGFIX] Prevent running again already running cron job #11465

Closed
wants to merge 5 commits into from
Closed

[ISSUE-10650][BUGFIX] Prevent running again already running cron job #11465

wants to merge 5 commits into from

Conversation

sylvainraye
Copy link
Contributor

@sylvainraye sylvainraye commented Oct 15, 2017

Description

Create a logic to get running cron job and prevent them to be run again when pending job should be executed

Fixed Issues (if relevant)

  1. Cron starts when it's already running #10650: Cron starts when it's already running

Manual testing scenarios

  1. Create a cron that takes a while (for example, sleep 15 minutes)
  2. Schedule it to run every 5 minutes

Contribution checklist

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • All automated tests passed successfully (all builds on Travis CI are green)

@fooman
Copy link
Contributor

fooman commented Oct 15, 2017

@diglin thanks for this pull request. The proposed change looks okay to me. Would you be able to take a look at adding a test that covers the code branch $this->isJobRunning($schedule->getJobCode() for example with a mock that makes sure that the call to $schedule->getScheduledAt() is not executed.

*/
protected function isJobRunning($jobCode)
{
$runningJobs = $this->_getRunningSchedules();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to replace this part to SQL query, that will be much effective than getting collection and iterate it.

*
* @return bool
*/
protected function isJobRunning($jobCode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to use private instead of protected

@sylvainraye
Copy link
Contributor Author

sylvainraye commented Oct 17, 2017

@ihor-sviziev the other properties and methods of this class are in protected, I kept the same convention. If you want to change it, all other properties and methods need to be changed. It's outside of the scope of this issue. Same for the collection story, outside of the scope of the issue and the core team used the collection, even if I agree with you regarding the collection use.

@fooman I worked few minutes to fix the issue and do manually testing with success. However few hours to try to set a unit test without success on saturday during the hackathon in Essen. I updated the other unit test to pass the test. I really gave a try to cover this case by mocking cron job schedule, class and so on but without success. I'm stuck to have one cron with status running and an other one with pending to run the test. When I use into the method isJobRunning, the $schedule->getJobCode(), it breaks all other unit test of this class, because it returns always true due to how is mocked up the collection, it's a no end work...
I don't have the time and probably the knowledge to manage to create a unit test by M2, really painful and didn't manage during the hackathon to get help from the others participants. I'm open to learn but need help.

@ihor-sviziev
Copy link
Contributor

@diglin unfortunately all other properties we can't change, it will not be backward compatible change. So it's can't be done in scope of this PR.

@dmanners dmanners self-assigned this Oct 24, 2017
@dmanners dmanners added this to the October 2017 milestone Oct 24, 2017
@dmanners dmanners added 2.2.x bug report Reproduced on 2.1.x The issue has been reproduced on latest 2.1 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release labels Oct 24, 2017
Copy link
Contributor

@dmanners dmanners left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you consider making the requested changes. Might be a good thing to extract the new code into it's own class here also.

@@ -62,6 +62,11 @@ class ProcessCronQueueObserver implements ObserverInterface
protected $_pendingSchedules;

/**
* @var \Magento\Cron\Model\ResourceModel\Schedule\Collection
*/
protected $_runningSchedules;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is new we can make this private and remove the _ from the name. All new properties should be private or public.

Remove `_` for properties for new properties / methods
@sylvainraye
Copy link
Contributor Author

@dmanners done

@ihor-sviziev
Copy link
Contributor

ihor-sviziev commented Nov 13, 2017

@diglin @dmanners as I see you're checking that current job is already running. What do you think about using cron groups approach that was used in AOE Scheduler, maybe we need to add such groups functionality and prevent cron execution for group?

@paveq
Copy link
Contributor

paveq commented Nov 22, 2017

Perhaps it would be better to implement this on Cron Group level, so that only single instance of given Group is running at once. If implemented in job level, the question is what is the purpose of Cron Groups then? In my mind Groups should be able to be used to define jobs that are related (eg. product import) and should not run concurrently, but can run concurrently with another Group (eg. order export).

Secondly should we detect if the actual Group is running (via some IPC mechanism, or even by checking PID) and not depend on running status? If the process dies and running status is left behind, then it is never restarted.

@ihor-sviziev
Copy link
Contributor

ihor-sviziev commented Nov 22, 2017

About cron job groups
AFAIK magento already have two cron job groups:

  • Default: used in most modules
    <group id="default">
    <job name="catalog_product_alert" instance="Magento\ProductAlert\Model\Observer" method="process">
    <config_path>crontab/default/jobs/catalog_product_alert/schedule/cron_expr</config_path>
    </job>
    </group>
  • Index: Used in index module only
    <group id="index">
    <job name="indexer_reindex_all_invalid" instance="Magento\Indexer\Cron\ReindexAllInvalid" method="execute">
    <schedule>* * * * *</schedule>
    </job>
    <job name="indexer_update_all_views" instance="Magento\Indexer\Cron\UpdateMview" method="execute">
    <schedule>* * * * *</schedule>
    </job>
    <job name="indexer_clean_all_changelogs" instance="Magento\Indexer\Cron\ClearChangelog" method="execute">
    <schedule>0 * * * *</schedule>
    </job>
    </group>

I think this lock should be done in following way:

  1. We need to create "virtual" group - all that will include all cron job groups. This group should be used by default (in order not to brake backward compatibility).
  2. Adjust cron:run command to support two parameters: --include-groups and --exclude-groups:
  • If --include-groups present - run cron jobs from requested groups only in loop.
  • If --exclude-groups exists - run "all" available groups, except listed.
  • If --include-groups and --exclude-groups are present - error should be shown
  • If none of these params are present - run "all" group (should be the same as)

For all cases eache of these groups should be locked before cron execution and unlocked after execution.

As result we will have backward compatible solution with preventing group running in parallel and ability to run cron job groups in parallel for people that really need it.

Next improvement could be ability to manage cron job groups from the admin.

About locks
Locks should NOT be done as "is running" field in db. Cron jobs could be ran on different servers, they could fail with fatal errors, etc.
My proposition - do them in DB. Not sure that this code still there in M2, but I have example how it was done in Magento 1, not sure it was used for cron jobs out of the box, but we used it.

https://github.com/OpenMage/magento-mirror/blob/magento-1.9/app/code/core/Mage/Index/Model/Resource/Helper/Mysql4.php#L58-L84

@dmanners @diglin @okobchenko what do you think?

@ihor-sviziev ihor-sviziev self-assigned this Nov 22, 2017
@dmanners
Copy link
Contributor

@ihor-sviziev I like the proposal. Going for the groups approach seems to make sense to me.

@paveq
Copy link
Contributor

paveq commented Nov 22, 2017

@ihor-sviziev Looks nice, but I did not completely understand purpose of virtual group "all". If we introduce locking on cron group basis, I don't think we lose any backwards compatibility compared to current situation.

Only thing I can think of is that someone has put multiple jobs into single group that should be ran parallel but now are not. That is not breaking any backwards compatibility per se. It will still work, not just parallel, and easily fixed by moving these to separate groups.

In my opinion adding parameters to cron:run is going to wrong direction and introducing complexity on system level, which we should avoid. In my opinion the system should trigger single entry point, cron:run externally, and Magento should manage everything related to internal crons internally.

EDIT: perhaps we should run all cron groups, including default group in a separate process, so that main entry point is not being blocked by anything?

EDIT2: I like proposal of using MySQL based locking. We've used it earlier, and this works across multiple nodes. If we go for this + group based locking, then we can ideally have every node (not just primary node) running Magento cron, and tasks get picked by which ever node got there first.

@ihor-sviziev
Copy link
Contributor

@paveq

  1. My idea with "all" virtual group - it includes all cron job groups, just to visualize separate case.
  2. For cases when someone thought that cron jobs should be ran in parallel - it should be described in release notes, so such people could update their configurations, so it will not be a big issue.
  3. For most of the people this solution will work the same as today, so they could use it without any parameters. In case if we will not change cron:run - we will not have ability to run specific group as example on separate group of servers.
  4. I think it's a good idea, but it will be not a easy solution. Also for doing it - we need to have ability to run separate group, so we need some parameters for running specific cron group, so probably MVP could exclude it.

Copy link
Contributor

@ihor-sviziev ihor-sviziev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diglin could you update your code according to comments above?

Feel free to ask questions if you have any

@dmanners
Copy link
Contributor

I am closing this PR due to inactivity. Feel free to reach out to myself or @ihor-sviziev if you would like to pick up this pull request again later on.

@sylvainraye
Copy link
Contributor Author

All comments provided are new feature which looks great by the way. But sorry, it's outside of the scope of this issue. At the end, you close the issue, do not merge it and NO ONE can benefit of the solution.... Well, why not open a new issue, provide your feedback there and create a new feature for that !

@ihor-sviziev
Copy link
Contributor

@diglin I see your point, but we couldn't merge just working solution when it was fixed not in optimal way. If we will merge this PR, it will be included to next release, but next PR could not be included. As result - next PR will not be backward compatible with these changes which is not really good, so it could not be fixed in good way till 2.3 release. Hope you understood my point.

@sylvainraye
Copy link
Contributor Author

sylvainraye commented Dec 14, 2017

The solution provided by the other here raise an other issue: the whole cron group will be blocked if one cron job is running. If it last too long, the next run after the lock is released will won't necessary run the cron jobs because they are "missed" due to too late run.

@magento-engcom-team magento-engcom-team added the Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release label Dec 14, 2017
@sylvainraye
Copy link
Contributor Author

I finally found why I didn't receive notifications. My bad, my apologies.

@ihor-sviziev
Copy link
Contributor

@diglin thank you for getting it. Probably we need to find solution for that. As a workaround when some cron job will take long time - you could extract it to separate group and this issue will go away.

BTW if you found any issues in implementation that someone did - you could tell about it in that PR.
PS: I added this info to #12497 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug report bugfix Component: Cron Progress: needs update Release Line: 2.2 Reproduced on 2.1.x The issue has been reproduced on latest 2.1 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants