Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock jobs while executing them, to allow multiple executors to run in… #24696

Merged
merged 3 commits into from
May 21, 2016

Conversation

nickvergessen
Copy link
Contributor

@nickvergessen nickvergessen commented May 18, 2016

… parallel

Contributes towards #24551

  • Clarify API
  • Fix tests

I implemented the UPDATE with lock and then SELECT because SELECT ... FOR UPDATE does not exist for all ours DBs, so it feels wrong adding that to the IQueryBuilder

@nickvergessen nickvergessen added 3 - To Review technical debt p1-urgent Critical issue, need to consider hotfix with just that issue feature:backgroundjobs labels May 18, 2016
@nickvergessen nickvergessen added this to the 9.1-current milestone May 18, 2016
@@ -267,13 +289,23 @@ private function buildJob($row) {
* @param IJob $job
*/
public function setLastJob($job) {
$query = $this->connection->getQueryBuilder();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From API pov I'd expect this to happen in setLastRun(), but that is currently called before run(), so the job would be unlocked, although it is not finished yet:

$jobList->setLastRun($this);
try {
$this->run($this->argument);

But I guess that's part of #24609 😞

@nickvergessen
Copy link
Contributor Author

cc @DeepDiver1975 that's what I ended up with atm. Feel free to comment or deny, however I don't see a way to get SELECT FOR UPDATE working

@DeepDiver1975
Copy link
Member

Select for Update is one way to Lock a row.
Each DBMS has a mechanism for this.
Worst case we might end up in locking the full table.

Without locking this will not work.

@nickvergessen nickvergessen force-pushed the lock-jobs-while-executing branch 3 times, most recently from 2f17089 to 98478ab Compare May 19, 2016 13:03
@nickvergessen
Copy link
Contributor Author

Updated so it now locks the table for the select+update and then immediately unlocks it again, before building the job instance or anything else to keep the time window as small as possible.

@nickvergessen nickvergessen force-pushed the lock-jobs-while-executing branch 2 times, most recently from 19e3c7b to 8acb253 Compare May 20, 2016 09:17
@nickvergessen
Copy link
Contributor Author

nickvergessen commented May 20, 2016

Test instructions

  1. Create apps/files/lib/BackgroundJob/Test.php with content:

    <?php
    
    namespace OCA\Files\BackgroundJob;
    
    class Test extends \OC\BackgroundJob\TimedJob {
    
    public function __construct() {
        $this->setInterval(10);
    }
    
    protected function run($argument) {
        \OC::$server->getLogger()->error('PID ' . getmypid() . ' - ' . serialize($argument));
        sleep(rand(1, 3));
    }
    }
  2. Drop jobs table content and add fake jobs:

    INSERT INTO `oc_jobs` (`id`, `class`, `argument`, `last_run`, `last_checked`, `reserved_at`) VALUES
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '1', '0', '0', '0'),
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '2', '0', '0', '0'),
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '3', '0', '0', '0'),
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '4', '0', '0', '0'),
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '5', '0', '0', '0'),
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '6', '0', '0', '0'),
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '7', '0', '0', '0'),
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '8', '0', '0', '0'),
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '9', '0', '0', '0'),
    (NULL, 'OCA\\Files\\BackgroundJob\\Test', '10', '0', '0', '0');
  3. Open two console windows and run sudo -u www-data php cron.php in them.

Before:

Window 2:

sudo -u www-data php cron.php
Another instance of cron.php is still running!

Log:

{...,"message":"PID 12196 - i:1;","level":3,"time":"2016-05-20T09:05:53+00:00",...}
{...,"message":"PID 12196 - i:2;","level":3,"time":"2016-05-20T09:05:54+00:00",...}
{...,"message":"PID 12196 - i:4;","level":3,"time":"2016-05-20T09:05:57+00:00",...}
{...,"message":"PID 12196 - i:5;","level":3,"time":"2016-05-20T09:05:59+00:00",...}
{...,"message":"PID 12196 - i:6;","level":3,"time":"2016-05-20T09:06:00+00:00",...}
{...,"message":"PID 12196 - i:7;","level":3,"time":"2016-05-20T09:06:02+00:00",...}
{...,"message":"PID 12196 - i:8;","level":3,"time":"2016-05-20T09:06:04+00:00",...}
{...,"message":"PID 12196 - i:9;","level":3,"time":"2016-05-20T09:06:06+00:00",...}
{...,"message":"PID 12196 - i:10;","level":3,"time":"2016-05-20T09:06:09+00:00",...}
{...,"message":"PID 12196 - i:3;","level":3,"time":"2016-05-20T09:06:12+00:00",...}

After:

Both windows run
Log:

{...,"message":"PID 12554 - i:2;","level":3,"time":"2016-05-20T09:09:59+00:00",...}
{...,"message":"PID 12633 - i:4;","level":3,"time":"2016-05-20T09:10:01+00:00",...}
{...,"message":"PID 12554 - i:5;","level":3,"time":"2016-05-20T09:10:01+00:00",...}
{...,"message":"PID 12554 - i:6;","level":3,"time":"2016-05-20T09:10:02+00:00",...}
{...,"message":"PID 12633 - i:7;","level":3,"time":"2016-05-20T09:10:04+00:00",...}
{...,"message":"PID 12554 - i:8;","level":3,"time":"2016-05-20T09:10:04+00:00",...}
{...,"message":"PID 12633 - i:9;","level":3,"time":"2016-05-20T09:10:06+00:00",...}
{...,"message":"PID 12554 - i:10;","level":3,"time":"2016-05-20T09:10:06+00:00",...}
{...,"message":"PID 12554 - i:3;","level":3,"time":"2016-05-20T09:10:08+00:00",...}
{...,"message":"PID 12633 - i:1;","level":3,"time":"2016-05-20T09:10:09+00:00",...}

As you can see each job is still only executed once, but they are executed from different processes, so you can add more processes, when your job table grows too fast. Also the jobs are executed after ~10 seconds instead of ~20 so they are really executed in parallel.

@DeepDiver1975 @PVince81 ready to review from my pov

</field>

<field>
<name>reserved_at</name>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment to explain what this column is about ?

@PVince81
Copy link
Contributor

Looks good, thanks for the latest changes 👍

@PVince81
Copy link
Contributor

Second review @icewind1991 @Xenopathic @DeepDiver1975 ?

@icewind1991
Copy link
Contributor

Looks good 👍

@PVince81
Copy link
Contributor

@nickvergessen I have some bad news for you.

Hint 1: it starts with an O
Hint 2: wakes up mostly when touching DB code
Hint 3: everybody hates fixing stuff for it

07:45:46 1) Test\BackgroundJob\JobList::testGetNext
07:45:46 Doctrine\DBAL\Exception\DriverException: An exception occurred while executing 'LOCK TABLE "oc_jobs" EXCLUSIVE':
07:45:46 
07:45:46 ORA-01738: missing IN keyword
07:45:46 
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractOracleDriver.php:76
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/DBALException.php:116
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php:996
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/DB/Connection.php:208
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/DB/AdapterOCI8.php:35
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/DB/Connection.php:321
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/BackgroundJob/JobList.php:191
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/tests/lib/backgroundjob/joblist.php:184
07:45:46 
07:45:46 Caused by
07:45:46 Doctrine\DBAL\Driver\OCI8\OCI8Exception: ORA-01738: missing IN keyword
07:45:46 
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/OCI8/OCI8Exception.php:33
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/OCI8/OCI8Statement.php:214
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Driver/OCI8/OCI8Connection.php:140
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Connection.php:993
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/DB/Connection.php:208
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/DB/AdapterOCI8.php:35
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/DB/Connection.php:321
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/BackgroundJob/JobList.php:191
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/tests/lib/backgroundjob/joblist.php:184
07:45:46 
07:45:46 2) Test\BackgroundJob\JobList::testGetNextSkipReserved
07:45:46 BadMethodCallException: Can not lock a new table until the previous lock is released.
07:45:46 
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/DB/Connection.php:316
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/BackgroundJob/JobList.php:191
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/tests/lib/backgroundjob/joblist.php:197
07:45:46 
07:45:46 3) Test\BackgroundJob\JobList::testGetNextSkipNonExisting
07:45:46 BadMethodCallException: Can not lock a new table until the previous lock is released.
07:45:46 
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/DB/Connection.php:316
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/lib/private/BackgroundJob/JobList.php:191
07:45:46 /ssd/jenkins/workspace/core-ci-linux@4/database/oci/label/SLAVE/tests/lib/backgroundjob/joblist.php:211
07:45:46 

@nickvergessen
Copy link
Contributor Author

Easy fix, forgot one word in the LOCK TABLE x >>>IN<<< .... sql command, fixed

@PVince81 PVince81 merged commit 8646802 into master May 21, 2016
@PVince81 PVince81 deleted the lock-jobs-while-executing branch May 21, 2016 17:08
@lock
Copy link

lock bot commented Aug 5, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
4 - To release feature:backgroundjobs p1-urgent Critical issue, need to consider hotfix with just that issue technical debt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants