Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job.save()'d jobs are immediately cancelled? #9

Closed
dandv opened this issue Jun 25, 2014 · 11 comments
Closed

job.save()'d jobs are immediately cancelled? #9

dandv opened this issue Jun 25, 2014 · 11 comments
Labels

Comments

@dandv
Copy link
Contributor

dandv commented Jun 25, 2014

My code creates a bunch of jobs in a foreach loop like this:

var feedParsingJobs = new JobCollection('feedParsing');
feedParsingJobs.startJobs();

// ... foreach loop  {
  var job = feedParsingJobs.createJob('parseFeed', {...});
  job.save();
}

Within each iteration, I see in the console jobCancel succeeded. Inspecting db.feedParsing.jobs shows for each job three log messages:

Created
Job Submitted
Job Cancelled

status is cancelled.

The above happens before I get to var queue = feedParsingJobs.processJobs(...).

Shouldn't the job status be "waiting"?

If I add job.restart(); after job.save(), the queue appears to run correctly, but I get (STDERR) jobRestart failed messages.

@vsivsi
Copy link
Owner

vsivsi commented Jun 25, 2014

What options are being set on the jobs? Are they set to repeat?

Sent from Vaughn's iPad

On Jun 25, 2014, at 12:51 AM, Dan Dascalescu [email protected] wrote:

My code creates a bunch of jobs in a foreach loop like this:

var feedParsingJobs = new JobCollection('feedParsing');
feedParsingJobs.startJobs();

// ... for loop {
var job = feedParsingJobs.createJob('parseFeed', {...});
job.save();
}
Within each iteration, I see in the console jobCancel succeeded. Inspecting db.feedParsing.jobs shows for each job three log messages:

Created
Job Submitted
Job Cancelled
status is cancelled.

The above happens before I get to var queue = feedParsingJobs.processJobs(...).

Shouldn't the job status be "waiting"?

If I add job.restart(); after job.save(), the queue appears to run correctly.


Reply to this email directly or view it on GitHub.

@dandv
Copy link
Contributor Author

dandv commented Jun 25, 2014

No options. Initially I had a repeat forever and wait 300 seconds, but then I commented out the code and dropped the feedParsing.jobs collection.

@vsivsi
Copy link
Owner

vsivsi commented Jun 25, 2014

In particular, take a look at the cancelRepeats option on Job.save() to see if it applies to your situation.

@dandv
Copy link
Contributor Author

dandv commented Jun 25, 2014

Thanks for the pointer - this might be it:

By default, if an infinitely repeating job is added to the job Collection, any existing repeating jobs of the same type that are cancellable, will be cancelled.

Will have to do more testing tomorrow.

In my particular use case, parsing feeds, it wouldn't really make sense to cancel parsing all feeds if parsing one of them fails. Yet they're all of the same type, 'parseFeed', just with different data. Is job.save({cancelRepeats: false}); the solution, or could my code use an alternative architecture?

@dandv dandv changed the title job.save()'s jobs are immediately cancelled? job.save()'d jobs are immediately cancelled? Jun 26, 2014
@vsivsi
Copy link
Owner

vsivsi commented Jun 26, 2014

Let me recap and see if I understand your workload:

  • You have some well-defined number of feeds, each of which needs to be periodically parsed (for updates?)
  • The only difference between the parsing "workers" is the feed they are pointed at, which can be encapsulated in the job "data".
  • For these reasons, the "function" employed to process each feed is the same for all feeds.
  • The initial approach you have taken to parsing feeds is to create one infinitely repeating job per feed, but all of the same job "type".

If I have all of that correct, then yes, then job.save({cancelRepeats: true}); is going to cause you problems. There are multiple ways to deal with this, depending on how you want to structure your app:

  1. Use job.save({cancelRepeats: false}); This will stop the auto-cancelling you are seeing, but creates a different set of potential issues. The purpose of the cancelRepeats feature is to manage infinitely repeating "housekeeping" type jobs. With the default value, you can simply set your server to create a new job each time it restarts; and if a prior job already exists, it will be auto-cancelled. This also has the advantage of allowing you to change the job settings, worker code, etc. for such jobs between server restarts without complications. If you turn this feature off, then you need to write code to manually check to see it any old repeating jobs are still there and cancel them (or don't create a new job...). Otherwise, each server restart will add another job to an ever growing pile of repeating jobs. In your case, you are creating some number of such jobs, and so at startup you'll need to figure out which jobs are already in the collection and what to do about any you find (cancel them, or refrain from creating a new one...)
  2. Stay with the default job.save({cancelRepeats: true}); and instead change the type of each job to match the feed being parsed. So if you have jobs of types ['feed1Parser','feed2Parser','feed3Parser'], the server will automatically do the right thing as each one is saved to the collection. To use .processJobs() to handle them all with a single worker function, simply pass the array of types as the first parameter: jq = jc.processJobs(['feed1Parser','feed2Parser','feed3Parser'], {}, workerFunction); This will scale reasonably well up to maybe a couple of hundred feeds, after which the overhead of processing the large list of types repeatedly will probably begin to be a concern.
  3. Create a single infinitely repeating job, and give it access to the list of feeds to parse. This "über-job" can then be written to either parse all of the feeds itself, or alternatively it can create a new non-repeating job to process each feed, each time it runs. Obviously this approach only works if all feeds need to be re-parsed at the same rate.

Those are just some thoughts, I'm sure there are other possibilities as well.

@dandv
Copy link
Contributor Author

dandv commented Jun 26, 2014

Thanks - you have all of that correct. I was thinking about option 2 as well, i.e. name the job type according to the feed, since the feed name is unlikely to change. That seems a bit brittle, and conceptually somewhat misleading since the jobs are really of the same type.

What I've used so far is an unsophisticated option #1: job.save({cancelRepeats: false}); and when creating the jobs at server startup, check if they exist already in the collection by searching on their data payload:

// for each feed...
  if (!feedParsingJobs.findOne({ 'data.feed.url': feed.url })) {
    // add the job only if it doesn't already exist in the persisted collection
    var job = feedParsingJobs.createJob('parseFeed', { ... feed: ... });
    job.repeat({
      repeats: Job.forever,
      wait: 300 * 1000
    });
    job.delay(0).after(new Date());  // start ASAP     
    job.save({cancelRepeats: false});  // if one feed parsing fails, don't stop the entire queue
  }

Is that existence check sufficient? Do I need to be concerned with various feed states at server restart?

An advantage to this job.save({cancelRepeats: false}); approach is that if parsing a feed fails for whatever reason (404, etc.), the other feeds will keep being parsed. Option #2 would do that as well.
Thoughts?

@vsivsi
Copy link
Owner

vsivsi commented Jun 26, 2014

Having thought about this for a bit, if it were up to me, I think I'd use some version of option 3). It has the advantage that failures of individual feeds are isolated to single jobs, and the "über-job" could check-up on which jobs from last time it ran succeeded / failed. It also has the advantage that the list of feeds can change dynamically without restarting the server or modifying the repeating job.

The only case that doesn't seem to be handled by your code above is what to do with defunct feeds. As coded, their jobs will keep running (and failing) forever...

@vsivsi
Copy link
Owner

vsivsi commented Jun 26, 2014

Also, I think the right answer here probably depends heavily on the ultimate number of feeds the system needs to manage. 50 is very different from 500 or 5000.

@dandv
Copy link
Contributor Author

dandv commented Jun 26, 2014

A problem with 3 is that different feeds may have different update frequencies, so the uber-job would end up reinventing the jobCollection wheel.

There are about 70 feeds right now, and expected to increase to about 100 once fully in production.

@vsivsi
Copy link
Owner

vsivsi commented Jun 26, 2014

Okay, then it seems like 1) is the right answer for your case.

My takeaway from this discussion is that the default value of cancelRepeats should be changed to be false so that the next person doesn't unwittingly run into this "auto-cancel" scenario. I'll make that change in the next update (probably tomorrow.)

Also: in your code: job.delay(0).after(new Date()); is unnecessary, as that's the default.

@vsivsi
Copy link
Owner

vsivsi commented Jun 28, 2014

I just published v0.0.10 with a change in the default value of cancelRepeats to false. The update also contains all of the documentation updates you contributed. Thanks for reporting this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants