Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Emit a 'stalled' event." (it's misleading) #257

Closed

Conversation

bradvogel
Copy link
Contributor

Reverts #254

@manast I realized that this event is actually misleading. New jobs are being processed through processStalledJobs even though they haven't been attempted before.

For example:

  1. Worker 1: creates job
  2. Worker 2: picks up job in getNextJob() and moves it from 'wait' to 'active'
  3. Worker 3: runs processStalledJobs immediately after and gets the job lock before Worker 2 can get it.

I'm testing this in production with 1M+ jobs/day with 20 workers, and seeing more jobs get picked up by processStalledJobs than are picked up by takeLockAndProcessJob.

So if that's the case, this event is misleading.

@manast
Copy link
Member

manast commented Mar 1, 2016

ok. Seems like an interesting insight though. we would need to create a lock when moving the job from wait to active atomically. It should be possible by having a LUA script that implements both operations, I will check this out.

@bradvogel
Copy link
Contributor Author

Yeah, sounds good. What do you think about merging this though? Until #258 gets fixed.

@manast
Copy link
Member

manast commented Mar 1, 2016

it is un unreleased feature anyway so for my part it is ok to not merging this, nobody will use the version from master anyway (famous last words...).

@bradvogel
Copy link
Contributor Author

Oh ok, but do you think #258 will be done before you do another npm release (from master)? Just didn't want 'stalled' to confuse people before #258 fixes it :)

@bradvogel
Copy link
Contributor Author

bradvogel commented Oct 15, 2016

Addressed in #359. The 'stalled' processor now waits a bit (LOCK_RENEW_TIME) to consider a job stalled.

@bradvogel bradvogel closed this Oct 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants