All concurrency controlled jobs throw exceptions and are rescheduled if they are called using perform_now #591

pgvsalamander · 2022-05-05T18:23:13Z

Any job that includes the GoodJob::ActiveJobExtensions::Concurrency concurrency extension always throws GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError when it is called using perform_now, regardless of if there are any other jobs with matching concurrency keys. This behavior occurs regardless of if perform_now is called in a rake task or in a controller.

Is this expected/known behavior? My first guess would be that something about the way concurrency works requires that the jobs be run by background workers, but I haven't seen anything in the documentation that would suggest it.

Tested on Ruby 2.6.5, Rails 6.1.5.1, GoodJob 2.14.2

The text was updated successfully, but these errors were encountered:

bensheldon · 2022-05-06T15:47:15Z

@pgvsalamander thank you for opening the issue! I think this is a bug to be fixed.

I went searching to see if there was previous discussion about how the Concurrency extension should interact with perform_now and didn't find anything. So this may be the place to discuss the intended behavior.

Because Rails' perform_now does not invoke the queue adapter, any performance of the job is outside of the context of GoodJob's recordkeeping. This leads me to believe that the Concurrency extension's checks should not be invoked during perform_now at all. Does that seem right?

And regardless of that, when you describe the Concurrency extension always throwing an extension that is clearly not right and I will fix that.

reczy · 2022-05-06T15:57:09Z

Because Rails' perform_now does not invoke the queue adapter, any performance of the job is outside of the context of GoodJob's recordkeeping. This leads me to believe that the Concurrency extension's checks should not be invoked during perform_now at all. Does that seem right?

I think that makes sense as well. However, perhaps an INFO or DEBUG level log along the lines of "Skipping concurrency checks - concurrency restrictions are not enforced with perform_now" to alert users in development?

pgvsalamander · 2022-05-06T16:00:54Z

I would expect that it would bypass the total and enqueue limits, but would respect the perform_limit and block until it can run without exceeding the limit. That said, I imagine trying to enforce perform_limit on a job that doesn't exist in the database and therefore can't be seen by the rest of the runners would be very difficult, to the point where it's probably not worth the trouble to implement.

bensheldon · 2022-05-07T16:07:34Z

I'm having trouble reproducing the exact behavior observed, but I still think there's a bug. Documenting this for future reference.

Call perform_now on a job class with concurrency extension

Within the Concurrency extension's before_perform, there is this line, which is a bug and will prevent the job from executing inline/perform_now because the job id won't be in the database and thus the query check will always fail ⚠️ :

good_job/lib/good_job/active_job_extensions/concurrency.rb

Lines 65 to 67 in bf197d0

    
           allowed_active_job_ids = GoodJob::Execution.unfinished.where(concurrency_key: key).advisory_locked.order(Arel.sql("COALESCE(performed_at, scheduled_at, created_at) ASC")).limit(perform_limit).pluck(:active_job_id) 
        
           # The current job has already been locked and will appear in the previous query 
        
           raise GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError unless allowed_active_job_ids.include? job.job_id

The GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError raised in the previous step is caught by the retry_on block that's also part of the extension, which then does enqueue the job with the adapter. That was unexpected.

bensheldon · 2022-05-07T16:27:23Z

huh, I guess that shouldn't have been unexpected to me given I made this contribution to Rails 😅 : rails/rails#43434

Which maybe changes my thinking about how this should work. What about this instead?:

When using perform_now, if the global concurrency limit has not been exceeded, the job will be executed locally (but not be accounted as part of global concurrency count). If the concurrency limit is exceeded, it will be enqueued.)

I'm still not happy about the the "will be enqueued" part because I think that somewhat violates my expected contract for perform_now, but it seems like perform_now already violates that when there are retry_on handlers so I'm ok punting on that for now.

pgvsalamander · 2022-05-09T15:22:34Z

I'm having trouble reproducing the exact behavior observed, but I still think there's a bug. Documenting this for future reference.

This is the behavior I was trying to describe, I just did a bad job explaining it. (Pun not intended!)

When using perform_now, if the global concurrency limit has not been exceeded, the job will be executed locally (but not be accounted as part of global concurrency count). If the concurrency limit is exceeded, it will be enqueued.)

I'm still not happy about the the "will be enqueued" part because I think that somewhat violates my expected contract for perform_now, but it seems like perform_now already violates that when there are retry_on handlers so I'm ok punting on that for now.

This seems like the best solution for now. Would you be open to a PR adding the option to block on perform_now until it can run without exceeding the concurrency count? (If I ever get around to it) I'd imagine it being something that can be enabled on an entire job class, or on a particular instance (<job>.perform_blocking perhaps)

bensheldon · 2022-05-11T14:23:02Z

I've slept on this a few times, and I think I'm going to go back to my more conservative proposal of ignoring concurrency controls entirely on perform_now. My thinking is that it's a lot easier for compatibility/semantic versioning to make it progressively more strict than to make it less strict or differently strict if I focus on this feature in the future.

bensheldon added the bug Something isn't working label May 6, 2022

bensheldon mentioned this issue May 7, 2022

Fix Concurrency extension to not break perform_now #593

Merged

pgvsalamander mentioned this issue May 24, 2022

Concurrency controlled jobs cause infinite loops when perform_limit is exceeded in test environments #609

Closed

bensheldon closed this as completed Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All concurrency controlled jobs throw exceptions and are rescheduled if they are called using perform_now #591

All concurrency controlled jobs throw exceptions and are rescheduled if they are called using perform_now #591

pgvsalamander commented May 5, 2022

bensheldon commented May 6, 2022

reczy commented May 6, 2022

pgvsalamander commented May 6, 2022

bensheldon commented May 7, 2022

bensheldon commented May 7, 2022 •

edited

Loading

pgvsalamander commented May 9, 2022

bensheldon commented May 11, 2022

All concurrency controlled jobs throw exceptions and are rescheduled if they are called using perform_now #591

All concurrency controlled jobs throw exceptions and are rescheduled if they are called using perform_now #591

Comments

pgvsalamander commented May 5, 2022

bensheldon commented May 6, 2022

reczy commented May 6, 2022

pgvsalamander commented May 6, 2022

bensheldon commented May 7, 2022

bensheldon commented May 7, 2022 • edited Loading

pgvsalamander commented May 9, 2022

bensheldon commented May 11, 2022

bensheldon commented May 7, 2022 •

edited

Loading