Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All concurrency controlled jobs throw exceptions and are rescheduled if they are called using perform_now #591

Closed
pgvsalamander opened this issue May 5, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@pgvsalamander
Copy link
Contributor

Any job that includes the GoodJob::ActiveJobExtensions::Concurrency concurrency extension always throws GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError when it is called using perform_now, regardless of if there are any other jobs with matching concurrency keys. This behavior occurs regardless of if perform_now is called in a rake task or in a controller.

Is this expected/known behavior? My first guess would be that something about the way concurrency works requires that the jobs be run by background workers, but I haven't seen anything in the documentation that would suggest it.

Tested on Ruby 2.6.5, Rails 6.1.5.1, GoodJob 2.14.2

@bensheldon
Copy link
Owner

@pgvsalamander thank you for opening the issue! I think this is a bug to be fixed.

I went searching to see if there was previous discussion about how the Concurrency extension should interact with perform_now and didn't find anything. So this may be the place to discuss the intended behavior.

Because Rails' perform_now does not invoke the queue adapter, any performance of the job is outside of the context of GoodJob's recordkeeping. This leads me to believe that the Concurrency extension's checks should not be invoked during perform_now at all. Does that seem right?

And regardless of that, when you describe the Concurrency extension always throwing an extension that is clearly not right and I will fix that.

@bensheldon bensheldon added the bug Something isn't working label May 6, 2022
@reczy
Copy link
Contributor

reczy commented May 6, 2022

Because Rails' perform_now does not invoke the queue adapter, any performance of the job is outside of the context of GoodJob's recordkeeping. This leads me to believe that the Concurrency extension's checks should not be invoked during perform_now at all. Does that seem right?

I think that makes sense as well. However, perhaps an INFO or DEBUG level log along the lines of "Skipping concurrency checks - concurrency restrictions are not enforced with perform_now" to alert users in development?

@pgvsalamander
Copy link
Contributor Author

I would expect that it would bypass the total and enqueue limits, but would respect the perform_limit and block until it can run without exceeding the limit. That said, I imagine trying to enforce perform_limit on a job that doesn't exist in the database and therefore can't be seen by the rest of the runners would be very difficult, to the point where it's probably not worth the trouble to implement.

@bensheldon
Copy link
Owner

I'm having trouble reproducing the exact behavior observed, but I still think there's a bug. Documenting this for future reference.

  1. Call perform_now on a job class with concurrency extension

  2. Within the Concurrency extension's before_perform, there is this line, which is a bug and will prevent the job from executing inline/perform_now because the job id won't be in the database and thus the query check will always fail ⚠️ :

    allowed_active_job_ids = GoodJob::Execution.unfinished.where(concurrency_key: key).advisory_locked.order(Arel.sql("COALESCE(performed_at, scheduled_at, created_at) ASC")).limit(perform_limit).pluck(:active_job_id)
    # The current job has already been locked and will appear in the previous query
    raise GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError unless allowed_active_job_ids.include? job.job_id

  3. The GoodJob::ActiveJobExtensions::Concurrency::ConcurrencyExceededError raised in the previous step is caught by the retry_on block that's also part of the extension, which then does enqueue the job with the adapter. That was unexpected.

@bensheldon
Copy link
Owner

bensheldon commented May 7, 2022

huh, I guess that shouldn't have been unexpected to me given I made this contribution to Rails 😅 : rails/rails#43434

Which maybe changes my thinking about how this should work. What about this instead?:

When using perform_now, if the global concurrency limit has not been exceeded, the job will be executed locally (but not be accounted as part of global concurrency count). If the concurrency limit is exceeded, it will be enqueued.)

I'm still not happy about the the "will be enqueued" part because I think that somewhat violates my expected contract for perform_now, but it seems like perform_now already violates that when there are retry_on handlers so I'm ok punting on that for now.

@pgvsalamander
Copy link
Contributor Author

I'm having trouble reproducing the exact behavior observed, but I still think there's a bug. Documenting this for future reference.

This is the behavior I was trying to describe, I just did a bad job explaining it. (Pun not intended!)

When using perform_now, if the global concurrency limit has not been exceeded, the job will be executed locally (but not be accounted as part of global concurrency count). If the concurrency limit is exceeded, it will be enqueued.)

I'm still not happy about the the "will be enqueued" part because I think that somewhat violates my expected contract for perform_now, but it seems like perform_now already violates that when there are retry_on handlers so I'm ok punting on that for now.

This seems like the best solution for now. Would you be open to a PR adding the option to block on perform_now until it can run without exceeding the concurrency count? (If I ever get around to it) I'd imagine it being something that can be enabled on an entire job class, or on a particular instance (<job>.perform_blocking perhaps)

@bensheldon
Copy link
Owner

I've slept on this a few times, and I think I'm going to go back to my more conservative proposal of ignoring concurrency controls entirely on perform_now. My thinking is that it's a lot easier for compatibility/semantic versioning to make it progressively more strict than to make it less strict or differently strict if I focus on this feature in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants