Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preliminary support for a job queue #1212

Merged
merged 5 commits into from
Sep 25, 2024
Merged

preliminary support for a job queue #1212

merged 5 commits into from
Sep 25, 2024

Conversation

paulfitz
Copy link
Member

Context

Grist has needed a job queue for some time. This adds one, using BullMQ. BullMQ however requires Redis, meaning we couldn't use jobs for the large subset of Grist that needs to be runnable without Redis (e.g. for use on desktop, or on simple self-hosted sites). So simple immediate, delayed, and repeated jobs are supported also in a crude single-process form when Redis is not available.

This code isn't ready for actual use since an important issue remains to be worked out, specifically how to handle draining the queue during deployments to avoid mixing versions (or - if allowing mixed versions - thinking through any extra support needed for the developer to avoid introducing hard-to-test code paths).

Has this been tested?

  • 👍 yes, I added tests to the test suite
  • 💭 no, because this PR is a draft and still needs work
  • 🙅 no, because this is not relevant here
  • 🙋 no, because I need help

Grist has needed a job queue for some time. This adds one, using
BullMQ. BullMQ however requires Redis, meaning we couldn't use
jobs for the large subset of Grist that needs to be runnable without
Redis (e.g. for use on desktop, or on simple self-hosted sites).
So simple immediate, delayed, and repeated jobs are supported also
in a crude single-process form when Redis is not available.

This code isn't ready for actual use since an important issue
remains to be worked out, specifically how to handle draining
the queue during deployments to avoid mixing versions (or - if
allowing mixed versions - thinking through any extra support needed
for the developer to avoid introducing hard-to-test code paths).
@paulfitz paulfitz marked this pull request as ready for review September 18, 2024 15:31
@berhalak berhalak self-requested a review September 19, 2024 15:05
app/server/lib/GristJobs.ts Show resolved Hide resolved
app/server/lib/GristJobs.ts Outdated Show resolved Hide resolved
app/server/lib/GristJobs.ts Show resolved Hide resolved
test/server/lib/GristJobs.ts Show resolved Hide resolved
@berhalak berhalak self-requested a review September 20, 2024 18:43
Copy link
Contributor

@berhalak berhalak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have one question about the unhandled promise rejections or long running jobs, but this can be sorted out when we actually use it, without redis.

package.json Outdated
@@ -128,6 +128,7 @@
"bootstrap": "3.4.1",
"bootstrap-datepicker": "1.9.0",
"bowser": "2.7.0",
"bullmq": "^5.8.7",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"bullmq": "^5.8.7",
"bullmq": "5.8.7",


// Clean up any jobs left over from previous round of tests.
beforeEach(async function() {
const jobs = new GristBullMQJobs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here? I think this is needed only when we run it with redis, am I right? For in memory this doesn't do anything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Added to comment.

}
});
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a test like this one? That checks what happens with long running tasks? Or async jobs that throw errors?

The callback we use is not awaited anywhere, maybe the BullMq library handles it properly, but I don't think that our in-memory implementation deals with it.

  it('can stop when job is in the middle', async function() {
    const jobs: GristJobs = new GristBullMQJobs();
    const q = jobs.queue();
    let started = false;
    try {
      q.handleDefault(async () => {
        started = true;
        await delay(2000);
        started = false;
      });
      await q.add('', null, {delay: 500});
      await delay(600);
      assert.isTrue(started);
    } finally {
      await jobs.stop({obliterate: true});
      assert.isFalse(started);
    }
  });

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long running tasks are a problem since BullMQ cancelation relies on non-open source features:
https://docs.bullmq.io/bullmq-pro/observables/cancelation

And for the Grist Labs SaaS, we are very restricted on how long we can wait for workers to shut down because of ECS/Fargate requirements.

This for me is related to the big operational question mark about how to properly drain queues as workers are replaced, which we will have to deal with. There will be more work to do here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a note about this in GristJobs.

Copy link
Contributor

@berhalak berhalak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@paulfitz paulfitz merged commit 36722c1 into main Sep 25, 2024
10 of 11 checks passed
@paulfitz paulfitz deleted the paulfitz/q-test branch September 25, 2024 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants