tools for managing a 'fleet' of processes #150

jrochkind · 2020-09-21T17:26:46Z

Hi, were talking on reddit some time ago and I suggested it would be useful to have tools for managing a fleet or cluster separate worker processes -- since on MRI that's the only way to take advantage of multiple cores, which you probably want to be doing when you have a separate host just for bg workers, as is usually what you want at even moderate scale.

We agreed it's a bit tricky to figure out how to implement that, especially for those of us not experienced in "systems programming"

Recently someone brought this project to my attention, which hypothetically takes care of it for you! https://github.com/stripe/einhorn

It's a bit under-documented (and the README basically says "you're welcome to use this, but don't ask us questions or bother filing bug reports without PRs"), but I've been playing with it a bit and looking at the code, and it looks really nice!

The only real requirement it has is that your worker processes catch a USR2 signal as a message to do a graceful shutdown. So mentioning this in part to get it in there, so you don't accidentally use USR2 for anything else requiring a backwards incompat change to be compatible with einhorn. :( (resque uses USR2 for something else, alas. Sidekiq uses USR2 appropriately for einhorn, I think because sidekiq-enterprise actually uses einhorn).

The text was updated successfully, but these errors were encountered:

bensheldon · 2021-01-26T21:21:47Z

I've been thinking more about this lately.

I was searching for projects and forked gem looks like it might fit the bill, though I didn't see anything particularly about zombie management, which is something that I would like to trust is taken care of (and the complexity of that is also why I'm eager to find a maintained gem that can do that for me).

I also like the look of https://github.com/salsify/delayed_job_worker_pool

jrochkind · 2021-01-26T21:34:11Z

@bensheldon Einhorn's lack of maintenance makes you reluctant? It does seem to be some pretty sophisticated code. It is too bad that I can't find as high quality an option that is maintained either.

sandstrom · 2021-07-05T12:36:26Z

If this isn't an issue, maybe we could close or move to discussion.

bensheldon · 2021-08-02T14:57:34Z

I'm going to close this Issue for now, but am open to continuing the conversation. I do think that having a complete Puma-like fork+multithreaded executable would be really nice, but don't plan to implement that myself in the near future.

rgaufman · 2022-07-13T01:24:10Z

Why not just use systemd for this? - I've played around with a lot of different tools for forking and managing processes, including Bluepill, God and Eye. Eye was the best but it was still significantly more resource intensive. Even with sidekiq, I just do systemctl start sidekiq which starts all my sidekiq processes (e.g. sidekiq@worker1 sidekiq@worker2, etc), stop conversely stops them all.

It's not like there is a need for a shared socket with job processing.

bensheldon · 2022-07-13T01:33:59Z

Memory. Precious memory, especially in containerized environments.

Also, I agree on systemd, but people want to daemonize 🤷‍♀️

rgaufman · 2022-07-13T01:45:50Z

How would forked gem save memory vs starting 2 processes with systemd?

Hmm, in dev I "daemonize" with foreman, in prod, systemd :) - I can understand why this is useful when you need to share a single socket (at the expense of wasting memory!) - but I still don't see how this would save memory in this case?

For example with Puma, you start a single worker, it takes 7% ram, you start a 2 worker cluster, it takes 21% (!!):

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
21339 deployer  20   0  568076 279440  13304 S  35.9 7.0 628:27.11 puma: cluster worker 0: 4963 [current]
21342 deployer  20   0  548076 278540  13328 S  34.6 7.0 632:45.05 puma: cluster worker 1: 4963 [current]
 4963 deployer  20   0  558076 278440  22048 S   1.0 7.0 881:10.39 puma 5.6.4 (tcp://192.168.187.71:3000) [current]

An extra 7% wasted ram for the process manager

bensheldon · 2022-07-13T01:54:00Z

Are you using Puma's preload_app!

Puma has a lot of other interesting copy-on-write optimizations: https://github.com/puma/puma/blob/master/docs/fork_worker.md

rgaufman · 2022-07-13T01:55:18Z

Yes, I am. Interesting, will have a read.

rgaufman · 2022-07-13T01:58:56Z

"fork_worker option and refork command for reduced memory usage by forking from a worker process instead of the master process. " - Ah, ok, so no more master process, saves 7% ram, but will still take the equivalent of starting 2 processes, so no saving in the case of good_job from what I understand?

bensheldon · 2022-07-13T02:10:49Z

Sorry, I meant to emphasize preload_app!. That's what saves memory through copy-on-write. The different forking strategies I linked to are further attempts to optimize loading as many Ruby constants as possible before forking.

rgaufman · 2022-07-13T02:12:57Z

Interesting, just having a read through this: https://shopify.engineering/ruby-execution-models

jrochkind · 2022-07-22T14:19:57Z

Note that the einhorn ruby tool to "run (and keep alive) multiple copies of a single long-lived process", originally from stripe, for a long time basically unmaintained, has now been adopted by mperham of sidekiq.

I believe einhorn is used by sidekiq pro for managing multiple sidekiq worker processes, and probably could be by good_job as well. Perhaps with a few tweaks to good_job, like interpreting SIGUSR2 as a graceful shutdown request. Possibly more to take full advantage of things like pre-forking management built into einhorn.

bensheldon mentioned this issue May 15, 2021

Revisit and embrace concurrency control, scheduled jobs, and other extensions of ActiveJob #255

Closed

bensheldon closed this as completed Aug 2, 2021

bensheldon reopened this Jul 13, 2022

bensheldon mentioned this issue Aug 14, 2022

How to run multiple workers? #699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools for managing a 'fleet' of processes #150

tools for managing a 'fleet' of processes #150

jrochkind commented Sep 21, 2020

bensheldon commented Jan 26, 2021

jrochkind commented Jan 26, 2021

sandstrom commented Jul 5, 2021

bensheldon commented Aug 2, 2021

rgaufman commented Jul 13, 2022

bensheldon commented Jul 13, 2022

rgaufman commented Jul 13, 2022 •

edited

Loading

bensheldon commented Jul 13, 2022

rgaufman commented Jul 13, 2022 •

edited

Loading

rgaufman commented Jul 13, 2022

bensheldon commented Jul 13, 2022

rgaufman commented Jul 13, 2022

jrochkind commented Jul 22, 2022 •

edited

Loading

tools for managing a 'fleet' of processes #150

tools for managing a 'fleet' of processes #150

Comments

jrochkind commented Sep 21, 2020

bensheldon commented Jan 26, 2021

jrochkind commented Jan 26, 2021

sandstrom commented Jul 5, 2021

bensheldon commented Aug 2, 2021

rgaufman commented Jul 13, 2022

bensheldon commented Jul 13, 2022

rgaufman commented Jul 13, 2022 • edited Loading

bensheldon commented Jul 13, 2022

rgaufman commented Jul 13, 2022 • edited Loading

rgaufman commented Jul 13, 2022

bensheldon commented Jul 13, 2022

rgaufman commented Jul 13, 2022

jrochkind commented Jul 22, 2022 • edited Loading

rgaufman commented Jul 13, 2022 •

edited

Loading

rgaufman commented Jul 13, 2022 •

edited

Loading

jrochkind commented Jul 22, 2022 •

edited

Loading