Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--max-workers flag for gatsby build #11727

Closed
lettertwo opened this issue Feb 13, 2019 · 5 comments · Fixed by #10257
Closed

--max-workers flag for gatsby build #11727

lettertwo opened this issue Feb 13, 2019 · 5 comments · Fixed by #10257

Comments

@lettertwo
Copy link
Contributor

lettertwo commented Feb 13, 2019

Summary

Currently, Gatsby automatically selects a level of parallelism (based on the reported number of available CPUs) for the render phase of gatsby build. In some scenarios, though, gatsby build can actually run faster if the amount of parallelism is limited or even set to 1.

One such scenario may be encountered on cloud/container environments that have virtualized CPU resources, where it is difficult (or impossible) to discover exactly how many vCPU or 'cores' are available for parallelizing the build.

On CircleCI, a typical CI server will have 2 vCPUs available, but when Gatsby asks, it will be told there are 18. As a result, the build may start parallelizing many more render tasks than the environment can handle, paradoxically causing the overall build time to grow significantly.

As a real-world example, here are some samples of a Gatsby build of ~2400 pages on CircleCI with numWorkers modified to particular values:

numWorkers HTML done in
1 ~58s
2 ~20s
4 ~14s
8 ~11s
16 server timeout!
<reported cpu count> server timeout!

reported cpu count is 18, and the server timeout limit is 10m.

Basic example

Usage would be simple:

gatsby build --max-workers=2

When defined, this value would be used instead of programmatically determining how many workers to use.

The Jest version of this feature also supports percentage values, like:

gatsby build --max-workers=50%

Which might be a nice addition for situations where the number of cores is variable.

Motivation

This feature is inspired by the fact that Gatsby uses jest-worker to achieve its parallelism, and by the fact that Jest has similar features that are recommended for controlling the parallel characteristics to improve performance in CI environments.

@pieh
Copy link
Contributor

pieh commented Feb 13, 2019

There is open PR implementing ability to specify this ( #10257 ). Honestly this is very weird from circleCI that it times out if you use too many workers

@lettertwo
Copy link
Contributor Author

lettertwo commented Feb 13, 2019

I swear i searched! 😅 An env var was actually my first thought, but i elected for a CLI flag thinking it might be more 'idiomatic' Gatsby. I'm happy to close this and follow that PR.

I agree with you that it is odd that CircleCI times out in this case, cuz i couldn't tell you exactly why that happens. I've also had very occasional successes, so it seems at least in some way bound to the health or load of the underlying resources. In other words, the timeout condition is not 100% reproducible, but it is close to 100%.

@pieh
Copy link
Contributor

pieh commented Feb 13, 2019

I swear i searched! 😅 An env var was actually my first thought, but i elected for a CLI flag thinking it might be more 'idiomatic' Gatsby.

So this of course subject to change (we could also support both)

I'm happy to close this and follow that PR.

Let's keep it. I will add "fixes" thingie to PR so this issue could be tracked.

@lettertwo
Copy link
Contributor Author

I managed to temporarily hack support for artificially limiting parallelism in Gatsby build on CircleCI. In case this is useful for anyone else who finds themselves googling for a way to keep Gatsby and CircleCi playing nicely, here is what i did:

Added a small module that will inject the value of GATSBY_CPU_COUNT, if it is defined, rather than using the physical CPU count:

gatsby-cpu-count.js

let shouldProxy = process.env.GATSBY_CPU_COUNT != null;

if (shouldProxy) {
  const Module = require('module');
  const src = `module.exports = process.env.GATSBY_CPU_COUNT || 1;`;
  Module._load = new Proxy(Module._load, {
    apply(target, thisArg, argumentsList) {
      const [request, parent] = argumentsList;
      if (shouldProxy && /physical-cpu-count/.test(request)) {
        const filename = Module._resolveFilename(...argumentsList);
        const module = new Module(filename, parent);
        module._compile(src, filename);
        Module._cache[filename] = module;
        shouldProxy = false;
        return module.exports;
      }
      return Reflect.apply(target, thisArg, argumentsList);
    },
  });
}

Configured CircleCI to provide the GATSBY_CPU_COUNT env var, and to run gastby build with the gatsby-cpu-count.js module loaded up front:

.circleci/config.yml

# ...
jobs:
  build_frontend:
    # ...
    steps:
      # ...
      - run:
          name: Build frontend
          environment:
            # Artificially restrict the number of CPUs available to gatsby.
            # Using `4` even though CircleCi defaultly provides only 2 vCPUs
            # cuz it seems assuming double the logical cores over physical CPUs
            # is safe enough, and has a positive impact on perf.
            GATSBY_CPU_COUNT: 4
          command: node --require ./gatsby-cpu-count.js ./node_modules/.bin/gatsby build
      # ...
# ...

@LiteSoul
Copy link

LiteSoul commented Nov 8, 2021

This saved my life...!
CircleCI was failing the build (since I migrated to Gatsby 4) (while on local it was fine), so I specified that it use GATSBY_CPU_COUNT=8 on CircleCI and it now works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants