Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby): configure physical cores, logical_cores or fixed number #10257

Conversation

dominicfallows
Copy link
Contributor

@dominicfallows dominicfallows commented Dec 3, 2018

Gatsby 2 now utilises multi-core builds using jest-worker. By default Gatsby creates a pool of workers equal to the number of physical cores on your machine, see build-html.js.

In some scenarios it may be appropriate to tell Gatsby to use a different method to calculate the number of worker pools.

For example, if you are running a Cloud server (like AWS EC2) your DevOps engineers may want to control the number of worker pools to improve the efficiency of server resource usage.

The proposal here is to accept an options env var GATSBY_CPU_COUNT to change the method that Gatsby uses to calculate the number of worker pools.

closes #11727

@dominicfallows dominicfallows self-assigned this Dec 3, 2018
@dominicfallows dominicfallows requested a review from a team as a code owner December 3, 2018 16:15
@dominicfallows dominicfallows requested a review from a team December 3, 2018 16:15
@dominicfallows dominicfallows requested a review from a team as a code owner December 3, 2018 16:15
@KyleAMathews
Copy link
Contributor

So the scenario is you're running builds on a server with other work also going on and you want to limit the number of cores Gatsby uses?

@dominicfallows
Copy link
Contributor Author

@KyleAMathews either that yes, or the opposite, sometimes we want to allow logical CPU count (more cores). For example, our AWS instances are dedicated to this process, and therefore we want to be able to push them to their limits and speed up our builds.

@KyleAMathews
Copy link
Contributor

Hmm ok — we'll have to think about this a bit as we're also adding in the future multi-core support for running GraphQL queries. So a global CPU limit/count could make sense.

Also BTW, if you're site does any image transformations with sharp, it's already using all CPUs as well. Any solution we do should also probably figure out how to limit sharp.

@dominicfallows
Copy link
Contributor Author

Ok, sounds interesting. I'll have a dig around and look for the sharp handling to see how that could work alongside this.

In future, would you see a global CPU limit/count using an env var as I've drafted, or another config option? etc.

Copy link

@paulca99 paulca99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@dominicfallows
Copy link
Contributor Author

@KyleAMathews ok, so I've made a first attempt at aligning sharp usage with proposed Gatsby cpuCount usage. Along the right lines based on what you were thinking?

@KyleAMathews
Copy link
Contributor

Could you post build logs from using Physical vs. Logical CPU counts?

If that generally speeds up builds — we should just change our default to use the logical CPU count.

I'd rather not add config if we don't need to.

Could you try running some of the benchmarks w/ both physical/logical and see how that changes things? https://github.com/gatsbyjs/gatsby/tree/master/benchmarks

docs/docs/multi-core-builds.md Outdated Show resolved Hide resolved
docs/docs/multi-core-builds.md Outdated Show resolved Hide resolved
www/src/data/sidebars/doc-links.yaml Outdated Show resolved Hide resolved
docs/docs/multi-core-builds.md Outdated Show resolved Hide resolved
@mik-laj
Copy link

mik-laj commented Jan 1, 2019

Should the main file be called cpu-count? I am afraid that this name can be misleading. I am afraid that few people run the program in a multi-CPU environment. However, everyone runs the program in a multi-core environment. The better name will be core-count, but it is still not correct.

Valid name for me is parallelism.

Reference: https://learn-gevent-socketio.readthedocs.io/en/latest/general_concepts.html#what-s-the-difference-between-concurrency-and-parallelism

@wardpeet wardpeet added the status: awaiting reviewer response A pull request that is currently awaiting a reviewer's response label Feb 27, 2019
@wardpeet
Copy link
Contributor

pinging @KyleAMathews & @pieh wdyt?

I think this is great, especially on VMs.

@KyleAMathews
Copy link
Contributor

Haven't looked at the PR but 👍 to the concept

@wardpeet wardpeet changed the title Topics/CPU control in htm-renderer-queue.js (multi-core builds) feat(gatsby): configure physical cores, logical_cores or fixed number Feb 28, 2019
@wardpeet
Copy link
Contributor

@dominicfallows did you have some benchmarks? I'm a bit hesitant to add this, it's opt-in but for example sharp will probably have a negative impact as it probably best to use cores instead of threads as it has so many heavy computations.

I've found an issue at the Parcel repo and it is not moving to logical cores:
parcel-bundler/parcel#1554 (comment)

@wardpeet wardpeet added status: awaiting author response Additional information has been requested from the author and removed status: awaiting reviewer response A pull request that is currently awaiting a reviewer's response labels Feb 28, 2019
- change name of util file and function to prevent confusion
- Refactor default count response
- Sharp functions now back to default cpu core count (physical or 1)
- Throw error if we can't calculate logical_core count
@dominicfallows
Copy link
Contributor Author

@mik-laj updated the file name to cpu-core-count.js and helper function to cpuCoreCount which does seem more accurate, thanks :)

@dominicfallows
Copy link
Contributor Author

dominicfallows commented Feb 28, 2019

@KyleAMathews RE: #10257 (comment)
@wardpeet RE: #10257 (comment)

Benchmarks are proving quite difficult to show using the existing benchmark examples. The improvements are so dependant on the infrastructure setup and setup of the app, hence why I went for an 'opt-in' approach.

I could create a new benchmark example though, that would go to show the type of app setup and infrastructure where I'm seeing improvements (cloud containers and instances, for example), would that help?

I also wondered about a different approach, somehow having this calculation as a plugin. It would still require changes to the core code (to allow plugins to override the CPU core count calculation), but would move away from needing an env var.

@wardpeet
Copy link
Contributor

wardpeet commented Mar 1, 2019

I guess you're using this already on AWS? If so maybe share the numbers you're encountering? Or maybe a shared blog post or document where this really shows physical vs logical on the cloud infra 😄

I also wondered about a different approach, somehow having this calculation as a plugin. It would still require changes to the core code (to allow plugins to override the CPU core count calculation), but would move away from needing an env var.

I don't think we want to expose such an API to the world 😄. For now I think env var is good enough and we need to figure out this a bit more for other things as well like experiments so we're probably going to revisit this for Gatsby 3.

@paulca99
Copy link

paulca99 commented Mar 1, 2019

Hi, I'm the dev ops tech guy dealing with the codebuild images running this stuff. To be blunt, Doms "Logical cores" change knocked nearly 30% off our AWS Codebuild timings. If it's a concern to you, make it a variable... use logical . vs . use physical.... give users the option.

@paulca99
Copy link

paulca99 commented Mar 1, 2019

it's been 3 months since Dom, discovered this and you're still debating things. if I were you I'd have added a switch, then debated the output later once you had some user results to judge it on.

@wardpeet
Copy link
Contributor

wardpeet commented Mar 1, 2019

I was going to merge this when tests are passing.

@paulca99 Sorry but we have to deal with a lot of PRs and issues. Because of the holidays, it slipped our mind. There is nothing wrong to ask for validation and extra information before merging even if it's opt-in. Extra code could lead to bugs or build errors.

@paulca99
Copy link

paulca99 commented Mar 2, 2019

Sorry @wardpeet I didn't notice there were tests failing.

@dominicfallows
Copy link
Contributor Author

@wardpeet Hey, I've updated a test in the gatsby-plugin-manifest package and tests now look good. Let me know what else I can share and I'll do it asap.

Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dominicfallows thanks for creating this PR! I'm going to merge this one but it would be great if you could share some build times when using it.

Thanks for all your patience!

@wardpeet wardpeet removed the status: awaiting author response Additional information has been requested from the author label Mar 4, 2019
@wardpeet wardpeet merged commit c51440e into gatsbyjs:master Mar 4, 2019
@sidharthachatterjee
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--max-workers flag for gatsby build
7 participants