-
Notifications
You must be signed in to change notification settings - Fork 843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop/Document multi-level parallelism policy #644
Comments
I'm seeing about three different things being raised here:
stack does none of these right now. The only thing stack does in parallel is process individual packages in parallel. Does that answer your question? As to what should stack be doing... I don't see a downside to passing |
One tricky thing is deciding how many threads GHC should be running if stack is running multiple builds (i.e. if you have 8 CPUs and stack is running 8 builds, each GHC shouldn't be running 8 of its own threads). Simplest might be to only pass |
Given that neither stack nor cabal nor GHC has a notion of global resource management on the machine, I guess in the medium term what I'd like is enough knobs to experiment with this. For example, it would be great to be able to "turn it up" to max parallelism -- parallel packages, parallel targets within a package, parallel modules within We can already vary things pretty easily by generating By the way, as far as I know, no one's properly analyzed GHC builds from a parallel algorithms perspective. I.e. we need to profile the dependence graph and do a work/span analysis to figure out what is limiting our scaling. (We're working on a research project where we may be hacking on fine-grain parallelism inside certain GHC phases, but it only makes sense if there's a coherent picture for parallelism at the coarser grain too.) In lieu of a GHC server mode (which has been discussed on various issues), we can't directly implement an inter-process work-stealing type policy that mimics the now-standard intra-process load balancing approach. But we can do what that Gentoo builder seems to be doing and simply delay some tasks to manage resource use. The more look-ahead or previous knowledge we have about the task DAG the smarter we can be in prioritizing the critical path. |
I'm tempted to close this issue, and just add a link to my comment above to the FAQ. Any objection? |
That's fine. Fixing this so that stack does something smart would be a big project not solved in a day, and that addresses the "document" part. |
It would be great if this issue, since it's linked from the FAQ, tells me how to build at least my package in parallel using some ghc option, possibly specified in the cabal file or as an option to stack. Alternatively, explicitly say that this cannot be done. |
@alexanderkjeldaas No such flag necessary, stack will build your package in parallel with other things if it can. Perhaps you mean having ghc build modules in parallel? Unfortunately in my experience this doesn't speed things up as much as I'd hoped. You can do |
Yes that option is interesting - ghc seems to be able to use 6x the CPU and finish on exactly the sime time. Impressive! |
Related GHC ticket: https://ghc.haskell.org/trac/ghc/ticket/9221 I did some simple timings on my machine (i3-2350M, 2 physical cores + hyperthreading) and always got the shortest build times with I was wondering how hard it would be to detect when |
Related question (might need its own issue): how do I tell stack to set -j4 by default for itself, aside from Studying this issue suggests also builds are parallel, and stack's source suggests it defaults to |
|
@mgsloan I'm asking for docs on setting that by default, for all invocations. |
Also a different setting for the current project only would make sense. On Mon, Jun 13, 2016 at 3:42 PM, Paolo G. Giarrusso <
|
Can anyone tell me how many parallel ghc processes stack will spawn? This is distinct from the ghc I'm benchmarking some code on a 32-core machine, and stack seems to only spawn 8 concurrent ghc instances (building 8 dependencies in parallel), resulting in, at most, 25% use (8) of the 32 cores. Based on my testing this figure should be equal to at least the number of available CPU cores, perhaps as much as five times that number, as ghc often seems to have a hard time using up just a single core fully (doing IO I presume). So if we set it to An actual use case for this would be automatically spawning the build process onto high-CPU VMs, so we can build stuff in 2 minutes rather than half an hour. |
@runeksvendsen By default, it will build
Often we can't actually build that many packages concurrently, and so your CPUs remain unsaturated. You can pass |
@snoyberg the above link Added here: https://github.com/commercialhaskell/stack/wiki/FAQ#how-does-stack-handle-parallel-builds-what-exactly-does-it-run-in-parallel is dead, then the new FAQ links back to this issue, making it circular as this issue was closed because this is supposedly documented. reopen issue? |
Also a separate issue that I'll mention here, a |
Reopened as requested. After the issue was closed, it seems more than one question was asked and not addressed by the docs (sorry if I'm wrong). |
I actually don't know what
So some quick fix I could do to make CI work out-of-the-box is what I'd need. |
For practical purposes, something like Not a big issue, but if someone is going to look at this, might as well add it. |
For what it's worth, I experienced this on a 600M RAM f1-micro Google Cloud instance:
|
@alexanderkjeldaas You might want to run |
I can confirm that I had to completely give up trying to build my project on a 600M RAM machine. It worked OK to begin with, and it built GHC fine, but the closer it got towards actually finishing, the quicker all RAM was consumed. I found a 1.7G RAM machine to be sufficient, however. Although, the build process sometimes requires a restart due to the occasional out-of-memory error (which, as mentioned, can be avoided -- while capping concurrency/performance -- by using eg. |
Quick note for the maintainers of the Windows build in case they're not already using it: |
Is it actual for Windows? I believe i'm seeing a difference in build time when running |
@Anrock That should be correct for Windows too; to debug, please describe your machine (maybe in a new issue?) — if you have hyperthreading, it's not obvious whether "number of processors" will actually count physical cores or logical threads, though it appears to count threads by default. Just to double-check, please try calling by hand the underlying GHC API we use, $ ghci
GHCi, version 8.4.2: http://www.haskell.org/ghc/ :? for help
Prelude> import GHC.Conc
Prelude GHC.Conc> getNumProcessors
8 Sources I consulted:
|
@Blaisorblade false alarm, it works as expected. I did some benchmarking and for builds with As a note: i'm running Win10 Pro 1803 on AMD FX8350 with 4 physical cores and hyperthreading so 8 logical cores total. |
There are no clear steps to be taken here, closing. If people would like to see doc improvements in the FAQ, please consider sending a PR. |
Is it possible to get stack to build a single package with module level parallelism? I find that building the Cabal library is often a bottleneck in the dependency graph of many projects and I'd like to be able to force it to run that one in parallel since it has 234 modules in it as of Nov'20. I saw some discussion up thread about not wanting to have every package have its own parallelism on the same level since it might cause a lot of cpu thrash. Ideally the entire build system would share a single work queue rather than forcing packages into a particular "lane". As someone with a 24core dev machine this is something that is immensely useful to me, but I'm not sure how to go about thinking about this. I suspect that the design of Cabal itself might be the limiting factor here but I do not know enough to be able to say one way or another. Is there any sort of workaround such that |
Thumb twiddling on a multicore box. Would this option be hard to add, mostly for building cabal packages in parallel? stack build --genmake; make -j |
This is solvable with a simple algorithm. Stack knows how many parallel builds it's running at the end of executing each parallel build for a package. It can also know how many CPUs it has available, if it gets to control how many each build should be using. So if it's at the stage when it can only build one package and it has 4 CPUs available, it can start this particular build with |
Yes, that sounds like a reasonable heuristic. Note, however, that an adversarial schedule can screw over that strategy by a bunch of jobs finishing and freeing available CPUs right after you make the decision to do But in GHC's case the problem is also made a bit easier by the fact that GHC's internal parallelism is not very scalable. So telling GHC to use 32 cores wouldn't make sense... |
This is already a wrong assumption from the C world, where jobs=cpus is a somewhat ok heuristic. In haskell, a single package can blow up 16GB worth of ram and 2 such packages built in parallel can bring your entire system down. This happened frequently at one of my previous companies with pandoc+amazonka. We had to run stack with -j1 in order to not trigger OOM or cause swapping to make the machine unresponsive for 15+ minutes. |
there's a relatively simple solution for the two-level problem: implement the GNU make job server protocol at all levels. see for example cargo + rustc: |
I can see that
stack test
defaults to parallel builds. But that refers to parallelism between different test-suites, right? In the case of both builds and tests (with test-framework or tasty) there's the issue of "inner loop" parallelism within each build or test target. I suppose we can't parallelize the tests until we have some way for the build-tool to know it's a tasty/test-framework suite, not justexitcode-stdio-1.0
, but that still leaves build-parallelism. Does Stack currently pass-j
to GHC?I haven't seen comprehensive benchmark numbers, but a small value like
-j2
or-j4
for GHC offers some benefit. The current implementation is not scalable however (for example, I haven't seen anything good come of-j32
).Nested parallelism of course raises the issue of
N * N
jobs being spawned forN
cores. As long as its quadratic oversubscription and not exponential, I think this is not that big a problem for CPU usage, but it very often can create problems with memory usage (or hitting ulimits / max # of processes).There are several related cabal issues, but it's a little hard to tell the current status with them spanning a lot of time and various merged and unmerged pull requests:
-j
should build package components in parallel haskell/cabal#2623The text was updated successfully, but these errors were encountered: