Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize rustdoc rendering #82741

Open
1 of 5 tasks
camelid opened this issue Mar 4, 2021 · 15 comments
Open
1 of 5 tasks

Parallelize rustdoc rendering #82741

camelid opened this issue Mar 4, 2021 · 15 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-compiletime Issue: Problems and improvements with respect to compile times. S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@camelid
Copy link
Member

camelid commented Mar 4, 2021

I and others would like to parallelize rustdoc rendering. It is "embarrassingly parallel" ... except that rustdoc has some technical debt in the form of global mutable state that will have to be dealt with first.

Steps

  • Get rid of CURRENT_DEPTH thread-local (rustdoc: Get rid of CURRENT_DEPTH thread-local variable #82742)
  • Make Span etc. Send and Sync
  • Use RwLock instead of RefCell for SharedContext fields (or make Context, and by extension SharedContext, Send and Sync some other way)
  • So many more things...
  • Add parallelism via MPSC channels and rayon

cc @rylev @jyn514
cc https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/windows-rs.20perf/near/226957262 (was going to open this issue anyway, but noticed this discussion so thought I'd link to it)

@camelid camelid added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. labels Mar 4, 2021
@jyn514
Copy link
Member

jyn514 commented Mar 4, 2021

This is a duplicate of #82294 I think.

@camelid
Copy link
Member Author

camelid commented Mar 4, 2021

Maybe I'll add this issue description to the description of the other one?

@camelid
Copy link
Member Author

camelid commented Mar 4, 2021

I would like to have a check-list of things to do before we can parallelize.

@jyn514
Copy link
Member

jyn514 commented Mar 4, 2021

Personally I think because this is blocked on parallel_compiler it's not worth spending too much time on (note that TyCtxt is not thread-safe without cfg(parallel_compiler)). parallel_compiler is on the backburner for T-compiler and AFAIK will be for a quite a while.

@camelid
Copy link
Member Author

camelid commented Mar 4, 2021

Okay, that makes sense 👍

@camelid camelid added the S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. label Mar 4, 2021
@the8472
Copy link
Member

the8472 commented Mar 4, 2021

Add parallelism via MPSC channels and rayon

jobserver and/or -j N parameter will be needed too to obey parallelism limits of cargo and bootstrap.

@jyn514
Copy link
Member

jyn514 commented Aug 22, 2021

jobserver and/or -j N parameter will be needed too to obey parallelism limits of cargo and bootstrap.

@the8472 hmm, I'm not sure -j makes sense in this context - normally -j is intended for cpu-bound tasks, like compiling. But this is almost purely IO, so we should be able to potentially have thousands of files being written at once without too much resource contention.

@jyn514
Copy link
Member

jyn514 commented Aug 22, 2021

Oh, I might be misunderstanding - @camelid are you using "rendering" to mean "generating the HTML" or "writing the HTML to disk"? I was assuming the latter.

@camelid
Copy link
Member Author

camelid commented Aug 22, 2021

Both, but mainly generating the HTML.

@jyn514
Copy link
Member

jyn514 commented Aug 22, 2021

Ok, in that case -j probably does make sense. I'm working separately on making writing the HTML parallel (right now it's windows only for some reason: #60971 (comment))

@camelid
Copy link
Member Author

camelid commented Aug 22, 2021

(right now it's windows only for some reason: #60971 (comment))

That's weird!

@rbtcollins
Copy link
Contributor

I would note that exceeding available work actors for any dimension will cause queuing or thrashing, whether that is IO or CPU; -j as a cap should be set to min(CPUs, maxconcurrentIOs) - more or less; unless more sophisticated things are available - so treating -j as 'CPU only' is likely to be problematic - particularly on esoteric machines where IO requires CPU to perform.

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 16, 2021
rustdoc: reduce number of copies when using parallel IO

This is Windows-only for now; I was getting really bad slowdowns from this on linux for some reason.

Helps with rust-lang#82741. Follow-up to rust-lang#60971.
@jyn514
Copy link
Member

jyn514 commented Nov 26, 2021

Add parallelism via MPSC channels and rayon

Note that this was actually a performance regression on linux for some reason: https://perf.rust-lang.org/compare.html?start=9faa714154dbc03faa174a7d4f72d6bbbfd61f7c&end=853cac83440612cc4564f31c8b0ea39ef4389bf1

@the8472
Copy link
Member

the8472 commented Nov 26, 2021

That link shows total CPU instructions spent, across all cores. That's expected to get worse when you parallelize things due to communication overhead. But wall-time should go down, which didn't happen if you switch to those results.

But that PR only parallelizes IO, right? I.e. it doesn't parallelize the CPU-heavy parts, like querying the compiler internals or perhaps compression (if any). Write IO is fast on linux because it all goes into the page-cache and written out in the background by separate kernel threads. You're not actually waiting on disk IO. So there's not much to gain from parallelizing that.

Windows might be different due to anti-virus running synchronously when you write files.

@jyn514
Copy link
Member

jyn514 commented Nov 26, 2021

Write IO is fast on linux because it all goes into the page-cache and written out in the background by separate kernel threads. You're not actually waiting on disk IO. So there's not much to gain from parallelizing that.

Oh! I didn't realize writes happen asychronously. Yeah, this probably doesn't have much benefit on Linux then (maybe MacOS would benefit though?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-hard Call for participation: Hard difficulty. Experience needed to fix: A lot. I-compiletime Issue: Problems and improvements with respect to compile times. S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants