Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make precompile files relocatable/servable #47943

Closed
staticfloat opened this issue Dec 20, 2022 · 14 comments · Fixed by #49866
Closed

Make precompile files relocatable/servable #47943

staticfloat opened this issue Dec 20, 2022 · 14 comments · Fixed by #49866
Labels
compiler:codegen Generation of LLVM IR and native code compiler:precompilation Precompilation of modules

Comments

@staticfloat
Copy link
Member

staticfloat commented Dec 20, 2022

Cross-ref: #45215


Precompile files (.ji and soon, package images) should be as relocatable as possible. Use cases for this include:

  • Serving precompiled .so files over the network to obviate precompilation delays when installing new packages.
  • Moving depots around on the local file system

The biggest known problem in this is that precompilation caches currently embed the absolute path to their dependencies (e.g. .jl file sources, Artifacts.toml files, etc...). I propose that we should introduce a notion of "depot-local" files here. Here's how I see it working:

If we have a package Foo.jl which has been precompiled, that cache should reference the source files as something like @depot/packages/Foo_jll/a2b2c3/src/Foo.jl. During cache header verification, we string-replace ^@depot to the same depot that the cache file is currently being loaded from. I do not think it's worthwhile to allow this path to resolve to other depots on the depot path; it's too easy to find the "wrong" files, and I think it's reasonable to expect an atomic move of all associated resources for a package image.

When serializing out dependency paths, we simply check if the depot we're serializing into is a superpath of the dependency path. If it is, we emit @depot/..., otherwise we emit the full path.

@giordano
Copy link
Contributor

This isn't going to happen anytime soon, if ever, but in view of #4630 it'd be good to think how to make this XDG-friendly (in which case the depot doesn't have a single entry point), if possible

@staticfloat
Copy link
Member Author

I suppose we could emit more fine-grained tokens, such as @compiled, @artifacts, etc... and then in most cases, these are equal to @depot/compiled, @depot/artifacts, etc... but in other cases, they could be equal to ~/.local/share, ~/.cache, etc...

@vchuravy
Copy link
Member

Bonus points if we accidentally solve parts of #33065, having stacktraces (including GDB??) not point to the build machines would be grand

@DilumAluthge
Copy link
Member

See also #45215

@KristofferC
Copy link
Member

servable

Build scripts (deps/build.jl) makes serving ji files a bit tricky. Since a build script executes arbitrary Julia code before compilation, you cannot really content address the "compile artifact" of such a package. And everything depending on such a package can therefore also not be content addressed. So a package with a build script pretty much "poisons" everything above it transitively.

@staticfloat
Copy link
Member Author

That totally makes sense; I think eliminating self-modification is a good thing for the ecosystem overall, but even if 50% of all packages are poisoned, we can still reap the benefit for the other 50%.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 21, 2022

More ammo to motivate people to stop using deps/build.jl also helps. Even if we can only serve pre-generated precompile files for 20% of the ecosystem, we can probably start a campaign to get rid of the remaining deps/build.jl usages and rapidly push the percentage up by targetting the ones that poison the largest portion of the ecosystem.

@StefanKarpinski
Copy link
Member

Oops, mouse slip!

@brenhinkeller brenhinkeller added compiler:codegen Generation of LLVM IR and native code compiler:precompilation Precompilation of modules labels Dec 22, 2022
@fatteneder
Copy link
Member

This isn't going to happen anytime soon, if ever, ...

May I ask what are the reasons for this not happening soon or even never?

@giordano
Copy link
Contributor

May I ask what are the reasons for this not happening soon or even never?

No one signalled intention to work on #4630, and code is typically written by someone dedicating some time to it, not just by good will, so that looks to me like an indication that's not going to happen anytime soon.

@fatteneder
Copy link
Member

Sorry, i misunderstood your comment. I thought you were refering to the problem of this issue and not #4630.

@ericphanson
Copy link
Contributor

Just to add one more reason for wanting this: at work we often develop locally some amount (and along the way, precompile the env), and then move into a docker container to continue work on a k8s pod (maybe time to run that big chunky computation). It would be real nice if we could copy the precompilation cache into the container to avoid re-precompiling.

@sloede
Copy link
Contributor

sloede commented Jan 21, 2023

Here's another use case: massively parallel HPC runs, where we want to pack up the depot with precompilation files / pkgimages and move it to a node-local RAM disk to alleviate undue stress on the I/O subsystem.

Just as a measure of the magnitude of improvement this can bring, when using a RAM disk I am able to keep the time for using OrdinaryDiffEq at a steady 10 seconds instead of the up to 15 minutes when loading all files via the parallel file system.

@timholy
Copy link
Member

timholy commented Jan 23, 2023

I don't think anyone disagrees that this would be useful. When

(people who care a lot about this issue) ∩ (people who already know the code base well enough to fix it) = ∅,

usually the way it gets fixed is that someone in the first set decides to put in enough effort to join the second set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code compiler:precompilation Precompilation of modules
Projects
None yet
Development

Successfully merging a pull request may close this issue.