-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build artifacts caching support for non-relocatable binaries #480
Comments
This is not blocking us, but we have to apply a workaround which we always remove the cache before installation. |
Looking over package install scripts, it looks like it's copying the build artifacts to the cache post install. One way to address might be to do the build in the cache (as the configurable directory you're referring to). Related, and this might have a really obvious solution, but why do package managers like npm/bower/yarn all install local copies of all of the packages as opposed to having a central store of packages installed, and then symlinking them, similar to how npm link works with local packages? This would solve this issue, and maintain a single copy of (a version of) a dependency on the system. |
@dxu Building this in cache should work. I think the reason that npm or other package mangers don't do this is yarn tries to solve both of:
npm doesn't bother the cache artifacts because there are seldom packages that have install fields. (npm doesn't even have parallel installation support). |
Yeah, I can understand npm not doing this - there also isn't any notion of a central place to store all packages apart from global packages or an offline cache, I guess I was wondering if you guys have thought about doing it/what are disadvantages to doing so. My confusion is mainly wondering why its important to have both the cache in yarn along with the local node_modules copy (esp. if you start building pkg's in the cache), as opposed to just keeping everything in ~/.yarn and just symlinking the node modules folder to the dependency fetched/built in ~/.yarn, which is basically how |
On second thought, you'd still run into problems with this if you build in the cache and then copy over, if you clear the cache (or any configurable directory). You'll still need to build it per project, or the user would have to know to reinstall to rebuild it. Not sure how you can address this without forcing the build each time, and not sure how to set up something to address that on a case by case basis. Although, a single source of truth would help in this case as well. |
@dxu Good points. I'm convinced that maybe symlinking is a better idea here. As for your concern about clearing the cache -- yes, we will still have the same problem. But if we are explicit about the cache location configuration (ie, telling the user that this is the source of truth that you shouldn't remove), then users will understand the behavior. |
I wish the binaries were relocatable, but sometimes binaries can be large and copying can actually take a while too (12 seconds would be common), so symlinking seems like it would work pretty well aside from the cache clearing issue. It might be a worthwhile tradeoff though. Especially if |
Just to clarify, doesn't the cache clearing issues only apply for the copy case? As you mentioned, using symlinks actually allows for easy checking of cache misses (the path no longer exists). As I see it, with the current setup where node_modules contains copies of the cache:
Problems with these two:
|
No, there's an issue with symlinks too (though I think it's worth it). You have a symlink to the cache, then you clear the cache, any project that has a symlink to the cache in its node_modules will start failing. That isn't the behavior you'd expect from a "cache", though I think it's still useful and I'd opt into this behavior for faster installs. |
And perhaps we should see if the newest versions of the ocaml compiler are relocatable, which would make the copy option viable/preferred. However, aren't there some other popular programs that are also not relocatable? |
Oh, I see what you meant, and yeah you're right. I was differentiating the symlink and copy case in my head, because after clearing the cache, in the copy case, my understanding is that you'd get an ocaml-package-specific file error due to the relocated/removed build artifacts, whereas with the symlink, you just get the generic "command ocaml not found" (since the link points to nothing), which should be more digestible and understandable for users since it'd be the same error to when they didn't run npm install in the past. I can't confirm what the current error with ocaml says - I can't seem to reproduce this on my laptop right now, yarn seems to be rebuilding each time i yarn add for me now. |
Make it a hardlink then, not a symlink. Then all the links would point to the same inode, and clearing the cache will not break it (the files won't actually be deleted until all links are deleted). (as a bonus, you can create hardlinks and junctions without admin permissions on Windows, whereas symlink require admin permissions) |
An alternative if links (hard/soft) aren't possible (see #499) is fetching to cache directly prior to running postinstall scripts, and always running postinstall scripts after copying from cache -> However, I'm not able to reproduce the original problem the above example @yunxing, it seems to be rebuilding each time? Are you able to reproduce |
@dxu After some testing. I think the problem is not the binaries being non-relocatable. The problem is some installations are not idempotent (caching the installed scripts then reinstalling essentially means install the same thing twice). |
Is there any update on this? I am getting the same issue with |
We have a potential solution for binary relocation from cache. We've tried it with esy and we can bring the solution to Yarn, perhaps. Binary relocation doesn't work in all cases, and there are some limitations. |
The current caching mechanism assumes all binaries are relocatable. However that's not always the case. As an example, the ocaml compiler itself has hardcoded paths inside the binary.
The current problem today is that if we build ocaml in project A and use the cache in building project B, artifacts in project B will be pointing to project A, where it was originally build. This means if we remove projectA, B will stop working.
One possible solution to this problem is to always run build in a configurable directory, instead of node_modules in the current project. After build we can then copy the build artifacts back to the destination (either project A or B). This way, the non-relocatable artifacts will only depend on a directory where users are aware of.
To repro:
@bestander @jordwalke @dxu
The text was updated successfully, but these errors were encountered: