-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Types and specifically "mentioned items" take up a lot of space in Rust metadata files #122936
Comments
I thought that's what we already did. Anything that was already encoded will just get encoded as an offset to where the actual value was encoded before. |
Hm. Then either these lists are a lot bigger than I expected or it's not working somehow? |
Yeah it certainly looks like there's a cache that avoids repeatedly encoding the same type: rust/compiler/rustc_middle/src/ty/codec.rs Lines 112 to 116 in 2f2350e
In that case, no idea what the size regressed here. Is there any way to figure out what is taking up that extra size? |
Lots of bodies with lots of mentioned items just adding up? Even if the items themselves are fairly small |
I remember @saethlin doing some encoder debugging before. Got any ideas? |
Well I don't think that my encoder debugging rig is useful here; that's for finding what data is at some offset in the file. But this doesn't look particularly complicated to understand by generating a file of In case it's not obvious, this sort of thing will be incredibly slow, and even just having the code compiled in might have prohibitive runtime overhead. |
#122568 was a significant size regression for Rust metadata. The extra information that is stored in MIR bodies now is a bunch of
Ty<'tcx>
, so in memory I think this is actually not that much. But it seems like in our metadata format this takes up quite a bit of space. This not only makes the library files bigger, it also accounts for a large fraction (I think even the majority) of the compile-time regression from that PR.I don't know if there's something that can be done to improve this -- either by storing different information in
mentioned_items
that needs less space on disk, or by representingTy<'tcx>
more efficiently on disk. One drastic option would be to "intern" types in the on-disk format, i.e. have one global table of types that everything else just indexes into. That would certainly save space when the same type appears multiple times. I don't know if that is what happens here though.The text was updated successfully, but these errors were encountered: