Types and specifically "mentioned items" take up a lot of space in Rust metadata files #122936

RalfJung · 2024-03-23T11:09:49Z

#122568 was a significant size regression for Rust metadata. The extra information that is stored in MIR bodies now is a bunch of Ty<'tcx>, so in memory I think this is actually not that much. But it seems like in our metadata format this takes up quite a bit of space. This not only makes the library files bigger, it also accounts for a large fraction (I think even the majority) of the compile-time regression from that PR.

I don't know if there's something that can be done to improve this -- either by storing different information in mentioned_items that needs less space on disk, or by representing Ty<'tcx> more efficiently on disk. One drastic option would be to "intern" types in the on-disk format, i.e. have one global table of types that everything else just indexes into. That would certainly save space when the same type appears multiple times. I don't know if that is what happens here though.

The text was updated successfully, but these errors were encountered:

oli-obk · 2024-03-23T19:00:51Z

One drastic option would be to "intern" types in the on-disk format, i.e. have one global table of types that everything else just indexes into. That would certainly save space when the same type appears multiple times. I don't know if that is what happens here though.

I thought that's what we already did. Anything that was already encoded will just get encoded as an offset to where the actual value was encoded before.

RalfJung · 2024-03-23T19:39:54Z

Hm. Then either these lists are a lot bigger than I expected or it's not working somehow?

RalfJung · 2024-04-24T20:52:49Z

Yeah it certainly looks like there's a cache that avoids repeatedly encoding the same type:

rust/compiler/rustc_middle/src/ty/codec.rs

Lines 112 to 116 in 2f2350e

    
           impl<'tcx, E: TyEncoder<I = TyCtxt<'tcx>>> Encodable<E> for Ty<'tcx> { 
        
               fn encode(&self, e: &mut E) { 
        
                   encode_with_shorthand(e, self, TyEncoder::type_shorthands); 
        
               } 
        
           }

In that case, no idea what the size regressed here. Is there any way to figure out what is taking up that extra size?

oli-obk · 2024-04-24T20:58:14Z

Lots of bodies with lots of mentioned items just adding up? Even if the items themselves are fairly small

oli-obk · 2024-04-24T20:59:21Z

I remember @saethlin doing some encoder debugging before. Got any ideas?

saethlin · 2024-04-25T00:01:18Z

Well I don't think that my encoder debugging rig is useful here; that's for finding what data is at some offset in the file.

But this doesn't look particularly complicated to understand by generating a file of inferno's folded stacks format: https://crates.io/crates/inferno. I've analyzed memory consumption of programs by turning strace -k -e mmap dumps into the folded stacks with this: https://github.com/saethlin/strace-flamegraph, so for this case I'd use backtrace to collect a backtrace from all the primitive write operations in FileEncoder and print a line of output that's the backtrace, semicolon-delimited, then a space and the number of bytes written. Pipe that into inferno-flamegraph and you should get a flamegraph of what is using up file size in metadata encoding. If you have two files, you can use inferno-diff-folded to get a diff-style flamegraph between the two.

In case it's not obvious, this sort of thing will be incredibly slow, and even just having the code compiled in might have prohibitive runtime overhead.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Mar 23, 2024

RalfJung mentioned this issue Mar 23, 2024

recursively evaluate the constants in everything that is 'mentioned' #122568

Merged

RalfJung changed the title ~~Types and specificaly "mentioned items" take up a lot of space in Rust metadata files~~ Types and specifically "mentioned items" take up a lot of space in Rust metadata files Mar 23, 2024

jieyouxu added A-metadata Area: Crate metadata T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 23, 2024

fmease added A-const-eval Area: Constant evaluation (MIR interpretation) I-heavy Issue: Problems and improvements with respect to binary size of generated code. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Types and specifically "mentioned items" take up a lot of space in Rust metadata files #122936

Types and specifically "mentioned items" take up a lot of space in Rust metadata files #122936

RalfJung commented Mar 23, 2024

oli-obk commented Mar 23, 2024

RalfJung commented Mar 23, 2024

RalfJung commented Apr 24, 2024

oli-obk commented Apr 24, 2024

oli-obk commented Apr 24, 2024

saethlin commented Apr 25, 2024 •

edited

Loading

Types and specifically "mentioned items" take up a lot of space in Rust metadata files #122936

Types and specifically "mentioned items" take up a lot of space in Rust metadata files #122936

Comments

RalfJung commented Mar 23, 2024

oli-obk commented Mar 23, 2024

RalfJung commented Mar 23, 2024

RalfJung commented Apr 24, 2024

oli-obk commented Apr 24, 2024

oli-obk commented Apr 24, 2024

saethlin commented Apr 25, 2024 • edited Loading

saethlin commented Apr 25, 2024 •

edited

Loading