incr.comp.: Don't encode Fingerprint values with leb128. #45875

michaelwoerister · 2017-11-08T16:44:38Z

Since Fingerprint values have roughly random distribution, most of them will not profit from being stored in a variable-length encoding:

encoding to leb128 takes additional time, and
in the common case, the leb128 representation of the two 64 bit numbers in a Fingerprint will take up around 160 bits, so we are even wasting space.

We should not do that. UseSpecializedEncodable and UseSpecializedDecodable might be the way to circumvent the standard encoding methods of the opaque::Encoder.

The text was updated successfully, but these errors were encountered:

Xanewok · 2017-11-09T19:55:09Z

I’d like to take that on, since it seems self-contained and at least I know the Fingerprint type :laugh:
Would you mind posting more (specific) mentoring instructions on how would you like to see it implemented?

michaelwoerister · 2017-11-10T10:24:21Z

@Xanewok Sure, let's see:

Our libserialize defines the UseSpecializedEncodable and UseSpecializedDecodable traits. If you implement this trait for a type, you can provide encoder- and decoder-specific implementations for encoding and decoding. This way you can have implementations that have access to things that are not part of the general Encoder and Decoder interface. When implementing the UseSpecializedXxx traits, you have to provide a fallback implementation that only relies on the regular Encoder and Decoder, like here:

rust/src/libsyntax/ast.rs

Lines 240 to 250 in d5ff0e6

    
           impl serialize::UseSpecializedEncodable for NodeId { 
        
               fn default_encode<S: Encoder>(&self, s: &mut S) -> Result<(), S::Error> { 
        
                   s.emit_u32(self.0) 
        
               } 
        
           } 
        
           impl serialize::UseSpecializedDecodable for NodeId { 
        
               fn default_decode<D: Decoder>(d: &mut D) -> Result<NodeId, D::Error> { 
        
                   d.read_u32().map(NodeId) 
        
               } 
        
           }

We can use the above functionality to do two things:

Extend serialize::opaque::Encoder and serialize::opaque::Decoder with methods that allow saving raw bytes.

Provide serialization methods for Fingerprint that are specialized to serialize::opaque::Encoder and serialize::opaque::Decoder which use these methods directly. An example how such as specialized implementation can be provided can be found here:

rust/src/librustc_metadata/decoder.rs

Lines 289 to 298 in d5ff0e6

    
           impl<'a, 'tcx> SpecializedDecoder<CrateNum> for DecodeContext<'a, 'tcx> { 
        
               fn specialized_decode(&mut self) -> Result<CrateNum, Self::Error> { 
        
                   let cnum = CrateNum::from_u32(u32::decode(self)?); 
        
                   if cnum == LOCAL_CRATE { 
        
                       Ok(self.cdata().cnum) 
        
                   } else { 
        
                       Ok(self.cdata().cnum_map.borrow()[cnum]) 
        
                   } 
        
               } 
        
           }

In this example, you would replace DecodeContext with serialize::opaque::Decoder.

Let me know if you need more info to get started.

michaelwoerister · 2017-11-10T10:40:15Z

It would also be great to get some numbers on how this affects the size of the dep-graph.bin file in the incr. comp. session directory.

wesleywiser · 2017-12-06T17:58:01Z

Hi @Xanewok, are you still working on this?

Xanewok · 2017-12-07T03:14:32Z

Sorry, couldn't find more time to work on this. I'm afraid it's not going to change until the holidays, so @wesleywiser it's all yours if you want to work on this!

michaelwoerister · 2017-12-19T13:09:18Z

@wesleywiser, how is it going? Did you make any progress?

wesleywiser · 2017-12-19T15:26:01Z

@michaelwoerister I made some progress over the weekend. I've added methods to the Encoder and Decoder types for manipulating raw bytes using the same kind of technique as the code that handles str values (emit usize length prefix and then raw bytes). This got me thinking, by using these functions to handle the Fingerprint type, it will use one usize plus a [u8; 16] consuming 192 bits on a 64 bit platform. Was there a reason we wanted to do the raw bytes approach instead of just calling {emit,read}_u64 twice? It looks like there is also a u128 variant we could use if we changed Fingerprint to be a tuple struct of one u128 instead of two u64 values (or, I suppose we could just combine them during serialization). Thoughts?

michaelwoerister · 2017-12-19T16:05:18Z

This got me thinking, by using these functions to handle the Fingerprint type, it will use one usize plus a [u8; 16] consuming 192 bits on a 64 bit platform.

For these methods we assume that we already know how many bytes we want to read or write, so you don't have to encode the length. I suggest making the parameter for the write method &[u8] and the parameter for the read method &mut [u8]. In both cases the length of the slice tells you how many bytes you are dealing with. Both methods would return a Result<(), _> and for the read method we assume that the requested bytes are in the buffer if the method returned Ok(()).

Was there a reason we wanted to do the raw bytes approach instead of just calling {emit,read}_u64 twice?

These methods will use the leb128 encoding, which is what we want to avoid (because we know that fingerprints are random values and thus won't profit from a variable-length encoding).

wesleywiser · 2017-12-24T18:06:01Z

Thanks @michaelwoerister, that's really helpful! I've implemented UseSpecialized{Encoder,Decoder} and Specialized{Encoder,Decoder} but it doesn't seem to be working. I've stubbed out the function bodies with panic!()s but running the resulting compiler doesn't panic. Any ideas?

(Code is here and I'm invoking the compiler with CARGO_INCREMENTAL=1 cargo build)

michaelwoerister · 2017-12-27T15:33:51Z

@wesleywiser, try to remove RustcEncodable, RustcDecodable from the #[derive(...)] attribute on Fingerprint. They might somehow shadow the specialized implementations. I think, I've run into this myself a few weeks ago. Very annoying :)

wesleywiser · 2017-12-27T17:57:12Z

Thanks @michaelwoerister, that did it.

wesleywiser · 2018-01-03T15:40:47Z

@michaelwoerister I'm getting test failures from the incremental tests with my changes. From what I can tell, the new encode functions are never being called during the tests, only the decode functions are. Since the encode functions aren't being called, the decode is failing since the serialized format has changed. Is there additional test data somewhere I need to update or am I missing something?

(My code is here FYI)

michaelwoerister · 2018-01-04T08:54:40Z

I spot a bug in the decoding function:

    pub fn read_raw_bytes(&mut self, s: &mut [u8]) -> Result<(), String> {
        let len = s.len();

        self.position += len;

        s.copy_from_slice(&self.data[0..len]); 
        // ^^^ always reads from the beginning, not from `position`

        Ok(())
    }

Should be something like:

    pub fn read_raw_bytes(&mut self, s: &mut [u8]) -> Result<(), String> {
        let start = self.position;
        let end = self.position + s.len();
        s.copy_from_slice(&self.data[start .. end]); 
        self.position = end;
        Ok(())
    }

From what I can tell, the new encode functions are never being called during the tests, only the decode functions are.

Are you sure about that? The setup looks correct to me.

wesleywiser · 2018-01-04T19:25:48Z

Ah, yes, you're right! Good catch.

Are you sure about that? The setup looks correct to me.

I'm fairly sure of that because I added some logging statements into those functions and then captured the test output. I can see no instances of the encode functions being called based on that. I'll fix the bug you found and re-run the tests though and see if this changes anything.

wesleywiser · 2018-01-05T02:17:30Z

Ok. It's still failing with the same issue. I've pushed a commit that shows the logging I added to the encoder functions. From the failing test output, I can see that nothing is being logged to stderr.

michaelwoerister · 2018-01-05T14:41:13Z

I'll look into it.

michaelwoerister · 2018-01-05T16:03:06Z

OK, I see what the problem is: We have two kinds of encoders that have to handle fingerprints, the one for writing crate metadata and the one for writing the incr. comp. on-disk-cache. Both of these use an opaque::Encoder internally, so I thought we could handle both cases at once by providing a specialized implementation for that. However, this isn't how the UseSpecializedEncodable infrastructure works. If a type is marked as UseSpecializedEncodable then it will use the specialization for exactly the encoder given. So if we try to encode crate metadata, it will use the specialization for rustc_metadata::encoder::EncodeContext. But since we don't have a specialized implementation for that, it will fall back to the default_encode() method, which uses plain emit_u64() calls again and our specialization for opaque::Encoder is never hit.

In order to avoid this problem, we have to provide specializations for rustc_metadata::encoder::EncodeContext, rustc_metadata::decoder::DecodeContext, rustc::ty::maps::on_disk_cache::CacheEncoder, and rustc::ty::maps::on_disk_cache::CacheDecoder. So, I would write helper methods Fingerprint::encode_opaque(&self, encoder: opaque::Encoder) -> Result<(), _> and Fingerprint::decode_opaque(decoder: opaque::Encoder) -> Result<Fingerprint, _> in fingerprint.rs and then call these from the specializations in librustc_metadata/encoder.rs, librustc_metadata/decoder.rs, and librustc/ty/maps/on_disk_cache.rs.

In order to make sure that we don't accidentally hit the default_encode/default_decode methods, I would just not implement them. (We already follow this strategy for many other things like DefIndex).

Part of rust-lang#45875

wesleywiser · 2018-01-06T20:02:35Z

Thanks! That did the trick!

This saves the storage space used by about 32 bits per `Fingerprint`. On average, this reduces the size of the `/target/{mode}/incremental` folder by roughly 5%. Fixes rust-lang#45875

@michaelwoerister

…elwoerister [incremental] Specialize encoding and decoding of Fingerprints This saves the storage space used by about 32 bits per `Fingerprint`. On average, this reduces the size of the `/target/{mode}/incremental` folder by roughly 5% [Full details here](https://gist.github.com/wesleywiser/264076314794fbd6a4c110d7c1adc43e). Fixes #45875 r? @michaelwoerister

michaelwoerister added A-incr-comp Area: Incremental compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. E-needs-mentor WG-incr-comp Working group: Incremental compilation labels Nov 8, 2017

michaelwoerister added E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. and removed E-needs-mentor labels Nov 10, 2017

wesleywiser added a commit to wesleywiser/rust that referenced this issue Jan 6, 2018

Add raw bytes functions

fc2424b

Part of rust-lang#45875

wesleywiser mentioned this issue Jan 6, 2018

[incremental] Specialize encoding and decoding of Fingerprints #47243

Merged

bors closed this as completed in #47243 Jan 11, 2018

michaelwoerister mentioned this issue Jan 22, 2018

Tracking Issue for Incremental Compilation #47660

Open

32 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incr.comp.: Don't encode Fingerprint values with leb128. #45875

incr.comp.: Don't encode Fingerprint values with leb128. #45875

michaelwoerister commented Nov 8, 2017

Xanewok commented Nov 9, 2017

michaelwoerister commented Nov 10, 2017

michaelwoerister commented Nov 10, 2017

wesleywiser commented Dec 6, 2017

Xanewok commented Dec 7, 2017

michaelwoerister commented Dec 19, 2017

wesleywiser commented Dec 19, 2017

michaelwoerister commented Dec 19, 2017

wesleywiser commented Dec 24, 2017

michaelwoerister commented Dec 27, 2017

wesleywiser commented Dec 27, 2017

wesleywiser commented Jan 3, 2018

michaelwoerister commented Jan 4, 2018

wesleywiser commented Jan 4, 2018

wesleywiser commented Jan 5, 2018

michaelwoerister commented Jan 5, 2018

michaelwoerister commented Jan 5, 2018

wesleywiser commented Jan 6, 2018

incr.comp.: Don't encode Fingerprint values with leb128. #45875

incr.comp.: Don't encode Fingerprint values with leb128. #45875

Comments

michaelwoerister commented Nov 8, 2017

Xanewok commented Nov 9, 2017

michaelwoerister commented Nov 10, 2017

michaelwoerister commented Nov 10, 2017

wesleywiser commented Dec 6, 2017

Xanewok commented Dec 7, 2017

michaelwoerister commented Dec 19, 2017

wesleywiser commented Dec 19, 2017

michaelwoerister commented Dec 19, 2017

wesleywiser commented Dec 24, 2017

michaelwoerister commented Dec 27, 2017

wesleywiser commented Dec 27, 2017

wesleywiser commented Jan 3, 2018

michaelwoerister commented Jan 4, 2018

wesleywiser commented Jan 4, 2018

wesleywiser commented Jan 5, 2018

michaelwoerister commented Jan 5, 2018

michaelwoerister commented Jan 5, 2018

wesleywiser commented Jan 6, 2018