Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incr.comp.: Don't encode Fingerprint values with leb128. #45875

Closed
michaelwoerister opened this issue Nov 8, 2017 · 18 comments
Closed

incr.comp.: Don't encode Fingerprint values with leb128. #45875

michaelwoerister opened this issue Nov 8, 2017 · 18 comments
Labels
A-incr-comp Area: Incremental compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. WG-incr-comp Working group: Incremental compilation

Comments

@michaelwoerister
Copy link
Member

Since Fingerprint values have roughly random distribution, most of them will not profit from being stored in a variable-length encoding:

  • encoding to leb128 takes additional time, and
  • in the common case, the leb128 representation of the two 64 bit numbers in a Fingerprint will take up around 160 bits, so we are even wasting space.

We should not do that. UseSpecializedEncodable and UseSpecializedDecodable might be the way to circumvent the standard encoding methods of the opaque::Encoder.

@michaelwoerister michaelwoerister added A-incr-comp Area: Incremental compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. E-needs-mentor WG-incr-comp Working group: Incremental compilation labels Nov 8, 2017
@Xanewok
Copy link
Member

Xanewok commented Nov 9, 2017

I’d like to take that on, since it seems self-contained and at least I know the Fingerprint type :laugh:
Would you mind posting more (specific) mentoring instructions on how would you like to see it implemented?

@michaelwoerister
Copy link
Member Author

@Xanewok Sure, let's see:

  • Our libserialize defines the UseSpecializedEncodable and UseSpecializedDecodable traits. If you implement this trait for a type, you can provide encoder- and decoder-specific implementations for encoding and decoding. This way you can have implementations that have access to things that are not part of the general Encoder and Decoder interface. When implementing the UseSpecializedXxx traits, you have to provide a fallback implementation that only relies on the regular Encoder and Decoder, like here:

    rust/src/libsyntax/ast.rs

    Lines 240 to 250 in d5ff0e6

    impl serialize::UseSpecializedEncodable for NodeId {
    fn default_encode<S: Encoder>(&self, s: &mut S) -> Result<(), S::Error> {
    s.emit_u32(self.0)
    }
    }
    impl serialize::UseSpecializedDecodable for NodeId {
    fn default_decode<D: Decoder>(d: &mut D) -> Result<NodeId, D::Error> {
    d.read_u32().map(NodeId)
    }
    }
  • We can use the above functionality to do two things:
    • Extend serialize::opaque::Encoder and serialize::opaque::Decoder with methods that allow saving raw bytes.
    • Provide serialization methods for Fingerprint that are specialized to serialize::opaque::Encoder and serialize::opaque::Decoder which use these methods directly. An example how such as specialized implementation can be provided can be found here:
      impl<'a, 'tcx> SpecializedDecoder<CrateNum> for DecodeContext<'a, 'tcx> {
      fn specialized_decode(&mut self) -> Result<CrateNum, Self::Error> {
      let cnum = CrateNum::from_u32(u32::decode(self)?);
      if cnum == LOCAL_CRATE {
      Ok(self.cdata().cnum)
      } else {
      Ok(self.cdata().cnum_map.borrow()[cnum])
      }
      }
      }
      In this example, you would replace DecodeContext with serialize::opaque::Decoder.

Let me know if you need more info to get started.

@michaelwoerister
Copy link
Member Author

It would also be great to get some numbers on how this affects the size of the dep-graph.bin file in the incr. comp. session directory.

@michaelwoerister michaelwoerister added E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. and removed E-needs-mentor labels Nov 10, 2017
@wesleywiser
Copy link
Member

Hi @Xanewok, are you still working on this?

@Xanewok
Copy link
Member

Xanewok commented Dec 7, 2017

Sorry, couldn't find more time to work on this. I'm afraid it's not going to change until the holidays, so @wesleywiser it's all yours if you want to work on this!

@michaelwoerister
Copy link
Member Author

@wesleywiser, how is it going? Did you make any progress?

@wesleywiser
Copy link
Member

@michaelwoerister I made some progress over the weekend. I've added methods to the Encoder and Decoder types for manipulating raw bytes using the same kind of technique as the code that handles str values (emit usize length prefix and then raw bytes). This got me thinking, by using these functions to handle the Fingerprint type, it will use one usize plus a [u8; 16] consuming 192 bits on a 64 bit platform. Was there a reason we wanted to do the raw bytes approach instead of just calling {emit,read}_u64 twice? It looks like there is also a u128 variant we could use if we changed Fingerprint to be a tuple struct of one u128 instead of two u64 values (or, I suppose we could just combine them during serialization). Thoughts?

@michaelwoerister
Copy link
Member Author

This got me thinking, by using these functions to handle the Fingerprint type, it will use one usize plus a [u8; 16] consuming 192 bits on a 64 bit platform.

For these methods we assume that we already know how many bytes we want to read or write, so you don't have to encode the length. I suggest making the parameter for the write method &[u8] and the parameter for the read method &mut [u8]. In both cases the length of the slice tells you how many bytes you are dealing with. Both methods would return a Result<(), _> and for the read method we assume that the requested bytes are in the buffer if the method returned Ok(()).

Was there a reason we wanted to do the raw bytes approach instead of just calling {emit,read}_u64 twice?

These methods will use the leb128 encoding, which is what we want to avoid (because we know that fingerprints are random values and thus won't profit from a variable-length encoding).

@wesleywiser
Copy link
Member

Thanks @michaelwoerister, that's really helpful! I've implemented UseSpecialized{Encoder,Decoder} and Specialized{Encoder,Decoder} but it doesn't seem to be working. I've stubbed out the function bodies with panic!()s but running the resulting compiler doesn't panic. Any ideas?

(Code is here and I'm invoking the compiler with CARGO_INCREMENTAL=1 cargo build)

@michaelwoerister
Copy link
Member Author

@wesleywiser, try to remove RustcEncodable, RustcDecodable from the #[derive(...)] attribute on Fingerprint. They might somehow shadow the specialized implementations. I think, I've run into this myself a few weeks ago. Very annoying :)

@wesleywiser
Copy link
Member

Thanks @michaelwoerister, that did it.

@wesleywiser
Copy link
Member

@michaelwoerister I'm getting test failures from the incremental tests with my changes. From what I can tell, the new encode functions are never being called during the tests, only the decode functions are. Since the encode functions aren't being called, the decode is failing since the serialized format has changed. Is there additional test data somewhere I need to update or am I missing something?

(My code is here FYI)

@michaelwoerister
Copy link
Member Author

I spot a bug in the decoding function:

    pub fn read_raw_bytes(&mut self, s: &mut [u8]) -> Result<(), String> {
        let len = s.len();

        self.position += len;

        s.copy_from_slice(&self.data[0..len]); 
        // ^^^ always reads from the beginning, not from `position`

        Ok(())
    }

Should be something like:

    pub fn read_raw_bytes(&mut self, s: &mut [u8]) -> Result<(), String> {
        let start = self.position;
        let end = self.position + s.len();
        s.copy_from_slice(&self.data[start .. end]); 
        self.position = end;
        Ok(())
    }

From what I can tell, the new encode functions are never being called during the tests, only the decode functions are.

Are you sure about that? The setup looks correct to me.

@wesleywiser
Copy link
Member

Ah, yes, you're right! Good catch.

Are you sure about that? The setup looks correct to me.

I'm fairly sure of that because I added some logging statements into those functions and then captured the test output. I can see no instances of the encode functions being called based on that. I'll fix the bug you found and re-run the tests though and see if this changes anything.

@wesleywiser
Copy link
Member

Ok. It's still failing with the same issue. I've pushed a commit that shows the logging I added to the encoder functions. From the failing test output, I can see that nothing is being logged to stderr.

@michaelwoerister
Copy link
Member Author

I'll look into it.

@michaelwoerister
Copy link
Member Author

OK, I see what the problem is: We have two kinds of encoders that have to handle fingerprints, the one for writing crate metadata and the one for writing the incr. comp. on-disk-cache. Both of these use an opaque::Encoder internally, so I thought we could handle both cases at once by providing a specialized implementation for that. However, this isn't how the UseSpecializedEncodable infrastructure works. If a type is marked as UseSpecializedEncodable then it will use the specialization for exactly the encoder given. So if we try to encode crate metadata, it will use the specialization for rustc_metadata::encoder::EncodeContext. But since we don't have a specialized implementation for that, it will fall back to the default_encode() method, which uses plain emit_u64() calls again and our specialization for opaque::Encoder is never hit.

In order to avoid this problem, we have to provide specializations for rustc_metadata::encoder::EncodeContext, rustc_metadata::decoder::DecodeContext, rustc::ty::maps::on_disk_cache::CacheEncoder, and rustc::ty::maps::on_disk_cache::CacheDecoder. So, I would write helper methods Fingerprint::encode_opaque(&self, encoder: opaque::Encoder) -> Result<(), _> and Fingerprint::decode_opaque(decoder: opaque::Encoder) -> Result<Fingerprint, _> in fingerprint.rs and then call these from the specializations in librustc_metadata/encoder.rs, librustc_metadata/decoder.rs, and librustc/ty/maps/on_disk_cache.rs.

In order to make sure that we don't accidentally hit the default_encode/default_decode methods, I would just not implement them. (We already follow this strategy for many other things like DefIndex).

wesleywiser added a commit to wesleywiser/rust that referenced this issue Jan 6, 2018
@wesleywiser
Copy link
Member

Thanks! That did the trick!

wesleywiser added a commit to wesleywiser/rust that referenced this issue Jan 10, 2018
This saves the storage space used by about 32 bits per `Fingerprint`.
On average, this reduces the size of the `/target/{mode}/incremental`
folder by roughly 5%.

Fixes rust-lang#45875
bors added a commit that referenced this issue Jan 11, 2018
…elwoerister

[incremental] Specialize encoding and decoding of Fingerprints

This saves the storage space used by about 32 bits per `Fingerprint`.
On average, this reduces the size of the `/target/{mode}/incremental`
folder by roughly 5% [Full details here](https://gist.github.com/wesleywiser/264076314794fbd6a4c110d7c1adc43e).

Fixes #45875

r? @michaelwoerister
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation C-enhancement Category: An issue proposing an enhancement or a PR with one. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. WG-incr-comp Working group: Incremental compilation
Projects
None yet
Development

No branches or pull requests

3 participants