Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign DeserializerAdapter trait to support deserializing using DeserializeSeed #313

Closed
wants to merge 22 commits into from

Conversation

phil-opp
Copy link
Contributor

@phil-opp phil-opp commented Dec 19, 2023

Since the CDR format is not self-describing, the target type needs to be known for deserialization. The current RustDDS implementation ensures this by requiring a serde::Deserialize implementation for all target types. Unfortunately this makes it impossible to deserialize when the type information is only available at runtime. For example, it is not possible to base the deserialization on type information parsed from ROS2 msg files at runtime.

This pull request solves this limitation by adding basic support for deserialization based on serde's DeserializeSeed trait, which allows to use additional state for deserialization.

The main changes are:

  • Add a new Decode trait for deserializing a byte slice with access to the RepresentationIdentifier. The trait is implemented for the following new types:
    • PlCdrDeserializer is an empty newtype that implements Decode for all types that implement PlCdrDeserialize.
    • CdrDeserializeDecoder is an empty newtype that implements Decode for all types that implement serde::Deserialize.
    • CdrDeserializeSeedDecoder wraps a DeserializeSeed implementation and implements Decode using the seed for deserialization.
  • Redesign the DeserializerAdapter trait
    • Add a from_bytes_with method that takes an additional decoder: impl Decode argument. The method has a default implementation that calls the decoder.
    • Add a new associated type named Deserialized, which describes the type that the decoder will deserialize to.
      • Most of the time, the type is equal to the generic type D.
      • Some adapters, mainly DAWrapper, need to do an additional transformation after deserializing. This transformation is possible by implementing a new transform_deserialized trait method. For example, DAWrapper wraps the deserialized value into a NoKeyWrapper.
    • Create a new DefaultDecoder sub-trait, which describes adapters that can be used without supplying a decoder (since they provide a default one).
      • Gate the from_bytes and from_vec_bytes methods on the DefaultDecoder trait and provide a default implementation using the *_with variants and the default decoder.
      • The DefaultDecoder trait is implemented:
        • for CDRDeserializerAdapter if D implements serde::Deserialize.
        • for PlCdrDeserializerAdapter if D implements PlCdrDeserialize.
  • Use the DefaultDecoder trait to support both stateful and stateless deserialization in most functions, without duplicating the code
  • Create new try_take_one_with and as_async_stream_with API methods to allow ros2-client and others to plug in custom stateful deserializers.

I tried my best to limit the scope of this PR as much as possible and to document all the changes. Unfortunately, I failed to make this a non-breaking change, mainly because of the new DeserializerAdapter design. I hope you still consider this PR, as this feature is critical for our use case in the dora project.

Please let me know if you have any questions or ideas for improvements!

(edit: The commit list got a bit messy, sorry. I can squash if you like.)

@phil-opp
Copy link
Contributor Author

With these changes, we can update ros2-client to provide take_seed and async_stream_seed functions on Subscription. I opened a draft PR for this at jhelovuo/ros2-client#20 to show the required ros2-client changes.

@phil-opp
Copy link
Contributor Author

I'm not sure why the "Code Coverage" CI job fails, but it appears to be broken on the master branch as well: https://github.com/jhelovuo/RustDDS/actions/workflows/coverage.yml . So I don't think that the failure is related to this PR.

@jhelovuo
Copy link
Owner

jhelovuo commented Dec 21, 2023

Hello,

Thank you for the interesting idea and PR. I was not aware of the DeserializeSeed trait before. I can see the use case here.

If the deserialization would be changed as you propose, how would the serialization go then?

And no need to worry about the code coverage CI job.

@phil-opp
Copy link
Contributor Author

Serialization doesn't need to be changed, since the serde::Serialize trait gives access to the runtime value that should be serialized. So we can use the type information parsed from ROS2 message files in a custom Serialize implementation.

For example:

pub struct TypedPythonValue<'a> {
    pub value: &'a pyo3::Any,              // given from Python
    pub type_info: &'a Ros2TypeInfo,       // parsed from ROS2 message files
}


impl serde::Serialize for TypedValue<'_> {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: serde::Serializer,
    {
        match &self.type_info.data_type {
            DataType::UInt8 => {
                // the ROS2 topic expects an `u8` value, so try to convert the Python value
                let value: u8 = self.value.extract().map_err(serde::ser::Error::custom)?;
                serializer.serialize_u8(value)
            }
            .... // handle other ROS2 data types
            _ => todo!(),
        }
    }
}

This approach doesn't work with the standard serde::Deserialize trait because it is completely stateless, i.e. there is no self argument. So for deserialization based on runtime type information, we need to use DeserializeSeed.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 52.17391% with 66 lines in your changes are missing coverage. Please review.

Project coverage is 73.26%. Comparing base (28ed26f) to head (4e8e37d).
Report is 130 commits behind head on master.

Files Patch % Lines
src/dds/no_key/simpledatareader.rs 0.00% 25 Missing ⚠️
src/dds/with_key/simpledatareader.rs 48.64% 19 Missing ⚠️
src/dds/adapters.rs 52.94% 16 Missing ⚠️
src/serialization/cdr_deserializer.rs 77.77% 6 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #313      +/-   ##
==========================================
- Coverage   73.53%   73.26%   -0.28%     
==========================================
  Files          94       95       +1     
  Lines       19845    20192     +347     
==========================================
+ Hits        14594    14793     +199     
- Misses       5251     5399     +148     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jhelovuo
Copy link
Owner

Thanks for keeping your PR up to date. I assume this means you are using it for yourself.

We are considering accepting this PR, even though it adds extra complication. We may have another use case where it will be useful. This is just to let you know that the PR is not dead.

@phil-opp
Copy link
Contributor Author

Thanks for the update! Yes, we're using this PR for the https://github.com/dora-rs/dora project. We need it because we precompile Python bindings on a different machine, which need to be able to deserialize ROS2 messages based on type information parsed at runtime.

@jhelovuo
Copy link
Owner

Looking at the central part of the new design, namely, new code to adapters.rs , terms "decode" and "transform" are used. Are these interchangeable? If yes, perhaps only one term should be used for clarity?

@jhelovuo
Copy link
Owner

Looking at the central part of the new design, namely, new code to adapters.rs , terms "decode" and "transform" are used. Are these interchangeable? If yes, perhaps only one term should be used for clarity?

This question makes no sense. Apparently, "decode" and "deserialize" are more like synonyms. So the pipeline is

byte slice → deserialize/decode → value of type Self::Deserialized → call
[Self::transform_deserialized] → value of type D

And it makes sense to use "decode", because the term "deserialize" is already taken.

Am I now on the correct track?

@jhelovuo
Copy link
Owner

Another question: What is the use case for transform_deserialized() ?
In this PR and the corresponding ros2-client PR it always seems to be the identity function.

@jhelovuo
Copy link
Owner

I just created a new branch deserialize-with-seed for developing this feature. Please redirect the PR there.

@jhelovuo
Copy link
Owner

Is there a specific reason in

fn from_bytes_with<S>(
      input_bytes: &[u8],
      encoding: RepresentationIdentifier,
      decoder: S,
    ) -> Result<D, S::Error>
    where
      S: Decode<Self::Decoded>,
    {
...
    }

for the decoder to be passed by value rather than reference? Correspondingly, Decode::decode_bytes() takes self rather than &self. (Or maybe &mut self?)

If using a reference is possible, then we could get rid of the Clone requirement in DefaultDecoder::Decoder. Requiring Clone is a bit awkward, because it would really do the cloning at every sample processed.

@phil-opp
Copy link
Contributor Author

So the pipeline is

byte slice → deserialize/decode → value of type Self::Deserialized → call [Self::transform_deserialized] → value of type D

Yes, exactly!

Another question: What is the use case for transform_deserialized() ?

There are some adapters that add a wrapper around the deserialized values. With the normal Deserialize trait, this could happen behind the scenes because the adapters could just call deserialize on the desired type. With DeserializeSeed, the adapters need to explicitly specify the type that the DeserializeSeed instance should create, which might be different than the output type D.

In this PR and the corresponding ros2-client PR it always seems to be the identity function.

Yes, it's almost always the identity function. The one exception is no_key::DAWrapper, which wraps a DeserializerAdapter<D>, but returns a DeserializerAdapter<NoKeyWrapper<D>>. So it's transform_deserialized call wraps the result in a new NoKeyWrapper instance:

NoKeyWrapper::<D> {
    d: DA::transform_deserialized(deserialized),
}

(This implementation is at src/dds/no_key/wrappers.rs#L94-L96.)

@phil-opp
Copy link
Contributor Author

I just created a new branch deserialize-with-seed for developing this feature. Please redirect the PR there.

I just tried, but it looks like all the commits of this PR are already part of the branch.

@phil-opp
Copy link
Contributor Author

Is there a specific reason in

fn from_bytes_with<S>(
      input_bytes: &[u8],
      encoding: RepresentationIdentifier,
      decoder: S,
    ) -> Result<D, S::Error>
    where
      S: Decode<Self::Decoded>,
    {
...
    }

for the decoder to be passed by value rather than reference? Correspondingly, Decode::decode_bytes() takes self rather than &self. (Or maybe &mut self?)

If using a reference is possible, then we could get rid of the Clone requirement in DefaultDecoder::Decoder. Requiring Clone is a bit awkward, because it would really do the cloning at every sample processed.

I just added that code last week to get things working again with the latest changes from master. I'm not satisfied with that part of the implementation either, so it would be great if we could find a better solution.

The reason for 4e8e37d was the loop introduced in 6908f40, which calls deserialize_with multiple times.

I agree that taking a (mutable?) reference to the decoder would be the cleaner solution and would be fine for our use case. Unfortunately, the DeserializeSeed::deserialize method takes self by value, so I'm not sure how we could avoid a Clone bound...

(Side note: If your DeserializeSeed type T only needs &self, you could implement DeserializeSeed for &T as well. Then the .clone() call would be just a pointer copy.)

@jhelovuo
Copy link
Owner

So the pipeline is
byte slice → deserialize/decode → value of type Self::Deserialized → call [Self::transform_deserialized] → value of type D

Yes, exactly!

Ok. Your technical implementation is brilliant. This is quite complicated abstract metaprogramming.

In order to make it more accessible to others, I decided to rename some things. The basic renaming idea is

deserialize = decode + transform.

In your original code, there was confusion between deserialize and decode, as these were used as synonyms. Now it should be more clear that the whole two-step process is "deserialize" and the first step is called "decode".

I am migrating this PR to PR #334 , so we can continue the work there.

@jhelovuo
Copy link
Owner

Is there a specific reason in

...code...

for the decoder to be passed by value rather than reference? Correspondingly, Decode::decode_bytes() takes self rather than &self. (Or maybe &mut self?)
If using a reference is possible, then we could get rid of the Clone requirement in DefaultDecoder::Decoder. Requiring Clone is a bit awkward, because it would really do the cloning at every sample processed.

I just added that code last week to get things working again with the latest changes from master. I'm not satisfied with that part of the implementation either, so it would be great if we could find a better solution.

The reason for 4e8e37d was the loop introduced in 6908f40, which calls deserialize_with multiple times.

The new loop was logically necessary, because it can happen that in a with_key DDS topic, e.g. most of the Discovery topics, we could receive a Dispose instead of data update in a DATA submessage. The Dispose is indicated by sending a Key instead of Data, or alternatively, sending an MD5 hash of the Key.

In case we, as a DataReader, join an already long-running Topic, it may have some past data values published that we can no longer receive, because the History buffers are not deep enough. But we can at some point receive Dispose notifications of these samples we have never heard of. In case the Dispose was via a hash only, there is no way we can tell the application about the Dispose, because we do not know the actual key, so it is best to just ignore it, and move on to the next incoming sample. This is why the looping is needed.

This is something we could perhaps work around to avoid the need for cloning.

I agree that taking a (mutable?) reference to the decoder would be the cleaner solution and would be fine for our use case. Unfortunately, the DeserializeSeed::deserialize method takes self by value, so I'm not sure how we could avoid a Clone bound...

(Side note: If your DeserializeSeed type T only needs &self, you could implement DeserializeSeed for &T as well. Then the .clone() call would be just a pointer copy.)

Ok, the pass-by-value of self in DeserializeSeed::deserialize is something we cannot work around.

Then we must keep the Clone bound, and document that there will be quite a lot of cloning, about one .clone() per deserialized sample, so the application designer knows to invest in making it cheap, like a pointer copy.

@jhelovuo jhelovuo closed this Apr 14, 2024
@phil-opp
Copy link
Contributor Author

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants