-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for RFC-7049 Canonical CBOR key ordering #13
Comments
I think that's a bit beyond the scope of this crate. Can this be done outside of this crate? we could have a like struct CanonicalSerializer<W>(Cbor4iiSerializer<W>);
impl<W: Write> Serializer for &mut CanonicalSerializer<W> {
type SerializeSeq = <Cbor4iiSerializer<W> as Serializer>::SerializeSeq;
// forward to Cbor4iiSerializer
fn collect_map<K, V, I>(self, iter: I) -> Result<(), Self::Error> {
// your code
}
} |
That's fine. But it brings me to the next point (let me know if I should open a separate issue). As you can see on that implementation, I need access to some things (like |
The thing I can think of that prevents |
Thanks! I missed that one, that is exactly what I needed.
Oh right, I should've thought of this, I'll just do that.
As I mentioned, I need a few other changes that are not really general purpose (like this sort order), hence don't make sense to be upstreamed/published in a generic way. So I thought I just fork the whole Serde part and copy it over. Though the idea of wrapping the serializer sounds interesting. I'll have a look and come back to you in case I need additional things exposed. One more thing in regards to the original issue. I find it quite useful if you're able to serialize things into memory. In the above code I've implemented |
I didn't initially think about how to expose them, so I chose not to. Currently you can implement I would consider adding a mod that provides these helper type. |
I find the |
I would like to expose a |
Thanks a lot for all the help. I think I can now solely rely on the core features (without For those who need this kind of collation, this is the version updated for cbor4ii 0.2.20: fn collect_map<K, V, I>(self, iter: I) -> Result<(), Self::Error>
where
K: ser::Serialize,
V: ser::Serialize,
I: IntoIterator<Item = (K, V)>,
{
use serde::ser::SerializeMap;
#[cfg(not(feature = "use_std"))]
use crate::alloc::vec::Vec;
use crate::core::utils::BufWriter;
// CBOR RFC-7049 specifies a canonical sort order, where keys are sorted by length first.
// This was later revised with RFC-8949, but we need to stick to the original order to stay
// compatible with existing data.
// We first serialize each map entry into a buffer and then sort those buffers. Byte-wise
// comparison gives us the right order as keys in DAG-CBOR are always strings and prefixed
// with the length. Once sorted they are written to the actual output.
let mut buffer = BufWriter::new(Vec::new());
let mut mem_serializer = Serializer::new(&mut buffer);
let mut serializer = Collect {
bounded: true,
ser: &mut mem_serializer,
};
let mut entries = Vec::new();
for (key, value) in iter {
serializer.serialize_entry(&key, &value)
.map_err(|_| enc::Error::Msg("Map entry cannot be serialized.".into()))?;
entries.push(serializer.ser.writer.buffer().to_vec());
serializer.ser.writer.clear();
}
enc::MapStartBounded(entries.len()).encode(&mut self.writer)?;
entries.sort_unstable();
for entry in entries {
self.writer.push(&entry)?;
}
Ok(())
} |
I re-open this issue as I'm coming back to your proposal from #13 (comment), where I try to wrap your serializer. It almost works. The problem is the serializer to serialize the map. In the code from the comment above, I use: let mut serializer = Collect {
bounded: true,
ser: &mut mem_serializer.0,
}; But currently I then tried it with: let mut serializer = Self::SerializeMap {
bounded: true,
ser: &mut mem_serializer.0,
}; But that errors with:
Next try (thanks to the help from a colleague) was: let mut serializer = <&mut Serializer<_> as serde::Serializer>::SerializeMap {
bounded: true,
ser: &mut mem_serializer.0,
}; That would work, but sadly only on nightly due to rust-lang/rust#86935. So what to do now? Is making |
We should be able to implement a separate or we just need to serialize these two objects into the same buffer, it doesn't necessarily need like let mut buffer = BufWriter::new(Vec::new());
let mut entries = Vec::new();
for (key, value) in iter {
let mut mem_serializer = Serializer::new(&mut buffer);
key.serialize(&mut mem_serializer)?;
value.serialize(&mut mem_serializer)?;
entries.push(buffer.buffer().to_vec());
buffer.clear();
} |
Wow, that's even better. I've been spending so much time on Serde, but I'm still not good at it. Thanks a lot this works! I just need to look into some error handling issues, but that should be easy. |
Here comes the full source of the updated version: fn collect_map<K, V, I>(self, iter: I) -> Result<(), Self::Error>
where
K: ser::Serialize,
V: ser::Serialize,
I: IntoIterator<Item = (K, V)>,
{
// CBOR RFC-7049 specifies a canonical sort order, where keys are sorted by length first.
// This was later revised with RFC-8949, but we need to stick to the original order to stay
// compatible with existing data.
// We first serialize each map entry into a buffer and then sort those buffers. Byte-wise
// comparison gives us the right order as keys in DAG-CBOR are always strings and prefixed
// with the length. Once sorted they are written to the actual output.
let mut buffer = BufWriter::new(Vec::new());
let mut entries = Vec::new();
for (key, value) in iter {
let mut mem_serializer = Serializer::new(&mut buffer);
key.serialize(&mut mem_serializer)
.map_err(|_| enc::Error::Msg("Map key cannot be serialized.".into()))?;
value
.serialize(&mut mem_serializer)
.map_err(|_| enc::Error::Msg("Map key cannot be serialized.".into()))?;
entries.push(buffer.buffer().to_vec());
buffer.clear();
}
enc::MapStartBounded(entries.len()).encode(&mut self.0.writer())?;
entries.sort_unstable();
for entry in entries {
self.0.writer().push(&entry)?;
}
Ok(())
} My pub struct Serializer<W>(cbor4ii::serde::Serializer<W>);
impl<W> Serializer<W> {
pub fn new(writer: W) -> Serializer<W> {
Serializer(Cbor4iiSerializer::new(writer))
}
} |
This library explicitly specifies RF-8949, so this request might be out of scope.
In the project I want to use cbor4ii for I'm stuck with RFC-7049 Canonical CBOR key ordering. This means that keys are sorted by their length first. I wonder if that could perhaps be added behind a feature flag. Here is an implementation the seems to work. I didn't create a PR as this clearly needs more discussion first.
I'd also like to note that I need even more changes for my use case (it's a subset of CBOR), for which I will need to fork this library. Nonetheless I think it would be a useful addition and I'd also prefer if the fork would be as minimal as possible. I thought I bring it up, to make clear that it won't be a showstopper if this change wouldn't be accepted.
The text was updated successfully, but these errors were encountered: