-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serdect: is it actually constant-time? #1111
Comments
The README describes it as follows:
Really the goal is to be less noisy than e.g. a JSON serialization of bytes as an integer array. PRs accepted to improve the wording in the README.
The slice serializer just uses the
|
It seems that it is different, since
That's true, but then binary formats instead prefix separate elements, which kind of defeats the point. |
Can you point to a specific implementation, and ideally, write some test cases which capture the issue? |
cc @daxpedda |
Let me make an MRE. Also, speaking of |
Again, we provide both options. Use the slice serializer if you're okay with a length prefix. |
They are not completely interchangeable. |
I am loathe to change it without an actual empirical demonstration of timing variability or other performance issue. Can you put one together? |
Also note there is a long, long history of anecdotal reports like this with the various serde serializers across the @RustCrypto project, which is why we put this crate together and why it tests against various format implementations, so I would really like to see running code as the motivation for any changes. |
On it now. But if the length of the serialized bytestring is data-dependent, doesn't timing variability naturally follow? Not to mention that it's quite unexpected to get a variable-sized serialized data when you thought you were serializing a constant-sized array. |
If the length varies, then the timing will as well, yes. However, that would seem like a sharp edge and counterclaim to your argument that using a length-prefixed serialization would be better than one guaranteed fixed-length by the type system, if anything. |
use serde::{Serialize, Serializer};
#[derive(Serialize)]
struct A(
#[serde(serialize_with = "serdect::slice::serialize_hex_lower_or_bin")]
[u8; 8]
);
#[derive(Serialize)]
struct B(
#[serde(serialize_with = "serdect::array::serialize_hex_lower_or_bin")]
[u8; 8]
);
struct C([u8; 8]);
impl Serialize for C {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.serialize_bytes(&self.0)
}
}
fn main() {
let a = A([1, 2, 254, 255, 3, 4, 252, 253]);
let b = rmp_serde::encode::to_vec(&a);
println!("messagepack + slice: {:?}", b);
let b = bincode::serialize(&a).unwrap();
println!("bincode + slice: {:?}", b);
let a = B([1, 2, 254, 255, 3, 4, 252, 253]);
let b = rmp_serde::encode::to_vec(&a);
println!("messagepack + array: {:?}", b);
let b = bincode::serialize(&a).unwrap();
println!("bincode + array: {:?}", b);
let a = C([1, 2, 254, 255, 3, 4, 252, 253]);
let b = rmp_serde::encode::to_vec(&a);
println!("messagepack + serialize_bytes: {:?}", b);
let b = bincode::serialize(&a).unwrap();
println!("bincode + serialize_bytes: {:?}", b);
} Output:
So Bincode behaves well, MessagePack doesn't.
The type system guarantees that the format will receive a constant-sized tuple, but not what it will do with it. Of course, giving it a byte slice instead does not guarantee it either, but it seems to me that it is a better "best effort" for constant-timeness. |
I'm not sure I see anything actionable there? |
The "204" elements are not in the actual data and are added before any byte with the value above 127, making the serialization length (and time) data-dependent. Also it shows the difference between |
What are you suggesting as a mitigation? (Honestly, this seems like an argument against MessagePack, not |
It sounds like moving to It also sounds like it might be a difficult breaking change, depending on how various format implementations handle it. |
There may be other formats that do the same. My argument is that Yes, it will be certainly a breaking ABI change, there's no way around it. |
We can only provide guarantees for the formats we test against, so for any format you want to have guarantees, it needs to be added to our test suite. Obviously we can't provide guarantees for things we haven't tested against. |
If one variant works (in the sense of providing constant-time guarantees) in Bincode and CBOR (currently tested ones), and the other works in Bincode, CBOR, and MessagePack, I would argue that the latter variant is better. MessagePack is about 70 times more popular than CBOR according to crates.io download data. |
Can you open a separate issue for a potential switch to Also I think we can close this issue as "asked and answered". If you feel the existing documentation is insufficient, please open a PR. |
Still waiting for the new issue, but I'm not exactly following what the issue is here.
Also this doesn't seem like a constant-time issue to me, that different sizes have different timings is kind of unavoidable. I don't think this is what |
Deficiency or not, it's a de-facto standard used by a lot of people.
Same sizes, different data. |
We deal with both types of data. Whether or not we need to add a redundant length prefix to every other format for encoded arrays as a workaround for how MessagePack handles tuples is a rather ugly and highly debatable tradeoff. I think we should perhaps get a bit more diligent about our list of supported format implementations as it seems we should be worrying about formats that do wacky things like serialize binary data in variable-time. It'd be good to get a table added to the README.md at least. |
To be fair, with the tuple approach, we are serializing separate |
So MessagePack seems to support fixed sized arrays and tuples.
Are you saying the output length is different depending on data if the size is the same? |
Yes, see the posted example. Every byte over 127 gets prefixed with |
Curious if there are other Rust (or otherwise) implementations of MessagePack which handle this better |
It's not about that. The problem is that serializing |
Apologies, I think I finally understand the problem now, I should have been reading more carefully. Yeah, that's definitely an issue. So as you are saying, |
I'm looking into it now, and it doesn't seem like the format supports fixed-size |
But shouldn't it add this prefix for every |
Nope, see the first line:
A single byte starting from the bit 0 is an |
Ah I see ... that's so weird ... Apologies if you already explained that, but why is |
|
I guess the issue is that @tarcieri I think I'm in favor of changing this now, simply because |
As I said, I think it might make sense for slices. Arrays are more debatable. |
Personally, while the array thing is inconvenient, I can live without it, but I would like to use |
Yeah, I'm in favor of both. Though I understand that there is a tradeoff here. |
By the way, apparently CBOR uses packing as well. I replaced |
Created PR #1112 |
See #1111 for the discussion leading to this PR.
The binary serializer uses
serializer.serialize_tuple()
andserialize_element()
which, in some formats at least, makes it data-dependent. E.g. MessagePack prepends every element greater than 127 with0xCC
.Also, this contradicts the documentation claim:
What was the reason behind not using
serialize_bytes()
? Seems like it would provide better constant-time guarantees?The text was updated successfully, but these errors were encountered: