-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantics of deserialisation #870
Comments
I assume that its not ok to chicken out and say “if the argument is not a subtype of the expected type, the only guarantee is that the result is a member of the expected type”, i.e. partial correctness + safety afterwards, else you wouldn't raise this point, right? |
You mean if they aren't subtypes it could be any type-correct value? So |
Yes… it is one way out. It’s not like the caller could not have put |
But this may get them banned without being guilty. Why is this laxer condition better than the one above? |
What do you mean?
Easier to implement? But maybe it is what we do already. Probably it is. I didn’t notice you slightly changed it over what we discussed on the other PR. So maybe it is ok. |
Did I change it? Not consciously. I mean that it's like a man-in-the-middle attack, were the man is undefined behaviour. |
I wonder what is the threat model: We only ever get into that situation if someone sends a value of a wrong type. How much pity do we need to have for that case? Isn’t it just a case of garbage in - garbabe out? It would be the same under the “old“ scheme without the type description, so why does it matter now? |
Well, the type mismatch could be accidental. There is no concrete threat model, but it still seems rather fishy. Violating the idea of typing that way. |
But you agree that we are trying to fix a problem that we also had before we had the type section, and that would have been unfixable back then, and it didn’t bother us then? |
I am not entirely sure, to be honest. Before, we had a weaker format, it was basically untyped and did not promise much. Now it's typed and you may think that you can rely on that. But you can't. In a way, that is worse. |
If we find an elegant characterization of what deserialization does on badly typed input, I am not opposed, but given that it didn’t bother us before I would hesitsate to go out of our way (significant complication or overhead during deserialization) to fix it. And extra work will have to be done. For example, the current code would happily skip over a a malformed The decoder doesn't do that yet, but for any type with a statically known size (say, |
What for? It doesn't sound like a relevant cost to check wf of a singleton values. The case of an unused Text field is a bit more interesting, but even there I would argue that it is not worth skipping over it. It doesn't simplify the implementation notably (you have the decoder anyway), and the cost also probably never matters in practice. So I'd argue against premature optimisation of this sort. |
I don't think it’s premature optimization, because I am not complicating the code in order to make it faster, I am just resiting against extra complexitiy for an unclear use case. But point taken that these two examples are not compelling enough. So the goal of the above criterion (as opposed to the simple “check it’s a subtype” is that we don't want to do needless work checking types of values that we can ignore (because there is no such value), but we do want to do the work of check the bytes of the values that are there but which are ignored. Seems a bit arbitrary, but I can live with that (and add the necessary checks in the We of course would still accept ignored garbabe in future types, where we cannot know the structure. So it stays a best-effort approach. So does that decoding algorithm satisfy the above criterion? … probably … So I guess I am on board here. (I just hope that this is all not just motivated by wanting to get the negative subtyping check in the propsal #832 acceptable :-)) |
The primary goal here is to have a sufficiently clean correctness condition for deserialisation, since I find UB troublesome. But future type are another good point, hadn't considered those. So yeah, in it's best effort for types that are not understood by the deserialiser and the spec it operates against. I'd very much like to get rid of the subtype check in #832. Can't say I have a good idea, though. :( |
The Rust implementation does check types while skipping the value. The |
Does it also check values? I.e. check that the a text value is valid UTF8? |
Yes, it will call the corresponding deserialize function as if it were not skipped and then throw away the result: https://github.com/dfinity-lab/sdk/blob/master/lib/serde_idl/src/de.rs#L278 |
Even if we validate stuff like UTF8, what do we do for function references? There we can't do any kind of checks, unless we do a full, proper subtype check – and then we could just do it for the whole message and only accept “real subtypes” of the expected type. |
The discussion spilled a bit into #1830 (comment), good questions there. |
I believe we can close this, the Candid spec is now pretty clear on how deserialization should work. |
The design doc currently defines deserialisation of IDL values by an elaboration function that is only defined when subtyping holds. However, that is not how we implement it, because deserialisation actually proceeds by dynamically traversing types and value. In particular, it avoids descending into portions of the types for which no corresponding value is present, such as unused cases of a variant or the element type of empty options or vectors.
The consequence is that more values deserialise than actually should according to the spec. In particular, it is not always discovered when the annotated type is not actually a subtype of the target type. For example,
will succeed if v is an empty vector, but T and U are incompatible vector types, such as
vec Text
vsvec Nat
. Similarly for options or variants.We want to specify the precise (necessary and sufficient) precondition under which deserialisation succeeds. One candidate condition might be
i.e., the value has a "more principle" type T' that is a subtype of both annotated and target type. In the previous example, that might be
vec empty
. Effectively, this works around the requirement to always produce a principal type.However, this is a somewhat brittle condition. For example, with some of the possible implementation choices discussed on #832 (branching on the truth of a subtype statement) it would break.
Unclear what to specify here, or whether we should actually enforce the subtype check (undesirable).
The text was updated successfully, but these errors were encountered: