Revising the (`Try`)`FromBytes` Conversion Methods in 0.8 #1095

jswrenn · 2024-04-05T15:34:42Z

jswrenn
Apr 5, 2024
Maintainer

The 0.7 edition of zerocopy defines a large set of constructors on the FromBytes trait. These methods construct Self, &Self, &mut Self, &[Self] and &mut [Self] from input byte slices of the appropriate mutability.

As we prepare for the 0.8 release, we would like to reconsider the naming and semantics of these methods. In 0.8, we will additionally be introducing a TryFromBytes trait, which should adhere to the same conventions adopted for FromBytes.

Naming

Decision: Option (1). See #1095 (comment) for notes.

We are considering these naming conventions:

(try_){mut,ref,read,slice,mut_slice}_from(_{prefix,suffix})
Produces:
- mut_from
- mut_from_prefix
- mut_from_suffix
- mut_slice_from
- mut_slice_from_prefix
- mut_slice_from_suffix
- read_from
- read_from_prefix
- read_from_suffix
- ref_from
- ref_from_prefix
- ref_from_suffix
- slice_from
- slice_from_prefix
- slice_from_suffix
- try_mut_from
- try_mut_from_prefix
- try_mut_from_suffix
- try_mut_slice_from
- try_mut_slice_from_prefix
- try_mut_slice_from_suffix
- try_read_from
- try_read_from_prefix
- try_read_from_suffix
- try_ref_from
- try_ref_from_prefix
- try_ref_from_suffix
- try_slice_from
- try_slice_from_prefix
- try_slice_from_suffix
(try)_from_{mut,ref,read,slice,mut_slice}(_{prefix,suffix})
Produces:
- from_mut
- from_mut_prefix
- from_mut_slice ← Misleading (Special-case?)
- from_mut_slice_prefix ← Misleading (Special-case?)
- from_mut_slice_suffix ← Misleading (Special-case?)
- from_mut_suffix
- from_read ← Ungrammatical (Special-case?)
- from_read_prefix ← Ungrammatical (Special-case?)
- from_read_suffix ← Ungrammatical (Special-case?)
- from_ref
- from_ref_prefix
- from_ref_suffix
- from_slice ← Misleading (Special-case?)
- from_slice_prefix ← Misleading (Special-case?)
- from_slice_suffix ← Misleading (Special-case?)
- try_from_mut
- try_from_mut_prefix
- try_from_mut_slice ← Misleading (Special-case?)
- try_from_mut_slice_prefix ← Misleading (Special-case?)
- try_from_mut_slice_suffix ← Misleading (Special-case?)
- try_from_mut_suffix
- try_from_read ← Ungrammatical (Special-case?)
- try_from_read_prefix ← Ungrammatical (Special-case?)
- try_from_read_suffix ← Ungrammatical (Special-case?)
- try_from_ref
- try_from_ref_prefix
- try_from_ref_suffix
- try_from_slice ← Misleading (Special-case?)
- try_from_slice_prefix ← Misleading (Special-case?)
- try_from_slice_suffix ← Misleading (Special-case?)

Observations:

The distinction between these conventions is whether they emphasize the nature of the type being constructed, or the nature of the buffer it is constructed from.
read does not fit neatly into the latter convention. Do we special-case this convention so these are instead in the form read_from rather than from_read?
slice does not fit neatly into the latter convention. Do we special case this so slice appears early in the name, as it does in the first convention?
In both conventions, read is the only verb here. We need something because we don't want to just having a method named from. Using val instead of read might be more consistent, but, OTOH, read clearly signposts that this method performs a copy (which is valuable!).

Input

Decision: APIs will continue to take concrete slices rather than generic parameters. See #1095 (comment) for notes.

To preserve object safety, these methods consume &[u8] and &mut [u8] rather than impl ByteSlice and impl ByteSliceMut. This continues to seem sensible.

We do not know whether our users rely on FromBytes being object safe. Defensively, we might want to consider preemtively violating object safety, so that we have the SemVer freedom to either restore it later, or generalize these methods to ByteSlice and ByteSliceMut.

Panicking

Decision: APIs will continue to return explicit failure values rather than panicking, although we leave open the possibility of adding panicking methods in addition as a convenience. See #1095 (comment) for notes.

To leave the decision of panicing up to customers, these methods do not panic; they either return Option or Result.

Output

Presently, most of these methods only return the deserialized value in the success case; e.g.:

let Some(des) = T::mut_from_prefix(buffer) else {
    /* handle failure */
};

But is this all that these methods should return?

Return the Remaining Bytes?

Decision: Lean towards returning remaining bytes, but this is not yet committed to. See #1095 (comment) for notes.

In many use cases, the underlying buffer must be advanced after parsing. With the above API, this can be done manually:

let Some(des) = T::mut_from_prefix(buffer) else {
    /* handle failure */
};
buffer = &mut buffer[core::mem::size_of_val(*&des)..];

The commonality of this pattern suggests we modify our API to additionally return the excess bytes:

let Some((des, rem)) = T::mut_from_prefix(buffer) else {
    // handle failure
};
buffer = rem;

Return the Original Buffer?

Decision: Return the original buffer in the Err variant of a Result. See #1095 (comment) for notes.

In the event of failure, neither the minimal nor the above augmented API permits using the original buffer in error handling. For instance, attempting this:

loop {
    let Some((des, rem)) = T::mut_from_prefix(buffer) else {
        panic!("could not parse from {buffer:?}");
    };
    buffer = rem;
}

...produces this error message:

error[E0502]: cannot borrow `buffer` as immutable because it is also borrowed as mutable
 --> src/main.rs:9:38
  |
8 |     let Some((d, remainder)) = mut_from_prefix_split::<u8>(buffer) else {
  |                                                            ------ mutable borrow occurs here
9 |         panic!("could not parse from {buffer:?}");
  |                                      ^^^^^^^^^^
  |                                      |
  |                                      immutable borrow occurs here
  |                                      mutable borrow later used here

We can rememdy this by returning a Result instead, that provides the original buffer upon failure; permitting, e.g.:

let (des, rem) = match T::mut_from_prefix(buffer) {
    Ok((des, rem)) => (des, rem),
    Err(buffer) => {
        panic!("could not parse from {buffer:?}");
    }
};
buffer = rem;

Output Considerations

Ergonomics

How much complexity does each API add for users who only need the deserialized value? Under each output API, such a user would write:

Minimal:

let Some(des) = T::mut_from_prefix(buffer) else {
    /* handle failure */
};

Returning remainder:

let Some((des, _)) = T::mut_from_prefix(buffer) else {
    /* handle failure */
};

Also returning original buffer after failure:

let Ok((des, _)) = T::mut_from_prefix(buffer) else {
    /* handle failure */
};

How much complexity does each API add for users who need the excess bytes?

Minimal:

let Some(des) = T::mut_from_prefix(buffer) else {
    /* handle failure */
};
buffer = &mut buffer[core::mem::size_of_val(&*des)];

Returning remainder:

let Some((des, rem)) = T::mut_from_prefix(buffer) else {
    /* handle failure */
};
buffer = rem;

Also returning original buffer after failure:

let Ok((des, rem)) = T::mut_from_prefix(buffer) else {
    /* handle failure */
};
buffer = rem;

Performance

Presently, zerocopy computes the deserialization–remainder split point at its very lowest levels of abstraction. Under the minimal API, this split is discarded. A user (as seen above) can easily recompute the remainder with size_of_val, but doing so may generate additional runtime bounds checks. The remainder-returning option might offer a slight performance improvement for users requiring the excess bytes.

Optimization Misses

It is plausible that under some circumstances that these three conditions are all true:

A customer might only need the deserialized value.
The customer's compiler might fail to optimize out the computations for the unused remainder bytes returned by the more fully fledged APIs.
This failure to optimize might be intollerable.

For these circumstances, zerocopy provides the Unalign wrapper type and transmute macros. The transmute macros are as close to truly zero-cost of an API that zerocopy provides: they perform no runtime checks and they do not invoke any functions besides mem::transmute.

For example, to replicate Foo::ref_from with full control over runtime checks, one could write something like:

// runtime check: do we have sufficient bytes?
let Ok(bytes) = <&[u8; size_of::<Foo>()]>::try_from(buffer) else {
    panic!("wrong number of bytes");
};

// no runtime checks
let des: &Unalign<Foo> = transmute_ref!(bytes);

// runtime check: is the deserialization well-aligned?
let Some(des) = des.try_deref() else {
    panic!("wrong alignment");
};

Precise control over the semantics and runtime checks is achieved by changing the code before and after the transmute_ref! invocation.

Conclusions

To be determined.

Appendix: Higher Fidelity Errors

None of the return types evaluated in this proposal provide detailed information about why failures occured. Did a FromBytes deserialization fail because of misalignment? Because of insufficient bytes? Or, in the case of TryFromBytes, because of invalid data?

In the Result-returning API, the failure reason can, at least, be re-computed. However, perhaps we should provide the failure reason explicitly; e.g.:

pub enum DeserializationError<A, L, V> {
    /// The deserialization buffer was improperly aligned.
    Alignment(A),
    /// The deserialization buffer was of insufficient length.
    Length(L),
    /// The deserialization buffer contained invalid data.
    Validity(V),
}

pub struct AlignmentError(/* ... */);
pub struct LengthError(/* ... */);
pub struct ValidityError(/* ... */);

// The `!` signals that the validity error condition is unreachable.
type FromBytesError = DeserializationError<AlignmentError, LengthError, !>;
type TryFromBytesError = DeserializationError<AlignmentError, LengthError, ValidityError>;

See #528 for further discussion.

A point of comparison: bytemuck provides this information with its PodCastError type. A cursory search of Github suggests that this fidelity is useful to some customers.

Related Discussions

joshlf · 2024-04-08T19:01:28Z

joshlf
Apr 8, 2024
Maintainer

Notes from a synchronous conversation with @jswrenn.

Naming

Going with option (1). I've updated #5 to list renaming the TryFromBytes methods as a blocker.

Input

Reborrowing is very low-cost from a syntax perspective, and is often performed implicitly. Given that reborrowing is already possible, the API complexity cost of adding impl ByteSlice[Mut] is not worth it.

Panicking

Keeping the status quo: Our API does not panic. Users can easily write .unwrap() or .expect(). We leave open the possibility of adding panicking methods later for convenience, but these will always considered convenience wrappers, and will be secondary in importance to our non-panicking methods.

Output

Return the Remaining Bytes?

We lean towards returning the remaining bytes, but we don't commit to it yet. It's easier to discard extra information that isn't needed than to manually recompute the remaining bytes if they are needed but our API does not return them. Users who have stricter requirements regarding what code is effectively optimized can use a technique like that described in the "Performance" sub-section.

Note that this is not a firm decision: There are still discussions to resolve in #884, #1051, and #1059.

Return the Original Buffer?

We will return the original buffer in the Err variant of a Result. Code that doesn't care about the original buffer is either completely unmodified or only slightly modified compared to the existing API (which returns an Option). Examples like the "loop" case are quite painful to work around if the original buffer is not returned.

0 replies

joshlf · 2024-05-07T15:52:40Z

joshlf
May 7, 2024
Maintainer

The discussion itself is complete, and the work is now either done or tracked (in #871), so I'm going to close this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revising the (`Try`)`FromBytes` Conversion Methods in 0.8 #1095

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Revising the (Try)FromBytes Conversion Methods in 0.8 #1095

jswrenn Apr 5, 2024 Maintainer

Naming

Input

Panicking

Output

Return the Remaining Bytes?

Return the Original Buffer?

Output Considerations

Ergonomics

Performance

Optimization Misses

Conclusions

Appendix: Higher Fidelity Errors

Related Discussions

Replies: 2 comments

joshlf Apr 8, 2024 Maintainer

Naming

Input

Panicking

Output

Return the Remaining Bytes?

Return the Original Buffer?

joshlf May 7, 2024 Maintainer

Revising the (`Try`)`FromBytes` Conversion Methods in 0.8 #1095

jswrenn
Apr 5, 2024
Maintainer

joshlf
Apr 8, 2024
Maintainer

joshlf
May 7, 2024
Maintainer