Skip to content

Commit

Permalink
Rollup merge of #117534 - RalfJung:str, r=Mark-Simulacrum
Browse files Browse the repository at this point in the history
clarify that the str invariant is a safety, not validity, invariant

Updates these docs to match rust-lang/reference#792
  • Loading branch information
matthiaskrgr committed Nov 4, 2023
2 parents 805a56f + 0550ba5 commit 1ee5e12
Showing 1 changed file with 17 additions and 11 deletions.
28 changes: 17 additions & 11 deletions library/core/src/primitive_docs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ mod prim_never {}
/// Surrogate code points, used by UTF-16, are in the range 0xD800 to 0xDFFF.
///
/// No `char` may be constructed, whether as a literal or at runtime, that is not a
/// Unicode scalar value:
/// Unicode scalar value. Violating this rule causes undefined behavior.
///
/// ```compile_fail
/// // Each of these is a compiler error
Expand All @@ -308,9 +308,10 @@ mod prim_never {}
/// let _ = unsafe { char::from_u32_unchecked(0x110000) };
/// ```
///
/// USVs are also the exact set of values that may be encoded in UTF-8. Because
/// `char` values are USVs and `str` values are valid UTF-8, it is safe to store
/// any `char` in a `str` or read any character from a `str` as a `char`.
/// Unicode scalar values are also the exact set of values that may be encoded in UTF-8. Because
/// `char` values are Unicode scalar values and functions may assume [incoming `str` values are
/// valid UTF-8](primitive.str.html#invariant), it is safe to store any `char` in a `str` or read
/// any character from a `str` as a `char`.
///
/// The gap in valid `char` values is understood by the compiler, so in the
/// below example the two ranges are understood to cover the whole range of
Expand All @@ -324,11 +325,10 @@ mod prim_never {}
/// };
/// ```
///
/// All USVs are valid `char` values, but not all of them represent a real
/// character. Many USVs are not currently assigned to a character, but may be
/// in the future ("reserved"); some will never be a character
/// ("noncharacters"); and some may be given different meanings by different
/// users ("private use").
/// All Unicode scalar values are valid `char` values, but not all of them represent a real
/// character. Many Unicode scalar values are not currently assigned to a character, but may be in
/// the future ("reserved"); some will never be a character ("noncharacters"); and some may be given
/// different meanings by different users ("private use").
///
/// `char` is guaranteed to have the same size and alignment as `u32` on all
/// platforms.
Expand Down Expand Up @@ -894,8 +894,6 @@ mod prim_slice {}
/// type. It is usually seen in its borrowed form, `&str`. It is also the type
/// of string literals, `&'static str`.
///
/// String slices are always valid UTF-8.
///
/// # Basic Usage
///
/// String literals are string slices:
Expand Down Expand Up @@ -949,6 +947,14 @@ mod prim_slice {}
/// Note: This example shows the internals of `&str`. `unsafe` should not be
/// used to get a string slice under normal circumstances. Use `as_str`
/// instead.
///
/// # Invariant
///
/// Rust libraries may assume that string slices are always valid UTF-8.
///
/// Constructing a non-UTF-8 string slice is not immediate undefined behavior, but any function
/// called on a string slice may assume that it is valid UTF-8, which means that a non-UTF-8 string
/// slice can lead to undefined behavior down the road.
#[stable(feature = "rust1", since = "1.0.0")]
mod prim_str {}

Expand Down

0 comments on commit 1ee5e12

Please sign in to comment.