From 1c70ae7a5edca3cf21b0a0e637160bc5af323c9a Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 20 Nov 2022 11:59:45 +0000 Subject: [PATCH 1/5] literal-expr.md: say that suffixes are removed before interpreting numbers --- src/expressions/literal-expr.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index 4eec37dcb..ac42dabd4 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -96,6 +96,8 @@ The value of the expression is determined from the string representation of the * If the radix is not 10, the first two characters are removed from the string. +* Any suffix is removed from the string. + * Any underscores are removed from the string. * The string is converted to a `u128` value as if by [`u128::from_str_radix`] with the chosen radix. @@ -136,6 +138,8 @@ let x: f64 = 2.; // type f64 The value of the expression is determined from the string representation of the token as follows: +* Any suffix is removed from the string. + * Any underscores are removed from the string. * The string is converted to the expression's type as if by [`f32::from_str`] or [`f64::from_str`]. From c36b3c27ac305483d8ea261cded87980e392e9ba Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 20 Nov 2022 11:59:45 +0000 Subject: [PATCH 2/5] tokens.md: add one more case to the "Reserved forms" lexer block --- src/tokens.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 8f9bcb1f7..39b95cd40 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -536,7 +536,7 @@ Examples of such tokens: >    | OCT_LITERAL \[`8`-`9`​]\ >    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` \ >          _(not immediately followed by `.`, `_` or an XID_Start character)_\ ->    | ( BIN_LITERAL | OCT_LITERAL ) `e`\ +>    | ( BIN_LITERAL | OCT_LITERAL ) (`e`|`E`)\ >    | `0b` `_`\* _end of input or not BIN_DIGIT_\ >    | `0o` `_`\* _end of input or not OCT_DIGIT_\ >    | `0x` `_`\* _end of input or not HEX_DIGIT_\ @@ -549,7 +549,7 @@ Due to the possible ambiguity these raise, they are rejected by the tokenizer in * An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals). -* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e`. +* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e` or `E`. * Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits). From b333c25a43fcb1964aa1e688e168c0f713865706 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 20 Nov 2022 11:59:45 +0000 Subject: [PATCH 3/5] tokens.md: remove the "examples of invalid integer literals" These are covered under "Reserved forms similar to number literals" --- src/tokens.md | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 39b95cd40..ef1f07c58 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -405,20 +405,6 @@ Examples of integer literals of various forms: Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`. -Examples of invalid integer literals: - -```rust,compile_fail -// uses numbers of the wrong base - -0b0102; -0o0581; - -// bin, hex, and octal literals must have at least one digit - -0b_; -0b____; -``` - #### Tuple index > **Lexer**\ From ccde77edcb4d4607f212bfd9fa65a913defb5055 Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 20 Nov 2022 11:59:45 +0000 Subject: [PATCH 4/5] Numeric literals: say that the parser is now more lenient This reflects the changes made in in rust-lang/rust#102944 . Previously, unknown suffixes were rejected by the parser. Now they are accepted by the parser and rejected at a later stage. Similarly, integer literals too large to fit in u128 are now accepted by the parser. Forms like 5f32 are now INTEGER_LITERAL rather than FLOAT_LITERAL. The notion of a 'pseudoliteral' is no longer required. --- src/expressions/literal-expr.md | 14 ++-- src/tokens.md | 125 +++++++++++++++----------------- 2 files changed, 64 insertions(+), 75 deletions(-) diff --git a/src/expressions/literal-expr.md b/src/expressions/literal-expr.md index ac42dabd4..e5bc2dff4 100644 --- a/src/expressions/literal-expr.md +++ b/src/expressions/literal-expr.md @@ -8,11 +8,9 @@ >    | [BYTE_LITERAL]\ >    | [BYTE_STRING_LITERAL]\ >    | [RAW_BYTE_STRING_LITERAL]\ ->    | [INTEGER_LITERAL][^out-of-range]\ +>    | [INTEGER_LITERAL]\ >    | [FLOAT_LITERAL]\ >    | `true` | `false` -> -> [^out-of-range]: A value ≥ 2128 is not allowed. A _literal expression_ is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule. @@ -54,7 +52,7 @@ A string literal expression consists of a single [BYTE_STRING_LITERAL] or [RAW_B An integer literal expression consists of a single [INTEGER_LITERAL] token. -If the token has a [suffix], the suffix will be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type. +If the token has a [suffix], the suffix must be the name of one of the [primitive integer types][numeric types]: `u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`, and the expression has that type. If the token has no suffix, the expression's type is determined by type inference: @@ -101,7 +99,7 @@ The value of the expression is determined from the string representation of the * Any underscores are removed from the string. * The string is converted to a `u128` value as if by [`u128::from_str_radix`] with the chosen radix. -If the value does not fit in `u128`, the expression is rejected by the parser. +If the value does not fit in `u128`, it is a compiler error. * The `u128` value is converted to the expression's type via a [numeric cast]. @@ -113,9 +111,11 @@ If the value does not fit in `u128`, the expression is rejected by the parser. ## Floating-point literal expressions -A floating-point literal expression consists of a single [FLOAT_LITERAL] token. +A floating-point literal expression has one of two forms: + * a single [FLOAT_LITERAL] token + * a single [INTEGER_LITERAL] token which has a suffix and no radix indicator -If the token has a [suffix], the suffix will be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type. +If the token has a [suffix], the suffix must be the name of one of the [primitive floating-point types][floating-point types]: `f32` or `f64`, and the expression has that type. If the token has no suffix, the expression's type is determined by type inference: diff --git a/src/tokens.md b/src/tokens.md index ef1f07c58..721c4add6 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -72,13 +72,13 @@ Literals are tokens used in [literal expressions]. #### Numbers -| [Number literals](#number-literals)`*` | Example | Exponentiation | Suffixes | -|----------------------------------------|---------|----------------|----------| -| Decimal integer | `98_222` | `N/A` | Integer suffixes | -| Hex integer | `0xff` | `N/A` | Integer suffixes | -| Octal integer | `0o77` | `N/A` | Integer suffixes | -| Binary integer | `0b1111_0000` | `N/A` | Integer suffixes | -| Floating-point | `123.0E+77` | `Optional` | Floating-point suffixes | +| [Number literals](#number-literals)`*` | Example | Exponentiation | +|----------------------------------------|---------|----------------| +| Decimal integer | `98_222` | `N/A` | +| Hex integer | `0xff` | `N/A` | +| Octal integer | `0o77` | `N/A` | +| Binary integer | `0b1111_0000` | `N/A` | +| Floating-point | `123.0E+77` | `Optional` | `*` All number literals allow `_` as a visual separator: `1_234.0E+18f64` @@ -86,17 +86,26 @@ Literals are tokens used in [literal expressions]. A suffix is a sequence of characters following the primary part of a literal (without intervening whitespace), of the same form as a non-raw identifier or keyword. -Any kind of literal (string, integer, etc) with any suffix is valid as a token, -and can be passed to a macro without producing an error. + +> **Lexer**\ +> SUFFIX : IDENTIFIER_OR_KEYWORD\ +> SUFFIX_NO_E : SUFFIX _not beginning with `e` or `E`_ + +Any kind of literal (string, integer, etc) with any suffix is valid as a token. + +A literal token with any suffix can be passed to a macro without producing an error. The macro itself will decide how to interpret such a token and whether to produce an error or not. +In particular, the `literal` fragment specifier for by-example macros matches literal tokens with arbitrary suffixes. ```rust macro_rules! blackhole { ($tt:tt) => () } +macro_rules! blackhole_lit { ($l:literal) => () } blackhole!("string"suffix); // OK +blackhole_lit!(1suffix); // OK ``` -However, suffixes on literal tokens parsed as Rust code are restricted. +However, suffixes on literal tokens which are interpreted as literal expressions or patterns are restricted. Any suffixes are rejected on non-numeric literal tokens, and numeric literal tokens are accepted only with suffixes from the list below. @@ -329,7 +338,7 @@ literal_. The grammar for recognizing the two kinds of literals is mixed. > **Lexer**\ > INTEGER_LITERAL :\ >    ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) -> INTEGER_SUFFIX? +> SUFFIX_NO_E? > > DEC_LITERAL :\ >    DEC_DIGIT (DEC_DIGIT|`_`)\* @@ -350,10 +359,6 @@ literal_. The grammar for recognizing the two kinds of literals is mixed. > DEC_DIGIT : \[`0`-`9`] > > HEX_DIGIT : \[`0`-`9` `a`-`f` `A`-`F`] -> -> INTEGER_SUFFIX :\ ->       `u8` | `u16` | `u32` | `u64` | `u128` | `usize`\ ->    | `i8` | `i16` | `i32` | `i64` | `i128` | `isize` An _integer literal_ has one of four forms: @@ -369,11 +374,11 @@ An _integer literal_ has one of four forms: (`0b`) and continues as any mixture (with at least one digit) of binary digits and underscores. -Like any literal, an integer literal may be followed (immediately, without any spaces) by an _integer suffix_, which must be the name of one of the [primitive integer types][numeric types]: -`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, or `isize`. +Like any literal, an integer literal may be followed (immediately, without any spaces) by a suffix as described above. +The suffix may not begin with `e` or `E`, as that would be interpreted as the exponent of a floating-point literal. See [literal expressions] for the effect of these suffixes. -Examples of integer literals of various forms: +Examples of integer literals which are accepted as literal expressions: ```rust # #![allow(overflowing_literals)] @@ -396,15 +401,29 @@ Examples of integer literals of various forms: 0usize; -// These are too big for their type, but are still valid tokens - +// These are too big for their type, but are accepted as literal expressions. 128_i8; 256_u8; +// This is an integer literal, accepted as a floating-point literal expression. +5f32; ``` Note that `-1i8`, for example, is analyzed as two tokens: `-` followed by `1i8`. + +Examples of integer literals which are not accepted as literal expressions: + +```rust +# #[cfg(FALSE)] { +0invalidSuffix; +123AFB43; +0b010a; +0xAB_CD_EF_GH; +0b1111_f32; +# } +``` + #### Tuple index > **Lexer**\ @@ -428,9 +447,8 @@ let cat = example.01; // ERROR no field named `01` let horse = example.0b10; // ERROR no field named `0b10` ``` -> **Note**: The tuple index may include an `INTEGER_SUFFIX`, but this is not -> intended to be valid, and may be removed in a future version. See -> for more information. +> **Note**: Tuple indices may include certain suffixes, but this is not intended to be valid, and may be removed in a future version. +> See for more information. #### Floating-point literals @@ -438,38 +456,32 @@ let horse = example.0b10; // ERROR no field named `0b10` > FLOAT_LITERAL :\ >       DEC_LITERAL `.` > _(not immediately followed by `.`, `_` or an XID_Start character)_\ ->    | DEC_LITERAL FLOAT_EXPONENT\ ->    | DEC_LITERAL `.` DEC_LITERAL FLOAT_EXPONENT?\ ->    | DEC_LITERAL (`.` DEC_LITERAL)? -> FLOAT_EXPONENT? FLOAT_SUFFIX +>    | DEC_LITERAL `.` DEC_LITERAL SUFFIX_NO_E?\ +>    | DEC_LITERAL (`.` DEC_LITERAL)? FLOAT_EXPONENT SUFFIX?\ > > FLOAT_EXPONENT :\ >    (`e`|`E`) (`+`|`-`)? > (DEC_DIGIT|`_`)\* DEC_DIGIT (DEC_DIGIT|`_`)\* > -> FLOAT_SUFFIX :\ ->    `f32` | `f64` -A _floating-point literal_ has one of three forms: +A _floating-point literal_ has one of two forms: * A _decimal literal_ followed by a period character `U+002E` (`.`). This is optionally followed by another decimal literal, with an optional _exponent_. * A single _decimal literal_ followed by an _exponent_. -* A single _decimal literal_ (in which case a suffix is required). Like integer literals, a floating-point literal may be followed by a suffix, so long as the pre-suffix part does not end with `U+002E` (`.`). -There are two valid _floating-point suffixes_: `f32` and `f64` (the names of the 32-bit and 64-bit [primitive floating-point types][floating-point types]). +The suffix may not begin with `e` or `E` if the literal does not include an exponent. See [literal expressions] for the effect of these suffixes. -Examples of floating-point literals of various forms: +Examples of floating-point literals which are accepted as literal expressions: ```rust 123.0f64; 0.1f64; 0.1f32; 12E+99_f64; -5f32; let x: f64 = 2.; ``` @@ -479,39 +491,16 @@ to call a method named `f64` on `2`. Note that `-1.0`, for example, is analyzed as two tokens: `-` followed by `1.0`. -#### Number pseudoliterals - -> **Lexer**\ -> NUMBER_PSEUDOLITERAL :\ ->       DEC_LITERAL ( . DEC_LITERAL )? FLOAT_EXPONENT\ ->          ( NUMBER_PSEUDOLITERAL_SUFFIX | INTEGER_SUFFIX )\ ->    | DEC_LITERAL . DEC_LITERAL\ ->          ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | INTEGER SUFFIX )\ ->    | DEC_LITERAL NUMBER_PSEUDOLITERAL_SUFFIX_NO_E\ ->    | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )\ ->          ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | FLOAT_SUFFIX ) -> -> NUMBER_PSEUDOLITERAL_SUFFIX :\ ->    IDENTIFIER_OR_KEYWORD _not matching INTEGER_SUFFIX or FLOAT_SUFFIX_ -> -> NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :\ ->    NUMBER_PSEUDOLITERAL_SUFFIX _not beginning with `e` or `E`_ - -Tokenization of numeric literals allows arbitrary suffixes as described in the grammar above. -These values generate valid tokens, but are not valid [literal expressions], so are usually an error except as macro arguments. +Examples of floating-point literals which are not accepted as literal expressions: -Examples of such tokens: -```rust,compile_fail -0invalidSuffix; -123AFB43; -0b010a; -0xAB_CD_EF_GH; +```rust +# #[cfg(FALSE)] { 2.0f80; 2e5f80; 2e5e6; 2.0e5e6; 1.3e10u64; -0b1111_f32; +# } ``` #### Reserved forms similar to number literals @@ -547,13 +536,13 @@ Examples of reserved forms: 0b0102; // this is not `0b010` followed by `2` 0o1279; // this is not `0o127` followed by `9` 0x80.0; // this is not `0x80` followed by `.` and `0` -0b101e; // this is not a pseudoliteral, or `0b101` followed by `e` -0b; // this is not a pseudoliteral, or `0` followed by `b` -0b_; // this is not a pseudoliteral, or `0` followed by `b_` -2e; // this is not a pseudoliteral, or `2` followed by `e` -2.0e; // this is not a pseudoliteral, or `2.0` followed by `e` -2em; // this is not a pseudoliteral, or `2` followed by `em` -2.0em; // this is not a pseudoliteral, or `2.0` followed by `em` +0b101e; // this is not a suffixed literal, or `0b101` followed by `e` +0b; // this is not an integer literal, or `0` followed by `b` +0b_; // this is not an integer literal, or `0` followed by `b_` +2e; // this is not a floating-point literal, or `2` followed by `e` +2.0e; // this is not a floating-point literal, or `2.0` followed by `e` +2em; // this is not a suffixed literal, or `2` followed by `em` +2.0em; // this is not a suffixed literal, or `2.0` followed by `em` ``` ## Lifetimes and loop labels From 018b14be9ded4e5890813dcf1209961298b0873e Mon Sep 17 00:00:00 2001 From: Matthew Woodcraft Date: Sun, 20 Nov 2022 11:59:45 +0000 Subject: [PATCH 5/5] tokens.md: add optional SUFFIX to the Lexer blocks for non-numeric literals --- src/tokens.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 721c4add6..0067b647d 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -119,7 +119,7 @@ and numeric literal tokens are accepted only with suffixes from the list below. > **Lexer**\ > CHAR_LITERAL :\ ->    `'` ( ~\[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'` +>    `'` ( ~\[`'` `\` \\n \\r \\t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) `'` SUFFIX? > > QUOTE_ESCAPE :\ >    `\'` | `\"` @@ -145,7 +145,7 @@ which must be _escaped_ by a preceding `U+005C` character (`\`). >       | ASCII_ESCAPE\ >       | UNICODE_ESCAPE\ >       | STRING_CONTINUE\ ->    )\* `"` +>    )\* `"` SUFFIX? > > STRING_CONTINUE :\ >    `\` _followed by_ \\n @@ -205,7 +205,7 @@ following forms: > **Lexer**\ > RAW_STRING_LITERAL :\ ->    `r` RAW_STRING_CONTENT +>    `r` RAW_STRING_CONTENT SUFFIX? > > RAW_STRING_CONTENT :\ >       `"` ( ~ _IsolatedCR_ )* (non-greedy) `"`\ @@ -242,7 +242,7 @@ r##"foo #"# bar"##; // foo #"# bar > **Lexer**\ > BYTE_LITERAL :\ ->    `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'` +>    `b'` ( ASCII_FOR_CHAR | BYTE_ESCAPE ) `'` SUFFIX? > > ASCII_FOR_CHAR :\ >    _any ASCII (i.e. 0x00 to 0x7F), except_ `'`, `\`, \\n, \\r or \\t @@ -262,7 +262,7 @@ _number literal_. > **Lexer**\ > BYTE_STRING_LITERAL :\ ->    `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )\* `"` +>    `b"` ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )\* `"` SUFFIX? > > ASCII_FOR_STRING :\ >    _any ASCII (i.e 0x00 to 0x7F), except_ `"`, `\` _and IsolatedCR_ @@ -293,7 +293,7 @@ following forms: > **Lexer**\ > RAW_BYTE_STRING_LITERAL :\ ->    `br` RAW_BYTE_STRING_CONTENT +>    `br` RAW_BYTE_STRING_CONTENT SUFFIX? > > RAW_BYTE_STRING_CONTENT :\ >       `"` ASCII* (non-greedy) `"`\