📎 Formatter: Number literal separators #3294

MichaReiser · 2022-09-30T11:10:14Z

Description

Consistently format separators in number literals example

Test Cases

js/literal-numeric-separator/test.js
js/literal/number.js

The text was updated successfully, but these errors were encountered:

MichaReiser · 2022-10-11T08:55:58Z

@xunilrj would this be a formatter improvement you're interested in tackling? I know you're good at high performance string manipulation :)

github-actions · 2022-10-25T12:05:10Z

This issue is stale because it has been open 14 days with no activity.

jeysal · 2022-11-01T19:56:21Z

Hi! I wanted to get to know the Rome codebase a bit so I thought I'd look at a formatter issue and then a lint rule, or something like that. I've been trying this one out and made some progress so I think I'll be able to submit a PR for it soon.

From what I understand (from the comments on #2442), we can't use regex like Prettier does (not sure why, binary size maybe? Or because running multiple regex replaces in sequence is too inefficient?), so I'm hand-writing the string manipulation that Prettier does with the regexes, and because you mentioned high performance I'm trying to make sure it

only clones if the number literal actually needs changing,
doesn't do multiple passes through the number literal, and
doesn't do insertions or deletions in the middle of a string or anything like that.

Feels a little bit like I'm optimizing prematurely given that number literals are rarely long, but it's been kind of fun so whatever 😅

Let me know if I need to know anything else :)

MichaReiser · 2022-11-02T08:01:27Z

Feels a little bit like I'm optimizing prematurely given that number literals are rarely long, but it's been kind of fun so whatever 😅

That's awesome that you're enjoying it.

We aren't using RegEx because it is a rather large crate and we have so far been able to implement similar replacement logic without.

You can find a similar "normalisation" in

tools/crates/rome_js_formatter/src/utils/string_utils.rs

Lines 336 to 361 in 5e45610

    
           fn normalise_string_literal(&self, string_information: StringInformation) -> Cow<'token, str> { 
        
               let preferred_quote = string_information.preferred_quote; 
        
               let polished_raw_content = self.normalize_string(&string_information); 
        
               match polished_raw_content { 
        
                   Cow::Borrowed(raw_content) => { 
        
                       let final_content = self.swap_quotes(raw_content, &string_information); 
        
                       match final_content { 
        
                           Cow::Borrowed(final_content) => Cow::Borrowed(final_content), 
        
                           Cow::Owned(final_content) => Cow::Owned(final_content), 
        
                       } 
        
                   } 
        
                   Cow::Owned(s) => { 
        
                       // content is owned, meaning we allocated a new string, 
        
                       // so we force replacing quotes, regardless 
        
                       let final_content = std::format!( 
        
                           "{}{}{}", 
        
                           preferred_quote.as_char(), 
        
                           s.as_str(), 
        
                           preferred_quote.as_char() 
        
                       ); 
        
                       Cow::Owned(final_content) 
        
                   } 
        
               } 
        
           }

The main idea is to return a Cow::Borrowed if the original string doesn't change and only allocate a new string if the string differs.

An alternative solution would be to implement an iterator that returns the "cleaned" text and then simply compare the text one by one. Here a rather complicated solution for handling JSX:

use std::borrow::Cow;
use std::iter::Peekable;
use std::str::Chars;


pub fn clean_jsx_text(input: &str) -> Cow<str> {
    let mut cleaned = CleanJsxWhitespace::new(input);
    let mut actual = input.chars();

    let mut pos = 0;

    loop {
        let next_cleaned = cleaned.next();
        let next_actual = actual.next();

        match (next_cleaned, next_actual) {
            (Some(cleaned_char), Some(actual_char)) => {
                if cleaned_char == actual_char {
                    // all good
                    pos += cleaned_char.len_utf8();
                } else {
                    // difference, copy what we have so far, then take everything from cleaned
                    let mut result = String::from(&input[..pos]);

                    result.push(cleaned_char);

                    for cleaned_char in cleaned {
                        result.push(cleaned_char);
                    }

                    return Cow::Owned(result);
                }
            }
            (None, Some(_)) => {
                // anything following in actual now can only be truncated away text
                return Cow::Borrowed(&input[..pos]);
            }
            (Some(_), None) => {
                // This should never happen because cleaned only removes
                unreachable!()
            }
            (None, None) => return Cow::Borrowed(input),
        }
    }
}

#[derive(Debug)]
struct CleanJsxWhitespace<'a> {
    inner: Peekable<Chars<'a>>,
    /// Leading/trailing has the same definition as in the Formatter/Parser
    /// trailing: Any whitespace following a character up to, but not including a new line
    /// leading: Any whitespace before the next token, including all line breaks before the last token.
    /// Flag set to true as soon as the iterator starts processing trailing trivia (
    is_trailing_whitespace: bool,

    /// Tracks if there has been any trailing whitespace
    has_trailing_whitespace: bool,

    /// Tracks if there has been a new line
    has_leading_newline: bool,

    /// Tracks if this is the first line
    first_line: bool,
}

impl<'a> CleanJsxWhitespace<'a> {
    pub fn new(input: &'a str) -> Self {
        Self {
            inner: input.chars().peekable(),
            has_trailing_whitespace: false,
            has_leading_newline: false,
            is_trailing_whitespace: false,
            first_line: true,
        }
    }
}

// * First Line:
//      * Renders single space even if the text is all empty
//      * Render single space before first character.

impl Iterator for CleanJsxWhitespace<'_> {
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            if let Some(next) = self.inner.peek() {
                match next {
                    ' ' | '\t' => {
                        // Eat over space
                        self.inner.next();

                        if self.first_line {
                            match self.inner.peek() {
                                // `<>    </>` -> `<> </>`
                                None => {
                                    return Some(' ');
                                }
                                // Collapse succeeding whitespace
                                Some(' ' | '\t' | '\n' | '\r') => {}
                                // `<>  a<>` -> `<> a</>`
                                Some(_) => {
                                    return Some(' ');
                                }
                            }
                        }

                        // Track all trailing trivia
                        if self.is_trailing_whitespace {
                            self.has_trailing_whitespace = true;
                        }
                    }

                    '\n' | '\r' => {
                        // Only track the newline if it has been after a non-whitespace character.
                        // (e.g. don't track newlines if the text so far has been empty)
                        if self.is_trailing_whitespace {
                            self.has_leading_newline = true;
                        }

                        self.has_trailing_whitespace = false;
                        self.is_trailing_whitespace = false;
                        self.first_line = false;

                        // Skip new line character
                        self.inner.next();
                    }
                    _ => {
                        // In case there's any whitespace or newline between two characters, print one space
                        // `<>a   a</>` -> `<>a a</>`
                        if self.has_trailing_whitespace || self.has_leading_newline {
                            self.has_trailing_whitespace = false;
                            self.has_leading_newline = false;

                            return Some(' ');
                        }

                        self.is_trailing_whitespace = true;

                        return self.inner.next();
                    }
                }
            } else {
                // `<>a\n b </>` -> `<>a b </>`
                if self.has_trailing_whitespace {
                    self.has_trailing_whitespace = false;
                    return Some(' ');
                }

                return None;
            }
        }
    }
}

#[cfg(test)]
mod tests {
    use crate::clean_jsx_text;

    #[test]
    fn clean_jsx_text_works() {
        assert_eq!("", clean_jsx_text(""));
        assert_eq!(" ", clean_jsx_text(" "));
        assert_eq!("Foo", clean_jsx_text("Foo"));
        assert_eq!(" Foo", clean_jsx_text(" Foo"));
        assert_eq!("Foo", clean_jsx_text("\nFoo"));
        assert_eq!(" Foo", clean_jsx_text("\tFoo"));
        assert_eq!("Foo", clean_jsx_text("\n \t Foo"));
        assert_eq!("Foo", clean_jsx_text("\n \t \n \t\nFoo"));
        assert_eq!(" Foo bar lorem", clean_jsx_text(" Foo bar lorem"));
        assert_eq!("Foo ", clean_jsx_text("Foo "));
        assert_eq!("Foo", clean_jsx_text("Foo\n"));
        assert_eq!("Foo ", clean_jsx_text("Foo\t"));
        assert_eq!("Foo", clean_jsx_text("Foo\t \n "));
        assert_eq!("Foo", clean_jsx_text("Foo\n \t \n \t\n"));
        assert_eq!("Foo Bar", clean_jsx_text("Foo\n \t\t\n \tBar"));
        assert_eq!(
            "Foo Bar",
            clean_jsx_text("\n \t\t\n \tFoo\n \t\t\n \tBar\n \t\t\n \t")
        );
    }
}

and there are probably other ways to do this. Looking forward to see what you come up with!

jeysal · 2022-11-02T11:42:27Z

@MichaReiser thanks for the tips! The solution I've started yesterday is roughly the iterator solution but I didn't impl Iterator, just wrote a

pub struct FormatNumberLiteral<'token> {
    token: &'token JsSyntaxToken,
    text: Cow<'token, str>,
}
impl Format<JsFormatContext> for FormatNumberLiteral<'_> {
    fn fmt(&self, f: &mut JsFormatter) -> FormatResult<()> {
}
impl<'token> FormatNumberLiteral<'token> {
    pub fn from_number_literal_token(token: &'token JsSyntaxToken) -> Self {
}

The loop is getting pretty complicated since there are quite a few cases. I'll set aside some more time later today and try to finish it.

MichaReiser added task A task, an action that needs to be performed A-Formatter Area: formatter labels Sep 30, 2022

MichaReiser added this to Rome 2022 Sep 30, 2022

MichaReiser changed the title ~~📎 Number literal separators~~ 📎 Formatter: Number literal separators Sep 30, 2022

MichaReiser mentioned this issue Oct 11, 2022

☂️ Formatter 10.0.0 #3378

Closed

6 tasks

MichaReiser mentioned this issue Oct 18, 2022

Normalize numeric literals #2442

Closed

github-actions bot added the S-Stale label Oct 25, 2022

MichaReiser added S-Wishlist Possible interesting features not on the current roadmap and removed S-Stale S-Wishlist Possible interesting features not on the current roadmap labels Oct 25, 2022

MichaReiser assigned jeysal Nov 2, 2022

jeysal mentioned this issue Nov 3, 2022

feat(rome_js_formatter): number literals, number literal types, bigint literal types #3554

Merged

MichaReiser closed this as completed in #3554 Nov 3, 2022

MichaReiser moved this to Done in Rome 2022 Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📎 Formatter: Number literal separators #3294

📎 Formatter: Number literal separators #3294

MichaReiser commented Sep 30, 2022

MichaReiser commented Oct 11, 2022

github-actions bot commented Oct 25, 2022

jeysal commented Nov 1, 2022

MichaReiser commented Nov 2, 2022 •

edited

Loading

jeysal commented Nov 2, 2022

📎 Formatter: Number literal separators #3294

📎 Formatter: Number literal separators #3294

Comments

MichaReiser commented Sep 30, 2022

Description

Test Cases

MichaReiser commented Oct 11, 2022

github-actions bot commented Oct 25, 2022

jeysal commented Nov 1, 2022

MichaReiser commented Nov 2, 2022 • edited Loading

jeysal commented Nov 2, 2022

MichaReiser commented Nov 2, 2022 •

edited

Loading