Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

📎 Formatter: Number literal separators #3294

Closed
Tracked by #3378
MichaReiser opened this issue Sep 30, 2022 · 5 comments · Fixed by #3554
Closed
Tracked by #3378

📎 Formatter: Number literal separators #3294

MichaReiser opened this issue Sep 30, 2022 · 5 comments · Fixed by #3554
Assignees
Labels
A-Formatter Area: formatter S-Wishlist Possible interesting features not on the current roadmap task A task, an action that needs to be performed

Comments

@MichaReiser
Copy link
Contributor

Description

Consistently format separators in number literals example

Test Cases

  • js/literal-numeric-separator/test.js
  • js/literal/number.js
@MichaReiser MichaReiser added task A task, an action that needs to be performed A-Formatter Area: formatter labels Sep 30, 2022
@MichaReiser MichaReiser changed the title 📎 Number literal separators 📎 Formatter: Number literal separators Sep 30, 2022
@MichaReiser
Copy link
Contributor Author

@xunilrj would this be a formatter improvement you're interested in tackling? I know you're good at high performance string manipulation :)

@github-actions
Copy link

This issue is stale because it has been open 14 days with no activity.

@MichaReiser MichaReiser added S-Wishlist Possible interesting features not on the current roadmap and removed S-Stale S-Wishlist Possible interesting features not on the current roadmap labels Oct 25, 2022
@jeysal
Copy link
Contributor

jeysal commented Nov 1, 2022

Hi! I wanted to get to know the Rome codebase a bit so I thought I'd look at a formatter issue and then a lint rule, or something like that. I've been trying this one out and made some progress so I think I'll be able to submit a PR for it soon.

From what I understand (from the comments on #2442), we can't use regex like Prettier does (not sure why, binary size maybe? Or because running multiple regex replaces in sequence is too inefficient?), so I'm hand-writing the string manipulation that Prettier does with the regexes, and because you mentioned high performance I'm trying to make sure it

  1. only clones if the number literal actually needs changing,
  2. doesn't do multiple passes through the number literal, and
  3. doesn't do insertions or deletions in the middle of a string or anything like that.

Feels a little bit like I'm optimizing prematurely given that number literals are rarely long, but it's been kind of fun so whatever 😅

Let me know if I need to know anything else :)

@MichaReiser
Copy link
Contributor Author

MichaReiser commented Nov 2, 2022

Feels a little bit like I'm optimizing prematurely given that number literals are rarely long, but it's been kind of fun so whatever 😅

That's awesome that you're enjoying it.

We aren't using RegEx because it is a rather large crate and we have so far been able to implement similar replacement logic without.

You can find a similar "normalisation" in

fn normalise_string_literal(&self, string_information: StringInformation) -> Cow<'token, str> {
let preferred_quote = string_information.preferred_quote;
let polished_raw_content = self.normalize_string(&string_information);
match polished_raw_content {
Cow::Borrowed(raw_content) => {
let final_content = self.swap_quotes(raw_content, &string_information);
match final_content {
Cow::Borrowed(final_content) => Cow::Borrowed(final_content),
Cow::Owned(final_content) => Cow::Owned(final_content),
}
}
Cow::Owned(s) => {
// content is owned, meaning we allocated a new string,
// so we force replacing quotes, regardless
let final_content = std::format!(
"{}{}{}",
preferred_quote.as_char(),
s.as_str(),
preferred_quote.as_char()
);
Cow::Owned(final_content)
}
}
}

The main idea is to return a Cow::Borrowed if the original string doesn't change and only allocate a new string if the string differs.

An alternative solution would be to implement an iterator that returns the "cleaned" text and then simply compare the text one by one. Here a rather complicated solution for handling JSX:

use std::borrow::Cow;
use std::iter::Peekable;
use std::str::Chars;


pub fn clean_jsx_text(input: &str) -> Cow<str> {
    let mut cleaned = CleanJsxWhitespace::new(input);
    let mut actual = input.chars();

    let mut pos = 0;

    loop {
        let next_cleaned = cleaned.next();
        let next_actual = actual.next();

        match (next_cleaned, next_actual) {
            (Some(cleaned_char), Some(actual_char)) => {
                if cleaned_char == actual_char {
                    // all good
                    pos += cleaned_char.len_utf8();
                } else {
                    // difference, copy what we have so far, then take everything from cleaned
                    let mut result = String::from(&input[..pos]);

                    result.push(cleaned_char);

                    for cleaned_char in cleaned {
                        result.push(cleaned_char);
                    }

                    return Cow::Owned(result);
                }
            }
            (None, Some(_)) => {
                // anything following in actual now can only be truncated away text
                return Cow::Borrowed(&input[..pos]);
            }
            (Some(_), None) => {
                // This should never happen because cleaned only removes
                unreachable!()
            }
            (None, None) => return Cow::Borrowed(input),
        }
    }
}

#[derive(Debug)]
struct CleanJsxWhitespace<'a> {
    inner: Peekable<Chars<'a>>,
    /// Leading/trailing has the same definition as in the Formatter/Parser
    /// trailing: Any whitespace following a character up to, but not including a new line
    /// leading: Any whitespace before the next token, including all line breaks before the last token.
    /// Flag set to true as soon as the iterator starts processing trailing trivia (
    is_trailing_whitespace: bool,

    /// Tracks if there has been any trailing whitespace
    has_trailing_whitespace: bool,

    /// Tracks if there has been a new line
    has_leading_newline: bool,

    /// Tracks if this is the first line
    first_line: bool,
}

impl<'a> CleanJsxWhitespace<'a> {
    pub fn new(input: &'a str) -> Self {
        Self {
            inner: input.chars().peekable(),
            has_trailing_whitespace: false,
            has_leading_newline: false,
            is_trailing_whitespace: false,
            first_line: true,
        }
    }
}

// * First Line:
//      * Renders single space even if the text is all empty
//      * Render single space before first character.

impl Iterator for CleanJsxWhitespace<'_> {
    type Item = char;

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            if let Some(next) = self.inner.peek() {
                match next {
                    ' ' | '\t' => {
                        // Eat over space
                        self.inner.next();

                        if self.first_line {
                            match self.inner.peek() {
                                // `<>    </>` -> `<> </>`
                                None => {
                                    return Some(' ');
                                }
                                // Collapse succeeding whitespace
                                Some(' ' | '\t' | '\n' | '\r') => {}
                                // `<>  a<>` -> `<> a</>`
                                Some(_) => {
                                    return Some(' ');
                                }
                            }
                        }

                        // Track all trailing trivia
                        if self.is_trailing_whitespace {
                            self.has_trailing_whitespace = true;
                        }
                    }

                    '\n' | '\r' => {
                        // Only track the newline if it has been after a non-whitespace character.
                        // (e.g. don't track newlines if the text so far has been empty)
                        if self.is_trailing_whitespace {
                            self.has_leading_newline = true;
                        }

                        self.has_trailing_whitespace = false;
                        self.is_trailing_whitespace = false;
                        self.first_line = false;

                        // Skip new line character
                        self.inner.next();
                    }
                    _ => {
                        // In case there's any whitespace or newline between two characters, print one space
                        // `<>a   a</>` -> `<>a a</>`
                        if self.has_trailing_whitespace || self.has_leading_newline {
                            self.has_trailing_whitespace = false;
                            self.has_leading_newline = false;

                            return Some(' ');
                        }

                        self.is_trailing_whitespace = true;

                        return self.inner.next();
                    }
                }
            } else {
                // `<>a\n b </>` -> `<>a b </>`
                if self.has_trailing_whitespace {
                    self.has_trailing_whitespace = false;
                    return Some(' ');
                }

                return None;
            }
        }
    }
}

#[cfg(test)]
mod tests {
    use crate::clean_jsx_text;

    #[test]
    fn clean_jsx_text_works() {
        assert_eq!("", clean_jsx_text(""));
        assert_eq!(" ", clean_jsx_text(" "));
        assert_eq!("Foo", clean_jsx_text("Foo"));
        assert_eq!(" Foo", clean_jsx_text(" Foo"));
        assert_eq!("Foo", clean_jsx_text("\nFoo"));
        assert_eq!(" Foo", clean_jsx_text("\tFoo"));
        assert_eq!("Foo", clean_jsx_text("\n \t Foo"));
        assert_eq!("Foo", clean_jsx_text("\n \t \n \t\nFoo"));
        assert_eq!(" Foo bar lorem", clean_jsx_text(" Foo bar lorem"));
        assert_eq!("Foo ", clean_jsx_text("Foo "));
        assert_eq!("Foo", clean_jsx_text("Foo\n"));
        assert_eq!("Foo ", clean_jsx_text("Foo\t"));
        assert_eq!("Foo", clean_jsx_text("Foo\t \n "));
        assert_eq!("Foo", clean_jsx_text("Foo\n \t \n \t\n"));
        assert_eq!("Foo Bar", clean_jsx_text("Foo\n \t\t\n \tBar"));
        assert_eq!(
            "Foo Bar",
            clean_jsx_text("\n \t\t\n \tFoo\n \t\t\n \tBar\n \t\t\n \t")
        );
    }
}

and there are probably other ways to do this. Looking forward to see what you come up with!

@jeysal
Copy link
Contributor

jeysal commented Nov 2, 2022

@MichaReiser thanks for the tips! The solution I've started yesterday is roughly the iterator solution but I didn't impl Iterator, just wrote a

pub struct FormatNumberLiteral<'token> {
    token: &'token JsSyntaxToken,
    text: Cow<'token, str>,
}
impl Format<JsFormatContext> for FormatNumberLiteral<'_> {
    fn fmt(&self, f: &mut JsFormatter) -> FormatResult<()> {
}
impl<'token> FormatNumberLiteral<'token> {
    pub fn from_number_literal_token(token: &'token JsSyntaxToken) -> Self {
}

The loop is getting pretty complicated since there are quite a few cases. I'll set aside some more time later today and try to finish it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Formatter Area: formatter S-Wishlist Possible interesting features not on the current roadmap task A task, an action that needs to be performed
Projects
Status: Done
2 participants