-
Notifications
You must be signed in to change notification settings - Fork 660
📎 Formatter: Number literal separators #3294
Comments
@xunilrj would this be a formatter improvement you're interested in tackling? I know you're good at high performance string manipulation :) |
This issue is stale because it has been open 14 days with no activity. |
Hi! I wanted to get to know the Rome codebase a bit so I thought I'd look at a formatter issue and then a lint rule, or something like that. I've been trying this one out and made some progress so I think I'll be able to submit a PR for it soon. From what I understand (from the comments on #2442), we can't use regex like Prettier does (not sure why, binary size maybe? Or because running multiple regex replaces in sequence is too inefficient?), so I'm hand-writing the string manipulation that Prettier does with the regexes, and because you mentioned high performance I'm trying to make sure it
Feels a little bit like I'm optimizing prematurely given that number literals are rarely long, but it's been kind of fun so whatever 😅 Let me know if I need to know anything else :) |
That's awesome that you're enjoying it. We aren't using RegEx because it is a rather large crate and we have so far been able to implement similar replacement logic without. You can find a similar "normalisation" in tools/crates/rome_js_formatter/src/utils/string_utils.rs Lines 336 to 361 in 5e45610
The main idea is to return a An alternative solution would be to implement an iterator that returns the "cleaned" text and then simply compare the text one by one. Here a rather complicated solution for handling JSX: use std::borrow::Cow;
use std::iter::Peekable;
use std::str::Chars;
pub fn clean_jsx_text(input: &str) -> Cow<str> {
let mut cleaned = CleanJsxWhitespace::new(input);
let mut actual = input.chars();
let mut pos = 0;
loop {
let next_cleaned = cleaned.next();
let next_actual = actual.next();
match (next_cleaned, next_actual) {
(Some(cleaned_char), Some(actual_char)) => {
if cleaned_char == actual_char {
// all good
pos += cleaned_char.len_utf8();
} else {
// difference, copy what we have so far, then take everything from cleaned
let mut result = String::from(&input[..pos]);
result.push(cleaned_char);
for cleaned_char in cleaned {
result.push(cleaned_char);
}
return Cow::Owned(result);
}
}
(None, Some(_)) => {
// anything following in actual now can only be truncated away text
return Cow::Borrowed(&input[..pos]);
}
(Some(_), None) => {
// This should never happen because cleaned only removes
unreachable!()
}
(None, None) => return Cow::Borrowed(input),
}
}
}
#[derive(Debug)]
struct CleanJsxWhitespace<'a> {
inner: Peekable<Chars<'a>>,
/// Leading/trailing has the same definition as in the Formatter/Parser
/// trailing: Any whitespace following a character up to, but not including a new line
/// leading: Any whitespace before the next token, including all line breaks before the last token.
/// Flag set to true as soon as the iterator starts processing trailing trivia (
is_trailing_whitespace: bool,
/// Tracks if there has been any trailing whitespace
has_trailing_whitespace: bool,
/// Tracks if there has been a new line
has_leading_newline: bool,
/// Tracks if this is the first line
first_line: bool,
}
impl<'a> CleanJsxWhitespace<'a> {
pub fn new(input: &'a str) -> Self {
Self {
inner: input.chars().peekable(),
has_trailing_whitespace: false,
has_leading_newline: false,
is_trailing_whitespace: false,
first_line: true,
}
}
}
// * First Line:
// * Renders single space even if the text is all empty
// * Render single space before first character.
impl Iterator for CleanJsxWhitespace<'_> {
type Item = char;
fn next(&mut self) -> Option<Self::Item> {
loop {
if let Some(next) = self.inner.peek() {
match next {
' ' | '\t' => {
// Eat over space
self.inner.next();
if self.first_line {
match self.inner.peek() {
// `<> </>` -> `<> </>`
None => {
return Some(' ');
}
// Collapse succeeding whitespace
Some(' ' | '\t' | '\n' | '\r') => {}
// `<> a<>` -> `<> a</>`
Some(_) => {
return Some(' ');
}
}
}
// Track all trailing trivia
if self.is_trailing_whitespace {
self.has_trailing_whitespace = true;
}
}
'\n' | '\r' => {
// Only track the newline if it has been after a non-whitespace character.
// (e.g. don't track newlines if the text so far has been empty)
if self.is_trailing_whitespace {
self.has_leading_newline = true;
}
self.has_trailing_whitespace = false;
self.is_trailing_whitespace = false;
self.first_line = false;
// Skip new line character
self.inner.next();
}
_ => {
// In case there's any whitespace or newline between two characters, print one space
// `<>a a</>` -> `<>a a</>`
if self.has_trailing_whitespace || self.has_leading_newline {
self.has_trailing_whitespace = false;
self.has_leading_newline = false;
return Some(' ');
}
self.is_trailing_whitespace = true;
return self.inner.next();
}
}
} else {
// `<>a\n b </>` -> `<>a b </>`
if self.has_trailing_whitespace {
self.has_trailing_whitespace = false;
return Some(' ');
}
return None;
}
}
}
}
#[cfg(test)]
mod tests {
use crate::clean_jsx_text;
#[test]
fn clean_jsx_text_works() {
assert_eq!("", clean_jsx_text(""));
assert_eq!(" ", clean_jsx_text(" "));
assert_eq!("Foo", clean_jsx_text("Foo"));
assert_eq!(" Foo", clean_jsx_text(" Foo"));
assert_eq!("Foo", clean_jsx_text("\nFoo"));
assert_eq!(" Foo", clean_jsx_text("\tFoo"));
assert_eq!("Foo", clean_jsx_text("\n \t Foo"));
assert_eq!("Foo", clean_jsx_text("\n \t \n \t\nFoo"));
assert_eq!(" Foo bar lorem", clean_jsx_text(" Foo bar lorem"));
assert_eq!("Foo ", clean_jsx_text("Foo "));
assert_eq!("Foo", clean_jsx_text("Foo\n"));
assert_eq!("Foo ", clean_jsx_text("Foo\t"));
assert_eq!("Foo", clean_jsx_text("Foo\t \n "));
assert_eq!("Foo", clean_jsx_text("Foo\n \t \n \t\n"));
assert_eq!("Foo Bar", clean_jsx_text("Foo\n \t\t\n \tBar"));
assert_eq!(
"Foo Bar",
clean_jsx_text("\n \t\t\n \tFoo\n \t\t\n \tBar\n \t\t\n \t")
);
}
} and there are probably other ways to do this. Looking forward to see what you come up with! |
@MichaReiser thanks for the tips! The solution I've started yesterday is roughly the iterator solution but I didn't pub struct FormatNumberLiteral<'token> {
token: &'token JsSyntaxToken,
text: Cow<'token, str>,
}
impl Format<JsFormatContext> for FormatNumberLiteral<'_> {
fn fmt(&self, f: &mut JsFormatter) -> FormatResult<()> {
}
impl<'token> FormatNumberLiteral<'token> {
pub fn from_number_literal_token(token: &'token JsSyntaxToken) -> Self {
} The loop is getting pretty complicated since there are quite a few cases. I'll set aside some more time later today and try to finish it. |
Description
Consistently format separators in number literals example
Test Cases
js/literal-numeric-separator/test.js
js/literal/number.js
The text was updated successfully, but these errors were encountered: