Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to strip trailing CR #344

Merged
merged 6 commits into from
Dec 27, 2023

Conversation

oBusk
Copy link
Contributor

@oBusk oBusk commented Feb 5, 2022

Simple idea/suggestion.

Fixes #343, #275

See #343 for in-depth explanation.

src/diff/line.js Outdated
Comment on lines 7 to 8
// remove all CR (\r) characters from the input string
value = value.replace(/\r+\n/g, '\n');
Copy link
Collaborator

@ExplodingCabbage ExplodingCabbage Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment, code, and analogy to GNU diff's --strip-trailing-cr option have a three-way disagreement about what the behaviour here is meant to be:

  • --strip-trailing-cr only removes at most ONE \r before an \n, and so by choosing that name we imply we've got the same behaviour
  • the actual code here removes a sequence of ANY number of consecutive \r characters before a \n
  • the comment says that all \r characters will be removed from ANYWHERE in the string, even if they don't appear before a \n

We should tweak the behaviour to be consistent with the GNU diff option whose name we're copying, and we should make the comment accurately reflect the behaviour.

Copy link
Contributor Author

@oBusk oBusk Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point! Have pushed an update!

README.md Outdated
@@ -37,6 +37,8 @@ npm install diff --save

Options
* `ignoreWhitespace`: `true` to ignore leading and trailing whitespace. This is the same as `diffTrimmedLines`
* `stripTrailingCr`: `true` to remove all trailing CR (`\r`) characters before perfoming the diff.
This helps to get a useful diff when comparing files from UNIX/Windows respectively.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English nit: "respectively" here doesn't make sense.

Suggested wording:

This helps to get a useful diff when diffing UNIX text files against Windows text files.

@ExplodingCabbage
Copy link
Collaborator

I am broadly in favour of this change; it's useful and consistent with the precedent provided by diff. Just have some nits. I'll polish it up in due course if @oBusk doesn't get to it first!

// remove all CR (\r) characters from the input string
value = value.replace(/\r+\n/g, '\n');
}

let retLines = [],
linesAndNewlines = value.split(/(\n|\r\n)/);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably just be value.split("\n") after this change, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well only if stripTrailingCr===true, to not be breaking. But even in that case, I'm not sure I understand the benefit?

Copy link
Collaborator

@ExplodingCabbage ExplodingCabbage Dec 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was getting confused by #275 (comment). After reading that issue, but not testing anything to confirm, I had wrongly believed that the \n appearing first in the pipe-separated list of alternatives \n|\r\n/ meant that it would always match in preference to \r\n, making the entire regex equivalent to just /(\n)/, and that the logic using that regex had thus always been fundamentally broken. I thus thought that with my comment above I was just proposing cleaning up some misleading code that never really worked.

But some quick experimentation suggests that I'm wrong, at least in Node, Chrome, and Firefox!

> "foo\r\nbar\r\nbaz\r\n".split(/(\n|\r\n)/)
[
  'foo', '\r\n',
  'bar', '\r\n',
  'baz', '\r\n',
  ''
]

I'm not sure, then, why @cctakaso thought, in #275, that reordering the \r\n and \n in the regex would fix anything. Is it possible there is some JavaScript environment out there, that @cctakaso was using, with a regex engine where the order of alternatives really does affect the result in the way that issue suggests? I would've thought this would be standardised and all implementations would follow the standard, but tbh I don't want to spend hours unpicking the meaning of the ECMAScript 3 standard where regexes got added to JavaScript to confirm that the behaviour here has always been unambiguously specified!

My new suggestion, then, is that I'd be in favour of reversing the order as @cctakaso suggested, to /(\r\n|\n)/, just to make absolutely sure there's no ambiguity and avoid any need to research this - even though I'm now like 85% sure the order doesn't matter and this won't actually have any effect!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the problem #275 presents will be fixed by the option, but the solution they suggested wouldn't change much since the array that the regex produces would still contain all new lines as separate elements, and '\n' !== '\r\n'.
I think there's just a misunderstanding and we shouldn't change the code for it.

Copy link
Collaborator

@ExplodingCabbage ExplodingCabbage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the contribution and for coming back to tweak & comment & set me straight through multiple rounds of review! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to strip/ignore cr at end of line(?)
2 participants