-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Allow carriage return as valid newline sequence #837
Comments
If I interpret the I think this should be clarified in the text spec though. Currently the text spec explicitly mentions that unescaped CRs can be used in MLBs (with no mention of a following LF being required). |
You're right that clarification is worthwhile. But to be clear, there's no such thing as an "escaped" carriage return in TOML. There's a limited set of characters that can be escaped (with a backslash) in basic strings, and carriage returns aren't among them. (Even inside escaped end-of-line whitespace in MLBs, carriage returns are parts of newline sequences.) And other representations of carriage returns like |
Thanks for the correction. FWIW, I currently think it may be best to reject this proposal in favor of the clarifying PR #838 |
Hm shouldn't TOML use the the Unicode line breaking algorithm since documents are utf-8 encoded? |
@ethanhs Based on quick glance, it seems that the algorithm is an annex that is not part of the core spec. It also seems to do a slightly different thing than what we want, specifying mandatory and optional break opportunities for text display. E.g. how web browsers wrap text at space characters. |
Happy to accept a PR adding this. |
TOML is a format for general config. In currently computer world, Windows use CRLF, Linux and Mac use LF; CR only existed in history (old iOS). So I think keep EOL as CRLF/LF is good. If all newlines are considered, they will be too many:
|
I have some sympathy for @LongTengDao's viewpoint, but at the other hand I think: If it works for CommonMark, it can work for us too! The alternative would be fragmentation, since it doesn't sound wise to require all implementations to reject lonely CR's as errors. Hence, if we don't clarify this, some will continue to accept and others to reject them. Not an ideal state of affairs. |
Note that I didn't request this because "I want all newlines" 😄, but mostly because the Markdown spec seems to need either this or the clarifying PR to fix the first struck-through issue item in the original post. I'm also slightly in favor of the clarifying PR and leaving lone CR in the history books. The nice thing is that it is a clarification, not a spec change, so less disturbing for implementations. E.g. I don't feel strongly at all however, as long as one of the two proposed changes happen. |
Also, from @LongTengDao's comment one can see that both JS and CSS accept lonely CR's as linebreaks, so it seems we're heading in the right direction. (I don't see a reason to allow additional linebreak characters, though. Let's keep it simple, like CommonMark does.). |
As far as I can tell, lone CRs are accepted in most places (think: browsers, editors, language interpreters/compilers etc) as a valid newline character sequence. I don't think I'd be expanding the specification to allow other characters. |
CR was used in the old pre-OSX MacOS and some other long-obsolete systems. Systems and standards that go back to the 80s and 90s support it because of that, but there's really no practical reason to add it in 2021. |
In addition to git's lack of support for lone CRs, also the Golang compiler seems to completely refuse to parse them. So even if we stick to just LF and CRLF, it seems we're not an outlier among other modern projects. |
I don't think lone
Modern things should not repeat the mistakes of predecessorsIdeally, EOL should be a uniform thing in the electronic world. The first people who made computers created so many kinds of EOL, perhaps for laziness, perhaps for business competition. Anyway, the past decades have been a big lesson. Nowadays, only Adding dying things to new specs is not wise. "Lone
|
@Lemmingh 's comment makes a lot of sense. While I don't have strong opinion on this whole issue, I agree that the rationale for the suggested change seems pretty week indeed. Which sane (or even insane) editor people might use these days to edit TOML files would use CR to write newlines? Probably none. So, it seems there is no problem and hence no need for a solution/change. |
Given the last few comments were over 2 months ago, is there an official verdict on this yet? |
Not yet, evidently. But considering the last two comments and the number of 👍 on them, I guess the proposal will be rejected. |
I too see no need for this change.
Just to invalidate the last of your reasons (after 1 and 2a were already retracted): The newline parsing is a feature of python and not an implementation error in the toml libraries. If you do not want that normalization you are free to open the file in byte-mode. The libraries which I have tested will correctly reject files containing a lone |
Well, looks like we have concensus within the broader group and... well, I don't care strongly enough to push against that. Rejecting this proposal, although I do appreciate everyone who has pitched in on these discussions! ^>^ |
Related to some of the discussion in #835
I'd like to propose adding CR (carriage return) character that is not followed by an LF character to the list of allowed newline sequences. This is how e.g. CommonMark defines line ending.
The reasoning for the change is:
Avoid roundtrip instability when an LF is prefixed by more than one CR characters.EDIT: Not an issue as per.abnf
In Python (perhaps other languages?) the standard way of opening files supports something called Universal Newline Support, meaning that LF, CR, and CRLF are all normalised to LF. This normalisation should NOT be used with the current TOML spec because
CR characters in strings will be converted to LFs when not allowed to do so.EDIT: Not an issue as per.abnf
As far as I know, all popular Python parsers and their documented API make the mistake of doing this incorrect newline normalisation.
Now for point 2, a valid counter-argument is to "just fix all Python parsers". 😄 However, perhaps other languages do similar normalisation by default or otherwise we agree that a spec change makes more sense here instead.
The text was updated successfully, but these errors were encountered: