-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix parsing CRLF files, part 2 #4
Conversation
FYI @warpfork @rvagg. Before this fix, Windows with CRLF on ipld-prime failed:
because we'd try to dagjson decode the wrong bytes, |
ooof, nasty trying to do it on a per-line basis may also have some interesting interactions with maybe the safest fix is to do that global replace before line split here too? https://github.com/rvagg/testmark.js/pull/3/files#diff-3a20575d9588a8fc87b3e20b46ac1213b1b38af7497cb9f0753160165ccb02c7 |
I'll leave the actual fix up to Eric, I'm not familiar with the architecture of this whole library. Having Patch do a "dos2unix" might not be a good idea, I'm not sure. My only intuition is that doing a search-and-replace on the whole file is somewhat unnecessary if we can just strip the CR at every line we read. |
I think this probably looks right, but I'm also nervous enough about it that I'm gonna go backfill some tests onto the master branch and then might ask you to rebase so we get to make sure those don't break. I don't recall if our last round of hacks for dealing with FWIW, I'm pretty sure |
(((Opinions incoming: My holistic approach to this is grounded in "if you're using files with That's not just blind Windows hostility. I developed on Windows for longer in my life than I care to admit, and even a decade ago already, I switched everything in my environment on Windows to use LF-only. It's just the least wrong approach. Once you start letting tools put CR's back into things (as git will do on checkouts!) all sanity has gone out the window and all actions remaining available are those of damage minimization. When it comes to damage minimization, I'm willing to merge PRs with that as a goal. (Especially for the read path.) It's just even more important to hew to "simplest thing that works" when it comes to damage minimization code than it is with any other code. End of Opinions.))) |
I think that's the same approach all other languages take. For example, the Go parser accepts CRLF (it treats CR as whitespace), but when printing/formatting via e.g. gofmt, it just uses LF newlines. |
There's now some new tests on the tip of master. I had to put a |
I hadn't noticed we also used len(line) to work out offsets. Since those are offsets into the input file, they must use the original line length before trimming.
Done. The parse works now; the test just fails at a later time with its sanity checks. I'll leave that to you as I'm not sure what should happen. |
We can choose between having something that's slightly closer to "binary safe" (but breaks when used on Windows with default git checkout config), or, we can have something that isn't "binary safe", but actually works in extremely common scenarios. Let's do the latter. ... ugh, this means not being able to just return the whole sub-slice. ..... UGH, it means we even have to look for those bytes and waste time doing it, to decide if we can return a subslice or not, wasting time even for non-Windows users who are basically never going to be having this problem. (Minuscule time, in practice, but, just, ow.) Despite what I said about not being hostile to Windows, a minute ago, my emotional state is.... turning. 😠 |
This is a plaintext file format for test files. I don't think stripping certain whitespace characters is extraordinary :) |
Friendly nudge. I don't have strong opinions here, but we should have a fix :) I'm happy to change the implementation here if Eric wants me to. I'm blocked on continuing work on portable CI for ipld-prime until then. |
seems fine to me, I'd still prefer just stripping them entirely and globally before entering processing but this seems to have roughly the same effect in the end |
Yeah, okay. I'll merge this, and then I'll do the remaining normalization patches to be able to remove the skip from the tests on master shortly. Here we go. |
(see commit message)