Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2; .txt, .js] Outputs entirety of generated file despite there being only two changes; tries to allocate 16.3 TiB #653

Closed
berrymot opened this issue Feb 28, 2024 · 7 comments

Comments

@berrymot
Copy link

berrymot commented Feb 28, 2024

Buy one issue, get another free!

(1) A description of the issue. A screenshot is often helpful too.

This occurred when running git diff on this before commit.

image

This continues for all 25,109 lines, excluding the empty final one.

It diffs data/data.txt fine, but then tries allocating ~18 trillion bytes for the diff of data/jbo.js, probably at least because it's 4.54 MiB squished onto one line. I admit I don't know what to do about this.

image

(2) A copy of what you're diffing. If you're diffing files, include
the before and after files. If you're using difftastic with a VCS
repository (e.g. git), include the URL and commit hash.

Link above; hash is 2190a95d50cfe15039df80f53a310f66a40008e3.

(3) The version of difftastic you're using (see difft --version) and
your operating system.

0.55, Windows 11

@Wilfred
Copy link
Owner

Wilfred commented Feb 29, 2024

lojbo .ui

Thanks for the report.

I think the first issue is probably trying to diff lines ending \r\n against files ending with just \n and concluding that they're different.

The second issue is probably due to line diffing trying to highlight words within the same line.

@Wilfred
Copy link
Owner

Wilfred commented Feb 29, 2024

I can repro the second issue, but not the first. Can you reproduce the first issue with local copies of the files? If so, could you attach them here?

@Wilfred
Copy link
Owner

Wilfred commented Feb 29, 2024

OK, I can reproduce the first issue if I have two files that differ by line ending, but I'm not convinced that difftastic's behaviour is wrong here.

$ echo "one\ntwo" > a.txt                            
$ echo "one\nfoo\ntwo" > b.txt
$ unix2dos b.txt       

$ difft a.txt b.txt
b.txt --- Text
1 one                             1 one
.                                 2 foo
2 two                             3 two

$ diff a.txt b.txt 
1,2c1,3
< one
< two
---
> one
> foo
> two

Plain GNU diff also considers these files to be completely different. Do you have any special crlf settings in your Windows git setup?

@berrymot
Copy link
Author

berrymot commented Feb 29, 2024

I can repro the second issue, but not the first. Can you reproduce the first issue with local copies of the files? If so, could you attach them here?

Unfortunately I'll be away from my computer until next week, sorry. I don't think it's the line endings though? Pretty sure allwords.txt has been CRLF all along. When I get back I'll try adding a test entry to the database and regenerate allwords.txt and see if this still happens.

As for

Do you have any special crlf settings in your Windows git setup?

I don't remember ever messing with that- the one thing I know I've done to etc/config is using difftastic rather than the default diff lol

ni'o lu «lojbo .ui» li'u zo'u .ue xu do jbopre

@berrymot
Copy link
Author

berrymot commented Mar 1, 2024

The test entry has been made, it happens to mean 'line terminator' lol; I'll rerun the parsing stuff when I can

@berrymot
Copy link
Author

berrymot commented Mar 3, 2024

Reran the script with the new word.

allwords.txt still printed the entire file:
image

data.txt still didn't:
image

Replaced every \n with \r\n in the script, reran.

This fixed allwords.txt, but data.txt's changes were too big.
image

Reverted the changes to those two, manually replaced each LF with CRLF, committed, reran.

image

wheeeeeee

@berrymot
Copy link
Author

berrymot commented Mar 7, 2024

Updated to 0.56.1, turns out printing such a giant diff is REALLY bad for my terminal lol

@Wilfred Wilfred closed this as completed in f52ca70 Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants