Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: segfault on missing final newline #301

Closed
leahneukirchen opened this issue Jan 20, 2020 · 11 comments
Closed

Bug: segfault on missing final newline #301

leahneukirchen opened this issue Jan 20, 2020 · 11 comments
Labels

Comments

@leahneukirchen
Copy link

--xcat breaks when the input file does not contain a trailing newline.

% echo -n foo >foo
% mlr --xtab cat foo
zsh: segmentation fault  mlr --xtab cat foo

Sometimes, it doesn't segfault but print something like

malloc(18446744072608038873) failed.

Note that this doesn't appear when stdin is used.
--no-mmap works too, so this may be a regression of #29.

Reproducible with Miller 5.6.2 on Void Linux x86_64 glibc, Kernel 5.2.21, glibc 2.30.

@aborruso
Copy link
Contributor

aborruso commented Jan 20, 2020 via email

@leahneukirchen
Copy link
Author

Still doesn't justify a segfault on user-controlled input.

@aborruso
Copy link
Contributor

aborruso commented Jan 20, 2020 via email

@johnkerl
Copy link
Owner

@leahneukirchen thanks for reporting. @aborruso segfaults are never acceptable (unless someone does a print statement with %s formatting and non-string data). And the huge-malloc fail is clearly a memory-corruption issue.

@johnkerl johnkerl added the bug label Jan 21, 2020
@johnkerl
Copy link
Owner

@leahneukirchen as a workaround please use either mlr --xtab cat < foo (the <) or mlr --no-mmap --xtab cat foo until I get to the bottom of this.

@johnkerl
Copy link
Owner

P.S. Trailing carriage returns are not mandatory in Miller.

@johnkerl
Copy link
Owner

johnkerl commented Jan 26, 2020

@leahneukirchen Having two implementations of all file-format readers (mmap & stdin) has been a development overhead for me, and error-prone for me (as you've encountered), and causes issues with too-many or too-large files (memory-mapped pages aren't freed by the OS, as one would intuit, despite all my attempts on #160):

#29
#102
#160
#181
#256
#296

I am highly tempted to jettison the entire mmap approach and leave only the stdin readers in place. I hesitate only because I explicitly created the mmap stuff for performance improvement -- and it is a bit faster.

But mmap is already opted out when there are too many files, or or files too large (as noted above) so I think it is already unused at those times when its performance improvement would most help.

@johnkerl
Copy link
Owner

... 2632ddc 4-KLOC red diff. :)

@johnkerl
Copy link
Owner

$ echo -n foo > foo
$ mlr --xtab cat foo
foo

I'm still accepting --mmap and --no-mmap from the command line, as no-ops, for backward compatibility.

@johnkerl
Copy link
Owner

This will go out in 5.6.3. Thanks for reporting!! :)

@johnkerl
Copy link
Owner

P.S. @leahneukirchen having a single stdio code-path for each file-format reader will make #304 easier to implement. So this is a double win. :)

This was referenced Jan 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants