Skip to content

Commit

Permalink
Closes #1573, when fill=TRUE, fread looks for maximum col length inst…
Browse files Browse the repository at this point in the history
…ead of longest run.
  • Loading branch information
arunsrinivasan committed Mar 6, 2016
1 parent 8521072 commit e58a857
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 2 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@

13. GForce is optimised for `.SD[val]` and `col[val]` where `val` is a positive length-1 value. Partly addresses [#523](https://github.com/Rdatatable/data.table/issues/523).

14. `fread` gains `fill` argument with default `FALSE` for backwards compatibility. Closes [#536](https://github.com/Rdatatable/data.table/issues/536).
14. `fread` gains `fill` argument with default `FALSE` for backwards compatibility. Closes [#536](https://github.com/Rdatatable/data.table/issues/536). Also, `fill=TRUE` prioritises maximum cols instead of longest run with identical columns when `fill=TRUE` which allows handle missing columns slightly more robustly, [#1573](https://github.com/Rdatatable/data.table/issues/1573).

15. `fread` gains `key` argument, [#590](https://github.com/Rdatatable/data.table/issues/590).

Expand All @@ -60,6 +60,8 @@

22. `all.equal.data.table` gets new features for testing equality of data.tables, new arguments are `check.attributes`, `ignore.col.order`, `ignore.row.order`.

23. Use of `mult='first'` or `mult='last'` when `i` argument is *logical/numeric* now provides a warning that `mult` argument is ignored, [#1295](https://github.com/Rdatatable/data.table/issues/1295). Thanks to @nkurz.

#### BUG FIXES

1. Now compiles and runs on IBM AIX gcc. Thanks to Vinh Nguyen for investigation and testing, [#1351](https://github.com/Rdatatable/data.table/issues/1351).
Expand Down
8 changes: 8 additions & 0 deletions inst/tests/issue_1573_fill.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
SD1 ST1 SMS1 SD2 ST2 SMS2 SD3 ST3 SMS3 SD4 ST4 SMS4
01-11-2015 00:00:01 323 2015-11-01 00:00:01 551
01-11-2015 00:00:02 289 2015-11-01 00:00:02 618
01-11-2015 01:13:16 253 2015-11-01 01:13:25 511 2015-11-01 01:13:33 489 2015-11-01 01:13:44 870
01-11-2015 00:00:11 986 2015-11-01 00:00:12 602
01-11-2015 00:00:27 48 2015-11-01 00:00:27 391 2015-11-01 00:00:27 429
01-11-2015 00:00:13 750 2015-11-01 00:00:14 255
01-11-2015 00:00:28 773 2015-11-01 00:00:29 114
6 changes: 6 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -7654,6 +7654,12 @@ if (!truelength(DT)) test(1620, truelength(as.data.table(DT)), 100L)
# fix for #1116, (#1239 and #1201)
test(1621, fread("issue_1116_fread_few_lines.txt"), setDT(read.delim("issue_1116_fread_few_lines.txt", stringsAsFactors=FALSE, sep=",", check.names=FALSE)))

# fix for #1573
ans1 = fread("issue_1573_fill.txt", fill=TRUE, na.strings="")
ans2 = setDT(read.table("issue_1573_fill.txt", header=TRUE, fill=TRUE, stringsAsFactors=FALSE, na.strings=""))
test(1622.1, ans1, ans2)
test(1622.2, ans1, fread("issue_1573_fill.txt", fill=TRUE, sep=" ", na.strings=""))

##########################

# TODO: Tests involving GForce functions needs to be run with optimisation level 1 and 2, so that both functions are tested all the time.
Expand Down
2 changes: 1 addition & 1 deletion src/fread.c
Original file line number Diff line number Diff line change
Expand Up @@ -824,7 +824,7 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
if (ch==eof || i==30 || ncol!=thisNcol) {
// this* still refers to the previous run which has just finished
// Rprintf("\nRun: s='%c' thisLine=%d thisLen=%d thisNcol=%d", sep, thisLine, thisLen, thisNcol);
if (thisNcol>1 && (thisLen>topLen || // longest run wins
if (thisNcol>1 && (( thisLen>topLen && (!fill || (fill && thisNcol>topNcol)) ) || // longest run wins as long as fill=FALSE. If not, retain longest column so far. Fixes
(thisLen==topLen && sep==topSep && thisNcol>topNcol))) { // if tied, the one that divides it more (test 1328, 2 rows)
topStart = thisStart;
topLine = thisLine;
Expand Down

0 comments on commit e58a857

Please sign in to comment.