Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fwrite integer rownames #5098

Merged
merged 8 commits into from
Aug 10, 2021
Merged

Fwrite integer rownames #5098

merged 8 commits into from
Aug 10, 2021

Conversation

ColeMiller1
Copy link
Contributor

Closes #4957 .

The biggest change is behavior of quote = auto for default row names and for integer assigned row-names because this PR makes all row names characters.

library(data.table)
DT = data.table(foo=1:3,bar=c(1.2,9.8,-6.0))

# 1.14.0
fwrite(DT,row.names=TRUE, quote = 'auto')
## "",foo,bar
## "1",1,1.2
## "2",2,9.8
## "3",3,-6

#1.14.1 with PR. Double quote are lost
fwrite(DT,row.names=TRUE, quote = 'auto')
## "",foo,bar
## 1,1,1.2
## 2,2,9.8
## 3,3,-6

I believe this better matches existing behavior with quote = 'auto' as the current behavior will not add double quotes to character row.names:

row.names(DT) = letters[1:3]
fwrite(DT,row.names=TRUE, quote = 'auto')

## "",foo,bar
## a,1,1.2
## b,2,9.8
## c,3,-6

Finally, based on this PR, this part of fwrite.c shown below is no longer used. In the issue, I comment about how to do this more directly in C. I would be happy to move forward with a C approach but it just seems like a higher diff for little productivity. While maybe big data people who need performance use row names, overall I doubt it.

data.table/src/fwrite.c

Lines 876 to 881 in 831013a

if (args.rowNames==NULL) {
if (doQuote!=0/*NA'auto' or true*/) *ch++='"';
int64_t rn = i+1;
writeInt64(&rn, 0, &ch);
if (doQuote!=0) *ch++='"';
} else {

@mattdowle mattdowle added this to the 1.14.1 milestone Aug 10, 2021
R/fwrite.R Outdated
@@ -36,6 +36,7 @@ fwrite = function(x, file="", append=FALSE, quote="auto",
nThread = as.integer(nThread)
# write.csv default is 'double' so fwrite follows suit. write.table's default is 'escape'
# validate arguments
rn = if (row.names) row.names(x) else NULL # allocate row.names in R to address integer row.names #4957
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your reasoning for this; i.e. option 1 you detailed in issue. However I'm just a bit concerned about any inadvertent usage that ends up somehow calling row.names=TRUE without the user intending to. Then 1:nrow will be coerced to character by this line and the global character cache gets clobbered. Would prefer not to leave that door open. Another door open would be benchmarkers who either deliberately or by accident display results with row.names = TRUE and show poor performance.
Hence I went for option 3 as you described.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant! Thanks for your time - changes look great!

@mattdowle mattdowle merged commit c3d1100 into master Aug 10, 2021
@mattdowle mattdowle deleted the fwrite_integer_rownames branch August 10, 2021 07:55
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fwrite substitutes class integer rownames by 1,2,3...
4 participants