Fwrite integer rownames #5098

ColeMiller1 · 2021-08-08T22:39:36Z

Closes #4957 .

The biggest change is behavior of quote = auto for default row names and for integer assigned row-names because this PR makes all row names characters.

library(data.table)
DT = data.table(foo=1:3,bar=c(1.2,9.8,-6.0))

# 1.14.0
fwrite(DT,row.names=TRUE, quote = 'auto')
## "",foo,bar
## "1",1,1.2
## "2",2,9.8
## "3",3,-6

#1.14.1 with PR. Double quote are lost
fwrite(DT,row.names=TRUE, quote = 'auto')
## "",foo,bar
## 1,1,1.2
## 2,2,9.8
## 3,3,-6

I believe this better matches existing behavior with quote = 'auto' as the current behavior will not add double quotes to character row.names:

row.names(DT) = letters[1:3]
fwrite(DT,row.names=TRUE, quote = 'auto')

## "",foo,bar
## a,1,1.2
## b,2,9.8
## c,3,-6

Finally, based on this PR, this part of fwrite.c shown below is no longer used. In the issue, I comment about how to do this more directly in C. I would be happy to move forward with a C approach but it just seems like a higher diff for little productivity. While maybe big data people who need performance use row names, overall I doubt it.

data.table/src/fwrite.c

Lines 876 to 881 in 831013a

    
           if (args.rowNames==NULL) { 
        
             if (doQuote!=0/*NA'auto' or true*/) *ch++='"'; 
        
             int64_t rn = i+1; 
        
             writeInt64(&rn, 0, &ch); 
        
             if (doQuote!=0) *ch++='"'; 
        
           } else {

Include row.names from R

…e number at the end of the first sentence)

mattdowle · 2021-08-10T07:33:16Z

R/fwrite.R

@@ -36,6 +36,7 @@ fwrite = function(x, file="", append=FALSE, quote="auto",
  nThread = as.integer(nThread)
  # write.csv default is 'double' so fwrite follows suit. write.table's default is 'escape'
  # validate arguments
+  rn = if (row.names) row.names(x) else NULL # allocate row.names in R to address integer row.names #4957


I like your reasoning for this; i.e. option 1 you detailed in issue. However I'm just a bit concerned about any inadvertent usage that ends up somehow calling row.names=TRUE without the user intending to. Then 1:nrow will be coerced to character by this line and the global character cache gets clobbered. Would prefer not to leave that door open. Another door open would be benchmarkers who either deliberately or by accident display results with row.names = TRUE and show poor performance.
Hence I went for option 3 as you described.

Brilliant! Thanks for your time - changes look great!

ColeMiller1 and others added 6 commits August 8, 2021 17:57

Update fwrite.R

8eb2dc8

Include row.names from R

Update fwriteR.c

5a4842e

Update tests.Rraw

3882d89

Update NEWS.md

c200f31

merge master

05a3107

news item shorten and standard style (code/funnction first word, issu…

b819857

…e number at the end of the first sentence)

mattdowle added this to the 1.14.1 milestone Aug 10, 2021

mattdowle reviewed Aug 10, 2021

View reviewed changes

mattdowle added 2 commits August 10, 2021 01:34

option 3

f80ad26

news item tweak

cb4cf4c

mattdowle merged commit c3d1100 into master Aug 10, 2021

mattdowle deleted the fwrite_integer_rownames branch August 10, 2021 07:55

jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fwrite integer rownames #5098

Fwrite integer rownames #5098

ColeMiller1 commented Aug 8, 2021

mattdowle Aug 10, 2021

ColeMiller1 Aug 10, 2021

	if (args.rowNames==NULL) {
	if (doQuote!=0/NA'auto' or true/) *ch++='"';
	int64_t rn = i+1;
	writeInt64(&rn, 0, &ch);
	if (doQuote!=0) *ch++='"';
	} else {

Fwrite integer rownames #5098

Fwrite integer rownames #5098

Conversation

ColeMiller1 commented Aug 8, 2021

mattdowle Aug 10, 2021

Choose a reason for hiding this comment

ColeMiller1 Aug 10, 2021

Choose a reason for hiding this comment