as.data.table(dfidx) fails #4526

RicoDiel · 2020-06-04T13:50:34Z

The class dfidx (from the package dfidx) exists. Here the idea is to glue and idx data.frame to another data.frame, and when working with the desired data.frame, the idx object always is glued to it. This is for example used in the mlogit() package. Conversion of this type to data.table fails.

library(mlogit)
data("TravelMode",package="AER") 
Mtravel <- dfidx(data     = TravelMode,         
                       shape    = "long",           
                       choice   = "choice",         
                       chid.var = "individual",     
                       alt.var  = "mode")
DTtravel <- as.data.table(Mtravel)

Gives the following error message:

Error in .subset2(x, i, exact = exact) : 
  attempt to select less than one element in get1index

The mlogit package requires such an object to carry out the desired regression. dfidx objects also come with their own '['.
More on this here:
https://cran.microsoft.com/web/packages/dfidx/vignettes/dfidx.html#more_advanced_use_of_dfidx

One could always use dcast() and melt() to change data.tables from wide to long etc. and just use dfidx right before the estimation. Another alternative would be to write a function like the following (which of course is terribly inefficient)

my.as.dt <- function(x){
  wide = ncol(x) - 1
  long = nrow(x)
  
  DTprep = list()
  for(i in 1:wide){
    DTprep[[i]] = x[1:long, i][1:long]
  }
  
  DT = as.data.table(DTprep)
  colnames(DT) = colnames(x)[1:wide]
  return(DT)
}

Another approach would be to use setDT, which also delivers a warning:

DT = setDT(Mtravel)
Warning message:
In setDT(Mtravel) :
  Some columns are a multi-column type (such as a matrix column): [8]. setDT will retain these columns as-is but subsequent operations like grouping and joining may fail. Please consider as.data.table() instead which will create a new column for each embedded column.

And the following call of that object will deliver an error:

DT
Error in `[.data.table`(x, i, , ) : 
  Column 8 ['idx'] is a data.frame or data.table; malformed data.table.

Hopefully you can just give as.data.table() and setDT() the power to overwrite this and drop the idx data.frame (potentially with a warning that some commands which require it in the future may fail).

The text was updated successfully, but these errors were encountered:

MichaelChirico · 2020-06-06T09:29:49Z

I would lean towards closing... I don't think it's data.table's place to offer as.data.table methods for all possible input classes from 3rd packages. Maintainers of dfidx could offer such method...

In the meantime I would probably just do:

setDT(as.data.frame(Mtravel))

jangorecki · 2020-06-06T10:27:42Z

@MichaelChirico shouldn't as.data.table use default method which should behave like as.data.table(as.data.frame(x))?

MichaelChirico · 2020-06-06T10:29:49Z

No, because dfidx inherits from data.frame, so as.data.table.data.frame is attempted, but structure of dfidx is slightly different, so that fails. Forcing the dfidx object to be a proper data.frame makes the method work.

jangorecki · 2020-06-06T10:32:05Z

OK, make sense

MichaelChirico · 2020-06-06T10:39:18Z

Hmm, actually looking, maybe it's a data.table bug.

The issue is that in as.data.table.list, we make an edit to x as a list by changing the idx object of the dfidx.

Then we try to extract columns again:

xi = x[[i]]

BUT this is calling [[.dfidx, which expects a dfidx object, whereas we've already edited the object so we can't rely on that class method to apply anymore. It might make sense to be sure we've dropped classes in as.data.table.list before proceeding.

i.e. we have two loops in as.data.table.list:

(1) checking structure of final object:

data.table/R/as.data.table.R

Lines 132 to 151 in 3436568

    
           for (i in seq_len(n)) { 
        
             xi = x[[i]] 
        
             if (is.null(xi)) next    # eachncol already initialized to 0 by integer() above 
        
             if (!is.null(dim(xi)) && missing.check.names) check.names=TRUE 
        
             if ("POSIXlt" %chin% class(xi)) { 
        
               warning("POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.") 
        
               xi = x[[i]] = as.POSIXct(xi) 
        
             } else if (is.matrix(xi) || is.data.frame(xi)) { 
        
               if (!is.data.table(xi)) { 
        
                 xi = x[[i]] = as.data.table(xi, keep.rownames=keep.rownames)  # we will never allow a matrix to be a column; always unpack the columns 
        
               } 
        
               # else avoid dispatching to as.data.table.data.table (which exists and copies) 
        
             } else if (is.table(xi)) { 
        
               xi = x[[i]] = as.data.table.table(xi, keep.rownames=keep.rownames) 
        
             } else if (is.function(xi)) { 
        
               xi = x[[i]] = list(xi) 
        
             } 
        
             eachnrow[i] = NROW(xi)    # for a vector (including list() columns) returns the length 
        
             eachncol[i] = NCOL(xi)    # for a vector returns 1 
        
           }

(2) building the object:

data.table/R/as.data.table.R

Lines 171 to 191 in 3436568

    
           for(i in seq_len(n)) { 
        
             xi = x[[i]] 
        
             if (is.null(xi)) { n_null = n_null+1L; next } 
        
             if (eachnrow[i]>1L && nrow%%eachnrow[i]!=0L)   # in future: eachnrow[i]!=nrow 
        
               warning("Item ", i, " has ", eachnrow[i], " rows but longest item has ", nrow, "; recycled with remainder.") 
        
             if (eachnrow[i]==0L && nrow>0L && is.atomic(xi))   # is.atomic to ignore list() since list() is a common way to initialize; let's not insist on list(NULL) 
        
               warning("Item ", i, " has 0 rows but longest item has ", nrow, "; filled with NA")  # the rep() in recycle() above creates the NA vector 
        
             if (is.data.table(xi)) {   # matrix and data.frame were coerced to data.table above 
        
               prefix = if (!isFALSE(.named[i]) && isTRUE(nchar(names(x)[i])>0L)) paste0(names(x)[i],".") else ""  # test 2058.12 
        
               for (j in seq_along(xi)) { 
        
                 ans[[k]] = recycle(xi[[j]], nrow) 
        
                 vnames[k] = paste0(prefix, names(xi)[j]) 
        
                 k = k+1L 
        
               } 
        
             } else { 
        
               nm = names(x)[i] 
        
               vnames[k] = if (length(nm) && !is.na(nm) && nm!="") nm else paste0("V",i-n_null)  # i (not k) tested by 2058.14 to be the same as the past for now 
        
               ans[[k]] = recycle(xi, nrow) 
        
               k = k+1L 
        
             } 
        
           }

In the first one, we can rely on [[ methods to dispatch correctly, but because we do xi = x[[i]] = ..., we may break the object structure, which means we can't rely on that in the second loop anymore (have to use the default [[)

MichaelChirico · 2020-06-06T11:34:42Z

#4529 would make it so that as.data.table([dfidx object]) works without error, but notice that the output is different as compared to setDT(as.data.frame(x)) because dfidx offers a custom as.data.frame method -- to align these two, it would be on the dfidx maintainer to offer data.table method.

jangorecki · 2020-06-06T11:57:10Z

AFAIK, @MichaelChirico please correct me if I am wrong, it should be enough if maintainer of dfidx would provide as.list.dfidx method that would handle preprocessing.

MichaelChirico · 2020-06-06T12:15:16Z

that sounds right to me

jangorecki closed this as completed Jun 6, 2020

MichaelChirico reopened this Jun 6, 2020

MichaelChirico mentioned this issue Jun 6, 2020

when as.data.frame dispatches to list method, force input to list #4529

Merged

mattdowle added this to the 1.14.1 milestone Apr 15, 2021

mattdowle closed this as completed in #4529 Apr 27, 2021

jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

as.data.table(dfidx) fails #4526

as.data.table(dfidx) fails #4526

RicoDiel commented Jun 4, 2020 •

edited by MichaelChirico

Loading

MichaelChirico commented Jun 6, 2020

jangorecki commented Jun 6, 2020

MichaelChirico commented Jun 6, 2020 •

edited

Loading

jangorecki commented Jun 6, 2020

MichaelChirico commented Jun 6, 2020 •

edited

Loading

MichaelChirico commented Jun 6, 2020

jangorecki commented Jun 6, 2020

MichaelChirico commented Jun 6, 2020

as.data.table(dfidx) fails #4526

as.data.table(dfidx) fails #4526

Comments

RicoDiel commented Jun 4, 2020 • edited by MichaelChirico Loading

MichaelChirico commented Jun 6, 2020

jangorecki commented Jun 6, 2020

MichaelChirico commented Jun 6, 2020 • edited Loading

jangorecki commented Jun 6, 2020

MichaelChirico commented Jun 6, 2020 • edited Loading

MichaelChirico commented Jun 6, 2020

jangorecki commented Jun 6, 2020

MichaelChirico commented Jun 6, 2020

RicoDiel commented Jun 4, 2020 •

edited by MichaelChirico

Loading

MichaelChirico commented Jun 6, 2020 •

edited

Loading

MichaelChirico commented Jun 6, 2020 •

edited

Loading