Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggered internal error with an expression in j while grouping #3708

Open
MichaelChirico opened this issue Jul 18, 2019 · 3 comments
Open
Labels
internals Low non-atomic column e.g. list columns, S4 vector columns

Comments

@MichaelChirico
Copy link
Member

Breaking things trying to do coverage and hit an "internal" error:

data.table(e = expression(1), c = '1')[ , e, by = c]
Error in `[.data.table`(data.table(e = expression(1), c = "1"), , e, by = c) : 
  Internal error: column type 'expression' not supported by data.table subset. All known types are supported so please report as bug.

Of course this should stay an error, but handled differently.

@udaydatar7
Copy link

@MichaelChirico what do you mean by handled?

@jangorecki
Copy link
Member

jangorecki commented Dec 25, 2019

This should not be an internal error. Should be caught sooner than it currently is. Expressions should be wrapped into list to be stored as a column.

tlapak added a commit to tlapak/data.table that referenced this issue Mar 6, 2020
@tlapak
Copy link
Contributor

tlapak commented Mar 6, 2020

I've been poking around a bit and this isn't directly related to the grouping:

data.table(e = expression(1))[1]
# Error in `[.data.table`(data.table(e = expression(1)), 1) : 
#  Internal error: column type 'expression' not supported by data.table subset. All known types are supported so please report as bug.

The printout with a multi-row expression column is also weird:

data.table(e = expression(1, 2), c=c('1', '2'))
#                   e      c
#              <expr> <char>
# 1: expression(1, 2)      1
# 2: expression(1, 2)      2

You get the same issues with columns of type language:

data.table(e=as.call(list(sum, 1)), c='1')[1]
# Error in `[.data.table`(data.table(e = as.call(list(sum, 1)), c = "1"),  : 
#  Internal error: column type 'language' not supported by data.table subset. All known types are supported so please report as bug.

data.table(e=as.call(list(sum, 1)), c='1')
#                       e      c
#                  <call> <char>
# 1: .Primitive("sum")(1)      1
# 2: .Primitive("sum")(1)      1

Which seems extra unintuitive to me.
You also can't print or subset data.tables that have a column of type name:

data.table(e=as.name('a'), c='1')
# Error in format.default(char.trunc(col), justify = justify, ...) : 
#  Found no format() method for class "name" 

data.table(e=as.name('a'), c='1')[1]
# Error in `[.data.table`(data.table(e = as.name("a"), c = "1"), 1) : 
#  invalid type/length (symbol/1) in vector allocation

Type environment can also get funky:

data.table(e=environment())
# Error in dimnames(x) <- dn : 
#  length of 'dimnames' [1] not equal to array extent 

I would also generally expect that data.table(something) == data.table(c(something)) == data.table(c(something, something_else))[1] unless explicitly stated otherwise.

From what @jangorecki said, and the work in #4196, it seems to me that the default should be to wrap anything that's not atomic (or a data.frame or matrix) in a list column. Maybe with a warning?
I have a commit here that achieves this: f1ca3f3.

It makes a few tests redundant. It would also make much of the work in #4196 redundant, as far as I can tell. Or are there other ways to get a data.table with non-atomic columns? It would also close #4040 and #546 as won't fix, I guess.

I'll prepare the pull request if people think this is the way to go.

@jangorecki jangorecki added non-atomic column e.g. list columns, S4 vector columns and removed beginner-task labels Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internals Low non-atomic column e.g. list columns, S4 vector columns
Projects
None yet
Development

No branches or pull requests

4 participants