Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting data.table column outputs whole table to screen #109

Closed
abielr opened this issue May 8, 2015 · 52 comments
Closed

Setting data.table column outputs whole table to screen #109

abielr opened this issue May 8, 2015 · 52 comments

Comments

@abielr
Copy link
Contributor

abielr commented May 8, 2015

If I run the code below, the second line will cause the entire dat object to be output, whereas at an R console it wouldn't return anything. The syntax used is the special syntax for setting columns with the popular data.table package. I'm using the 1.9.5 devel version of data.table.

library(data.table)
dat <- data.table(x1=1:10)
dat[, x2 := 1:10]
@takluyver
Copy link
Member

withVisible(d[, x2:=1:10]) tells me that the result should be visible. I thought that was how R determined whether or not to print it. Clearly there's something else going on.

@abielr
Copy link
Contributor Author

abielr commented May 9, 2015

Just a note if testing this: the same behavior you see in the notebook is also cropping up at the console in R 3.2.0, as described at Rdatatable/data.table#1122. So if trying to fix you will likely want to use R < 3.2.0 until they get this bug fixed.

@abielr
Copy link
Contributor Author

abielr commented May 12, 2015

Fixing this on the IRkernel side is probably not reasonable. Under the hood, when you run a statement with the data.table := syntax, it sets a global variable equal to address(x), which is then checked inside of the print.data.table command to see if the output should be suppressed. In other words, when you run the command

dat[, x2 := 1:10]

It is triggering [.data.table followed by print.data.table, where print.data.table sees that it should not print this time, after which it resets the state of the global variable so that other print statements will run. However, when IRkernel is running evaluate(), the print.data.table statement never gets called, so all you see is the regular data.table object.

@takluyver
Copy link
Member

Ah, R ;-)

@abielr
Copy link
Contributor Author

abielr commented May 15, 2015

Unfortunately because [ is a R primitive function, it cannot return an invisible object, see here. This is why the developers of data.table have resorted to the workaround of using an internal global variable to alter the behavior of print.data.table.

A fix on the IRkernel side is to check of the length of data[['text/plain']] inside the execution.R/handle_value() function. If the length is zero, nchar(data[['text/plain']]==0, then don't send back a response because if you were running this at the console then you would be getting nothing, and in general we would expect the printing of text objects to mimic the console in the notebook. This still allows the notebook to work properly with functions that return a blank string or NULL, which will print properly.

On a related note, it would be desirable to have a repr option that controls the maximum number of rows to print for a generic data.frame or matrix, similar to what RStudio does. By default this should be set to something not too large. Otherwise the user who accidentally prints their 10 million element matrix to screen ends up waiting a long time for the HTML to be built and displayed.

@takluyver
Copy link
Member

@flying-sheep : the idea of checking whether there's any text output, and suppressing all output if there isn't, sounds basically reasonable to me. Do you see any problems with that?

@flying-sheep
Copy link
Member

Generally yes, but the question is what that means.

Will print per default do something unless you override it?

Because if it is overridden to not output anything, then it will usually do something unrelated, right? Like plotting or something.

@abielr
Copy link
Contributor Author

abielr commented May 16, 2015

In general I would say an overridden print statement would not do anything else, though to be frank this use in data.table is the only place I've seen it. If the intention was to print graphics, the more canonical form would be to have a plot function, so that the user is typing plot(myobj).

But regardless, if a graphics command is called inside the print statement, the graphics callback used with the output handler in evaluate will still pick it up, allowing you to send back the plot even if there is no console output. For example,

library(evaluate)

mat <- function(x) {
  class(x) <- "mat"
  x
}

print.mat <- function(x, ...) {
  plot(rnorm(10))
  return(invisible())
}

oh <- new_output_handler(
  value = function(obj) {
    print("VALUE")
    val <- capture.output(print(obj))
    # Check the length of val to see if we should send text output back
  },

  graphics = function(plotobj) {
    print("GRAPHICS") # This will always get run
  }
)

m1 <- mat(matrix(1:4, 2, 2))
evaluate("m1", output_handler = oh)

@flying-sheep
Copy link
Member

true. as said: i think we should do it. this was just a side thought :)

@takluyver
Copy link
Member

OK, @abielr, do you want to make a pull request?

@mattdowle
Copy link

Very much appreciate the kind language in this thread. Yes all correct.

Have just fixed Rdatatable/data.table#1122. About to release v1.9.6 to CRAN.

Note new wording of bug fixes in https://github.com/Rdatatable/data.table/blob/master/README.md :

if (TRUE) DT[,LHS:=RHS] no longer prints, #869 and #1122. Tests added. To get this to work we've had to live with one downside: if a := is used inside a function with no DT[] before the end of the function, then the next time DT or print(DT) is typed at the prompt, nothing will be printed. A repeated DT or print(DT) will print. To avoid this: include a DT[] after the last := in your function. If that is not possible (e.g., it's not a function you can change) then DT[] at the prompt is guaranteed to print. As before, adding an extra [] on the end of a := query is a recommended idiom to update and then print; e.g. > DT[,foo:=3L][]. Thanks to Jureiss and Jan Gorecki for reporting.

DT[FALSE,LHS:=RHS] no longer prints either, #887. Thanks to Jureiss for reporting.

:= no longer prints in knitr for consistency with behaviour at the prompt, #505. Output of a test knit("knitr.Rmd") is now in data.table's unit tests. Thanks to Corone for the illustrated report.

We had to add a workaround in data.table for knitr. Obviously ugly and not ideal. But := by reference is so fundamental in the DT[where, select|update|do, by] general form, that it was worth this hassle, so far. Could add a similar workaround for IRkernel too if that helps - let me know.

@ericwatt
Copy link

ericwatt commented Dec 1, 2015

This seems to be the same or a related issue I am seeing. I found a simple reproducible example to show it.

The following data.table creation and := works correctly (though it does output to the screen).

DT1 = data.table(x=rep(c("a","b","c", "d"),each=15), 
                 y=c(1,3,NA,9), 
                 v=c(1:6,NA,NA,NA,NA,NA,NA), 
                 z=1:12)
DT1[,min:=pmin(y, v, na.rm=TRUE)]

When I make the data.table a bit larger by increasing each, I get a warning/error

DT2 = data.table(x=rep(c("a","b","c", "d"),each=18), 
                 y=c(1,3,NA,9), 
                 v=c(1:6,NA,NA,NA,NA,NA,NA), 
                 z=1:12)
DT2[,min:=pmin(y, v, na.rm=TRUE)]

Error in rbindlist(l, use.names, fill, idcol): Item 2 of list input is not a data.frame, data.table or list

If I separate the commands, the warning is from the line DT2[,min:=pmin(y, v, na.rm=TRUE)] However, the resulting DT2 prints to screen, and it is modified correctly with column min added. But even this is a bit strange.

print(DT2)

Gives no error, and outputs all 72 rows and 4 columns, just like in RStudio.

DT2

Gives the same error

Error in rbindlist(l, use.names, fill, idcol): Item 2 of list input is not a data.frame, data.table or list

but then outputs the same 72x4 data.table.

In my actual script, which has a much larger data.table, where I am doing several new columns with :=, this causes multiple of these errors to be reported, and the data.table to be output, but the end result seems to be the same as what I get in RStudio with no errors. The assignment seems to work. It does seem to run MUCH slower than in RStudio. I'm not sure if this is because it's outputing the table at each step, or if it's not assigning by reference in place.

Session info with versions is below:

sessionInfo()

R version 3.2.2 (2015-08-14)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Workstation release 6.7 (Santiago)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] data.table_1.9.6

loaded via a namespace (and not attached):
[1] magrittr_1.5 IRdisplay_0.3 tools_3.2.2 base64enc_0.1-3
[5] uuid_0.1-2 stringi_1.0-1 rzmq_0.7.7 IRkernel_0.5
[9] jsonlite_0.9.17 stringr_1.0.0 digest_0.6.8 chron_2.3-47
[13] repr_0.4 evaluate_0.8

jupyter --version
4.0.6

python --version
Python 2.7.10 :: Anaconda 2.4.0 (64-bit)

@takluyver
Copy link
Member

I would guess that when you see the error followed by the table, the error comes from the code in repr that attempts to generate an HTML version of that table. That's failing, so it falls back to showing the plain text table.

@takluyver
Copy link
Member

You can check this by doing:

repr::repr_html(DT2)

@ericwatt
Copy link

ericwatt commented Dec 1, 2015

You're right, repr::repr_html(DT2) gives the same error, without showing the table. Perhaps the error is because repr_html() has a size limit for the resulting table?

I also get the same error with a much taller table, perhaps in this case it's the --- causing the issue:

      x  y  v  z min
   1: a  1  1  1   1
   2: a  3  2  2   2
   3: a NA  3  3   3
   4: a  9  4  4   4
   5: a  1  5  5   1
  ---               
7196: d  9 NA  8   9
7197: d  1 NA  9   1
7198: d  3 NA 10   3
7199: d NA NA 11  NA
7200: d  9 NA 12   9

@ericwatt
Copy link

ericwatt commented Dec 1, 2015

head(DT2, 60) gives no error and outputs as a nicely formatted table, head(DT2, 61) gives error and outputs the table as plain text.

@flying-sheep
Copy link
Member

why ‘---’? we insert and !

@takluyver
Copy link
Member

If it goes wrong at 61 rows, it seems likely that it's something going wrong when we truncate the table and insert ellipses.

@flying-sheep
Copy link
Member

jup, that’s why i said that it’s strange to see “---” there

@ericwatt
Copy link

ericwatt commented Dec 2, 2015

The --- is inserted by data.table if the number of rows are greater than 100 (by default). With >100 rows, it will print the first 5 and the last 5, separated by ---, just like my comment above shows.

It seems like there are two issues happening. If the data.table has nrows where 60 < nrows < 101, data.table will show all of the rows, but Jupyter is having trouble rendering this to an html table and giving an error as takluyver found when suggesting I check repr::repr_html(DT2). Above 100 rows, data.table itself is trying to print a summary, and the --- in the middle of the table it prints may be causing a similar issue with repr::repr_html(DT2). With a tall table (say 7200 rows), if I do a print(DT2) there is no error, and I get:

      x  y  v  z min
   1: a  1  1  1   1
   2: a  3  2  2   2
   3: a NA  3  3   3
   4: a  9  4  4   4
   5: a  1  5  5   1
  ---               
7196: d  9 NA  8   9
7197: d  1 NA  9   1
7198: d  3 NA 10   3
7199: d NA NA 11  NA
7200: d  9 NA 12   9

Which looks exactly like it does in the console. If the command is instead DT2 I get the error I reported above, and then the same output as I just showed for print(DT2).

When I try to print a 61 row data.table, no ellipses are inserted. It just prints 61 rows but in text format, not as a table.

Of course, in my case I don't want the data.table printed as all, as I'm using a := assignment which doesn't output to console usually, as mentioned in the comments before mine.

@jankatins
Copy link
Contributor

This is affected by #285:

Currently a table does not output anything. IMO this needs a fix in data.table itself to put in the same workaround as for knit_print.

@breschke
Copy link

This problem appears to be causing performance issues for me. A simple assignment by reference dt[,newvar:=1] on a data.table of 30 million+ rows takes less than a second in my console, but endlessly hangs in a jupyter notebook running from the same machine. I may have stumbled on a workaround: when I tried wrapping the assignment by reference in system.time() system.time(dt[,newvar:=1]) to try to compare console and notebook behavior, the command executes as fast on the notebook as on the console.

@jankatins
Copy link
Contributor

jankatins commented May 19, 2016

@breschke Could be that it prints the thing and then sends it over to the frontend (=Browser) which crashes?

Does this also work:

{
dt[,newvar:=1]
NULL
}

[The {...} is making this treated as one statement in evaluate and the NULL is returned, so nothing to print]

@jankatins
Copy link
Contributor

@breschke Which version of data.table and which version of IRkernel, IRdisplay, repr, and data.table? sessionInfo() should print these.

@jankatins
Copy link
Contributor

jankatins commented May 19, 2016

There is also this: https://github.com/Rdatatable/data.table/blob/9d2d71098d849c99e6eebb0e0b539eb58d723b05/R/cedta.R#L4

Maybe repr should be added there as well?

You may try this:

assignInNamespace("cedta.pkgEvalsUserCode", c(data.table:::cedta.pkgEvalsUserCode,"repr"), "data.table")

and then execute the assignment in a new cell?

But I still suspect that we need to get repr into this here:
https://github.com/Rdatatable/data.table/blob/d3567006b7b1d4cbb3a29ff22f8576e948e9c3e9/R/data.table.R#L35

@jankatins
Copy link
Contributor

See also Rdatatable/data.table#933 where I just commented...

@breschke
Copy link

@JanSchulz This works without hanging:

{
dt[,newvar:=1]
NULL
}

R version 3.2.2
data.table_1.9.6
IRkernel_0.5
IRdisplay_0.3
repr_0.7

Running on Ubuntu 15.10.

@jankatins
Copy link
Contributor

Then this looks like Rdatatable/data.table#933 :-(

@breschke
Copy link

breschke commented May 19, 2016

@JanSchulz I attempted your other suggestion:

assignInNamespace("cedta.pkgEvalsUserCode", c(data.table:::cedta.pkgEvalsUserCode,"repr"), "data.table")

then assign by reference in a new cell. It appears to hang endlessly (doesn't execute immediately as it would wrapped as above). I'm not familiar enough with repr to comment on whether it should be included.

Note: it doesn't kill the kernel---I can interrupt and continue the session.

@mattdowle
Copy link

I'm almost following this one. The general root of this problematic area is explained in FAQ 2.22.

Thanks for finding and trying the assignInNamepspace() test @breschke. If that didn't work then adding repr to the white list in data.table isn't going to help unfortunately then. That test is a manual way to add to the whitelist. IRKernel is already present in the whitelist.

Can someone debug the hanging-endlessly point and establish what is happening there? Is it transferring the entire data.table between the processes for some reason or is it hanging for another reason? Might be able to tell using htop or other monitoring tools.

@mattdowle
Copy link

mattdowle commented May 19, 2016

I now see @JanSchulz's suggestion in Rdatatable/data.table#933. Making the change now.

@jankatins
Copy link
Contributor

Can someone debug the hanging-endlessly point and establish what is happening there? Is it transferring the entire data.table between the processes for some reason or is it hanging for another reason?

I suspect that there are multiple reasons:

I suspect we shouldn't transform it into a data.frame?

@jankatins
Copy link
Contributor

jankatins commented May 19, 2016

from the FAQ:

To solve this problem, the key was to stop trying to stop the print method running after a :=. Instead, inside := we now (from v1.8.3) set a global flag which the print method uses to know whether to actually print or not.

What is this flag? If it is useable outside of print, then we could use that as well?

Eg:

if (is.data.table(obj)) {
   if (data.table.should.print(obj)) { # does such a function exist?
       # do our converting to the right representation...
   }
}

@mattdowle
Copy link

mattdowle commented May 19, 2016

@JanSchulz Might work. See first line of data.table:::print.data.table.
From your package would need to prefix with ::: to get to .global e.g.
if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print)
If that works, we could export .global for you.

Edit: if that works we should export a function for you to isolate you from data.table internals; e.g. data.table.should.print as you suggest.

@mattdowle
Copy link

mattdowle commented May 19, 2016

And yes, running as.data.frame on it doesn't sound right as that will copy. If you are willing to import or depend on data.table then you could setDF() on it to save the copy. But that's only necessary because IRKernel has been adding to the whitelist since you mimic user running code at the prompt. The user code may be data.table-aware or not but the data.table calls are coming from your package rather than the global environment. Another way is perhaps when you eval(), pass the global environment to eval() rather than eval()ing in your own environment, to truly mimic what happens in the global environment. I'm just guessing now and haven't looked at your code.

@jankatins
Copy link
Contributor

Another way is perhaps when you eval(), pass the global environment to eval() rather than eval()ing in your own environment, to truly mimic what happens in the global environment. I'm just guessing now and haven't looked at your code.

That we already do: https://github.com/IRkernel/IRkernel/blob/master/R/execution.r#L268-L272 :-)

Edit: if that works we should export a function for you to isolate you from data.table internals; e.g. data.table.should.print as you suggest.

If we should take the workaround, then this is definitely prefered... We already have our share of workarounds around CRANs "no ::: usage" policy :-)

@jankatins
Copy link
Contributor

Just to ask: if it is a global and we would use any data.table function, this global would be reset?

Eg what happens if a repr_text.data.table would subset the dt to print a shorter version (head and tail) and then repr_html.data.table would do the same? On the other hand, if it signals "not print", then it wouldn't be touched as we would never alter the dt (e.g. not call a functions which sets this global flag)...

Is this the right idea about this flag?

@mattdowle
Copy link

That we already do: https://github.com/IRkernel/IRkernel/blob/master/R/execution.r#L268-L272 :-)

Great. Then maybe IRKernel shouldn't be in the whitelist after all. Perhaps that's the problem. Just to quickly test, try assignInNamespace("cedta.pkgEvalsUserCode", NULL, "data.table") and then try again.

If that works then it simplifies a lot as you don't need to be data.table-aware at all.

@jankatins
Copy link
Contributor

I'm pretty sure we need to: we are basically doing the same as knitr does with evaluate and knit_print: we evaluate code via evaluate and then print returned values (i.e. everything which is not invisible) via the repr_xxx functions.

I tried this:

library(data.table)
assignInNamespace("cedta.pkgEvalsUserCode", NULL, "data.table")
dat <- data.table(x1=1:10)
dat[, x2 := 1:10]

and it printed the table...

@jankatins
Copy link
Contributor

jankatins commented May 19, 2016

This is basically our implementation if there would be an exported function to get the flag: IRkernel/IRkernel#343 [That it errors is a bug on our side...]. You could probably implement something similar on your side for knit_print.data.table (or also taking the above functions into your package :-)) instead of using the hack with the the callstack in https://github.com/Rdatatable/data.table/blob/d3567006b7b1d4cbb3a29ff22f8576e948e9c3e9/R/data.table.R#L34-L39.

@mattdowle
Copy link

I just added mimicsAutoPrint to the callstack hack:
Rdatatable/data.table@689b624
If you could fetch that, add your function name using assignInNamespace() and test please. Then if it works let me know the function name I should add: repr_print.default ?

I also added and exported shouldPrint() to expose the flag (Rdatatable/data.table@3ec2d61). It resets the flag within it so it is a read-once function. If you need it twice in your logic, store the value from the first call.

Does this resolve everything? Should we continue to keep IRkernel on the whitelist or remove it? Maybe we can remove knitr from the whitelist too since the comment there is "knitr's eval is passed envir=globalenv() so doesn't need to be listed here currently, but we include it in case it decides to change that."

@breschke
Copy link

I updated to current dev versions of data.table and IRkernel and ran the following described in #343:

assignInNamespace("cedta.pkgEvalsUserCode", NULL, "data.table")
repr_html.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}
repr_latex.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}
repr_text.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}

Now assignment by reference no longer displays output:

dat <- data.table(x1=1:10)
dat[, x2 := 1:10]

but calling the object no longer prints (a summary of) the data.table--i.e., this does nothing:

dat

Personally, I'm fine with that, though I suspect others will dislike this behavior.

sessionInfo():

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.7

loaded via a namespace (and not attached):
 [1] R6_2.1.2        magrittr_1.5    IRdisplay_0.3   pbdZMQ_0.2-3   
 [5] tools_3.2.2     base64enc_0.1-3 uuid_0.1-2      stringi_1.0-1  
 [9] IRkernel_0.6    jsonlite_0.9.20 stringr_1.0.0   digest_0.6.9   
[13] repr_0.7        evaluate_0.9 

@jankatins
Copy link
Contributor

jankatins commented May 26, 2016

Ok, I tried this:

library(data.table)
old = data.table:::mimicsAutoPrint
old
reprs = c("repr_text.data.table", "repr_latex.data.table", "repr_markdown.data.table", "repr_html.data.table", "repr_text.default")
assignInNamespace("mimicsAutoPrint", c(old, reprs), "data.table")
dat <- data.table(x1=1:10)
dat[, x2 := 1:10]

and it still prints because we do have a repr_text.data.frame which is used and prints the dat because it is also a data.frame.

I also tried to remove the repr_text.data.frame method, but then repr ran into this callstack:

[...]
[[20]]
withCallingHandlers(withVisible(value_fun(ev$value, ev$visible)), 
    warning = wHandler, error = eHandler, message = mHandler)

[[21]]
withVisible(value_fun(ev$value, ev$visible))

[[22]]
value_fun(ev$value, ev$visible)

[[23]]
value_handler(x)

[[24]]
prepare_mimebundle(obj, .self$handle_display_error)

[[25]]
repr_text(obj)

[[26]]
repr_text.data.frame(obj)

[[27]]
NextMethod()

[[28]]
repr_text.default(obj)

[[29]]
paste(utils::capture.output(print(obj)), collapse = "\n")

[[30]]
utils::capture.output(print(obj))

[[31]]
evalVis(expr)

[[32]]
withVisible(eval(expr, pf))

[[33]]
eval(expr, pf)

[[34]]
eval(expr, envir, enclos)

[[35]]
print(obj)

[[36]]
print.data.table(obj)

which seems not to match ( length(SYS) > 3L && as.character(SYS[[length(SYS)-3L]][[1L]]) %chin% mimicsAutoPrint )

So Rdatatable/data.table@689b624 seems to be not working here because the callstack is just too different than the one from knit_print :-(

So we do have to go the repr_text.data.table with shouldPrint() way.

repr_text.data.table <- function(obj, ...){
    if (!data.table::shouldPrint(obj)) {
        invisible(NULL) # in IRkernel, will prevent any other repr_xx methods from being called
    } else {
        NextMethod() # fallsback to `repr_text.default`, which uses print(obj)
    }
}
# No need for repr_html/... as the only reason we have the above method 
# is to return null, which indicates to the IRkernel that nothing else should be printed.
# the shouldPrint() actually resets the flag, so it can't be used twice anyway...

This works as intended:

dat <- data.table(x1=1:10)
dat[, x2 := 1:10] # does not print
dat # prints

But now we have a different problem:

Currently we implicitly assume that each repr_* is independent of the other, so by calling shouldPrint() once in repr_text.data.table() we do not prevent printing with the other methods (e.g. a new repr_html.data.table) because the flag is reset. The actual situation is a bit different, because when repr_text returns an empty string, the rest of the mimetypes are not called in irkernel.

So maybe we should make this explicit in the documentation of repr?

The alternative is adding a repr_should_represent method which irkernel could then use and which has a special repr_should_represent.data.table which uses the flag.

I'm currently would prefere the former because it's basically what we do and it will only confuse users if they test in irkernel and it works and then (in the future) in another lib it works differently. On the other hand, for performance reasons, weh should probably add a repr_get_shorter_version() so that we don't do subseting 4 times and do not convert such big data.tables to data.frames... @flying-sheep @takluyver ?

@mattdowle: would you be able to include the repr_text.data.table function in the data.table? That way we do not need to guard against data.table being loaded and against older versions of the data.table package.

(In general, we would like to have packages exporting repr_xxx implementations for their data structures instead of having these implementations in the repr packages for this reasons...)

As a bonus, you could probably replace the knit_print specific callstack lookup (|| ( length(SYS) > 3L && as.character(SYS[[length(SYS)-3L]][[1L]]) %chin% mimicsAutoPrint ) with

knit_print.data.table <- function(x, ...) {
    if (!data.table::shouldPrint(x)) {
        invisible(NULL)
    } else {
        NextMethod() # which will fall back to your normal print, which will now see `shouldPrint() == T`
    }

This would prevent a problem if knitr would ever change it's knit_print.default implementation so that the above callstack would be different.

@mattdowle
Copy link

Thanks all! This is great info. Ok yes I see what you mean that adding the print methods to data.table might be best. Happy to give that a go. Will do.

@breschke
Copy link

breschke commented Sep 21, 2016

Any update on this issue? Same behavior with data.table_1.9.6. If the data.table is very large, I find this causes intolerable lags in performance (hanging while trying to print).

The following prints the data.table during assignment by reference:

library(data.table)
dat <- data.table(x1=1:10)
dat[, x2 := 1:10]

but when I run this first:

assignInNamespace("cedta.pkgEvalsUserCode", NULL, "data.table")
repr_html.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}
repr_latex.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}
repr_text.data.table <- function(obj, ...){
    if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
        NULL
    } else {
        NextMethod()
    }
}

printing is suppressed during assignment by reference, but:

dat

prints no output.

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.6

loaded via a namespace (and not attached):
 [1] R6_2.1.2           magrittr_1.5       IRdisplay_0.4.9000 pbdZMQ_0.2-3      
 [5] tools_3.2.2        crayon_1.3.2       uuid_0.1-2         stringi_1.0-1     
 [9] IRkernel_0.7       jsonlite_1.1       stringr_1.0.0      digest_0.6.10     
[13] chron_2.3-47       repr_0.9.9000      evaluate_0.9 

@flying-sheep
Copy link
Member

flying-sheep commented Sep 21, 2016

Hi, looks like it works fine, but it needs the as-of-now unreleased shouldPrint

data table

@jeffwong-nflx
Copy link

Hi, this still seems to be an issue with the new notebook feature in the recent Rstudio 1.0 release. Any time a data table is modified with := it will inline the output in the notebook. I was reading this thread and saw that @mattdowle did something for knitr to avoid this behavior, can something be done with IRkernel too? The issue is very visible now that notebooks are so mainstream inside Rstudio 1.0

@takluyver
Copy link
Member

Is the Rstudio notebook using IRkernel? I know nothing of this.

@flying-sheep
Copy link
Member

i don’t think so. i think it has nothing to do with us.

@abielr
Copy link
Contributor Author

abielr commented Feb 19, 2017

Are there any plans to integrate the repr_XXX.data.table functions described above into the repr package so that this issue is fixed by default for anyone running a recent version of data.table? They work, but at the moment I end up copying and pasting them into the top of every notebook.

@flying-sheep
Copy link
Member

PRs welcome!

@flying-sheep flying-sheep transferred this issue from IRkernel/IRkernel Jan 16, 2019
@flying-sheep
Copy link
Member

Fixed according to Rdatatable/data.table#933

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants