-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting data.table column outputs whole table to screen #109
Comments
|
Just a note if testing this: the same behavior you see in the notebook is also cropping up at the console in R 3.2.0, as described at Rdatatable/data.table#1122. So if trying to fix you will likely want to use R < 3.2.0 until they get this bug fixed. |
Fixing this on the IRkernel side is probably not reasonable. Under the hood, when you run a statement with the data.table
It is triggering |
Ah, R ;-) |
Unfortunately because A fix on the IRkernel side is to check of the length of On a related note, it would be desirable to have a |
@flying-sheep : the idea of checking whether there's any text output, and suppressing all output if there isn't, sounds basically reasonable to me. Do you see any problems with that? |
Generally yes, but the question is what that means. Will Because if it is overridden to not output anything, then it will usually do something unrelated, right? Like plotting or something. |
In general I would say an overridden But regardless, if a graphics command is called inside the library(evaluate)
mat <- function(x) {
class(x) <- "mat"
x
}
print.mat <- function(x, ...) {
plot(rnorm(10))
return(invisible())
}
oh <- new_output_handler(
value = function(obj) {
print("VALUE")
val <- capture.output(print(obj))
# Check the length of val to see if we should send text output back
},
graphics = function(plotobj) {
print("GRAPHICS") # This will always get run
}
)
m1 <- mat(matrix(1:4, 2, 2))
evaluate("m1", output_handler = oh) |
true. as said: i think we should do it. this was just a side thought :) |
OK, @abielr, do you want to make a pull request? |
Very much appreciate the kind language in this thread. Yes all correct. Have just fixed Rdatatable/data.table#1122. About to release v1.9.6 to CRAN. Note new wording of bug fixes in https://github.com/Rdatatable/data.table/blob/master/README.md :
We had to add a workaround in data.table for knitr. Obviously ugly and not ideal. But |
This seems to be the same or a related issue I am seeing. I found a simple reproducible example to show it. The following data.table creation and DT1 = data.table(x=rep(c("a","b","c", "d"),each=15),
y=c(1,3,NA,9),
v=c(1:6,NA,NA,NA,NA,NA,NA),
z=1:12)
DT1[,min:=pmin(y, v, na.rm=TRUE)] When I make the data.table a bit larger by increasing DT2 = data.table(x=rep(c("a","b","c", "d"),each=18),
y=c(1,3,NA,9),
v=c(1:6,NA,NA,NA,NA,NA,NA),
z=1:12)
DT2[,min:=pmin(y, v, na.rm=TRUE)]
If I separate the commands, the warning is from the line print(DT2) Gives no error, and outputs all 72 rows and 4 columns, just like in RStudio. DT2 Gives the same error
but then outputs the same 72x4 data.table. In my actual script, which has a much larger data.table, where I am doing several new columns with Session info with versions is below: sessionInfo()
|
I would guess that when you see the error followed by the table, the error comes from the code in repr that attempts to generate an HTML version of that table. That's failing, so it falls back to showing the plain text table. |
You can check this by doing: repr::repr_html(DT2) |
You're right, I also get the same error with a much taller table, perhaps in this case it's the
|
|
why ‘ |
If it goes wrong at 61 rows, it seems likely that it's something going wrong when we truncate the table and insert ellipses. |
jup, that’s why i said that it’s strange to see “ |
The It seems like there are two issues happening. If the data.table has nrows where 60 < nrows < 101, data.table will show all of the rows, but Jupyter is having trouble rendering this to an html table and giving an error as takluyver found when suggesting I check
Which looks exactly like it does in the console. If the command is instead When I try to print a 61 row data.table, no ellipses are inserted. It just prints 61 rows but in text format, not as a table. Of course, in my case I don't want the data.table printed as all, as I'm using a |
This is affected by #285: Currently a |
This problem appears to be causing performance issues for me. A simple assignment by reference |
@breschke Could be that it prints the thing and then sends it over to the frontend (=Browser) which crashes? Does this also work: {
dt[,newvar:=1]
NULL
} [The |
@breschke Which version of data.table and which version of IRkernel, IRdisplay, repr, and data.table? |
There is also this: https://github.com/Rdatatable/data.table/blob/9d2d71098d849c99e6eebb0e0b539eb58d723b05/R/cedta.R#L4 Maybe repr should be added there as well? You may try this:
and then execute the assignment in a new cell? But I still suspect that we need to get repr into this here: |
See also Rdatatable/data.table#933 where I just commented... |
@JanSchulz This works without hanging:
R version 3.2.2 Running on Ubuntu 15.10. |
Then this looks like Rdatatable/data.table#933 :-( |
@JanSchulz I attempted your other suggestion:
then assign by reference in a new cell. It appears to hang endlessly (doesn't execute immediately as it would wrapped as above). I'm not familiar enough with repr to comment on whether it should be included. Note: it doesn't kill the kernel---I can interrupt and continue the session. |
I'm almost following this one. The general root of this problematic area is explained in FAQ 2.22. Thanks for finding and trying the assignInNamepspace() test @breschke. If that didn't work then adding Can someone debug the hanging-endlessly point and establish what is happening there? Is it transferring the entire data.table between the processes for some reason or is it hanging for another reason? Might be able to tell using htop or other monitoring tools. |
I now see @JanSchulz's suggestion in Rdatatable/data.table#933. Making the change now. |
I suspect that there are multiple reasons:
I suspect we shouldn't transform it into a data.frame? |
from the FAQ:
What is this flag? If it is useable outside of print, then we could use that as well? Eg:
|
@JanSchulz Might work. See first line of Edit: if that works we should export a function for you to isolate you from data.table internals; e.g. |
And yes, running as.data.frame on it doesn't sound right as that will copy. If you are willing to import or depend on data.table then you could |
That we already do: https://github.com/IRkernel/IRkernel/blob/master/R/execution.r#L268-L272 :-)
If we should take the workaround, then this is definitely prefered... We already have our share of workarounds around CRANs "no |
Just to ask: if it is a global and we would use any Eg what happens if a Is this the right idea about this flag? |
Great. Then maybe IRKernel shouldn't be in the whitelist after all. Perhaps that's the problem. Just to quickly test, try If that works then it simplifies a lot as you don't need to be data.table-aware at all. |
I'm pretty sure we need to: we are basically doing the same as knitr does with evaluate and knit_print: we evaluate code via I tried this:
and it printed the table... |
This is basically our implementation if there would be an exported function to get the flag: IRkernel/IRkernel#343 [That it errors is a bug on our side...]. You could probably implement something similar on your side for |
I just added I also added and exported Does this resolve everything? Should we continue to keep IRkernel on the whitelist or remove it? Maybe we can remove knitr from the whitelist too since the comment there is "knitr's eval is passed envir=globalenv() so doesn't need to be listed here currently, but we include it in case it decides to change that." |
I updated to current dev versions of data.table and IRkernel and ran the following described in #343:
Now assignment by reference no longer displays output:
but calling the object no longer prints (a summary of) the data.table--i.e., this does nothing:
Personally, I'm fine with that, though I suspect others will dislike this behavior. sessionInfo():
|
Ok, I tried this: library(data.table)
old = data.table:::mimicsAutoPrint
old
reprs = c("repr_text.data.table", "repr_latex.data.table", "repr_markdown.data.table", "repr_html.data.table", "repr_text.default")
assignInNamespace("mimicsAutoPrint", c(old, reprs), "data.table")
dat <- data.table(x1=1:10)
dat[, x2 := 1:10] and it still prints because we do have a I also tried to remove the
which seems not to match So Rdatatable/data.table@689b624 seems to be not working here because the callstack is just too different than the one from knit_print :-( So we do have to go the repr_text.data.table <- function(obj, ...){
if (!data.table::shouldPrint(obj)) {
invisible(NULL) # in IRkernel, will prevent any other repr_xx methods from being called
} else {
NextMethod() # fallsback to `repr_text.default`, which uses print(obj)
}
}
# No need for repr_html/... as the only reason we have the above method
# is to return null, which indicates to the IRkernel that nothing else should be printed.
# the shouldPrint() actually resets the flag, so it can't be used twice anyway... This works as intended: dat <- data.table(x1=1:10)
dat[, x2 := 1:10] # does not print
dat # prints But now we have a different problem: Currently we implicitly assume that each So maybe we should make this explicit in the documentation of repr? The alternative is adding a I'm currently would prefere the former because it's basically what we do and it will only confuse users if they test in irkernel and it works and then (in the future) in another lib it works differently. On the other hand, for performance reasons, weh should probably add a @mattdowle: would you be able to include the (In general, we would like to have packages exporting As a bonus, you could probably replace the knit_print specific callstack lookup (
This would prevent a problem if knitr would ever change it's knit_print.default implementation so that the above callstack would be different. |
Thanks all! This is great info. Ok yes I see what you mean that adding the print methods to data.table might be best. Happy to give that a go. Will do. |
Any update on this issue? Same behavior with data.table_1.9.6. If the data.table is very large, I find this causes intolerable lags in performance (hanging while trying to print). The following prints the data.table during assignment by reference: library(data.table)
dat <- data.table(x1=1:10)
dat[, x2 := 1:10] but when I run this first: assignInNamespace("cedta.pkgEvalsUserCode", NULL, "data.table")
repr_html.data.table <- function(obj, ...){
if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
NULL
} else {
NextMethod()
}
}
repr_latex.data.table <- function(obj, ...){
if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
NULL
} else {
NextMethod()
}
}
repr_text.data.table <- function(obj, ...){
if (data.table:::.global$print != "" && address(obj) == data.table:::.global$print) {
NULL
} else {
NextMethod()
}
} printing is suppressed during assignment by reference, but: dat prints no output. R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.9.6
loaded via a namespace (and not attached):
[1] R6_2.1.2 magrittr_1.5 IRdisplay_0.4.9000 pbdZMQ_0.2-3
[5] tools_3.2.2 crayon_1.3.2 uuid_0.1-2 stringi_1.0-1
[9] IRkernel_0.7 jsonlite_1.1 stringr_1.0.0 digest_0.6.10
[13] chron_2.3-47 repr_0.9.9000 evaluate_0.9 |
Hi, this still seems to be an issue with the new notebook feature in the recent Rstudio 1.0 release. Any time a data table is modified with |
Is the Rstudio notebook using IRkernel? I know nothing of this. |
i don’t think so. i think it has nothing to do with us. |
Are there any plans to integrate the |
PRs welcome! |
Fixed according to Rdatatable/data.table#933 |
If I run the code below, the second line will cause the entire
dat
object to be output, whereas at an R console it wouldn't return anything. The syntax used is the special syntax for setting columns with the populardata.table
package. I'm using the 1.9.5 devel version ofdata.table
.The text was updated successfully, but these errors were encountered: