Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setkey changing data (not just sorting) #2540

Closed
patrickhowerter opened this issue Dec 27, 2017 · 2 comments · Fixed by #2557
Closed

setkey changing data (not just sorting) #2540

patrickhowerter opened this issue Dec 27, 2017 · 2 comments · Fixed by #2557
Assignees
Labels
Milestone

Comments

@patrickhowerter
Copy link

I have found a case where setkey can actually change the underlying rows of data (more than just sorting). It is like to structure that indexes the rows for each vector is out of sync.

The case happens when:

  1. I update columns by the syntax dt[ , c('col1', 'col2') := somecolumn] rather than dt[, .(col1=somecolumn, col2=somecolumn)].
  2. Execute setkey on the columns that were updated by the example above.

Please see the reproducible example:

library(data.table)

# set up some dummy data
a <- c('A', 'B', 'D', 'C')
b <- as.numeric(c(20160101,20160131, 20160102 ))
ab <- CJ(a=a, b=b, sorted = FALSE)
c <- as.numeric(c(20170101,20170131, 20170102 ))

ab2 <- CJ(a = a, b = c, sorted = FALSE)
ab <- rbindlist(list(ab, ab2))

# set up the test data.table that will give us strange results
test <- data.table(a = ab$a)
# this must be issue ?
test[, c('astart', 'aend') := as.integer(ab$b)]

# once we set the keys some unque records are removed and some are duplicated
setkey(test, a, astart, aend)

# duplicate data
ab[ (a == "A") & (b == 20160101)] # there was one row
test[(a == "A") & (astart == 20160101)] # now there are two rows?

# some of the rows have been removed
test[(a == "A") & (astart == 20170101)] # now there are no rows where a == "A"?
ab[ (a == "A") & (b == 20170101)] # there was one row

# Output of sessionInfo()

R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] shiny_1.0.5 mdo_0.3.3 data.table_1.10.4-3

loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 compiler_3.4.2 bindr_0.1 tools_3.4.2 xts_0.10-0 digest_0.6.12 bit_1.1-12 evaluate_0.10.1 lubridate_1.7.1 jsonlite_1.5 tibble_1.3.4 lattice_0.20-35
[13] ff_2.2-13 pkgconfig_2.0.1 rlang_0.1.4 fastmatch_1.1-0 rstudioapi_0.7 yaml_2.1.15 bindrcpp_0.2 dplyr_0.7.4 stringr_1.2.0 knitr_1.17 htmlwidgets_0.9 rprojroot_1.2
[25] DT_0.2 grid_3.4.2 glue_1.2.0 R6_2.2.2 bookdown_0.5 rmarkdown_1.8 magrittr_1.5 backports_1.1.1 htmltools_0.3.6 rsconnect_0.8.5 assertthat_0.2.0 mime_0.5
[37] xtable_1.8-2 httpuv_1.3.5 stringi_1.1.6 zoo_1.8-0

@MarkusBonsch MarkusBonsch self-assigned this Dec 30, 2017
@MarkusBonsch
Copy link
Contributor

Thank you for the good report. I can reproduce it and will investigate and solve the issue ASAP.

@MarkusBonsch
Copy link
Contributor

Dear Patrick,
this is a known bug that had been fixed before (#185), but has reappeared. I know what to do and will fix it. Until then, the easiest way to go forward for you is to wrap the := assignment in a list() call:
test[, c('astart', 'aend') := list(as.integer(ab$b))].
This will fix the issue until my bugfix has made it to the official release.

Cheers,
Markus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants