Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault merging data.tables with keyed NA_character_ columns #5070

Closed
JorisChau opened this issue Jul 9, 2021 · 1 comment · Fixed by #5170
Closed

Segfault merging data.tables with keyed NA_character_ columns #5070

JorisChau opened this issue Jul 9, 2021 · 1 comment · Fixed by #5170

Comments

@JorisChau
Copy link

JorisChau commented Jul 9, 2021

Issue

Merging two data.tables where one data.table (or both) has a keyed column containing only NA_character_'s produces a segfault and crashes the R session.

Reproducible example

library(data.table)

dt1 <- data.table(x1 = rep(letters[1:4], each = 3), x2 = NA_character_)
dt2 <- data.table(x1 = letters[1:3])
  
setkey(dt1, x2)

dt3 <- dt1[dt2, on = "x1"]
dt3[, .(x1, x2)]

With valgrind enabled, using the current data.table development version (1.14.1), the above code returns:

==10795== Use of uninitialised value of size 8
==10795==    at 0x4FB7910: LEVELS (in /usr/lib/R/lib/libR.so)
==10795==    by 0x101824AA: issorted (in /home/jchau/R/x86_64-pc-linux-gnu-library/4.0/data.table/libs/datatable.so)
==10795==    by 0x4F352AB: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F7540B: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F7F66F: Rf_eval (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F8148E: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F82256: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F76908: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F7F66F: Rf_eval (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F8148E: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F82256: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4FC5362: ??? (in /usr/lib/R/lib/libR.so)
==10795==  Uninitialised value was created by a heap allocation
==10795==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==10795==    by 0x4FBE353: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4FBFE81: Rf_allocVector3 (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F892F7: R_bcEncode (in /usr/lib/R/lib/libR.so)
==10795==    by 0x50215C6: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x502163F: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x502095C: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x501FCA9: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x5021A2D: R_Unserialize (in /usr/lib/R/lib/libR.so)
==10795==    by 0x5022DC9: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x5023200: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F7FBF5: Rf_eval (in /usr/lib/R/lib/libR.so)
==10795== 
==10795== Invalid read of size 2
==10795==    at 0x4FB7910: LEVELS (in /usr/lib/R/lib/libR.so)
==10795==    by 0x101824AA: issorted (in /home/jchau/R/x86_64-pc-linux-gnu-library/4.0/data.table/libs/datatable.so)
==10795==    by 0x4F352AB: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F7540B: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F7F66F: Rf_eval (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F8148E: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F82256: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F76908: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F7F66F: Rf_eval (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F8148E: ??? (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4F82256: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==10795==    by 0x4FC5362: ??? (in /usr/lib/R/lib/libR.so)
==10795==  Address 0x1000000010001 is not stack'd, malloc'd or (recently) free'd
==10795== 

 *** caught segfault ***
address (nil), cause 'unknown'

Traceback:
 1: is.sorted(jval, by = key(x))
 2: `[.data.table`(dt3, , .(x1, x2))
 3: dt3[, .(x1, x2)]

Session info

R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.1

loaded via a namespace (and not attached):
[1] compiler_4.0.2

Note

Using an explicit merge does work as expected:

library(data.table)

dt1 <- data.table(x1 = rep(letters[1:4], each = 3), x2 = NA_character_)
dt2 <- data.table(x1 = letters[1:3])
  
setkey(dt1, x2)

dt3 <- merge(dt1, dt2, by = "x1")
dt3[, .(x1, x2)]

#>    x1 x2
#> 1:  a <NA>
#> 2:  a <NA>
#> 3:  a <NA>
#> 4:  b <NA>
#> 5:  b <NA>
#> 6:  b <NA>
#> 7:  c <NA>
#> 8:  c <NA>
#> 9:  c <NA>
@JorisChau JorisChau changed the title Segfault merging data.tables with keyed NA columns Segfault merging data.tables with keyed NA_character_ columns Jul 9, 2021
@tlapak
Copy link
Contributor

tlapak commented Sep 2, 2021

This must be a multi threading issue because it works as expected if you first call setDTthreads(1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants