Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr 0.4.2 crashes R (reproducible) #1316

Closed
kogreger opened this issue Aug 13, 2015 · 12 comments
Closed

dplyr 0.4.2 crashes R (reproducible) #1316

kogreger opened this issue Aug 13, 2015 · 12 comments

Comments

@kogreger
Copy link

Hi!

Unfortunately I have encountered what appears to be a bug in the latest version of dplyr (0.4.2. from CRAN). I did not encounter this issue until I updated dplyr earlier this week to reap the benefits of its improved handling of label attributes when dealing with imported SPSS data. Since the update running scripts that worked flawlessly earlier will crash R (3.1.3) as well as RStudio 0.99.441 reproducibly.

Here is some data that (using the below code) crashes R: https://app.box.com/s/jqhbpsr9ufa8xa9dwscdhwv9z93dsunk

The code:
load("data.Rdata")
library(dplyr)
data2 <- data %>% mutate_each(funs(removeNAs = ifelse(is.na(.), 0, .)), -gid, -plr_id, -plr_name)
head(data2)

Interestingly enough, it's not the mutate_each() that crashes but the head() (or whatever command is issued on the mutate_each()'d data first).

I then tried to narrow the problem down, which led me even deeper down the rabbit hole... I tried to use slice() to figure out whether a single row in the data is making trouble,like so (note that there are 447 rows in data):
data2 <- slice(data, 1:100)
data2 <- data %>% mutate_each(funs(removeNAs = ifelse(is.na(.), 0, .)), -gid, -plr_id, -plr_name)
head(data2)

It appears to work for a few slices but crashes on others. But I have not been able to narrow it down to single rows that cause the crash. For example 100:120as well as 110:130worked, but 115:125 (which contains only data from the other two slices) crashed.

I think this is a very strange error for three reasons:

  1. it is kind of reproducible, but not consistently even on the same data and slices (making me believe it's not about my data),
  2. it doesn't issue an error message or warning, but crashes the R session completely - sometimes without an error message, sometimes with a notice by the Microsoft Visual C++ Runtime Library telling me that "This application has requested the Runtime to terminate it in an unusual way."
  3. it appears to have been introduced in the latest release 0.4.2 of dplyr...

Here's my sessionInfo():
R version 3.1.3 (2015-03-09)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

I'd be grateful if somebody knowledgeable could look into this and would be happy to provide more details.

Kind regards,
Konstantin

@hadley
Copy link
Member

hadley commented Aug 13, 2015

Please try the dev version.

@hadley hadley closed this as completed Aug 13, 2015
@kogreger
Copy link
Author

Thanks for the quick reply.

I'd love to try it out, but it appears to be impossible to get Rtools 3.3 up and running on my Windows machine. All I'm getting from library(devtools) after installing Rtools 3.3 is WARNING: Rtools is required to build R packages, but is not currently installed. Please download and install Rtools 3.3 from http://cran.r-project.org/bin/windows/Rtools/ and then run find_rtools().

Hence I'm unable to build the packages dplyr and lazyeval from GitHub... :o(

@romainfrancois
Copy link
Member

I'm pretty sure this has been fixed.

@kogreger
Copy link
Author

I beg to differ. ;o)

I also looked around on how to fix this issue and get Rtools 3.3 not only installed but also recognized by R. I cam across r-lib/devtools#497 and tried the suggestions given there (apologies for the German error messages...):
devtools::has_devel() produced
"C:/Users/greg_ko/Documents/R/R-3.2.1/bin/x64/R" --no-site-file --no-environ --no-save --no-restore CMD SHLIB foo.c
Warnmeldung: Ausführung von Kommando 'make -f "C:/Users/greg_ko/DOCUME~1/R/R-32~1.1/etc/x64/Makeconf" -f "C:/Users/greg_ko/DOCUME~1/R/R-32~1.1/share/make/winshlib.mk" SHLIB="foo.dll" WIN=64 TCLBIN=64 OBJECTS="foo.o"' ergab Status 127
Fehler: Command failed (1)

find_rtools(TRUE) produced
Scanning path...
Scanning registry...
WARNING: Rtools is required to build R packages, but is not currently installed. Please download and install Rtools 3.3 from http://cran.r-project.org/bin/windows/Rtools/ and then run find_rtools().

@romainfrancois
Copy link
Member

I meant I'm pretty sure the issue is fixed in the dev version of dplyr. Good luck with Rtools.

@romainfrancois
Copy link
Member

FWIW it Works fine for me:

> data2 <- data %>% mutate_each(funs(removeNAs = ifelse(is.na(.), 0, .)), -gid, -plr_id, -plr_name) %>% tbl_df
> head(data2)
Source: local data frame [6 x 95]

  gid   plr_id           plr_name   hh_0029   hh_3039   hh_4049   hh_5059
1   1 01011101      St\xfclerstr. 0.1138438 0.1451994 0.1946726 0.1641798
2   3 01011103      L\xfctzowstr. 0.1211218 0.1486497 0.2139949 0.1828808
3   4 01011104      K\xf6rnerstr. 0.1103363 0.1756611 0.2389407 0.2101758
4   7 01011204     Leipziger Str. 0.1489445 0.1536207 0.1976222 0.1595184
5   8 01011301  Charit\xe9viertel 0.1385754 0.1616661 0.2335387 0.1913695
6   9 01011302 Oranienburger Str. 0.1453255 0.1690992 0.2381201 0.1904056
Variables not shown: hh_6069 (dbl), hh_70o (dbl), hh_ek1 (dbl), hh_ek2 (dbl),
  hh_ek3 (dbl), hh_ek4 (dbl), hh_ek5 (dbl), hh_ek6 (dbl), hh_1 (dbl), hh_2
  (dbl), hh_3o (dbl), ststr1 (dbl), ststr2 (dbl), ststr3 (dbl), ststr4 (dbl),
  ststr5 (dbl), ststr6 (dbl), ststr8 (dbl), ststr10 (dbl), ststr11 (dbl),
  ststr12 (dbl), ststr13 (dbl), ststr14 (dbl), ststr15 (dbl), ststr16 (dbl),
  ststr17 (dbl), ststr18 (dbl), flt1 (dbl), flt2 (dbl), flt3 (dbl), flt6 (dbl),
  flt7 (dbl), flt8 (dbl), flt9 (dbl), flt10 (dbl), flt11 (dbl), flt12 (dbl),
  flt13 (dbl), flt16 (dbl), flt17 (dbl), flt21 (dbl), flt22 (dbl), flt23 (dbl),
  flt24 (dbl), flt25 (dbl), flt27 (dbl), flt29 (dbl), flt30 (dbl), flt31 (dbl),
  flt32 (dbl), flt33 (dbl), flt36 (dbl), flt37 (dbl), flt38 (dbl), flt41 (dbl),
  flt43 (dbl), flt44 (dbl), flt45 (dbl), flt46 (dbl), flt47 (dbl), flt49 (dbl),
  flt51 (dbl), flt53 (dbl), flt54 (dbl), flt55 (dbl), flt56 (dbl), flt57 (dbl),
  flt58 (dbl), flt59 (dbl), flt60 (dbl), flt72 (dbl), flt73 (dbl), flt91 (dbl),
  flt92 (dbl), flt93 (dbl), flt94 (dbl), flt99 (dbl), flt100 (dbl), trips_wr
  (dbl), trips_br (dbl), trips_mr (dbl), trips_cr (dbl), trips_pr (dbl),
  trips_sr (dbl), trips_bpr (dbl), trips_cpr (dbl), trips_ppr (dbl), trips_ssr

@kogreger
Copy link
Author

Great, thanks for testing it!
At least now I know I'd be done once I get Rtools sorted out...

@kogreger
Copy link
Author

Additional update: It works fine on my Mac using dplyr 0.4.2 from CRAN...
Should be a Windows issue then...

@mschubert
Copy link

0.4.2 crashes for me reproducibly on Linux - please push a new release to CRAN as soon as possible.

@diegogarcilazo
Copy link

0.4.2 crashes for me in several scripts that before worked ok. I use Windows 10 Rstudio 0.99.467 and R 3.2.1. I should migrate code

@kogreger
Copy link
Author

Version 0.4.3 solves this issue. Thanks for the great work!

@allenwbenson
Copy link

I am using dplyr to do a bind_rows. I have to load plyr first, because I need to mutate first and the dplyr mutate gives me the "Error: not compatible with STRSXP". I do my plyr mutate first, then load dplyr, then try to bind_rows. Calling head() on any object from that point shuts down both R (3.2.2) and RStudio (0.99.486). I am using dplyr 0.4.3. My os is Windows 10.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants