Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gc() race in data.table with R-devel #2882

Merged
merged 5 commits into from
May 22, 2018
Merged

gc() race in data.table with R-devel #2882

merged 5 commits into from
May 22, 2018

Conversation

mattdowle
Copy link
Member

@mattdowle mattdowle commented May 15, 2018

Closes #2866
Closes #2767

The approach of this PR is to ensure ALTREP vectors are not allowed as columns in a DT. We like and benefit from ALTREP very much in R code such as [.data.table where sequence vectors are used a lot. But as columns in a data.table, ALTREPs are not so appropriate. Internals like := assign by reference are rewritten on the basis of columns already being materialized (expanded).

setDT() now expands ALTREP columns. The reproducible examples used setDT() to create the test data.table because data.table() already expanded ALTREP columns (by happy accident) in CcopyNamedInList. That function now checks for ALTREP just in case (as well as MAYBE_REFERENCED as before) just to be safe.

Luke said that ALTREPs may in future be more than just sequence vectors; e.g. distributed arrays that cannot be expanded. But in that case, data.table will need code changes anyway to deal with such arrays. If and when that happens, the expansion will fail on such ALTREPs which is reasonable, graceful behaviour; much better than a subsequent gc race at least.

The above is long-term approach with no plans to change; i.e. data.table is unlikely to ever support ALTREP columns in a data.table.

What is short term though, is in all the parallel regions this PR will add checks that no SEXP being used inside the parallel region are ALTREPs (and fail if so). In R-devel, there's only a problem with INTEGER(), REAL() etc on ALTREP vectors. Those functions are still thread-safe on regular vectors, currently. This is a short term solution in the interests of getting an update to CRAN which is intermittently in error state on R-devel due to the gc race. In future we will still take all API use out of parallel regions as Luke suggested. That involves a new approach around the parallel regions which will take time to work through.

  • setDT() expands ALTREPs (data.table() already did)
  • tests added
  • news item
    Add checks before all parallel regions that no ALTREPs are present.
    All files with parallel regions :
  • subset.c

The following were moved to follow up PR #2899 : between.c, freadR.c, fwrite.c, fsort.c, reorder.c

@mattdowle mattdowle added this to the 1.11.4 milestone May 15, 2018
@mattdowle mattdowle changed the title DATAPTR gc() race in R-devel gc() race in R-devel May 15, 2018
@mattdowle mattdowle changed the title gc() race in R-devel gc() race in data.table with R-devel May 15, 2018
@codecov-io
Copy link

codecov-io commented May 15, 2018

Codecov Report

Merging #2882 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2882      +/-   ##
==========================================
+ Coverage   93.47%   93.47%   +<.01%     
==========================================
  Files          61       61              
  Lines       12355    12356       +1     
==========================================
+ Hits        11549    11550       +1     
  Misses        806      806
Impacted Files Coverage Δ
src/subset.c 97.35% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update acca623...6ac5831. Read the comment docs.

@mattdowle mattdowle merged commit 47560d0 into master May 22, 2018
@mattdowle mattdowle deleted the dataptr branch May 22, 2018 21:02
@jangorecki
Copy link
Member

jangorecki commented May 23, 2018

memtest after this build from R devel Windows

i386

memtest_i386

x64

memtest_x64
looks it is fixed 🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants