Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.SD is locked for DT[, DT2[.SD]] joins #1926

Closed
franknarf1 opened this issue Nov 23, 2016 · 11 comments · Fixed by #3774
Closed

.SD is locked for DT[, DT2[.SD]] joins #1926

franknarf1 opened this issue Nov 23, 2016 · 11 comments · Fixed by #3774
Milestone

Comments

@franknarf1
Copy link
Contributor

franknarf1 commented Nov 23, 2016

Based on an SO question, I was trying something like...

library(data.table)
DT = data.table(id = 1:2, v = 3:4)
DT2 = data.table(id = 1, x = 5)
DT[id == 1, DT2[.SD, on="id"]]

Error in set(i, j = lc, value = newval) :
.SD is locked. Updating .SD by reference using := or set are reserved for future use. Use := in j directly. Or use copy(.SD) as a (slow) last resort, until shallow() is exported.

I wasn't expecting to see this error since I'm not using set.

Other SO posts to update when fixed:

@skanskan
Copy link

In this case .SD refers to the outer data.table or to the inner one?

@sebastian-c
Copy link

If you try DT[id == 1, DT2[dput(.SD), on="id"]]

You end up with

structure(list(id = 1L, v = 3L), .Names = c("id", "v"), class = c("data.table", 
"data.frame"), row.names = c(NA, -1L), .data.table.locked = TRUE)

This clearly comes from DT.

@skanskan
Copy link

skanskan commented Dec 7, 2016

And what if I want to refer to DT2 instead?

@franknarf1
Copy link
Contributor Author

@skanskan You cannot normally refer to .SD in i, so that would be a new FR.

@MichaelChirico
Copy link
Member

I usually just do DT2[DT[id==1], on = "id"] in this case. But it is I think unexpected that .SD errors here.

@franknarf1
Copy link
Contributor Author

franknarf1 commented Apr 12, 2017

Weird new example:

library(data.table)
df1 = setDT(structure(list(Measurement = structure(c(3L, 1L, 2L, 4L), .Label = c("Breadth", 
"Height", "Length", "Width"), class = "factor"), When = structure(c(1491592742, 
1486735990, 1484325914, 1479090924), class = c("POSIXct", "POSIXt"
), tzone = "")), .Names = c("Measurement", "When"), class = "data.frame", row.names = c(NA, 
-4L)))
df2 = setDT(structure(list(Measurement = structure(c(3L, 3L, 3L, 3L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L), .Label = c("Breadth", 
"Height", "Length", "Width"), class = "factor"), Datetime = structure(c(1491679142, 
1491765542, 1491769142, 1491851942, 1486822390, 1486908790, 1486995190, 
1487081590, 1487167990, 1484844314, 1484930714, 1485017114, 1485189914, 
1485535514, 1484325914, 1479004524, 1479177324, 1479436524, 1479609324, 
1479616524), class = c("POSIXct", "POSIXt"), tzone = ""), PassFail = structure(c(1L, 
1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 
2L, 1L, 2L), .Label = c("Fail", "Pass"), class = "factor")), .Names = c("Measurement", 
"Datetime", "PassFail"), row.names = c(NA, -20L), class = "data.frame"))

# this treats .SD as NULL? ... returns null data.table
df1[ 
  df2[.SD, on=.(Measurement, Datetime > When), 
    all(head(x.PassFail, 2) == "Fail")
  , by=.EACHI]$V1 
]

# errors with complaint about missing on=
df1[ 
  df2[.SD, on=.(Measurement, Datetime > When)]
]

# errors regarding .SD being locked as in my first post
df1[, df2[.SD, on=.(Measurement, Datetime > When), "bah", by=.EACHI]]

Data was borrowed from SO: http://stackoverflow.com/q/43373889/

Another example to update: http://stackoverflow.com/questions/43660562/find-matches-to-several-tables-conditional-full-join-using-data-table

Another example: https://stackoverflow.com/a/47818115/ should be able to write [.SD where it now is [mydf

mais um exemplo: https://stackoverflow.com/questions/48995398/sum-of-data-frames-rows-in-range-defined-by-columns#comment84996740_48995696

y https://stackoverflow.com/a/55029120/

mais um https://stackoverflow.com/questions/57167921/conditionally-merging-data-from-two-data-frames/57168406?noredirect=1#comment100887545_57168406

@MichaelChirico
Copy link
Member

Was just about to post this on SO as a new Q:

    library(Lahman)
    library(data.table)
    
    Teams = as.data.table(Teams)
    Pitching = as.data.table(Pitching)
    
    Pitching[G > 5, rank_in_team := frank(ERA), by = .(teamID, yearID)]
    Pitching[rank_in_team == 1, team_performance := 
               Teams[.SD, Rank, on = .(teamID, yearID)]]

A fix that works is:

    Pitching[rank_in_team == 1, team_performance := 
               Teams[copy(.SD), Rank, on = .(teamID, yearID)]]

@MichaelChirico
Copy link
Member

^ the above example is in the new .SD vignette. Should fix after this issue.

@MichaelChirico
Copy link
Member

Hey @franknarf1 could you file the other seemingly broken examples as a separate issue? I have a fix for the .data.table.locked thing going but it doesn't address those other two cases

@franknarf1
Copy link
Contributor Author

franknarf1 commented Aug 19, 2019

Thanks @MichaelChirico! -- I never noticed coercion was the common element, d'oh.

If you want to move ahead with what you have, maybe could close this issue and I will look at the other examples soon? (It might be a week or so. Pretty busy atm, and I have not installed except from CRAN or the master binaries for Windows in a long while.)

@jangorecki
Copy link
Member

jangorecki commented Aug 20, 2019

@franknarf1 just to let you know data.table::update.dev.pkg() will install from windows binaries if you are on 3.6 (or R-devel). No extra arguments are needed, just data.table::update.dev.pkg().

@mattdowle mattdowle added this to the 1.12.4 milestone Aug 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants