Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: proper inplace argument (like in Pandas) #3682

Closed
randomgambit opened this issue Jul 5, 2019 · 14 comments
Closed

feature request: proper inplace argument (like in Pandas) #3682

randomgambit opened this issue Jul 5, 2019 · 14 comments

Comments

@randomgambit
Copy link

Hello there,

Its me again. Sorry for posting all these requests but I am super excited about DT and I feel just a few extra features are missing to make DT absolutely great.

Something that I find quite annoyting when using := is that it operates in place by default. This is super annoying because during data work I create new variables all the time (for testing/exploring the data). This does not mean I want these variables to be added to my dataframe!

I know I can do something like

annoyingfunc <- function(DT){
  DT <- copy(DT)
  DT[, mytempvar := 1][]
}

but this is really not efficient at all. Could it be possible to add some inplace = FALSE like in Pandas so that the output of
DT[, newvar := 'hello', inplace = FALSE][]

prints BUT my DT is not modified? What do you think?
Thanks!!

@MichaelChirico
Copy link
Member

Actually you could do that quite easily like:

iris = setDT(copy(iris))

iris[TRUE][ , temp_var := 1]
names(iris)
# [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species" 

I understand -- I believe you could also do iris[NULL] or iris[1:.N] instead.

Alternatively, you could add the new column in j without :=:

iris[ , c(.SD, temp_var = 1)][1]
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species temp_var
# 1:          5.1         3.5          1.4         0.2  setosa        1

@randomgambit
Copy link
Author

ah! crazy trick! This should added to the main DT page !
thanks!

@MichaelChirico
Copy link
Member

Sure thing! The above are all a bit hack-ish (besides the c(.SD, ...) approach), putting it in the FAQ/intro would mean committing to supporting & maintaining this API going forward.

So let's leave this issue open as the place to decide how we want to go about supporting affirmatively what's mentioned here.

@randomgambit
Copy link
Author

ah ok but do you mean the safest way to do so is with the c(.SD) trick?

@jangorecki
Copy link
Member

Safest IMO is c(.SD, list(var=val)), I recommend you to understand why the extra 'list'

@randomgambit
Copy link
Author

thanks! I actually dont undersdant why I need the extra list here. That was working without it?!

@moodymudskipper
Copy link

because you're concatenating a list (.SD) with a length one element here, if you has a length 2 element you would add 2 elements to your list, which might not be what you want. If you wrap it in list you're safe

@MichaelChirico
Copy link
Member

MichaelChirico commented Jul 5, 2019

Just that that is very canonical way to select all-existing-columns-plus-one-new-column & the return is a copy of the original (not updating the original)

Just saw Jan's comment with which I agree, I was just trying to be concise but personally I also would do the c(.SD, list(var=val)) approach

@randomgambit
Copy link
Author

oh I see. then if I have multiple columns I can just type c(.SD, list(var=val, michael = 'chirico')), right?

@MichaelChirico
Copy link
Member

yep, as well as defining nontrivial columns by dropping in a full-length vector

@shrektan
Copy link
Member

shrektan commented Jul 6, 2019

IMO, a shallow copy elegant solves this... but can anybody tell me the reason that shallow() is not ready to be exported?

@randomgambit
Copy link
Author

so to summarize, the best way to work with a copy without the need for the function I defined above (which is actually mentioned in the FAQ) is to use the powerful chirico[TRUE]chirico[do.some.stuff.without.consequences]

Thanks!!

@franknarf1
Copy link
Contributor

[ @shrektan ] IMO, a shallow copy elegant solves this... but can anybody tell me the reason that shallow() is not ready to be exported?

Fyi, #2323 links to SO where Matt's answer and comments might explain. It looks like he was thinking the more elegant way would be to change copy to act like shallow (so the latter can be removed and the user doesn't have to think about the distinction between deep and shallow copies).

@jangorecki
Copy link
Member

jangorecki commented Jul 30, 2019

Not sure but one of the reasons might be lack of reference counting mechanism in base R. Actually it is there, implemented, ready, but not enabled.
According to comments in this thread there is no need for new inplace argument, using c(.SD, ...) or (shallow) copy is a way to go, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants