Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added skip_absent() feature to setnames() #3111

Merged
merged 27 commits into from
Nov 14, 2018
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
baaf311
First commit. Code builds and checks correctly. Needs to be debugged.
M-YD Oct 3, 2018
12f5576
Added skip_absent() feature to setnames().
M-YD Oct 17, 2018
ece7dd4
Removed test file and updated description in NEWS.md
M-YD Oct 17, 2018
f16e15f
Merge branch 'master' into master
M-YD Oct 17, 2018
8b3139b
Moved new item to bottom of list.
M-YD Oct 17, 2018
e213af8
Split branch logic onto new lines.
M-YD Oct 17, 2018
fa11f45
Added newline.
M-YD Oct 17, 2018
bb75fe4
Removed files that were unwittingly ignored.
M-YD Oct 17, 2018
f7a1915
Moved new line onto newline.
M-YD Oct 17, 2018
364e843
Removed as per @MichaelChirico's request.
M-YD Oct 17, 2018
19dbcfd
Added .dll to .gitignore as this caused problems on my Windows 10 mac…
M-YD Oct 17, 2018
21bb14c
Added ^.*\.dll$ to .Rbuildignore as this caused problems on my Window…
M-YD Oct 17, 2018
4525d63
Added details about new feature (skip_absent()) to NEWS.
M-YD Oct 17, 2018
74e98ca
Added skip_absent() feature to setnames().
M-YD Oct 17, 2018
ad52224
Merge branch 'master' of https://github.com/MusTheDataGuy/data.table
M-YD Oct 17, 2018
c30963d
Expanded logical flow of conditional statement to improve readability.
M-YD Oct 17, 2018
bfa1db0
Update man/setattr.Rd
mattdowle Oct 18, 2018
543b9c7
Removed duplicate news item. Added thanks to @MusTheDataGuy.
M-YD Oct 18, 2018
ea36552
Merge branch 'master' of https://github.com/MusTheDataGuy/data.table
M-YD Oct 18, 2018
21d12a3
Cleaned up merge conflicts to bring code to latest version.
M-YD Oct 18, 2018
fee02b6
Update man/setattr.Rd
mattdowle Oct 18, 2018
c5debe7
Removed backtick character.
M-YD Oct 24, 2018
bf140dc
Merge branch 'master' into master
mattdowle Nov 13, 2018
13e857a
Update DESCRIPTION
mattdowle Nov 14, 2018
f491437
Update NEWS.md
mattdowle Nov 14, 2018
66f311f
indentation
mattdowle Nov 14, 2018
bf8db81
Added suggested tests
mattdowle Nov 14, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,4 @@
^bus$
^Dockerfile$
^Dockerfile\.in$
^.*\.dll$
63 changes: 32 additions & 31 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,31 +1,32 @@
# Source: https://github.com/github/gitignore/blob/master/R.gitignore
# History files
.RData
.Rhistory
.Rapp.history

# Package build process
*-Ex.R
data.table_*.tar.gz
data.table.Rcheck

# Emacs IDE files
.emacs.desktop
.emacs.desktop.lock

# RStudio IDE files
.Rproj.user
data.table.Rproj

# produced vignettes
vignettes/*.html
vignettes/*.pdf

# object and shared objects
*.o
*.so

*~
.DS_Store
.idea
*.sw[op]
# Source: https://github.com/github/gitignore/blob/master/R.gitignore
# History files
.RData
.Rhistory
.Rapp.history

# Package build process
*-Ex.R
data.table_*.tar.gz
data.table.Rcheck

# Emacs IDE files
.emacs.desktop
.emacs.desktop.lock

# RStudio IDE files
.Rproj.user
data.table.Rproj

# produced vignettes
vignettes/*.html
vignettes/*.pdf

# object and shared objects
*.o
*.so
*.dll

*~
.DS_Store
.idea
*.sw[op]
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ Authors@R: c(
person("Eduard","Antonyan", role="ctb"),
person("Markus","Bonsch", role="ctb"),
person("Hugh","Parsonage", role="ctb"),
person("Scott","Ritchie", role="ctb"))
person("Scott","Ritchie", role="ctb"),
person("Mus","Yaramaz-David", role="ctb"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People don't normally add themselves here. There are 53 contributors who have done something, no matter how small: https://github.com/Rdatatable/data.table/graphs/contributors. You would be listed there, and in NEWS. When a contributor has made sustained contributions, or a few big ones, I add them to DESCRIPTION which is displayed on CRAN. The contributors listed in DESCRIPTION are roughly listed at the top of https://github.com/Rdatatable/data.table/graphs/contributors.
At least, this is the way this has been done to date. We could change so that every single contributor is listed in DESCRIPTION if people want to.

Copy link
Contributor Author

@M-YD M-YD Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that something midway between the two would be ideal.

For example, those who make small, non-functional changes (such as correcting typos) wouldn't be included here but those who extend the functionality of data.table in some meaningful and/or useful way should be included.

That way it would prevent this file from becoming too large, whilst still retaining its relevance and meaning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the author field should be limited to those who have contributed code that the authors would not have ordinarily come up with. This PR is an enhancement but its implementation is quite elementary so should not qualify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this addition for now in this PR but have created #3144 to discuss further. If #3144 goes ahead, Mus would be added then along with other names. Trying to be utterly fair to everyone, past and present.

Depends: R (>= 3.1.0)
Imports: methods
Suggests: bit64, curl, R.utils, knitr, xts, nanotime, zoo
Expand Down
535 changes: 535 additions & 0 deletions NEWS.html

Large diffs are not rendered by default.

7 changes: 3 additions & 4 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

**If you are viewing this file on CRAN, please check [latest news on GitHub](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) where the formatting is also better.**

### Changes in v1.11.9 (to be v1.12.0)
Expand All @@ -7,7 +6,9 @@

1. `fread()` can now read a remote compressed file in one step; `fread("https://domain.org/file.csv.bz2")`. The `file=` argument now supports `.gz` and `.bz2` too; i.e. `fread(file="file.csv.gz")` works now where only `fread("file.csv.gz")` worked in 1.11.8.

2. `nomatch=NULL` now does the same as `nomatch=0L`; i.e. discards missing values silently (inner join). The default is still `nomatch=NA` (outer join) for statistical safety so that missing values are retained by default. You have to explicitly write `nomatch=NULL` to indicate to the reader of your code that you intend to discard missing values silently. After several years have elapsed, we will start to deprecate `0L`; please start using `NULL`. TO DO ... `nomatch=.(0)` fills with `0` instead of `NA`, [#857](https://github.com/Rdatatable/data.table/issues/857) and `nomatch="error"`.
3. `nomatch=NULL` now does the same as `nomatch=0L`; i.e. discards missing values silently (inner join). The default is still `nomatch=NA` (outer join) for statistical safety so that missing values are retained by default. You have to explicitly write `nomatch=NULL` to indicate to the reader of your code that you intend to discard missing values silently. After several years have elapsed, we will start to deprecate `0L`; please start using `NULL`. TO DO ... `nomatch=.(0)` fills with `0` instead of `NA`, [#857](https://github.com/Rdatatable/data.table/issues/857) and `nomatch="error"`.

4. In those cases where you need to rename columns in a `DT` but the columns aren't always known, `setnames()` now contains an additional argument (`skip_absent`) to skip them if they aren't present. For example, if you know that columns `a`, `b` and `d` are present in `DT`, but you don't know if column `c` is or isn't, then you can include `c` in `old` and if it isn't found, `setnames()` will simply skip to the next item of `old` rather than exit the function. **Note: The default behaviour of `setnames()` has not been altered as `skip_absent` is set to `FALSE` by default.** [#3030](https://github.com/Rdatatable/data.table/issues/3030)

#### BUG FIXES

Expand Down Expand Up @@ -540,5 +541,3 @@ When `j` is a symbol (as in the quanteda and xgboost examples above) it will con


### Old news from v1.9.8 (Nov 2016) back to v1.2 (Aug 2008) has been moved to [NEWS.0.md](https://github.com/Rdatatable/data.table/blob/master/NEWS.0.md)


12 changes: 9 additions & 3 deletions R/data.table.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

dim.data.table <- function(x)
{
.Call(Cdim, x)
Expand Down Expand Up @@ -2497,7 +2496,7 @@ setattr <- function(x,name,value) {
invisible(x)
}

setnames <- function(x,old,new) {
setnames <- function(x,old,new,skip_absent=FALSE) {
# Sets by reference, maintains truelength, no copy of table at all.
# But also more convenient than names(DT)[i]="newname" because we can also do setnames(DT,"oldname","newname")
# without an onerous match() ourselves. old can be positions, too, but we encourage by name for robustness.
Expand Down Expand Up @@ -2532,7 +2531,14 @@ setnames <- function(x,old,new) {
if (!is.character(old)) stop("'old' is type ",typeof(old)," but should be integer, double or character")
if (any(duplicated(old))) stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)],collapse=","))
i = chmatch(old,names(x))
if (anyNA(i)) stop("Items of 'old' not found in column names: ",paste(old[is.na(i)],collapse=","))
if (anyNA(i)){ if (skip_absent == TRUE){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either if (skip_absent) or (generally more preferred) if (isTRUE(skip_absent))

w <- old %chin% names(x)
M-YD marked this conversation as resolved.
Show resolved Hide resolved
old = old[w]
new = new[w]
i = i[w]
} else {
stop("Items of 'old' not found in column names: ",paste(old[is.na(i)],collapse=",")) }
}
if (any(tt<-!is.na(chmatch(old,names(x)[-i])))) stop("Some items of 'old' are duplicated (ambiguous) in column names: ",paste(old[tt],collapse=","))
}
if (length(new)!=length(i)) stop("'old' is length ",length(i)," but 'new' is length ",length(new))
Expand Down
5 changes: 5 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -12328,6 +12328,11 @@ test(1951.4, d1[d2, nomatch=3], error="nomatch= must be either NA or NULL .or 0
test(1952.1, d1[a==2, which=3], error="which= must be a logical vector length 1. Either FALSE, TRUE or NA.")
test(1952.2, d1[a==2, 2, which=TRUE], error="which==TRUE.*but j is also supplied")

# skip values that are not present in old, #3030
DT <- data.table(a = 1, b = 2, d = 3)
old <- c("a", "b", "c", "d")
new <- c("A", "B", "C", "D")
test(1953, setnames(DT, old, new, skip_absent = TRUE), DT <- data.table(A = 1, B = 2, D = 3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also test potential erroneous behavior/mis-use of skip_absent

Copy link
Member

@mattdowle mattdowle Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree need more tests please. skip_absent should only accept TRUE and FALSE and fail with error on anything else (use stopifnot). There should be a test when TRUE and all are missing, none are missing, two are missing, and when there are missings and the non-missings in old don't correspond to names(DT) in the same order. The DT <- in the y part of test 1953 can be removed too.


###################################
# Add new tests above this line #
Expand Down
12 changes: 10 additions & 2 deletions man/setattr.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,17 @@
}
\usage{
setattr(x,name,value)
setnames(x,old,new)
setnames(x,old,new,skip_absent=FALSE)
}
\arguments{
\item{x}{ \code{setnames} accepts \code{data.frame} and \code{data.table}. \code{setattr} accepts any input; e.g, list, columns of a \code{data.frame} or \code{data.table}. }
\item{name}{ The character attribute name. }
\item{value}{ The value to assign to the attribute or \code{NULL} removes the attribute, if present. }
\item{old}{ When \code{new} is provided, character names or numeric positions of column names to change. When \code{new} is not provided, the new column names, which must be the same length as the number of columns. See examples. }
\item{new}{ Optional. New column names, must be the same length as columns provided to \code{old} argument. }
\item{skip_absent}{ Skips cases where there is no match in \code{old}. Set to \code{FALSE} by default. Switch flag to \code{TRUE} to activate. }
M-YD marked this conversation as resolved.
Show resolved Hide resolved
}

\details{

\code{setnames} operates on \code{data.table} and \code{data.frame} not other types like \code{list} and \code{vector}. It can be used to change names \emph{by name} with built-in checks and warnings (e.g., if any old names are missing or appear more than once).
Expand All @@ -34,6 +36,13 @@ setnames(x,old,new)
}
\examples{

DT <- data.table(a = 1, b = 2, d = 3)

old <- c("a", "b", "c", "d")
new <- c("A", "B", "C", "D")

setnames(DT, old, new, skip_absent = TRUE) # check for column names in old and skip if item is absent
M-YD marked this conversation as resolved.
Show resolved Hide resolved

DF = data.frame(a=1:2,b=3:4) # base data.frame to demo copies and syntax
if (capabilities()["profmem"]) # usually memory profiling is available but just in case
tracemem(DF)
Expand Down Expand Up @@ -70,4 +79,3 @@ attr(DT,"myFlag2") # NULL

}
\keyword{ data }