-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added skip_absent()
feature to setnames()
#3111
Conversation
.gitignore
Outdated
.DS_Store | ||
.idea | ||
*.sw[op] | ||
inst/tests/winallquoted.csv.bz2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these being ignored? aren't they part of the test suite?
NEWS.md
Outdated
**If you are viewing this file on CRAN, please check [latest news on GitHub](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) where the formatting is also better.** | ||
|
||
### Changes in v1.11.9 (to be v1.12.0) | ||
|
||
#### NEW FEATURES | ||
|
||
1. `fread()` can now read a remote compressed file in one step; `fread("https://domain.org/file.csv.bz2")`. The `file=` argument now supports `.gz` and `.bz2` too; i.e. `fread(file="file.csv.gz")` works now where only `fread("file.csv.gz")` worked in 1.11.8. | ||
1. In those cases where you need to rename columns in a `DT` but the columns aren't always known, `setnames()` now contains an additional argument (`skip_absent`) to skip them if they aren't present. For example, if you know that columns `a`, `b` and `d` are present in `DT`, but you don't know if column `c` is or isn't, then you can include `c` in `old` and if it isn't found, `setnames()` will simply skip to the next item of `old` rather than exit the function. **Note: The default behaviour of `setnames()` has not been altered as `skip_absent` is set to `FALSE` by default.** [#3030](https://github.com/Rdatatable/data.table/issues/3030) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add new NEWS
items at the bottom
R/data.table.R
Outdated
@@ -2532,7 +2531,7 @@ setnames <- function(x,old,new) { | |||
if (!is.character(old)) stop("'old' is type ",typeof(old)," but should be integer, double or character") | |||
if (any(duplicated(old))) stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)],collapse=",")) | |||
i = chmatch(old,names(x)) | |||
if (anyNA(i)) stop("Items of 'old' not found in column names: ",paste(old[is.na(i)],collapse=",")) | |||
if (anyNA(i)){ if (skip_absent == TRUE){ w <- old %chin% names(x); old = old[w]; new = new[w]; i = i[w] } else { stop("Items of 'old' not found in column names: ",paste(old[is.na(i)],collapse=",")) } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please split the branch logic onto new lines, it's a bit hard to read at the moment
inst/tests/tests.Rraw
Outdated
DT <- data.table(a = 1, b = 2, d = 3) | ||
old <- c("a", "b", "c", "d") | ||
new <- c("A", "B", "C", "D") | ||
test(1953, setnames(DT, old, new, skip_absent = TRUE), DT <- data.table(A = 1, B = 2, D = 3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also test potential erroneous behavior/mis-use of skip_absent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree need more tests please. skip_absent
should only accept TRUE
and FALSE
and fail with error on anything else (use stopifnot
). There should be a test when TRUE
and all are missing, none are missing, two are missing, and when there are missings and the non-missings in old don't correspond to names(DT)
in the same order. The DT <-
in the y part of test 1953 can be removed too.
Thanks for the PR! |
Codecov Report
@@ Coverage Diff @@
## master #3111 +/- ##
==========================================
+ Coverage 91.95% 91.96% +<.01%
==========================================
Files 61 61
Lines 11437 11444 +7
==========================================
+ Hits 10517 10524 +7
Misses 920 920
Continue to review full report at Codecov.
|
Could you remove the new |
R/data.table.R
Outdated
@@ -2532,7 +2531,13 @@ setnames <- function(x,old,new) { | |||
if (!is.character(old)) stop("'old' is type ",typeof(old)," but should be integer, double or character") | |||
if (any(duplicated(old))) stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)],collapse=",")) | |||
i = chmatch(old,names(x)) | |||
if (anyNA(i)) stop("Items of 'old' not found in column names: ",paste(old[is.na(i)],collapse=",")) | |||
if (anyNA(i)){ if (skip_absent == TRUE){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either if (skip_absent)
or (generally more preferred) if (isTRUE(skip_absent))
# Conflicts: # .gitignore # R/data.table.R
skip_absent()
feature to setnames()
skip_absent()
feature to setnames()
inst/tests/tests.Rraw
Outdated
DT <- data.table(a = 1, b = 2, d = 3) | ||
old <- c("a", "b", "c", "d") | ||
new <- c("A", "B", "C", "D") | ||
test(1953, setnames(DT, old, new, skip_absent = TRUE), DT <- data.table(A = 1, B = 2, D = 3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree need more tests please. skip_absent
should only accept TRUE
and FALSE
and fail with error on anything else (use stopifnot
). There should be a test when TRUE
and all are missing, none are missing, two are missing, and when there are missings and the non-missings in old don't correspond to names(DT)
in the same order. The DT <-
in the y part of test 1953 can be removed too.
DESCRIPTION
Outdated
@@ -12,7 +12,8 @@ Authors@R: c( | |||
person("Eduard","Antonyan", role="ctb"), | |||
person("Markus","Bonsch", role="ctb"), | |||
person("Hugh","Parsonage", role="ctb"), | |||
person("Scott","Ritchie", role="ctb")) | |||
person("Scott","Ritchie", role="ctb"), | |||
person("Mus","Yaramaz-David", role="ctb")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
People don't normally add themselves here. There are 53 contributors who have done something, no matter how small: https://github.com/Rdatatable/data.table/graphs/contributors. You would be listed there, and in NEWS. When a contributor has made sustained contributions, or a few big ones, I add them to DESCRIPTION which is displayed on CRAN. The contributors listed in DESCRIPTION are roughly listed at the top of https://github.com/Rdatatable/data.table/graphs/contributors.
At least, this is the way this has been done to date. We could change so that every single contributor is listed in DESCRIPTION if people want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that something midway between the two would be ideal.
For example, those who make small, non-functional changes (such as correcting typos) wouldn't be included here but those who extend the functionality of data.table
in some meaningful and/or useful way should be included.
That way it would prevent this file from becoming too large, whilst still retaining its relevance and meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the author field should be limited to those who have contributed code that the authors would not have ordinarily come up with. This PR is an enhancement but its implementation is quite elementary so should not qualify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEWS.md
Outdated
|
||
4. In those cases where you need to rename columns in a `DT` but the columns aren't always known, `setnames()` now contains an additional argument (`skip_absent`) to skip them if they aren't present. For example, if you know that columns `a`, `b` and `d` are present in `DT`, but you don't know if column `c` is or isn't, then you can include `c` in `old` and if it isn't found, `setnames()` will simply skip to the next item of `old` rather than exit the function. **Note: The default behaviour of `setnames()` has not been altered as `skip_absent` is set to `FALSE` by default.** [#3030](https://github.com/Rdatatable/data.table/issues/3030) | ||
|
||
3. In those cases where you need to rename columns in a `DT` but the columns aren't always known, `setnames()` now contains an additional argument (`skip_absent`) to skip them if they aren't present. For example, if you know that columns `a`, `b` and `d` are present in `DT`, but you don't know if column `c` is or isn't, then you can include `c` in `old` and if it isn't found, `setnames()` will simply skip to the next item of `old` rather than exit the function. **Note: The default behaviour of `setnames()` has not been altered as `skip_absent` is set to `FALSE` by default.** [#3030](https://github.com/Rdatatable/data.table/issues/3030) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just a few minor tweaks and then it can be merged (see comments inline). Thanks! |
That's right, I'm running Windows 10. I have updated the file now and will do this for any other affected files also. For those who are interested, the way I fixed the problem was to change the setting of my code editor (VS Code). It is set to |
R/data.table.R
Outdated
@@ -18,7 +17,7 @@ setPackageName("data.table",.global) | |||
|
|||
.SD = .N = .I = .GRP = .BY = .EACHI = NULL | |||
# These are exported to prevent NOTEs from R CMD check, and checkUsage via compiler. | |||
# But also exporting them makes it clear (to users and other packages) that data.table uses these as symbols. | |||
# But also exporting them makes it clear (to users and other packages) that data.table uses these as symbols.` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete.
Minor remark, in
|
Closes #3030
Added
skip_absent()
feature tosetnames()
so that elements not present inold
won't cause the function to exit but will instead skip onto the next element.skip_absent()
is set toFALSE
by default and thus will not affect the default behaviour ofsetnames()
.When building this element, I became aware of some issues when building the package on a Windows 10 machine that related to
.dll
file conflicts, and as such updated.gitignore
and.Rbuildignore
which rectified the problem:*.dll
to.gitignore
^.*\.dll$
to.Rbuildignore
I initially intended to add the
isTrue(skip_absent)
check as a standalone element withinsetnames()
but found that it kept failing due toi
beingNA
, which subsequently halted thesetnames()
function.From there it was clear that the solution was to incorporate the
isTrue(skip_absent)
check within theif(anyNA(i)) { ... }
check that occurs so that it is called wheni
isNA
(i.e. absent). This in turn indicates that said element is not present, which logically implies that this is the correct place to check for any missing elements ofold
and to skip accordingly.I would also like to thank @mattdowle for his assistance throughout this process.