Added `skip_absent()` feature to `setnames()` #3111

M-YD · 2018-10-17T10:09:45Z

Added skip_absent() feature to setnames() so that elements not present in old won't cause the function to exit but will instead skip onto the next element.

skip_absent() is set to FALSE by default and thus will not affect the default behaviour of setnames().

When building this element, I became aware of some issues when building the package on a Windows 10 machine that related to .dll file conflicts, and as such updated .gitignore and .Rbuildignore which rectified the problem:

Added *.dll to .gitignore
Added ^.*\.dll$ to .Rbuildignore

I initially intended to add the isTrue(skip_absent) check as a standalone element within setnames() but found that it kept failing due to i being NA, which subsequently halted the setnames() function.

From there it was clear that the solution was to incorporate the isTrue(skip_absent) check within the if(anyNA(i)) { ... } check that occurs so that it is called when i is NA (i.e. absent). This in turn indicates that said element is not present, which logically implies that this is the correct place to check for any missing elements of old and to skip accordingly.

I would also like to thank @mattdowle for his assistance throughout this process.

MichaelChirico · 2018-10-17T12:26:12Z

.gitignore

+.DS_Store
+.idea
+*.sw[op]
+inst/tests/winallquoted.csv.bz2


why are these being ignored? aren't they part of the test suite?

MichaelChirico · 2018-10-17T12:28:05Z

NEWS.md

 **If you are viewing this file on CRAN, please check [latest news on GitHub](https://github.com/Rdatatable/data.table/blob/master/NEWS.md) where the formatting is also better.**

 ### Changes in v1.11.9 (to be v1.12.0)

 #### NEW FEATURES

-1. `fread()` can now read a remote compressed file in one step; `fread("https://domain.org/file.csv.bz2")`. The `file=` argument now supports `.gz` and `.bz2` too; i.e. `fread(file="file.csv.gz")` works now where only `fread("file.csv.gz")` worked in 1.11.8.
+1. In those cases where you need to rename columns in a `DT` but the columns aren't always known, `setnames()` now contains an additional argument (`skip_absent`) to skip them if they aren't present.  For example, if you know that columns `a`, `b` and `d` are present in `DT`, but you don't know if column `c` is or isn't, then you can include `c` in `old` and if it isn't found, `setnames()` will simply skip to the next item of `old` rather than exit the function.  **Note: The default behaviour of `setnames()` has not been altered as `skip_absent` is set to `FALSE` by default.** [#3030](https://github.com/Rdatatable/data.table/issues/3030)


Please add new NEWS items at the bottom

MichaelChirico · 2018-10-17T12:28:52Z

R/data.table.R

@@ -2532,7 +2531,7 @@ setnames <- function(x,old,new) {
      if (!is.character(old)) stop("'old' is type ",typeof(old)," but should be integer, double or character")
      if (any(duplicated(old))) stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)],collapse=","))
      i = chmatch(old,names(x))
-      if (anyNA(i)) stop("Items of 'old' not found in column names: ",paste(old[is.na(i)],collapse=","))
+      if (anyNA(i)){ if (skip_absent == TRUE){ w <- old %chin% names(x); old = old[w]; new = new[w]; i = i[w] } else { stop("Items of 'old' not found in column names: ",paste(old[is.na(i)],collapse=",")) } }


please split the branch logic onto new lines, it's a bit hard to read at the moment

MichaelChirico · 2018-10-17T12:29:20Z

inst/tests/tests.Rraw

+DT <- data.table(a = 1, b = 2, d = 3)
+old <- c("a", "b", "c", "d")
+new <- c("A", "B", "C", "D") 
+test(1953, setnames(DT, old, new, skip_absent = TRUE), DT <- data.table(A = 1, B = 2, D = 3))


please also test potential erroneous behavior/mis-use of skip_absent

agree need more tests please. skip_absent should only accept TRUE and FALSE and fail with error on anything else (use stopifnot). There should be a test when TRUE and all are missing, none are missing, two are missing, and when there are missings and the non-missings in old don't correspond to names(DT) in the same order. The DT <- in the y part of test 1953 can be removed too.

MichaelChirico · 2018-10-17T12:29:34Z

Thanks for the PR!

codecov · 2018-10-17T12:35:26Z

Codecov Report

Merging #3111 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3111      +/-   ##
==========================================
+ Coverage   91.95%   91.96%   +<.01%     
==========================================
  Files          61       61              
  Lines       11437    11444       +7     
==========================================
+ Hits        10517    10524       +7     
  Misses        920      920

Impacted Files	Coverage Δ
R/data.table.R	`92.62% <100%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9935f2d...bf8db81. Read the comment docs.

Improved readability.

MichaelChirico · 2018-10-17T12:56:37Z

I'm seeing a big whitespace diff on .gitignore... you're on Windows right? I suspect your editor is inserting \r\n newlines instead of the standard \n... could you try and confirm and, if so, stick to \n? 😅

MichaelChirico · 2018-10-17T12:57:57Z

Could you remove the new NEWS.html?

MichaelChirico · 2018-10-17T12:59:21Z

R/data.table.R

@@ -2532,7 +2531,13 @@ setnames <- function(x,old,new) {
      if (!is.character(old)) stop("'old' is type ",typeof(old)," but should be integer, double or character")
      if (any(duplicated(old))) stop("Some duplicates exist in 'old': ", paste(old[duplicated(old)],collapse=","))
      i = chmatch(old,names(x))
-      if (anyNA(i)) stop("Items of 'old' not found in column names: ",paste(old[is.na(i)],collapse=","))
+      if (anyNA(i)){ if (skip_absent == TRUE){


either if (skip_absent) or (generally more preferred) if (isTRUE(skip_absent))

R/data.table.R

…hine.

…s 10 machine.

# Conflicts: # .gitignore # R/data.table.R

mattdowle · 2018-10-17T22:01:00Z

inst/tests/tests.Rraw

+DT <- data.table(a = 1, b = 2, d = 3)
+old <- c("a", "b", "c", "d")
+new <- c("A", "B", "C", "D") 
+test(1953, setnames(DT, old, new, skip_absent = TRUE), DT <- data.table(A = 1, B = 2, D = 3))


agree need more tests please. skip_absent should only accept TRUE and FALSE and fail with error on anything else (use stopifnot). There should be a test when TRUE and all are missing, none are missing, two are missing, and when there are missings and the non-missings in old don't correspond to names(DT) in the same order. The DT <- in the y part of test 1953 can be removed too.

man/setattr.Rd

mattdowle · 2018-10-17T22:39:52Z

DESCRIPTION

@@ -12,7 +12,8 @@ Authors@R: c(
  person("Eduard","Antonyan", role="ctb"),
  person("Markus","Bonsch",   role="ctb"),
  person("Hugh","Parsonage",  role="ctb"),
-  person("Scott","Ritchie",   role="ctb"))
+  person("Scott","Ritchie",   role="ctb"),
+  person("Mus","Yaramaz-David", role="ctb"))


People don't normally add themselves here. There are 53 contributors who have done something, no matter how small: https://github.com/Rdatatable/data.table/graphs/contributors. You would be listed there, and in NEWS. When a contributor has made sustained contributions, or a few big ones, I add them to DESCRIPTION which is displayed on CRAN. The contributors listed in DESCRIPTION are roughly listed at the top of https://github.com/Rdatatable/data.table/graphs/contributors.
At least, this is the way this has been done to date. We could change so that every single contributor is listed in DESCRIPTION if people want to.

I think that something midway between the two would be ideal.

For example, those who make small, non-functional changes (such as correcting typos) wouldn't be included here but those who extend the functionality of data.table in some meaningful and/or useful way should be included.

That way it would prevent this file from becoming too large, whilst still retaining its relevance and meaning.

I think the author field should be limited to those who have contributed code that the authors would not have ordinarily come up with. This PR is an enhancement but its implementation is quite elementary so should not qualify.

I'll remove this addition for now in this PR but have created #3144 to discuss further. If #3144 goes ahead, Mus would be added then along with other names. Trying to be utterly fair to everyone, past and present.

mattdowle · 2018-10-17T22:50:26Z

NEWS.md

+
+4. In those cases where you need to rename columns in a `DT` but the columns aren't always known, `setnames()` now contains an additional argument (`skip_absent`) to skip them if they aren't present.  For example, if you know that columns `a`, `b` and `d` are present in `DT`, but you don't know if column `c` is or isn't, then you can include `c` in `old` and if it isn't found, `setnames()` will simply skip to the next item of `old` rather than exit the function.  **Note: The default behaviour of `setnames()` has not been altered as `skip_absent` is set to `FALSE` by default.** [#3030](https://github.com/Rdatatable/data.table/issues/3030)
+
+3. In those cases where you need to rename columns in a `DT` but the columns aren't always known, `setnames()` now contains an additional argument (`skip_absent`) to skip them if they aren't present.  For example, if you know that columns `a`, `b` and `d` are present in `DT`, but you don't know if column `c` is or isn't, then you can include `c` in `old` and if it isn't found, `setnames()` will simply skip to the next item of `old` rather than exit the function.  **Note: The default behaviour of `setnames()` has not been altered as `skip_absent` is set to `FALSE` by default.** [#3030](https://github.com/Rdatatable/data.table/issues/3030)


Looks like a duplicate new item here. Please also add "Thanks to @MusTheDataGuy for the suggestion (#3030) and the PR (#3111)." (i.e. item 2 in Contributing guide mentions this.)

mattdowle · 2018-10-17T22:52:03Z

Looks great. Just a few minor tweaks and then it can be merged (see comments inline). Thanks!

M-YD · 2018-10-18T10:27:59Z

I'm seeing a big whitespace diff on .gitignore... you're on Windows right? I suspect your editor is inserting \r\n newlines instead of the standard \n... could you try and confirm and, if so, stick to \n? 😅

That's right, I'm running Windows 10. I have updated the file now and will do this for any other affected files also.

For those who are interested, the way I fixed the problem was to change the setting of my code editor (VS Code). It is set to CRLF by default; I have switched it to LF and the problem is now fixed.

HughParsonage · 2018-10-20T12:50:56Z

R/data.table.R

@@ -18,7 +17,7 @@ setPackageName("data.table",.global)

 .SD = .N = .I = .GRP = .BY = .EACHI = NULL
 # These are exported to prevent NOTEs from R CMD check, and checkUsage via compiler.
-# But also exporting them makes it clear (to users and other packages) that data.table uses these as symbols.
+# But also exporting them makes it clear (to users and other packages) that data.table uses these as symbols.`


Please delete.

Atrebas · 2019-04-05T06:07:50Z

Minor remark, in ?setnames, the comments on the lines below are inaccurate:

setnames(DT,"b","B")               # by name, no match() needed (warning if "b" is missing)
setnames(DT,c("a","E"),c("A","F")) # multiple by name (warning if either "a" or "E" is missing)

M-YD added 4 commits October 3, 2018 09:29

First commit. Code builds and checks correctly. Needs to be debugged.

baaf311

Added skip_absent() feature to setnames().

12f5576

Removed test file and updated description in NEWS.md

ece7dd4

Merge branch 'master' into master

f16e15f

MichaelChirico reviewed Oct 17, 2018

View reviewed changes

M-YD added 4 commits October 17, 2018 13:43

Moved new item to bottom of list.

8b3139b

Split branch logic onto new lines.

e213af8

Improved readability.

Added newline.

fa11f45

Removed files that were unwittingly ignored.

bb75fe4

Moved new line onto newline.

f7a1915

MichaelChirico reviewed Oct 17, 2018

View reviewed changes

R/data.table.R Outdated Show resolved Hide resolved

M-YD added 7 commits October 17, 2018 14:29

Removed as per @MichaelChirico's request.

364e843

Added .dll to .gitignore as this caused problems on my Windows 10 mac…

19dbcfd

…hine.

Added ^.*\.dll$ to .Rbuildignore as this caused problems on my Window…

21bb14c

…s 10 machine.

Added details about new feature (skip_absent()) to NEWS.

4525d63

Added skip_absent() feature to setnames().

74e98ca

Merge branch 'master' of https://github.com/MusTheDataGuy/data.table

ad52224

# Conflicts: # .gitignore # R/data.table.R

Expanded logical flow of conditional statement to improve readability.

c30963d

mattdowle added this to the 1.12.0 milestone Oct 17, 2018

mattdowle changed the title ~~[Closes #3030] Added skip_absent() feature to setnames()~~ Added skip_absent() feature to setnames() Oct 17, 2018

mattdowle reviewed Oct 17, 2018

View reviewed changes

mattdowle and others added 4 commits October 18, 2018 09:31

Update man/setattr.Rd

bfa1db0

Removed duplicate news item. Added thanks to @MusTheDataGuy.

543b9c7

Merge branch 'master' of https://github.com/MusTheDataGuy/data.table

ea36552

Cleaned up merge conflicts to bring code to latest version.

21d12a3

M-YD closed this Oct 18, 2018

M-YD reopened this Oct 18, 2018

Update man/setattr.Rd

fee02b6

HughParsonage reviewed Oct 20, 2018

View reviewed changes

M-YD and others added 2 commits October 24, 2018 09:08

Removed backtick character.

c5debe7

Merge branch 'master' into master

bf140dc

mattdowle mentioned this pull request Nov 14, 2018

Expand contributors field in DESCRIPTION? #3144

Closed

mattdowle added 4 commits November 13, 2018 16:26

Update DESCRIPTION

13e857a

Update NEWS.md

f491437

indentation

66f311f

Added suggested tests

bf8db81

mattdowle merged commit 41b63ea into Rdatatable:master Nov 14, 2018

MichaelChirico mentioned this pull request Nov 9, 2020

Allow setnames to include old columns not in x #1099

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `skip_absent()` feature to `setnames()` #3111

Added `skip_absent()` feature to `setnames()` #3111

M-YD commented Oct 17, 2018 •

edited

Loading

MichaelChirico Oct 17, 2018

MichaelChirico Oct 17, 2018

MichaelChirico Oct 17, 2018

MichaelChirico Oct 17, 2018

mattdowle Oct 17, 2018 •

edited

Loading

MichaelChirico commented Oct 17, 2018

codecov bot commented Oct 17, 2018 •

edited

Loading

MichaelChirico commented Oct 17, 2018 •

edited

Loading

MichaelChirico commented Oct 17, 2018

MichaelChirico Oct 17, 2018

mattdowle Oct 17, 2018 •

edited

Loading

mattdowle Oct 17, 2018

M-YD Oct 18, 2018 •

edited

Loading

HughParsonage Oct 20, 2018

mattdowle Nov 14, 2018

mattdowle Oct 17, 2018

mattdowle commented Oct 17, 2018

M-YD commented Oct 18, 2018

HughParsonage Oct 20, 2018

Atrebas commented Apr 5, 2019 •

edited

Loading


		4. In those cases where you need to rename columns in a `DT` but the columns aren't always known, `setnames()` now contains an additional argument (`skip_absent`) to skip them if they aren't present. For example, if you know that columns `a`, `b` and `d` are present in `DT`, but you don't know if column `c` is or isn't, then you can include `c` in `old` and if it isn't found, `setnames()` will simply skip to the next item of `old` rather than exit the function. Note: The default behaviour of `setnames()` has not been altered as `skip_absent` is set to `FALSE` by default. [#3030](https://github.com/Rdatatable/data.table/issues/3030)

		3. In those cases where you need to rename columns in a `DT` but the columns aren't always known, `setnames()` now contains an additional argument (`skip_absent`) to skip them if they aren't present. For example, if you know that columns `a`, `b` and `d` are present in `DT`, but you don't know if column `c` is or isn't, then you can include `c` in `old` and if it isn't found, `setnames()` will simply skip to the next item of `old` rather than exit the function. Note: The default behaviour of `setnames()` has not been altered as `skip_absent` is set to `FALSE` by default. [#3030](https://github.com/Rdatatable/data.table/issues/3030)

Added skip_absent() feature to setnames() #3111

Added skip_absent() feature to setnames() #3111

Conversation

M-YD commented Oct 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdowle Oct 17, 2018 • edited Loading

Choose a reason for hiding this comment

MichaelChirico commented Oct 17, 2018

codecov bot commented Oct 17, 2018 • edited Loading

Codecov Report

MichaelChirico commented Oct 17, 2018 • edited Loading

MichaelChirico commented Oct 17, 2018

Choose a reason for hiding this comment

mattdowle Oct 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

M-YD Oct 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdowle commented Oct 17, 2018

M-YD commented Oct 18, 2018

Choose a reason for hiding this comment

Atrebas commented Apr 5, 2019 • edited Loading

Added `skip_absent()` feature to `setnames()` #3111

Added `skip_absent()` feature to `setnames()` #3111

M-YD commented Oct 17, 2018 •

edited

Loading

mattdowle Oct 17, 2018 •

edited

Loading

codecov bot commented Oct 17, 2018 •

edited

Loading

MichaelChirico commented Oct 17, 2018 •

edited

Loading

mattdowle Oct 17, 2018 •

edited

Loading

M-YD Oct 18, 2018 •

edited

Loading

Atrebas commented Apr 5, 2019 •

edited

Loading