Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate transmute() from mutate(.keep = "none") #6086

Closed
DavisVaughan opened this issue Nov 16, 2021 · 0 comments · Fixed by #6087
Closed

Separate transmute() from mutate(.keep = "none") #6086

DavisVaughan opened this issue Nov 16, 2021 · 0 comments · Fixed by #6087

Comments

@DavisVaughan
Copy link
Member

In #6035 we reworked .keep to be more consistent for mutate(). I still stand by this decision in the context of mutate(), but this has changed the behavior of transmute(), since that is currently mutate(.keep = "none"). Here is a good example that shows the change in behavior for both group-vars and modified non-group vars.

library(dplyr)

df <- tibble(
  g1 = 1:3,
  g2 = 1:3,
  x = 1:3,
  y = 4:6
)

gdf <- group_by(df, g1, g2)

transmute(gdf, x = x + 1, z = x + 1, y = y + 1, g1 = g1 + 1)

# CRAN:
# - The column order supplied in `...` is kept.
# - Grouping vars not modified by `...` are kept at the front.

#> # A tibble: 3 × 5
#> # Groups:   g1, g2 [3]
#>      g2     x     z     y    g1
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1     2     3     5     2
#> 2     2     3     4     6     3
#> 3     3     4     5     7     4

# Current dev:
# The column ordering comes from:
# - Modified columns are altered in place
# - New columns are added at the end

#> # A tibble: 3 × 5
#> # Groups:   g1, g2 [3]
#>      g1    g2     x     y     z
#>   <dbl> <int> <dbl> <dbl> <dbl>
#> 1     2     1     2     5     3
#> 2     3     2     3     6     4
#> 3     4     3     4     7     5

It turns out that many people want transmute() to keep the current CRAN behavior because they use it for mixed selection and mutation and expect a specific column ordering from it.

We previously believed that transmute() had swapped between these two outputs over various dplyr releases, but this was actually not true, as seen in #6080 (comment). The current CRAN behavior has always been how transmute() works. In light of this, we believe we should retain the CRAN behavior of transmute() for the next dplyr release.

That said, mutate(.keep = "none") should retain its current dev behavior. This is an experimental argument, so changing it should not affect too many users. The dev behavior of .keep = "none" is overall more consistent with the rest of the mutate() options, makes it easier to predict the output when combined with .before and .after, and simplifies the implementation because it means that .keep never affects the column ordering, it is mainly about which columns get dropped (#6035 goes into this in great detail).

So, the action items are:

  • Fix transmute() to revert to the CRAN behavior, which requires giving it its own implementation separate from mutate()
  • Update the NEWS bullet to only mention the change in .keep
  • Separate any comparison of transmute() and .keep = "none" in the documentation, making it clear how those are different
jonkeane pushed a commit to apache/arrow that referenced this issue Apr 7, 2022
…option is set

This PR does two things to match some dplyr behaviour around column order:

1) Mimics dplyr implementation of `mutate(..., .keep = "none")` to append new columns after the existing columns (if suggested) as [per](tidyverse/dplyr#6086)

2) As per this [discussion](tidyverse/dplyr#6086), this required a bespoke approach to `transmute` as it not simply a wrapper for `mutate(..., .keep = "none")`. This cascades into needing to catch a couple edge cases.

I have also added some tests which will test for this behaviour.

Closes #12818 from boshek/mutate-keep

Authored-by: SAm Albers <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
jcralmeida pushed a commit to rafael-telles/arrow that referenced this issue Apr 19, 2022
…option is set

This PR does two things to match some dplyr behaviour around column order:

1) Mimics dplyr implementation of `mutate(..., .keep = "none")` to append new columns after the existing columns (if suggested) as [per](tidyverse/dplyr#6086)

2) As per this [discussion](tidyverse/dplyr#6086), this required a bespoke approach to `transmute` as it not simply a wrapper for `mutate(..., .keep = "none")`. This cascades into needing to catch a couple edge cases.

I have also added some tests which will test for this behaviour.

Closes apache#12818 from boshek/mutate-keep

Authored-by: SAm Albers <[email protected]>
Signed-off-by: Jonathan Keane <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant