Add `group_initial_split()` and `group_validation_split()` #315

mikemahoney218 · 2022-06-28T17:36:42Z

This PR adds group_initial_split() and group_validation_split(). These are documented as part of initial_split() and validation_split(), and don't have their own classes, in order to match the time-based methods. We might consider doing the same for group_vfold_cv() and group_mc_cv() (or, possibly changing these to use their own classes).

It also fixes a bug where group_mc_cv(..., times = 1) would never produce an assessment set, because all data would be assigned to the same fold.

library(rsample)

set.seed(2)
dat1 <- data.frame(a = 1:20, b = letters[1:20], c = rep(1:4, 5))

group_initial_split(dat1, c) |> 
  testing()
#>     a b c
#> 1   1 a 1
#> 5   5 e 1
#> 9   9 i 1
#> 13 13 m 1
#> 17 17 q 1

group_initial_split(dat1, c) |> 
  testing()
#>     a b c
#> 2   2 b 2
#> 6   6 f 2
#> 10 10 j 2
#> 14 14 n 2
#> 18 18 r 2

group_validation_split(dat1, c)$splits[[1]] |> 
  assessment()
#>     a b c
#> 4   4 d 4
#> 8   8 h 4
#> 12 12 l 4
#> 16 16 p 4
#> 20 20 t 4

group_validation_split(dat1, c)$splits[[1]] |> 
  assessment()
#>     a b c
#> 3   3 c 3
#> 7   7 g 3
#> 11 11 k 3
#> 15 15 o 3
#> 19 19 s 3

^{Created on 2022-06-28 by the reprex package (v2.0.1)}

juliasilge · 2022-06-29T17:01:05Z

Can you open another issue to outline the inconsistency in the classes? And walk through which ones could be addressed?

The way names have ended up is a bit of a bummer. What is the best option?

initial_split() + initial_time_split() + group_initial_split()? This one, not creating an rset but a split, I could see making consistent and all starting with initial_*().
validation_split() + validation_time_split() + group_validation_split()? For this one, it definitely makes more sense to have the new function start with group_*() because it belongs with all the other group_*() rset functions. Should we change the name of validation_time_split()? Or just leave it?

mikemahoney218 · 2022-06-29T17:56:34Z

Opened #318 for classes.

As for names: I personally don't hate that the names are inconsistent for these two. If we had more time-based functions, I think I'd rather prefix with time_ (so change validation_time_split() so it matches sliding_*, group_*, int_*), but I don't know that it's necessary here (especially since these aren't really time-based, so much as "number of rows" based). Not the strongest opinion I've ever had, but I think I would rather the initial_split/validation_split derivatives be inconsistent, instead of the group_* derivatives.

juliasilge · 2022-06-29T18:23:29Z

OK, sounds good on the naming! 👍

juliasilge

So great! 🚀

github-actions · 2022-07-14T02:27:10Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

mikemahoney218 added 3 commits June 28, 2022 12:57

Add group_initial_split

dc158dc

Add group_validation_split

f2637ba

Add group_validation_split() to docs

1766b2a

mikemahoney218 marked this pull request as ready for review June 28, 2022 18:54

mikemahoney218 requested review from juliasilge and hfrick June 28, 2022 18:54

This was referenced Jun 28, 2022

Add group_bootstraps() #316

Merged

Stratification in grouped resampling #317

Closed

mattwarkentin mentioned this pull request Jun 29, 2022

Reversing analysis and assessment splits #284

Closed

mikemahoney218 mentioned this pull request Jun 29, 2022

Grouped resamples are inconsistently of a group_* class #318

Closed

juliasilge approved these changes Jun 29, 2022

View reviewed changes

Update NEWS

0b581c4

juliasilge merged commit 243559d into main Jun 29, 2022

juliasilge deleted the mike/group_all_the_things branch June 29, 2022 18:29

This was referenced Jun 29, 2022

Add new grouped resampling to vctrs compatibility #320

Closed

Make default for v consistent between vfold_cv() and group_vfold_cv() #328

Open

github-actions bot locked and limited conversation to collaborators Jul 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `group_initial_split()` and `group_validation_split()` #315

Add `group_initial_split()` and `group_validation_split()` #315

mikemahoney218 commented Jun 28, 2022

juliasilge commented Jun 29, 2022 •

edited

Loading

mikemahoney218 commented Jun 29, 2022

juliasilge commented Jun 29, 2022

juliasilge left a comment

github-actions bot commented Jul 14, 2022

Add group_initial_split() and group_validation_split() #315

Add group_initial_split() and group_validation_split() #315

Conversation

mikemahoney218 commented Jun 28, 2022

juliasilge commented Jun 29, 2022 • edited Loading

mikemahoney218 commented Jun 29, 2022

juliasilge commented Jun 29, 2022

juliasilge left a comment

Choose a reason for hiding this comment

github-actions bot commented Jul 14, 2022

Add `group_initial_split()` and `group_validation_split()` #315

Add `group_initial_split()` and `group_validation_split()` #315

juliasilge commented Jun 29, 2022 •

edited

Loading