Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add group_mc_cv() #313

Merged
merged 3 commits into from
Jun 28, 2022
Merged

Add group_mc_cv() #313

merged 3 commits into from
Jun 28, 2022

Conversation

mikemahoney218
Copy link
Member

@mikemahoney218 mikemahoney218 commented Jun 27, 2022

This PR addresses #207 by adding a new function, group_mc_cv(), which can be used to create grouped Monte Carlo cross-validation resamples. Note that the user-facing function doesn't have a balance argument, because it doesn't make sense to balance anything other than the proportion of data assigned to the assessment fold.

This PR also re-organizes code so that grouped functions now live in the same file as their ungrouped variants. I believe we were agreed that would make the most sense going forward, but happy to undo if not.

library(rsample)
data(ames, package = "modeldata")
set.seed(123)
group_mc_cv(ames, group = Neighborhood, times = 5)
#> # Grouped Monte Carlo cross-validation (0.75/0.25) with 5 resamples  
#> # A tibble: 5 × 2
#>   splits             id       
#>   <list>             <chr>    
#> 1 <split [2182/748]> Resample1
#> 2 <split [2196/734]> Resample2
#> 3 <split [1980/950]> Resample3
#> 4 <split [2056/874]> Resample4
#> 5 <split [2178/752]> Resample5

Created on 2022-06-27 by the reprex package (v2.0.1)

@mikemahoney218 mikemahoney218 changed the title Add group_mc_cv Add group_mc_cv() Jun 27, 2022
@mikemahoney218
Copy link
Member Author

I think that if we like how this is implemented, then we can also quickly add group_validation_split(), group_initial_split(), and group_bootstraps() (slightly harder, but only slightly). That's everything mentioned in #207, although I think we also want to implement stratification before considering that "done".

@mikemahoney218 mikemahoney218 marked this pull request as ready for review June 27, 2022 17:35
...) {
rlang::check_dots_used(call = rlang::caller_env())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Member

@juliasilge juliasilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! 🙌 I have just a couple of comments, one of which is about an existing example.

R/vfold.R Outdated Show resolved Hide resolved
R/make_groups.R Show resolved Hide resolved
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants