Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add group_bootstraps() #316

Merged
merged 10 commits into from
Jun 30, 2022
Merged

Add group_bootstraps() #316

merged 10 commits into from
Jun 30, 2022

Conversation

mikemahoney218
Copy link
Member

This PR is based off of #315 .

This PR fixes #207 (and we're going to open a new issue for stratification with grouping, to get input from a wider audience).

This PR adds group_bootstraps(), patterned off group_mc_cv(), in order to provide a way to bootstrap with defined groups. In order to do that, it adds an argument replace to balance_prop(), which is TRUE for group_bootstraps() and FALSE for group_mc_cv().

library(rsample)

set.seed(2)
dat1 <- tibble::tibble(a = 1:20, b = letters[1:20], c = rep(1:4, 5))

group_bootstraps(dat1, c, times = 2)$splits[[1]] |> 
  analysis()
#> # A tibble: 20 × 3
#>        a b         c
#>    <int> <chr> <int>
#>  1     1 a         1
#>  2     1 a         1
#>  3     2 b         2
#>  4     4 d         4
#>  5     5 e         1
#>  6     5 e         1
#>  7     6 f         2
#>  8     8 h         4
#>  9     9 i         1
#> 10     9 i         1
#> 11    10 j         2
#> 12    12 l         4
#> 13    13 m         1
#> 14    13 m         1
#> 15    14 n         2
#> 16    16 p         4
#> 17    17 q         1
#> 18    17 q         1
#> 19    18 r         2
#> 20    20 t         4

group_bootstraps(dat1, c, times = 2)$splits[[1]] |> 
  analysis()
#> # A tibble: 20 × 3
#>        a b         c
#>    <int> <chr> <int>
#>  1     1 a         1
#>  2     2 b         2
#>  3     2 b         2
#>  4     3 c         3
#>  5     5 e         1
#>  6     6 f         2
#>  7     6 f         2
#>  8     7 g         3
#>  9     9 i         1
#> 10    10 j         2
#> 11    10 j         2
#> 12    11 k         3
#> 13    13 m         1
#> 14    14 n         2
#> 15    14 n         2
#> 16    15 o         3
#> 17    17 q         1
#> 18    18 r         2
#> 19    18 r         2
#> 20    19 s         3

group_bootstraps(dat1, c, times = 2)$splits[[1]] |> 
  assessment()
#> # A tibble: 5 × 3
#>       a b         c
#>   <int> <chr> <int>
#> 1     2 b         2
#> 2     6 f         2
#> 3    10 j         2
#> 4    14 n         2
#> 5    18 r         2

group_bootstraps(dat1, c, times = 2)$splits[[1]] |> 
  assessment()
#> # A tibble: 5 × 3
#>       a b         c
#>   <int> <chr> <int>
#> 1     1 a         1
#> 2     5 e         1
#> 3     9 i         1
#> 4    13 m         1
#> 5    17 q         1

Created on 2022-06-28 by the reprex package (v2.0.1)

@mikemahoney218 mikemahoney218 marked this pull request as ready for review June 28, 2022 19:23
@mikemahoney218
Copy link
Member Author

E> Cannot initiate the connection to ppa.launchpad.net:80

Looks like ubuntu checks are failing for network reasons

Merge branch 'main' into mike/group_one_specific_thing

# Conflicts:
#	NAMESPACE
#	R/compat-vctrs-helpers.R
Copy link
Member

@juliasilge juliasilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice! ⭐

@juliasilge juliasilge merged commit c257ec5 into main Jun 30, 2022
@juliasilge juliasilge deleted the mike/group_one_specific_thing branch June 30, 2022 00:16
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

more group-based splitting methods
2 participants