Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify discussion of strata #245

Merged
merged 4 commits into from
Jun 8, 2021
Merged

Clarify discussion of strata #245

merged 4 commits into from
Jun 8, 2021

Conversation

juliasilge
Copy link
Member

This PR clarifies how strata works, especially with regards to numeric stratification variables. It uses templating to minimize repetition.

@juliasilge juliasilge requested a review from hfrick June 7, 2021 20:30
Copy link
Member

@hfrick hfrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reduction of repetition is rather nice 😍

@juliasilge juliasilge merged commit 62902cb into master Jun 8, 2021
@juliasilge juliasilge deleted the strata-docs branch June 8, 2021 14:52
@samuelmacedo83
Copy link

Hi @juliasilge,
I was reading the documentation about make_strata and vfold_cv and I'm a lit bit confuse.
In make_strata, the documentation said that numeric variables are divided by percentile, but in vfold_cv the numeric variables are divided by quartile.

So, even if I use, for example, a 10 fold cv, the numeric variable will be divided by quartiles?

@juliasilge
Copy link
Member Author

The function make_strata() uses the stats::quantile() function with a default of breaks = 4, which means quartiles:

pctls <- quantile(x, probs = (0:breaks) / breaks, na.rm = TRUE)

You can create different bins (e.g. use different percentiles of the data) by changing breaks. Maybe we can clarify the make_strata() documentation a bit more.

@samuelmacedo83
Copy link

I think I understood. So the breaks parameter on vfold_cv is related to how to stratify the variable defined on strata parameter.
It makes sense :)
thank you @juliasilge

@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants