Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse data in tidymodels testing #229

Open
EmilHvitfeldt opened this issue Nov 15, 2024 · 0 comments
Open

sparse data in tidymodels testing #229

EmilHvitfeldt opened this issue Nov 15, 2024 · 0 comments

Comments

@EmilHvitfeldt
Copy link
Member

because we aren’t perfect, we need an argument in control_workflow() to overwrite options.

all of this shouldn’t matter whether the tibble contains sparse vectors or not. as we will go off the sparsity.

ID recipe produce sparsity sparsity model support control arg
1 yes high yes auto
2 yes high yes dense
3 yes high yes sparse
4 yes high no auto
5 yes high no dense
6 yes high no sparse
7 yes low yes auto
8 yes low yes dense
9 yes low yes sparse
10 yes low no auto
11 yes low no dense
12 yes low no sparse
13 no high yes auto
14 no high yes dense
15 no high yes sparse
16 no high no auto
17 no high no dense
18 no high no sparse
19 no low yes auto
20 no low yes dense
21 no low yes sparse
22 no low no auto
23 no low no dense
24 no low no sparse
  • recipe produce sparse means that it contains a recipe step with sparse argument.

  • sparsity means that there is a lot of sparsity in the data.

  • model support the parsnip model supports sparse data, e.i. allow_sparse_x = TRUE .

  • control arg is what is specified in control_workflow().

I think all the above combinations should be tested. In general:

if control is set to "sparse", then recipe should be updates to set sparse = "yes" in steps that has sparse = "auto"and data should be converted to dcgmatrix before being passed to engine model.

if control is set to "dense", then recipe should be updates to set sparse = "no" in steps that has sparse = "auto" and data should not be converted to dcgmatrix before being passed to engine model.

What should happen if control arg is "auto" are listed below.

  • if the model doesn’t support sparsity, then don’t give it sparse data, and stop recipes from creating sparsity, regardless of how sparse the data is
  • if sparsity is high and the model supports it, give it sparse data
  • if sparsity is low and the model supports sparse data, don’t give it sparse data, and make sure that the recipe doesn’t produce sparse data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant