Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

efficient way of storing synthetic test data #216

Closed
pawelru opened this issue Jul 13, 2022 · 3 comments
Closed

efficient way of storing synthetic test data #216

pawelru opened this issue Jul 13, 2022 · 3 comments

Comments

@pawelru
Copy link
Contributor

pawelru commented Jul 13, 2022

First of all - the current way is incorrect. chevron::syn_test_data make use of scda package which is not listed in package imports but in suggest instead. This makes installation process incomplete (I am facing this issue right now).

Above could be addressed in one PR only but I feel we need to find a target way of handling test data. Right now it's a function in the package namespace. Assuming we have ~100 tlgs we would have the same number of examples (not to mention tests) which means that this function is executed the same number of times. This is suboptimal as every time we call this function it takes the cached data, manipulate it and creates dm object. It take some time to complete if we execute it many many times. It has an impact on R CMD CHECK timings.

There are few options available (please feel free to edit my post in order to further refine existing options or to add yet another one).

  1. Extend scda package to export dm object as well. This introduces a dependency to dm there but I feel it's not a big deal as this package is intended to be in suggest.
  2. Store dm object of hard-coded scda snapshot here in data/ directory and refresh manually (using a dedicated script in inst/) from time to time. This introduce some manual process overhead on package development.
  3. Keep a separate function in package namespace and introduce a dependency (via import) to scda. This is something that I dislike the most as this is the very first time where our package imports scda.
@pawelru
Copy link
Contributor Author

pawelru commented Jul 13, 2022

@donyunardi @gogonzo @insightsengineering/chevron please have a look at above.

@BFalquet
Copy link
Contributor

BFalquet commented Jul 19, 2022

@pawelru I would be in favor of storing the dm object in chevron. We need stable data if we want to perform some snapshot tests one day. Controlling when the data is modified is therefor not a bad thing.

If it is stored in scda, we would still need to perform some cleaning operations that we are currently performing when cleaning the function.

Will storing these data in the package cause some problem for the open sourcing ?

@BFalquet
Copy link
Contributor

Outdated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants