efficient way of storing synthetic test data #216

pawelru · 2022-07-13T10:42:07Z

First of all - the current way is incorrect. chevron::syn_test_data make use of scda package which is not listed in package imports but in suggest instead. This makes installation process incomplete (I am facing this issue right now).

Above could be addressed in one PR only but I feel we need to find a target way of handling test data. Right now it's a function in the package namespace. Assuming we have ~100 tlgs we would have the same number of examples (not to mention tests) which means that this function is executed the same number of times. This is suboptimal as every time we call this function it takes the cached data, manipulate it and creates dm object. It take some time to complete if we execute it many many times. It has an impact on R CMD CHECK timings.

There are few options available (please feel free to edit my post in order to further refine existing options or to add yet another one).

Extend scda package to export dm object as well. This introduces a dependency to dm there but I feel it's not a big deal as this package is intended to be in suggest.
Store dm object of hard-coded scda snapshot here in data/ directory and refresh manually (using a dedicated script in inst/) from time to time. This introduce some manual process overhead on package development.
Keep a separate function in package namespace and introduce a dependency (via import) to scda. This is something that I dislike the most as this is the very first time where our package imports scda.

The text was updated successfully, but these errors were encountered:

pawelru · 2022-07-13T10:44:50Z

@donyunardi @gogonzo @insightsengineering/chevron please have a look at above.

BFalquet · 2022-07-19T15:46:38Z

@pawelru I would be in favor of storing the dm object in chevron. We need stable data if we want to perform some snapshot tests one day. Controlling when the data is modified is therefor not a bad thing.

If it is stored in scda, we would still need to perform some cleaning operations that we are currently performing when cleaning the function.

Will storing these data in the package cause some problem for the open sourcing ?

BFalquet · 2023-03-21T15:10:51Z

Outdated

pawelru mentioned this issue Dec 8, 2022

Decouple scda - reorganise example and test data samples insightsengineering/nestdevs-tasks#1

Closed

18 tasks

BFalquet closed this as completed Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

efficient way of storing synthetic test data #216

efficient way of storing synthetic test data #216

pawelru commented Jul 13, 2022 •

edited

Loading

pawelru commented Jul 13, 2022

BFalquet commented Jul 19, 2022 •

edited

Loading

BFalquet commented Mar 21, 2023

efficient way of storing synthetic test data #216

efficient way of storing synthetic test data #216

Comments

pawelru commented Jul 13, 2022 • edited Loading

pawelru commented Jul 13, 2022

BFalquet commented Jul 19, 2022 • edited Loading

BFalquet commented Mar 21, 2023

pawelru commented Jul 13, 2022 •

edited

Loading

BFalquet commented Jul 19, 2022 •

edited

Loading