You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am wondering whether you all had previously considered reading in assert/verify statements from a file.
In my explorations of TDD with data analysis, I came across the nifty tdda python package. It appears to have a fair amount of overlap with what is doable in assertr. My sense from their white papers is that tdda can algorithmically create constraints by summarizing columns in a data frame and writing those to a file or some other data structure. Those constraints can be modified to fine tune them and re-used in subsequent assertions when new data is considered.
I think it would be pretty powerful to have a yaml or json file that specifies the assertions for a file that can be loaded by assertr and applied to a data frame within a pipeline. A benefit of having a file-based approach to this would be that the file could also serve as a type of data dictionary that would be more readable than assertr code.
I suspect this might be a fair amount of effort to implement as was curious if it was something that's already on your roadmap or if you would be interested in contributions along these lines.
The text was updated successfully, but these errors were encountered:
That's so funny you mention that.. about a week back I realized I should have to implement something like this for the work I'm doing now.
Not sure I want to have assertions written purely in JSON/yaml, or any other markdown language
But maybe an assertr chain can be expressed in a separate file, and then included/sourced by the analysis script
I'll have to do some thinking about the best way to implement it (I want assertr to do the right thing)
I'd love to hear more about your experience with tdda and anything else you were thinking regarding this isea
I am wondering whether you all had previously considered reading in assert/verify statements from a file.
In my explorations of TDD with data analysis, I came across the nifty
tdda
python package. It appears to have a fair amount of overlap with what is doable inassertr
. My sense from their white papers is thattdda
can algorithmically create constraints by summarizing columns in a data frame and writing those to a file or some other data structure. Those constraints can be modified to fine tune them and re-used in subsequent assertions when new data is considered.I think it would be pretty powerful to have a yaml or json file that specifies the assertions for a file that can be loaded by
assertr
and applied to a data frame within a pipeline. A benefit of having a file-based approach to this would be that the file could also serve as a type of data dictionary that would be more readable thanassertr
code.I suspect this might be a fair amount of effort to implement as was curious if it was something that's already on your roadmap or if you would be interested in contributions along these lines.
The text was updated successfully, but these errors were encountered: