Add large test data cases to help detect non-deterministic challenges #193

d33bs · 2024-04-18T17:34:51Z

Probably best served by a different PR, but I'm wondering if it is worth adding a test that ensures, in a large enough dataset, that there are no duplicate rows and all expected rows are present.

Originally posted by @gwaybio in #182 (review)

This could take place with a synthetic dataset which is created through duplication of source data rows with minor variations. It could also reference a remote dataset.

Likely useful to add a Pytest marker indicating the nature of the test as "large" to allow development to skip the test by default.

d33bs added the enhancement New feature or request label Apr 18, 2024

d33bs mentioned this issue Apr 18, 2024

Add order to limit and offset queries for deterministic results #182

Merged

13 tasks

d33bs mentioned this issue May 2, 2024

Increase sorting scalability via CytoTable metadata columns #204

Merged

13 tasks

d33bs mentioned this issue Jun 6, 2024

Error on processing moto-based files within scheduled, un-locked dependency tests #198

Closed

d33bs mentioned this issue Jun 25, 2024

Use real AWS S3 data tests and apply related fixes #212

Merged

13 tasks

d33bs closed this as completed in #212 Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add large test data cases to help detect non-deterministic challenges #193

Add large test data cases to help detect non-deterministic challenges #193

d33bs commented Apr 18, 2024

Add large test data cases to help detect non-deterministic challenges #193

Add large test data cases to help detect non-deterministic challenges #193

Comments

d33bs commented Apr 18, 2024