Add diff API #293

peytondmurray · 2023-11-20T01:33:17Z

Changes

This PR adds an API for calculating the diff between two versions of a dataset; closes #264.

Tests are provided which check

The diff between a version and itself is nothing
The diff between two versions of a dataset gives the expected slices and data, including for nested datasets

An intersphinx mapping was added for numpy as well in order to handle docstrings for this new function.

Testing

I wrote a small benchmark to check how fast this implementation was. In the benchmark, I compared calling get_diff to iterating over two versions of a dataset and comparing the chunks using np.assert_equal. For very small datasets, direct comparison with numpy is fastest. As datasets get larger, directly iterating over the data becomes impractical:

@rahasurana Can you take a look at this?

Add diff API

51a5df6

peytondmurray marked this pull request as ready for review November 21, 2023 19:00

peytondmurray requested a review from ArvidJB November 21, 2023 19:00

peytondmurray self-assigned this Nov 30, 2023

peytondmurray merged commit 5c3b6d5 into deshaw:master Dec 13, 2023
7 checks passed

peytondmurray deleted the add-diff-api branch December 13, 2023 00:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add diff API #293

Add diff API #293

peytondmurray commented Nov 20, 2023 •

edited

Loading

Add diff API #293

Add diff API #293

Conversation

peytondmurray commented Nov 20, 2023 • edited Loading

Changes

Testing

peytondmurray commented Nov 20, 2023 •

edited

Loading