Flexible snapshot number for data shuffling #570
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR allows the user to specify an arbitrary number of snapshots during the data shuffling procedure, even if the snapshot number isn't an exact divisor of the number of grid points in each snapshot.
In the non-exact-divisor case, a very small number of data points is "left out". To give an idea what very small means, consider a typical grid of 200x200x200 points, and a user requesting up to 10,000 snapshots. From the graph below, we can see that no more than ~0.1% of the initial data is left out in any of these choices.
This PR is a fix for issue #509.
Right now, I've implemented this only for the numpy case, so it's not possible with openPMD. Frankly I've no idea how to do it for the latter. I don't know how many users are currently using openPMD AND want to use an arbitrary snapshot number, if the answer is ~0 then we can just keep this as a todo for the future.
As mentioned in issue #564, I think the whole shuffling procedure could be simplified. But this PR ignores that possibility and integrates the new functionality into the code without making other changes.