Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-integer inputs in DataSplitGenerator #296

Open
atc3 opened this issue Sep 25, 2024 · 0 comments
Open

Non-integer inputs in DataSplitGenerator #296

atc3 opened this issue Sep 25, 2024 · 0 comments

Comments

@atc3
Copy link
Contributor

atc3 commented Sep 25, 2024

Describe the bug

I have an SBEM dataset collected at a voxel size of 9.8 x 9.8 x 25 nm. When I create the datasplit with:

input_resolution = Coordinate(25, 9.8, 9.8) # Z, Y, X
output_resolution = Coordinate(25, 9.8, 9.8)
datasplit_config = DataSplitGenerator.generate_from_csv(
    "/path/to/datasplit.csv",
    input_resolution,
    output_resolution,
).compute()

The Y and X resolution gets rounded down from 9.8 to 9 as input_resolution and output_resolution gets cast to integers both in the constructor of DataSplitGenerator (it would also be cast to int in the downstream funlib Coordinate class).

print(datasplit.train[0].raw.voxel_size)
> (25, 9, 9)

This rounding down is immediately leading to a desync of ROIs, visible in the neuroglancer preview from the datasplit.

To fix this for now, I've just lied to datasplit and told it that the resolution is 25, 10, 10 instead of 25, 9.8, 9.8, and also altered by raw data to reflect that resolution change. I'm not sure what consequences lying about the resolution will have but everything else is working now.

Is there a reason resolution has to be cast to an integer? Or -- if lying about resolution is ok, then there should be an error or warning in DataSplitGenerator for the user to ensure that their data resolution is in integers.

To Reproduce

Take any dataset with non-integer resolution and feed it into DataSplitGenerator, then preview in neuroglancer. I can provide this dataset if requested

Expected behavior

Non-integer resolutions should be supported - or - the datasplit generator function should throw and error and force the user to modify data to have integer resolutions

Versions

  • OS: Ubuntu 22.04 x64
  • Python: 3.10.14
  • Version: dacapo: 3430d6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant