Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use float32 for images, and validate #1329

Merged
merged 10 commits into from
May 18, 2020

Conversation

kosack
Copy link
Contributor

@kosack kosack commented May 14, 2020

  • Changes Containers for images to specify
    they should be float32
  • call validate() when making schema in HDF5TableWriter

So far doesn't change the computation of the images. Need to ensure ImageExtractor creates float32s.

- Changes Containers for images to specify
they should be float32
- call validate() when making schema in HDF5TableWriter
@kosack kosack requested a review from watsonjj as a code owner May 15, 2020 11:38
@maxnoe
Copy link
Member

maxnoe commented May 15, 2020

@kosack I added a test testing for all containers defined in ctapipe/containers.py if the default can be written using the hdf5 writer, this turned up some more missing units / bad defaults.

I also changed three time ranges to two fields min / max to avoid needless non-scalar fields.

vuillaut
vuillaut previously approved these changes May 15, 2020
@vuillaut vuillaut self-requested a review May 15, 2020 11:53
@codecov
Copy link

codecov bot commented May 15, 2020

Codecov Report

Merging #1329 into master will increase coverage by 0.13%.
The diff coverage is 98.90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1329      +/-   ##
==========================================
+ Coverage   90.92%   91.05%   +0.13%     
==========================================
  Files         179      179              
  Lines       12164    12192      +28     
==========================================
+ Hits        11060    11102      +42     
+ Misses       1104     1090      -14     
Impacted Files Coverage Δ
ctapipe/calib/camera/flatfield.py 95.04% <ø> (ø)
ctapipe/core/container.py 88.35% <ø> (+8.90%) ⬆️
ctapipe/image/extractor.py 86.86% <ø> (ø)
ctapipe/io/tests/test_hdf5.py 97.79% <94.73%> (-0.20%) ⬇️
ctapipe/calib/camera/pedestals.py 94.94% <100.00%> (ø)
ctapipe/calib/camera/tests/test_calibrator.py 100.00% <100.00%> (ø)
ctapipe/containers.py 100.00% <100.00%> (ø)
ctapipe/core/tests/test_traits.py 100.00% <100.00%> (ø)
ctapipe/image/muon/intensity_fitter.py 93.00% <100.00%> (ø)
ctapipe/image/muon/ring_fitter.py 100.00% <100.00%> (ø)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c9eb246...5588fe4. Read the comment docs.

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

Thanks, this looks good. I'll just update the ImageExtractor then to produce the right data type

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

I think to fix that it's just to change extract_around_peak, which currently returns float64s to return float32s instead. @watsonjj is that enough?

An alternative would be to down-sample it afterward, but that is slower and probably not a good idea. If we rely on any operations being on 64-bit versions, then the answer would change when we re-read DL1 data stored in 32-bits and re-compute things, which is not a good design.

@kosack kosack requested a review from HealthyPear as a code owner May 15, 2020 13:07
@HealthyPear
Copy link
Member

I think to fix that it's just to change extract_around_peak, which currently returns float64s to return float32s instead. @watsonjj is that enough?

An alternative would be to down-sample it afterward, but that is slower and probably not a good idea. If we rely on any operations being on 64-bit versions, then the answer would change when we re-read DL1 data stored in 32-bits and re-compute things, which is not a good design.

In TwoPassWindowSum I had to force the conversion (line 903), is this is still needed? I was getting an error if I didn't convert to 32

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

In TwoPassWindowSum I had to force the conversion (line 903), is this is still needed? I was getting an error if I didn't convert to 32

Do you recall what error you were getting? I don't see anything that would have forbidden 64-bits, in ctapipe at least (if it was in prototype, you have your own IO system, and may have defined it as 32 bits)

@maxnoe
Copy link
Member

maxnoe commented May 15, 2020

@kosack I had to require an rtol of only 1e-4 to make that test pass with float32.

I guess 1e-4 on waveforms is still plenty, but are you sure we don't want to use float64 internally?

@HealthyPear
Copy link
Member

In TwoPassWindowSum I had to force the conversion (line 903), is this is still needed? I was getting an error if I didn't convert to 32

Do you recall what error you were getting? I don't see anything that would have forbidden 64-bits, in ctapipe at least (if it was in prototype, you have your own IO system, and may have defined it as 32 bits)

Ah yes, it is possible that was a ctapipe 0.7 issue....probably that conversion is not needed anymore.

As soon as we merge this, I will try take it out to see if nothing happens (as I expect)

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

I guess 1e-4 on waveforms is still plenty, but are you sure we don't want to use float64 internally?

Perhaps, or else we need to think about changing operations to minimize increasing floating point error, which normally you can ignore in 64 bits, but not in 32.

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

The worry I have is that if we store a 64-bit version in the Container (and down-sample in the writer), then if we run the analysis without writing a DL1 file, we would have all operations in 64bits, but if we write and read, it woudl change to 32 bits, and thus would give different results. So no matter what, we should write 32-bits to the container in memory, so further algorithms don't expect a higher bit depth

@maxnoe
Copy link
Member

maxnoe commented May 15, 2020

I'll try to improve the numerical stability of the extractor.

@maxnoe
Copy link
Member

maxnoe commented May 15, 2020

When using numpy.sum in extract_around_peak, the relative error is one order of magnitude less (numpy.sum does uses a middle ground implementation between floating point accuracy and speed instead of a plain loop sum).

This increases runtime from 430 µs to 450 µs in the test I did. This is acceptable I think, right?

Should I push this?

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

Should I push this?

Sounds good, I say go ahead. Might want to add a comment why it is used, in case somebody changes it back later.

@watsonjj
Copy link
Contributor

watsonjj commented May 15, 2020

When using numpy.sum in extract_around_peak, the relative error is one order of magnitude less (numpy.sum does uses a middle ground implementation between floating point accuracy and speed instead of a plain loop sum).

This increases runtime from 430 µs to 450 µs in the test I did. This is acceptable I think, right?

Should I push this?

I'm confused, what does using numpy.sum solve? I specifically didn't use it so I have more control over the sum, i.e. correctly sum samples when the window is at the edge of the waveform, and also to not include negative values in the pulse_time calculation

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

It's just that now that we've switched to 32-bit images, we have to worry more about numerical rounding error, so the way things are summed is important.

Though it's not clear which algorithm the Numba version of np.sum uses, but I guess it's more stable based on @maxnoe's test. A naive sum (loop over and accumulate) is usually not the ideal way to do it. Though what we could do is leave extract_around_peak() in 64 bit to avoid this, and just always call it with

image, peak = extract_around_peak()
image = image.astype("float32")
peak = peak.astype("float32")

but then , each extractor must do that and there is some speed overhead

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

you can see now, in 32bit, the tests fail, since the difference after error is around 1e-3, and the tolerance is 1e-7:

>       np.testing.assert_allclose(event.dl1.tel[telid].image, y.sum(1))
E       AssertionError:
E       Not equal to tolerance rtol=1e-07, atol=0
E
E       Mismatched elements: 1613 / 2048 (78.8%)
E       Max absolute difference: 0.0012207
E       Max relative difference: 6.832399e-05
E        x: array([402.1624 , 694.62103, 981.598  , ..., 525.88116, 796.07776,
E              855.5047 ], dtype=float32)
E        y: array([402.16254, 694.6211 , 981.59845, ..., 525.881  , 796.07776,
E              855.50446], dtype=float32)

Or we just keep it with rtol=1e-4. as already implemented. It's up to yoy! @watsonjj you can decide, since you have to review this anyhow ;)

@watsonjj
Copy link
Contributor

I see. I'm surprised at the difference!

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

I see. I'm surprised at the difference!

It was so much more difficult back in the days of 32-bit (or even 16-bit!) computers and low-ram! using doubles used to have a large speed and memory impact, so we had to use 32-bit floats for everything, and always think about numerical stability.

Anyhow, I think 1e-4 PE is still pretty reasonable, so maybe we don't do anything for now.

@maxnoe
Copy link
Member

maxnoe commented May 15, 2020

@watsonjj numpy sum is only used for the image sum, not the weighted average.

The numpy sum uses an algorithm optimized as a tradeoff between speed and minimizing floating point error, it reduced rounding errors.

The other extreme would be to use math.fsum which keeps tracke of rounding errors exactly to produce the sum without any rounding errors.

@kosack
Copy link
Contributor Author

kosack commented May 15, 2020

see e.g. https://en.wikipedia.org/wiki/Kahan_summation_algorithm or https://code.activestate.com/recipes/393090/

I guess fsum is slow, and we don't really need that level of precision, so either stick with low-precision (assuming we don't add too many more sample), or use np.sum, which I assume uses something like the kahan algorithm [edit: see below]

From the numpy manual:

For floating point numbers the numerical precision of sum (and np.add.reduce) is in general limited by directly adding each number individually to the result causing rounding errors in every step. However, often numpy will use a numerically better approach (partial pairwise summation) leading to improved precision in many use-cases. This improved precision is always provided when no axis is given. When axis is given, it will depend on which axis is summed. Technically, to provide the best speed possible, the improved precision is only used when the summation is along the fast axis in memory. Note that the exact precision may vary depending on other parameters. In contrast to NumPy, Python’s math.fsum function uses a slower but more precise approach to summation. Especially when summing a large number of lower precision floating point numbers, such as float32, numerical errors can become significant. In such cases it can be advisable to use dtype=”float64” to use a higher precision for the output.

@maxnoe
Copy link
Member

maxnoe commented May 15, 2020

So what should we do here?

I would say this is the last blocker for 0.8. Should we revert the numpy sum and increase the rtol or leave it like it is now?

@kosack @watsonjj

Would be nice to get it out today, pending the review of #1163

@watsonjj
Copy link
Contributor

I think an rtol of 1e-4 is still fine for our purpose. There was no particular reason I previously used 1e-7

@maxnoe
Copy link
Member

maxnoe commented May 16, 2020

We could also use numba.double to make the accumulator a double bevor assigning to the 32bit array

@maxnoe
Copy link
Member

maxnoe commented May 18, 2020

@watsonjj why does the loop go over all samples and then do branching to check if 0 < i < n_samples?

I replaced this with min / max before the loop and it's now twice as fast.

Copy link
Contributor

@watsonjj watsonjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, works for me

@kosack kosack merged commit 10b058f into cta-observatory:master May 18, 2020
@kosack kosack deleted the fix/large_image_sizes branch October 2, 2020 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants