Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make value type generic and independent from coordinate type #14

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

netthier
Copy link
Contributor

@netthier netthier commented Apr 15, 2024

This PR adds a new trait called ContourValue (though I'm not too happy with that name and would love better suggestions), which is defined as

pub trait ContourValue: PartialOrd + Copy + Num + NumCast{}
impl<T> ContourValue for T where T: PartialOrd + Copy + Num + NumCast {}

All method signatures that took some type of values before are now rewritten to something such as

pub fn contours<V: ContourValue>(&self, values: &[V], thresholds: &[V]) -> Result<Vec<Contour<V>>>

This allows for substantial memory and performance improvements in cases where the additional precision is not needed.
The PR is currently missing tests however, and it includes the changes from #13, meaning that should be merged first.

src/contourbuilder.rs Outdated Show resolved Hide resolved
@netthier
Copy link
Contributor Author

The geojson feature is also currently not working, because V needs to be converted to a serde_json::Value in those functions. Adding a Serialize bound shouldn't be an issue, but would make those functions fallible, adding another breaking change 🤔

@mthh
Copy link
Owner

mthh commented Apr 19, 2024

Thanks for your PR and sorry it took me a while to reply.

Indeed, making value type generic and independent from coordinate type is a great thing 👍
(and it was probably a debatable design choice to have linked the numerical type of coordinates and the numerical type of values in the input grid as we saw in issue #12).

Adding a Serialize bound shouldn't be an issue, but would make those functions fallible, adding another breaking change

If this is the only way to keep the geojson feature working, we might as well take advantage of the breaking changes this PR is already making to make this one too, what do you thing ?

a new trait called ContourValue (though I'm not too happy with that name and would love better suggestions)

I don't have any particular problems with ContourValue. From a user point of view, we could also talk about GridValue (since it designates the type of input values) or simply Value (since this library has a single, fairly clear purpose, that of making contours from a grid of values, there's not necessarily too much confusion possible).

@mthh
Copy link
Owner

mthh commented Apr 25, 2024

Hi @netthier!
Is there anything I can do to help finalise this PR?
Would you like me to continue the work you have started?

@netthier
Copy link
Contributor Author

netthier commented Apr 25, 2024

If this is the only way to keep the geojson feature working, we might as well take advantage of the breaking changes this PR is already making to make this one too, what do you thing ?

I'll do so then. Who knows what kind of numbers people might put in there, maybe some can indeed fail serialization.

I don't have any particular problems with ContourValue. From a user point of view, we could also talk about GridValue (since it designates the type of input values) or simply Value (since this library has a single, fairly clear purpose, that of making contours from a grid of values, there's not necessarily too much confusion possible).

I prefer GridValue over ContourValue. Value feels a bit too generic tbh.

I'll try to finalize this PR tommorow, but I'm still unsure about the part of the code I commented on in smooth_linear. I believe that the cast should be infallible in normal usage, as NumCast wlll just return +Inf or -Inf if the input is outside the float's range. But it is conceivable that someone could implement NumCast for their own number type and have it fail for whatever reason.
Maybe it makes sense to create a new error kind and have smooth_linear return a Result? Most other methods seem to already do so, so it wouldn't be that extreme of a change.

@mthh
Copy link
Owner

mthh commented Apr 25, 2024

Who knows what kind of numbers people might put in there, maybe some can indeed fail serialization.

I don't really know either but let's say "better safe than sorry".

I prefer GridValue over ContourValue. Value feels a bit too generic tbh.

Let's go with GridValue 👍

Maybe it makes sense to create a new error kind and have smooth_linear return a Result? Most other methods seem to already do so, so it wouldn't be that extreme of a change.

This seems like a good idea to me, with a new error type to be explicit about the reason for the error if it occurs (and since the smooth_linear caller already returns a Result it won't change the API for the user).

@netthier netthier force-pushed the generic-num branch 3 times, most recently from 0a062c2 to f645f2c Compare April 26, 2024 14:43
@netthier
Copy link
Contributor Author

netthier commented Apr 26, 2024

This could use some tests tbh, but I'm not sure if I have the time and knowledge required to write good ones, let me know how you want to proceed there 🤔
I also found out that you cannot edit this PR because I made it from an organization repo, while edits by maintainers are apparently only possible on personal repos. I'll have to find out how to do it better in the future.

@netthier netthier force-pushed the generic-num branch 6 times, most recently from 744fee0 to 46483ea Compare April 26, 2024 15:00
@netthier
Copy link
Contributor Author

netthier commented Apr 26, 2024

I'm also unsure how my changes to ContourBuilder::smooth_linear affect performance. I added more casts than previously, as I noticed that numbers without negative values might panic at the value - v0 and v1 - v0 parts.

@mthh
Copy link
Owner

mthh commented Apr 26, 2024

Thanks!

This could use some tests tbh, but I'm not sure if I have the time and knowledge required to write good ones

I don't really have an idea of how to make meaningful tests for all this yet, but I'm thinking about it a bit.

I'm also unsure how my changes to ContourBuilder::smooth_linear affect performance. I added more casts than previously, as I noticed that numbers without negative values might panic at the value - v0 and v1 - v0 parts.

I'm going to set up some benchmarks over the weekend to try to measure the difference in performance between 0.13.0 and your PR, and review your PR in the process.

Copy link
Owner

@mthh mthh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes!

I did some performance testing: extending the lib's current benchmarks to test with larger datasets, in f64 and f32; and on real use cases.

There does seem to be a slight slowdown (sometimes around 5% extra time) but the variability of timings between runs (both on benchmarks with test::Bencher and used elsewhere in an application) don't seem to allow me to conclude that this slowdown is a problem.

Have you had a chance to test performance on real data sets on your side?

I'm rather inclined to merge it; I think the flexibility that PR provides with regard to the type of input data is a good thing 👍

src/band.rs Outdated Show resolved Hide resolved
src/contour.rs Outdated Show resolved Hide resolved
src/line.rs Outdated Show resolved Hide resolved
src/contourbuilder.rs Outdated Show resolved Hide resolved
@netthier netthier force-pushed the generic-num branch 4 times, most recently from b479461 to aa21a5f Compare April 29, 2024 08:33
@netthier
Copy link
Contributor Author

netthier commented Apr 29, 2024

I guess a slowdown would be expected with a lot of more handling of results now. I'm also seeing differences when running cargo bench locally, with the worst difference I've seen being 10%.
The use case I implemented this for was one that didn't require floating point precision, and in those cases performance is far better than with floats, meaning the overhead added is irrelevant.
One thing I noticed is that if smoothing is disabled, a lot of functions are infallible and returning results for no reason.
It should be possible to optimize this more 🤔
And a cast to a float failing seems like such an esoteric thing (I dont think any std types fail this) that adding all of that overhead just for that seems bad. If users upgrade and want to keep using floats, they'd get a slowdown in exchange for having error handling on a bunch of f32->f32 conversions D:
I wonder if using some infallible conversion trait instead of NumCast or straight up panicking would be better.

EDIT: Unwrapping and not propagating results at all isn't much faster, I wonder where all that overhead is coming from

@netthier
Copy link
Contributor Author

I'll have to profile this, I'm a bit surprised that even the no_smoothing benchmark is slower when I remove all the result propagation 🤔

@mthh
Copy link
Owner

mthh commented Apr 29, 2024

if users upgrade and want to keep using floats, they'd get a slowdown in exchange for having error handling on a bunch of f32->f32 conversions
I wonder if using some infallible conversion trait instead of NumCast or straight up panicking would be better.

Unwrapping and not propagating results at all isn't much faster, I wonder where all that overhead is coming from

Indeed, there's no rush to merge this, let's investigate a little bit the overhead it adds first.

I'll have to profile this, I'm a bit surprised that even the no_smoothing benchmark is slower when I remove all the result propagation 🤔

Great if you get the chance to do this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants