Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of LitPop #720

Merged
merged 10 commits into from
May 24, 2023
Merged

Improve performance of LitPop #720

merged 10 commits into from
May 24, 2023

Conversation

peanutfun
Copy link
Member

@peanutfun peanutfun commented May 12, 2023

Changes proposed in this PR:

  • Speed up country-code detection in LitPop by assuming the data is gridded (it is)
  • Streamline gridpoints_core_calc to use numpy array tools and broadcasting, and avoid copying arrays
  • Add option to pass region ID to LitPop.from_shape. This avoids country-code detection.

PR Author Checklist

PR Reviewer Checklist

* Assume data from LitPop is gridded when determining region IDs.
* Add option to pass 'region_id' parameter to LitPop.from_shape.
* Avoid copying data by using np.asarray.
* Streamline computations by relying on numpy broadcasting.
@peanutfun peanutfun marked this pull request as ready for review May 12, 2023 11:22
@peanutfun peanutfun requested a review from chahank May 12, 2023 12:16
@chahank
Copy link
Member

chahank commented May 15, 2023

Looks all very reasonable! Any estimates of the gained computation time? Also, changelog needs to be updated. Maybe also the documentation to point people towards how to use custom shapes correctly.

@peanutfun
Copy link
Member Author

Any estimates of the gained computation time?

I was afraid you would ask that 😬 I can come up with some benchmarks tomorrow. The most important improvement is from using get_country_code(gridded=True). The actual LitPop computations are very fast, and avoiding copies does not actually help much with that. In my use case, LitPop was occupied for some 50s, 90% of which were spent with get_country_code. The other large bottleneck I found was geopandas.points_from_xy, but that resolved itself after upgrading from shapely v1.8 to v2.0.

So, tldr: from_shape can be by a factor 10 faster with this PR. But I'll run a benchmark just to be sure.

@peanutfun
Copy link
Member Author

Maybe also the documentation to point people towards how to use custom shapes correctly.

Not sure what you mean by that. How are people using it "incorrectly" in your perspective?

@chahank
Copy link
Member

chahank commented May 15, 2023

Not sure what you mean by that. How are people using it "incorrectly" in your perspective?

Sorry for being unclear. I was referring to the original issue which was about how to use arbitrary shapes with LitPop. This functionality was clearly not obvious, so a documentation entry could help.

@peanutfun
Copy link
Member Author

Ah, sorry, you were referring to #715. I'll see what I can do.

@peanutfun
Copy link
Member Author

So here are some benchmark results.

LitPop.from_countries

(basically no change)

on develop
Screenshot 2023-05-16 at 12 30 12

on this branch
Screenshot 2023-05-16 at 12 27 38

LitPop.from_shape

(time spent in get_country_code dropped from 48% to 13 %)

on develop
Screenshot 2023-05-16 at 12 29 52

on this branch

Screenshot 2023-05-16 at 12 27 07

* Rename return value in local function.
* Remove named argument call in `LitPop.from_shape`.
* Improve `region_id` parameter documentation.
@peanutfun
Copy link
Member Author

Sorry for being unclear. I was referring to the original issue which was about how to use arbitrary shapes with LitPop. This functionality was clearly not obvious, so a documentation entry could help.

I moved the Issue to Discussions/Q&A and marked my workaround as answer, see #723

@chahank
Copy link
Member

chahank commented May 16, 2023

So here are some benchmark results.

Small improvements, but certainly useful! In particular since from_countries is likely not to be used too often as country data is available on the API, but the from_shape is more important for tailored applications.

@@ -992,47 +1017,49 @@ def gridpoints_core_calc(data_arrays, offsets=None, exponents=None,

Returns
-------
result_array : np.array of same shape as arrays in data_arrays
result : np.array of same shape as arrays in data_arrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nitpicking, but result is a terrible variable name ^^.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but result_array was even worse. Do you have something better? The name can also be omitted here:

Suggested change
result : np.array of same shape as arrays in data_arrays
np.array of same shape as arrays in data_arrays

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the variable name still would be there. Maybe convoluted_array since this is essentially doing a convolution? Or rescaled_product ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about leaving out the result variable and storing everything in the data variable? I think that the new variable definition actually is not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fine. Although data is not a better name than result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'll do that. My goal is not to refactor this function 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 620d346

@peanutfun
Copy link
Member Author

Small improvements, but certainly useful!

Yes, I think I mixed something up. The major improvement was upgrading to shapely v2.0 on my side.

@peanutfun
Copy link
Member Author

@chahank Ready to merge from my side!

@chahank chahank merged commit f95b267 into develop May 24, 2023
@emanuel-schmid emanuel-schmid deleted the faster-litpop branch May 25, 2023 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants