feat: Extrapolate_flat parameter in interpolate_by #18355

agossard · 2024-08-24T23:48:12Z

Adds "extrapolate_flat" as a parameter to interpolate_by. If True, we will extrapolate values outside the range of the min/max of by column with the associated values of the min/max by positions in the expression column. This seeks to match the default behavior of numpy.interp.

This parameter defaults to False, in order to preserve existing behavior. In a vacuum, I would consider having this parameter default to True.

codecov · 2024-08-25T00:22:51Z

Codecov Report

Attention: Patch coverage is 97.66082% with 4 lines in your changes missing coverage. Please review.

Project coverage is 79.83%. Comparing base (dd1fc86) to head (3a89261).

Files	Patch %	Lines
crates/polars-plan/src/dsl/function_expr/mod.rs	50.00%	2 Missing ⚠️
.../polars-python/src/lazyframe/visitor/expr_nodes.rs	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #18355      +/-   ##
==========================================
- Coverage   79.84%   79.83%   -0.01%     
==========================================
  Files        1496     1496              
  Lines      200333   200456     +123     
  Branches     2841     2841              
==========================================
+ Hits       159952   160044      +92     
- Misses      39856    39887      +31     
  Partials      525      525

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

MarcoGorelli · 2024-08-25T08:36:43Z

thanks @agossard ! will take a look

MarcoGorelli

Really nice, thanks @agossard !

I've just left some minor comments

Is the name extrapolate_flat taken from anywhere?

MarcoGorelli · 2024-08-28T12:11:27Z

py-polars/tests/unit/operations/test_interpolate_by.py

-    assert_frame_equal(result, expected)
+    # assert_frame_equal(result, expected)


why is this commented out?

Probably by mistake. Good catch. I'll do the other clean ups (though what do you mean by missing full stop?)

Thanks. And let me know what you want to do with the name of the parameter.

MarcoGorelli · 2024-08-28T12:12:09Z

py-polars/polars/series/series.py

+        extrapolate_flat
+            If True, extrapolate the highest and lowest values of the expression in
+            the regions below and above the highest/lowest by values


could we include an example which uses extrapolate_flat? users tend to learn better from examples than from descriptions

MarcoGorelli · 2024-08-28T12:12:35Z

py-polars/polars/series/series.py

+        extrapolate_flat
+            If True, extrapolate the highest and lowest values of the expression in
+            the regions below and above the highest/lowest by values


missing full stop

MarcoGorelli · 2024-08-28T12:12:40Z

py-polars/polars/expr/expr.py

+        extrapolate_flat
+            If True, extrapolate the highest and lowest values of the expression in
+            the regions below and above the highest/lowest by values


MarcoGorelli · 2024-08-28T12:13:17Z

crates/polars-plan/src/dsl/function_expr/mod.rs

+                map_as_slice!(dispatch::interpolate_by, extrapolate_flat)
+                //map_as_slice!(dispatch::interpolate_by, extrapolate_flat)


commented out

agossard · 2024-08-28T20:16:57Z

Really nice, thanks @agossard !

I've just left some minor comments

Is the name extrapolate_flat taken from anywhere?

I'm not at all wedded to it. And no, it didn't come from anywhere. I feel like "extrapolate" could be fine and maybe better. I just added the "flat" to try to be extra pedantic about how it works... i.e. it's not like it's going to extrapolate base on the relationship between the two series or anything... or the slope of the last two points or something... it's just going to be flat (in the y-axis)

MarcoGorelli · 2024-08-28T21:38:14Z

ok thanks

going to think about this more and look more what other tools do (because a counterargument to adding this could be: just do .interpolate_by(...).forward_fill().backward_fill())

agossard · 2024-08-29T01:16:58Z

ok thanks

going to think about this more and look more what other tools do (because a counterargument to adding this could be: just do .interpolate_by(...).forward_fill().backward_fill())

Doesn’t that rely on the by column being sorted?

The PR behavior (with True) is the default behavior of np.interp. https://numpy.org/doc/stable/reference/generated/numpy.interp.html

They allow you to specify the values to use above and below the range, but default is to use the max and min.

agossard · 2024-08-29T01:33:15Z

Ok, did a quick survey of a few more libraries. Some of them (Matlab, scipy) provide all sorts of fancy fitting functionality (cubic splines, etc, etc). If you’re doing that, then it is possible to “extrapolate” with this actual functional form. These libraries call that operation “extrapolate” and refer to the thing I (and numpy is doing) with something more like “fill value.” Based on the fact that we are implanting a simple linear interpolation process, and not offering curve fitting, I continue to think that providing this option and matching the numpy functionality makes sense. However, I don’t necessarily think the parameter should involve the word “extrapolate” anymore.

MarcoGorelli · 2024-08-29T07:29:02Z

Doesn’t that rely on the by column being sorted?

ah you're right thanks, I'd forgotten that interpolate_by already doesn't assume sortedness

However, I don’t necessarily think the parameter should involve the word “extrapolate” anymore.

ok let's bikeshed on the name a bit longer then 😄

agossard · 2024-09-01T13:10:46Z

One thing to consider is how strongly do we feel about maintaining the existing behavior, even as an option. If we want to make a breaking change, we can simple match the interface of numpy interp exactly. In other words, you can pass “left” and “right” (not that I love THOSE parameter names) and it will use those as the fill values above and below the range. If they are null, it will use fp[0] and fp[-1]. So there is no way to end up with your existing behavior in that scenario.

I would lean this way personally. When I reach for interpolation, it’s generally because I have some null values and I want to end up with no null values. That can be accomplished with different levels of complexity and fanciness… but the goal is always “get rid of the missing values” so the existing polars behavior means I’m not done. If I don’t like linear interpolation with values above the range filled in with a scalar… well… then maybe I should reach for a more complicated science package outside of polars. “Simple linear interpolation in polars in the range, and then something custom above/below the range” doesn’t seem like all that common of a workflow.

agossard added 6 commits July 30, 2024 21:29

Support float values in interpolate_by and implement extrapolate_flat

e8d0bf0

Remove debugging lines

ff24f86

Merge branch 'main' into feature/interpolate_by_enhancements

f3b3f52

Compiling at least

8fadf67

Fix some problems and line cleanup

eb7bf0f

Add hypothesis test and final lint

e673863

agossard requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli, reswqa and orlp as code owners August 24, 2024 23:48

github-actions bot added the title needs formatting label Aug 24, 2024

agossard changed the title ~~Feature/interpolate by enhancements~~ feat: Extrapolate_flat parameter in interpolate_by Aug 24, 2024

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars and removed title needs formatting labels Aug 24, 2024

Last formatting error (hopefully!)

3a89261

MarcoGorelli reviewed Aug 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Extrapolate_flat parameter in interpolate_by #18355

feat: Extrapolate_flat parameter in interpolate_by #18355

agossard commented Aug 24, 2024

codecov bot commented Aug 25, 2024

MarcoGorelli commented Aug 25, 2024

MarcoGorelli left a comment

MarcoGorelli Aug 28, 2024

agossard Aug 28, 2024

MarcoGorelli Aug 28, 2024

MarcoGorelli Aug 28, 2024

MarcoGorelli Aug 28, 2024

MarcoGorelli Aug 28, 2024

agossard commented Aug 28, 2024

MarcoGorelli commented Aug 28, 2024 •

edited

Loading

agossard commented Aug 29, 2024

agossard commented Aug 29, 2024

MarcoGorelli commented Aug 29, 2024

agossard commented Sep 1, 2024

		assert_frame_equal(result, expected)
		# assert_frame_equal(result, expected)

		map_as_slice!(dispatch::interpolate_by, extrapolate_flat)
		//map_as_slice!(dispatch::interpolate_by, extrapolate_flat)

feat: Extrapolate_flat parameter in interpolate_by #18355

Are you sure you want to change the base?

feat: Extrapolate_flat parameter in interpolate_by #18355

Conversation

agossard commented Aug 24, 2024

codecov bot commented Aug 25, 2024

Codecov Report

MarcoGorelli commented Aug 25, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli Aug 28, 2024

Choose a reason for hiding this comment

agossard Aug 28, 2024

Choose a reason for hiding this comment

MarcoGorelli Aug 28, 2024

Choose a reason for hiding this comment

MarcoGorelli Aug 28, 2024

Choose a reason for hiding this comment

MarcoGorelli Aug 28, 2024

Choose a reason for hiding this comment

MarcoGorelli Aug 28, 2024

Choose a reason for hiding this comment

agossard commented Aug 28, 2024

MarcoGorelli commented Aug 28, 2024 • edited Loading

agossard commented Aug 29, 2024

agossard commented Aug 29, 2024

MarcoGorelli commented Aug 29, 2024

agossard commented Sep 1, 2024

MarcoGorelli commented Aug 28, 2024 •

edited

Loading