-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Look into contributing / integrating with polars
?
#19
Comments
polars
?
I'd be interested in both points since it seems not possible to use |
Hi @jmakov, I'll touch upon both points here. But first some questions.
Below I'll share a bit more about the underlying Rust code and how I envision this integration.
|
Thanks for the quick response. I'm using Another question: in my code I'm looking for the first occurrence (index) where a value is greater than given value (in the same column). And I wanted to call argminmax from |
Indeed! I haven't created any proper numpy (or Python) bindings for One (somewhat hacky) solution is to use Here a reusable Python implementation ⬇️ import numpy as np; from tsdownsample import MinMaxDownsampler
downsampler = MinMaxDownsampler()
def argminmax(arr: np.ndarray) -> (int, int):
"""Returns (min_index, max_index) for the given array."""
# Call argminmax exactly once on the data (by downsampling it to 2 datapoints)
(idx1, idx2) = downsampler.downsample(arr, n_out=2)
# Return the (argmin_index, argmax_index)
if arr[idx1] < arr[idx2]:
# idx1 is the argmin_index
return idx1, idx2
else:
# idx1 is the argmax_index
return idx2, idx1 But if you will be working with numpy arrays - you might as well just call |
Was already calling |
Can you share some (minimal) Python code of how you use => If polars calls
|
I'm using $ test.py
import numpy
import polars
numpy.random.seed(42)
df_size = 10**5
df = polars.DataFrame({"idx": range(0, df_size), "col1": numpy.random.randint(1, 1_000, df_size)})
def get_idx_where_first_value_greater(df):
for row in df.iter_rows():
idx = row[0]
(df[idx, 1] < df[idx:, 1]).arg_max()
get_idx_where_first_value_greater(df) You can run it like this to prevent multithreading: |
I monitored the runtime of the See my experiment below ⬇️ import polars
import numpy as np
import timeit
np.random.seed(42)
df_size = 10**4 # set this to 10x lower (otherwise the benchmark takes way to long)
df = polars.DataFrame({"idx": range(0, df_size), "col1": np.random.randint(1, 1_000, df_size)})
# Only create a bool Series
def get_idx_where_first_value_greater_cmp(df):
for row in df.iter_rows():
idx = row[0]
(df[idx, 1] < df[idx:, 1]) #.arg_max()
# Original code
def get_idx_where_first_value_greater_arg_max(df):
for row in df.iter_rows():
idx = row[0]
(df[idx, 1] < df[idx:, 1]).arg_max()
print("polars diff")
res = timeit.timeit('get_idx_where_first_value_greater_cmp(df)', globals=globals(), number=7)
print(res, "s")
print("polars diff argmax")
res = timeit.timeit('get_idx_where_first_value_greater_arg_max(df)', globals=globals(), number=7)
print(res, "s") Can you confirm this? |
In the issue I referenced somebody ran |
Ran Line # Hits Time Per Hit % Time Line Contents
==============================================================
10 def get_idx_where_first_value_greater(df):
11 100000 114806212.0 1148.1 0.4 for row in df.iter_rows():
12 100000 48883868.0 488.8 0.2 idx = row[0]
13 100000 25935368268.0 259353.7 98.4 d = (df[idx, 1] < df[idx:, 1])
14 100000 265593691.0 2655.9 1.0 d.arg_max() |
Thanks! I think that solves the issue with looking for faster |
Not really sure about this, but I think using arg_where might suit your problem better than using argmax. Was fun toying around with some polars code! 👌 |
Didn't know that existed, will check it out, thanks! |
Closing as pola-rs/polars#8074 brings |
It would be great to support this (meaning this package's) high level functionality with arrow arrays and/or polars dataframes/series directly too. |
Integration / operability with polars can be realized in two manners:
This Issue is, of course, open for discussion about how users / contributors see this
On another note, perhaps the
argminmax
its runtime speed / algorithm can be contributed to various parts of polars?The text was updated successfully, but these errors were encountered: