You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
importosfrompathlibimportPathimportpolarsaspl# Run this script from the `py-polars` folder, or otherwise make sure it points to that folderio_files_path=Path() /"tests"/"unit"/"io"/"files"df=pl.read_csv(io_files_path/"foods*.csv")
os.environ["POLARS_FORCE_OOC"] ="1"os.environ["POLARS_STREAMING_GROUPBY_SPILL_SIZE"] ="256"# this creates 10M rowsq=df.lazy()
q=q.join(q, how="cross").select(df.columns).head(10_000)
# uses out-of-core uniquedf1=q.join(q.head(1000), how="cross").unique().collect(streaming=True)
print(df1)
Log output
run UdfExec
RUN STREAMING PIPELINE
df -> cross_join_sink
RefCell { value: [df -> placeholder -> slice_sink -> fast_projection -> cross_join_sink, df -> cross_join_sink, df -> placeholder -> slice_sink -> fast_projection -> slice_sink -> placeholder -> re-project-sink -> ordered_sink] }
OOC group_by started
Temporary directory path in use: /tmp
Temporary directory path in use: /tmp
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
OOC group_by started
process partition 0 during generic-group_by-source
thread '<unnamed>' panicked at crates/polars-pipe/src/executors/sinks/group_by/generic/mod.rs:97:44:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("invalid series dtype: expected `BinaryOffset`, got `binary`"))note: run with `RUST_BACKTRACE=1` environment variable to display a backtraceTraceback (most recent call last): File "/home/stijn/code/polars/py-polars/repro.py", line 18, in<module> df1 = q.join(q.head(1000), how="cross").unique().collect(streaming=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/stijn/code/polars/py-polars/polars/lazyframe/frame.py", line 1935, in collectreturnwrap_df(ldf.collect()) ^^^^^^^^^^^^^pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("invalid series dtype: expected `BinaryOffset`, got `binary`"))
Issue description
There is actually a test for this in our test suite. It runs fine when you run it together with other tests, but fails like the script above when you run it independently:
Checks
Reproducible example
Log output
Issue description
There is actually a test for this in our test suite. It runs fine when you run it together with other tests, but fails like the script above when you run it independently:
Expected behavior
It should work.
Installed versions
main
The text was updated successfully, but these errors were encountered: