Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df.max()/min() on 1 column df leads to "could not broadcast input array from shape (6,) into shape (5,)" what from parquet loaded with ray #7367

Closed
Liquidmasl opened this issue Aug 12, 2024 · 1 comment
Labels
question ❓ Questions about Modin Triage 🩹 Issues that need triage

Comments

@Liquidmasl
Copy link

I am still trying to reliably save and load from parquet, but running into new problems.

It seams most of my problems are windows related, as on Linux the experience is a lot less painful.

While atempting to get managable .parquet chunks, I used a partition column.
But modin.read_parquet does not support partition columns and is defaulting to pandas, which exploses my RAM.
ray.from_parquet works though, and with .to_modin() I get a modin dataframe again, that looks fine.

but when I do

df['z'].max()
I get

could not broadcast input array from shape (6,) into shape (5,)

those 2 ints are not always the same though. They depend on the file I load, and I think on the set partitions.
But I cant seam to figure out what they mean.

I tried repartitioning but that did not help.

Any hint whats going on here?

To make my code run on windows as well, it would be great if I could use this workaround. On Linux it seams the modin load and save to parquet methods work a lot better

@Liquidmasl Liquidmasl added question ❓ Questions about Modin Triage 🩹 Issues that need triage labels Aug 12, 2024
@Liquidmasl
Copy link
Author

probably the same issue as here: #7383

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question ❓ Questions about Modin Triage 🩹 Issues that need triage
Projects
None yet
Development

No branches or pull requests

1 participant