-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] map_batches fails on multiple calls on data with nested lists #39559
Comments
@raulchen Can you please take a look? |
Confirmed it's a bug. we should fix it in the next release. |
Thanks! Is there a timeline for next release that we should be aware of? |
@keerthanvasist @raulchen would this be a valid output?
|
I would say it's not. I will take a look at the CR to try and appreciate the engineering constraints though. |
@keerthanvasist forgive my noobness, but what if all the nicknames are typed as follow:
|
I am also new to Ray. I think this would be okay, but you have to check with someone who has better context on what the contracts are for different block types. Thanks for working on this! |
Fixed by #45287 |
What happened + What you expected to happen
I have data that has nested lists. A simple reproducible example is shown below. It is an extension of the example in the Ray documentation page for
map_batches
.Let us consider this example.
When I apply this identity function on this:
Now,
nicknames
has different types across rows. Somehow the type check does not fail. This is already problematic. But it gets worse.When I run two successive map functions on it, even without performing any mutations (
identity
function below), it throws an exception.The expected behavior is that the dataframe is unchanged across any of number
identity
fuction applications.Versions / Dependencies
Ray version: 2.63.0
Python: 3.10.12
Mac OS
Reproduction script
Simplest reproduction script:
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: