-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] fix nested ragged ndarray #44236
[data] fix nested ragged ndarray #44236
Conversation
Signed-off-by: Hao Chen <[email protected]>
return True | ||
if np.isscalar(udf_return_col[0]): | ||
return True | ||
return is_scalar_list(udf_return_col[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's meant by "scalar list" here? I expect it to mean a list[float]
like [1, 2, 3]
and not a nested arrays like [[1], [2]]
, but with this implementation nested lists would be considered scalar lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we only need to check udf_return_col[0]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bveeramani @c21 I've updated the PR with a new way to fix. I.e, do not convert each individual element, as long as the whole list is a nested list.
Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: Hao Chen <[email protected]>
Signed-off-by: Hao Chen <[email protected]>
# scalar lists though, since those can be represented as pyarrow list type | ||
# without needing to go through our tensor extension. | ||
if all( | ||
is_valid_udf_return(e) and not is_scalar_list(e) for e in udf_return_col |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think is_scalar_list
is used anywhere other than here. Let's remove the function?
Signed-off-by: Hao Chen <[email protected]>
--------- Signed-off-by: Hao Chen <[email protected]>
Why are these changes needed?
Currently we support single level of ragged ndarray, i.e, shape of each row is different. But "nested ragged ndarray" isn't supported. I.e, when a row already contains a ragged ndarray, the following error will occur.
Repro:
Error:
Related issue number
Fixes #44235, #41078, #44062
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.