Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datasets] [Pandas Block] Implement PandasBlockAccessor in pandas-native ways #21296

Closed
2 tasks done
kfstorm opened this issue Dec 30, 2021 · 3 comments · Fixed by #26313
Closed
2 tasks done

[Datasets] [Pandas Block] Implement PandasBlockAccessor in pandas-native ways #21296

kfstorm opened this issue Dec 30, 2021 · 3 comments · Fixed by #26313
Assignees
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Milestone

Comments

@kfstorm
Copy link
Member

kfstorm commented Dec 30, 2021

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

#20988 Introduces a Pandas block format support in Ray Dataset. But Some methods of PandasBlockAccessor are implemented by converting to and from Arrow format. The performance may be not as good enough as the pandas-native way. We need to re-implement them.

Interfaces to be implemented:

  • sort_and_partition
  • combine
  • merge_sorted_blocks
  • aggregate_combined_blocks

Use case

No response

Related issues

#20719

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kfstorm kfstorm added enhancement Request for new feature and/or capability P2 Important issue, but not time-critical labels Dec 30, 2021
@kfstorm kfstorm self-assigned this Dec 30, 2021
@kfstorm kfstorm changed the title [Feature] Implement PandasBlockAccessor in pandas-native ways [Dataset] [DataFrame] Implement PandasBlockAccessor in pandas-native ways Dec 30, 2021
@clarkzinzow clarkzinzow added this to the Datasets GA milestone Jan 27, 2022
@clarkzinzow clarkzinzow added the data Ray Data-related issues label Apr 18, 2022
@clarkzinzow clarkzinzow changed the title [Dataset] [DataFrame] Implement PandasBlockAccessor in pandas-native ways [Datasets] [Pandas Block] Implement PandasBlockAccessor in pandas-native ways May 2, 2022
@clarkzinzow
Copy link
Contributor

@kfstorm Any chance that you might have bandwidth to take this on in the near future?

@clarkzinzow clarkzinzow assigned clarkzinzow and unassigned kfstorm Jun 27, 2022
@clarkzinzow clarkzinzow added P1 Issue that should be fixed within a few weeks and removed P2 Important issue, but not time-critical labels Jun 27, 2022
@kfstorm
Copy link
Member Author

kfstorm commented Jul 11, 2022

@clarkzinzow Sorry for the late reply. Unfortunately, I don't have the bandwidth to work on this.

@clarkzinzow
Copy link
Contributor

Hey @kfstorm, no worries, I opened a PR for this last week! #26313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants