Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datasets] [Hotfix] Add ray_remote_args to read_text. #23764

Merged

Conversation

clarkzinzow
Copy link
Contributor

Adds ray_remote_args to ray.data.read_text(). This exists on all other ray.data.read_* APIs, so it appears that read_text() was missed.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@clarkzinzow clarkzinzow self-assigned this May 9, 2022
@stale
Copy link

stale bot commented Jun 10, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 10, 2022
@clarkzinzow
Copy link
Contributor Author

Will be coming back to this in the next week or so.

@stale stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jun 10, 2022
@clarkzinzow clarkzinzow force-pushed the datasets/fix/read-text-remote-args branch from fc9ae40 to bb0bbac Compare June 17, 2022 00:59
Kai Fricke added 3 commits June 17, 2022 10:05
@krfricke
Copy link
Contributor

krfricke commented Jun 17, 2022

I've taken a look.

  • The test is stochastic, as sometimes the remote tasks are scheduled only on the bar node (as expected)
  • There were two sources of errors
    1. We didn't pass the remote args to flat_map(), which were sometimes scheduled on the foo node
    2. ds.take() also schedules remote tasks, which were sometimes scheduled on the foo node
  • To resolve this, we're now passing remote args to flat_map()
  • I don't think we need to solve ds.take() - but we need to move that assertion to the bottom of the test.

This all passes locally for me every time now.

See these changes: 7b4ee95

@krfricke krfricke added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jun 17, 2022
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a self-accept but still - cc @clarkzinzow to merge

Copy link
Contributor Author

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krfricke Thanks a million for finding the blocker bug! Definitely a 🤦 moment for me. 😁

I think this is good to merge, with a P2 follow-up around adding a TextDatasource to get rid of this two-stage read.

python/ray/data/read_api.py Outdated Show resolved Hide resolved
python/ray/data/read_api.py Outdated Show resolved Hide resolved
@richardliaw richardliaw merged commit c2ab73f into ray-project:master Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants