Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Document using a different separator for read_csv #27850

Merged
merged 10 commits into from
Sep 5, 2022

Conversation

pcmoritz
Copy link
Contributor

@pcmoritz pcmoritz commented Aug 13, 2022

Why are these changes needed?

See discussion in #27738

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

>>> # because by default read_csv only reads .csv files.
>>> from pyarrow import csv
>>> parse_options = csv.ParseOptions(delimiter="\\t")
>>> ray.data.read_csv( # doctest: +SKIP
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be skipped at the moment which is very very sad -- tracked in #27853

Signed-off-by: Philipp Moritz <[email protected]>
Signed-off-by: Philipp Moritz <[email protected]>
Signed-off-by: Philipp Moritz <[email protected]>
Copy link
Contributor

@matthewdeng matthewdeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@c21 take a look?

python/ray/data/read_api.py Outdated Show resolved Hide resolved
Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pcmoritz! Looks good except several minor comments.

>>> # Read files that use a different delimiter. The partition_filter=None is needed here
>>> # because by default read_csv only reads .csv files.
>>> from pyarrow import csv
>>> parse_options = csv.ParseOptions(delimiter="\\t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be delimiter="\t" right? I tested it out in #27738 (comment) .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that before and then it didn't render correctly in the docs, let me try again :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the right solution that makes this work in all cases (both raw docstring and readthedocs) seems to be to put an r before the doc string and not escape this, which I did now :)

@@ -0,0 +1,150 @@
5.1 3.5 1.4 0.2 setosa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to define the schema at the first line:

sepal.length sepal.width petal.length petal.width variety

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I fixed that now :)

6.3 2.5 5.0 1.9 virginica
6.5 3.0 5.2 2.0 virginica
6.2 3.4 5.4 2.3 virginica
5.9 3.0 5.1 1.8 virginica
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one new line at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@pcmoritz
Copy link
Contributor Author

pcmoritz commented Sep 5, 2022

Thanks for all the feedback, this should be ready to merge now, @c21 can you approve when you get a chance?

Copy link
Contributor

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pcmoritz, LGTM!

@pcmoritz pcmoritz merged commit 2a0ff1b into ray-project:master Sep 5, 2022
@pcmoritz pcmoritz deleted the fix-tsv-reading branch September 5, 2022 23:47
kira-lin pushed a commit to kira-lin/ray that referenced this pull request Sep 8, 2022
ilee300a pushed a commit to ilee300a/ray that referenced this pull request Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants