Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR][DOCS] JSON APIs related documentation fixes #17602

Closed

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Apr 11, 2017

What changes were proposed in this pull request?

This PR proposes corrections related to JSON APIs as below:

  • Rendering links in Python documentation
  • Replacing RDD to Dataset in programing guide
  • Adding missing description about JSON Lines consistently in DataFrameReader.json in Python API
  • De-duplicating little bit of DataFrameReader.json in Scala/Java API

How was this patch tested?

Manually build the documentation via jekyll build. Corresponding snapstops will be left on the codes.

Note that currently there are Javadoc8 breaks in several places. These are proposed to be handled in #17477. So, this PR does not fix those.

`JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
For JSON (one record per file), set the `wholeFile` parameter to ``true``.
`JSON Lines <http://jsonlines.org/>`_ (newline-delimited JSON) is supported by default.
For JSON (one record per file), set the ``wholeFile`` parameter to ``true``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before

2017-04-11 10 10 08

After

2017-04-11 10 06 33

`JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
For JSON (one record per file), set the `wholeFile` parameter to ``true``.
`JSON Lines <http://jsonlines.org/>`_ (newline-delimited JSON) is supported by default.
For JSON (one record per file), set the ``wholeFile`` parameter to ``true``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before

2017-04-11 10 10 08

After

2017-04-11 10 11 46

* Loads a JSON file (<a href="http://jsonlines.org/">JSON Lines text format or
* newline-delimited JSON</a>) and returns the result as a `DataFrame`.
* Loads a JSON file and returns the results as a `DataFrame`.
*
Copy link
Member Author

@HyukjinKwon HyukjinKwon Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This de-duplicates the documentation as it points the overloaded json() out below.

Before

2017-04-11 10 33 18

After

2017-04-11 12 36 03

@@ -634,7 +634,9 @@ def saveAsTable(self, name, format=None, mode=None, partitionBy=None, **options)

@since(1.4)
def json(self, path, mode=None, compression=None, dateFormat=None, timestampFormat=None):
"""Saves the content of the :class:`DataFrame` in JSON format at the specified path.
"""Saves the content of the :class:`DataFrame` in JSON format
(`JSON Lines text format or newline-delimited JSON <http://jsonlines.org/>`_) at the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before
2017-04-11 10 02 21

After

2017-04-11 12 49 38

Note that this is not consistent with Scala/Java ones:

2017-04-11 12 50 13

@@ -883,7 +883,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

<div data-lang="scala" markdown="1">
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`.
This conversion can be done using `SparkSession.read.json()` on either an RDD of String,
This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

2017-04-11 1 43 06

Example:

2017-04-11 1 43 10

@@ -897,7 +897,7 @@ For a regular multi-line JSON file, set the `wholeFile` option to `true`.

<div data-lang="java" markdown="1">
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset<Row>`.
This conversion can be done using `SparkSession.read().json()` on either an RDD of String,
This conversion can be done using `SparkSession.read().json()` on either a `Dataset<String>`,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:
2017-04-11 1 43 15

Example:
2017-04-11 1 43 18

@SparkQA
Copy link

SparkQA commented Apr 11, 2017

Test build #75689 has finished for PR 17602 at commit fd64e49.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2017

Test build #75691 has finished for PR 17602 at commit 9043f01.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2017

Test build #75690 has finished for PR 17602 at commit 82aadaa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 11, 2017

Test build #75692 has finished for PR 17602 at commit 3f60861.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Apr 12, 2017

Merged to master

@asfgit asfgit closed this in bca4259 Apr 12, 2017
@HyukjinKwon
Copy link
Member Author

Thank you @srowen.

@HyukjinKwon HyukjinKwon deleted the minor-json-documentation branch January 2, 2018 03:38
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
## What changes were proposed in this pull request?

This PR proposes corrections related to JSON APIs as below:

- Rendering links in Python documentation
- Replacing `RDD` to `Dataset` in programing guide
- Adding missing description about JSON Lines consistently in `DataFrameReader.json` in Python API
- De-duplicating little bit of `DataFrameReader.json` in Scala/Java API

## How was this patch tested?

Manually build the documentation via `jekyll build`. Corresponding snapstops will be left on the codes.

Note that currently there are Javadoc8 breaks in several places. These are proposed to be handled in apache#17477. So, this PR does not fix those.

Author: hyukjinkwon <[email protected]>

Closes apache#17602 from HyukjinKwon/minor-json-documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants