[MINOR][DOCS] JSON APIs related documentation fixes #17602

HyukjinKwon · 2017-04-11T03:45:18Z

What changes were proposed in this pull request?

This PR proposes corrections related to JSON APIs as below:

Rendering links in Python documentation
Replacing RDD to Dataset in programing guide
Adding missing description about JSON Lines consistently in DataFrameReader.json in Python API
De-duplicating little bit of DataFrameReader.json in Scala/Java API

How was this patch tested?

Manually build the documentation via jekyll build. Corresponding snapstops will be left on the codes.

Note that currently there are Javadoc8 breaks in several places. These are proposed to be handled in #17477. So, this PR does not fix those.

HyukjinKwon · 2017-04-11T03:48:24Z

python/pyspark/sql/readwriter.py

-        `JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
-        For JSON (one record per file), set the `wholeFile` parameter to ``true``.
+        `JSON Lines <http://jsonlines.org/>`_ (newline-delimited JSON) is supported by default.
+        For JSON (one record per file), set the ``wholeFile`` parameter to ``true``.


Before

After

HyukjinKwon · 2017-04-11T03:53:39Z

python/pyspark/sql/streaming.py

-        `JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
-        For JSON (one record per file), set the `wholeFile` parameter to ``true``.
+        `JSON Lines <http://jsonlines.org/>`_ (newline-delimited JSON) is supported by default.
+        For JSON (one record per file), set the ``wholeFile`` parameter to ``true``.


Before

After

HyukjinKwon · 2017-04-11T03:55:08Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

-   * Loads a JSON file (<a href="http://jsonlines.org/">JSON Lines text format or
-   * newline-delimited JSON</a>) and returns the result as a `DataFrame`.
+   * Loads a JSON file and returns the results as a `DataFrame`.
+   *


This de-duplicates the documentation as it points the overloaded json() out below.

Before

After

HyukjinKwon · 2017-04-11T04:04:39Z

python/pyspark/sql/readwriter.py

@@ -634,7 +634,9 @@ def saveAsTable(self, name, format=None, mode=None, partitionBy=None, **options)

    @since(1.4)
    def json(self, path, mode=None, compression=None, dateFormat=None, timestampFormat=None):
-        """Saves the content of the :class:`DataFrame` in JSON format at the specified path.
+        """Saves the content of the :class:`DataFrame` in JSON format
+        (`JSON Lines text format or newline-delimited JSON <http://jsonlines.org/>`_) at the


Before

After

Note that this is not consistent with Scala/Java ones:

HyukjinKwon · 2017-04-11T04:43:59Z

docs/sql-programming-guide.md

@@ -883,7 +883,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

 <div data-lang="scala"  markdown="1">
 Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`.
-This conversion can be done using `SparkSession.read.json()` on either an RDD of String,
+This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`,


Output:

Example:

HyukjinKwon · 2017-04-11T04:44:17Z

docs/sql-programming-guide.md

@@ -897,7 +897,7 @@ For a regular multi-line JSON file, set the `wholeFile` option to `true`.

 <div data-lang="java"  markdown="1">
 Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset<Row>`.
-This conversion can be done using `SparkSession.read().json()` on either an RDD of String,
+This conversion can be done using `SparkSession.read().json()` on either a `Dataset<String>`,


Output:

Example:

SparkQA · 2017-04-11T06:15:19Z

Test build #75689 has finished for PR 17602 at commit fd64e49.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-11T06:23:10Z

Test build #75691 has finished for PR 17602 at commit 9043f01.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-11T06:32:41Z

Test build #75690 has finished for PR 17602 at commit 82aadaa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-04-11T06:35:20Z

Test build #75692 has finished for PR 17602 at commit 3f60861.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2017-04-12T08:16:55Z

Merged to master

HyukjinKwon · 2017-04-12T08:21:26Z

Thank you @srowen.

## What changes were proposed in this pull request? This PR proposes corrections related to JSON APIs as below: - Rendering links in Python documentation - Replacing `RDD` to `Dataset` in programing guide - Adding missing description about JSON Lines consistently in `DataFrameReader.json` in Python API - De-duplicating little bit of `DataFrameReader.json` in Scala/Java API ## How was this patch tested? Manually build the documentation via `jekyll build`. Corresponding snapstops will be left on the codes. Note that currently there are Javadoc8 breaks in several places. These are proposed to be handled in apache#17477. So, this PR does not fix those. Author: hyukjinkwon <[email protected]> Closes apache#17602 from HyukjinKwon/minor-json-documentation.

HyukjinKwon commented Apr 11, 2017

View reviewed changes

JSON related documentation fixes

9043f01

HyukjinKwon force-pushed the minor-json-documentation branch from 82aadaa to 9043f01 Compare April 11, 2017 04:04

HyukjinKwon commented Apr 11, 2017

View reviewed changes

HyukjinKwon added 2 commits April 11, 2017 13:17

Fix typos

e358615

Fix typos

3f60861

HyukjinKwon commented Apr 11, 2017

View reviewed changes

srowen approved these changes Apr 11, 2017

View reviewed changes

asfgit closed this in bca4259 Apr 12, 2017

HyukjinKwon deleted the minor-json-documentation branch January 2, 2018 03:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MINOR][DOCS] JSON APIs related documentation fixes #17602

[MINOR][DOCS] JSON APIs related documentation fixes #17602

HyukjinKwon commented Apr 11, 2017 •

edited

Loading

HyukjinKwon Apr 11, 2017

HyukjinKwon Apr 11, 2017

HyukjinKwon Apr 11, 2017 •

edited

Loading

HyukjinKwon Apr 11, 2017

HyukjinKwon Apr 11, 2017

HyukjinKwon Apr 11, 2017

SparkQA commented Apr 11, 2017

SparkQA commented Apr 11, 2017

SparkQA commented Apr 11, 2017

SparkQA commented Apr 11, 2017

srowen commented Apr 12, 2017

HyukjinKwon commented Apr 12, 2017

[MINOR][DOCS] JSON APIs related documentation fixes #17602

[MINOR][DOCS] JSON APIs related documentation fixes #17602

Conversation

HyukjinKwon commented Apr 11, 2017 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

HyukjinKwon Apr 11, 2017

Choose a reason for hiding this comment

HyukjinKwon Apr 11, 2017

Choose a reason for hiding this comment

HyukjinKwon Apr 11, 2017 • edited Loading

Choose a reason for hiding this comment

HyukjinKwon Apr 11, 2017

Choose a reason for hiding this comment

HyukjinKwon Apr 11, 2017

Choose a reason for hiding this comment

HyukjinKwon Apr 11, 2017

Choose a reason for hiding this comment

SparkQA commented Apr 11, 2017

SparkQA commented Apr 11, 2017

SparkQA commented Apr 11, 2017

SparkQA commented Apr 11, 2017

srowen commented Apr 12, 2017

HyukjinKwon commented Apr 12, 2017

HyukjinKwon commented Apr 11, 2017 •

edited

Loading

HyukjinKwon Apr 11, 2017 •

edited

Loading