Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MINOR][DOCS] JSON APIs related documentation fixes #17602

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/sql-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -883,7 +883,7 @@ Configuration of Parquet can be done using the `setConf` method on `SparkSession

<div data-lang="scala" markdown="1">
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset[Row]`.
This conversion can be done using `SparkSession.read.json()` on either an RDD of String,
This conversion can be done using `SparkSession.read.json()` on either a `Dataset[String]`,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:

2017-04-11 1 43 06

Example:

2017-04-11 1 43 10

or a JSON file.

Note that the file that is offered as _a json file_ is not a typical JSON file. Each
Expand All @@ -897,7 +897,7 @@ For a regular multi-line JSON file, set the `wholeFile` option to `true`.

<div data-lang="java" markdown="1">
Spark SQL can automatically infer the schema of a JSON dataset and load it as a `Dataset<Row>`.
This conversion can be done using `SparkSession.read().json()` on either an RDD of String,
This conversion can be done using `SparkSession.read().json()` on either a `Dataset<String>`,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output:
2017-04-11 1 43 15

Example:
2017-04-11 1 43 18

or a JSON file.

Note that the file that is offered as _a json file_ is not a typical JSON file. Each
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,7 @@ private static void runJsonDatasetExample(SparkSession spark) {
// +------+

// Alternatively, a DataFrame can be created for a JSON dataset represented by
// an Dataset[String] storing one JSON object per string.
// a Dataset<String> storing one JSON object per string.
List<String> jsonData = Arrays.asList(
"{\"name\":\"Yin\",\"address\":{\"city\":\"Columbus\",\"state\":\"Ohio\"}}");
Dataset<String> anotherPeopleDataset = spark.createDataset(jsonData, Encoders.STRING());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ object SQLDataSourceExample {
// +------+

// Alternatively, a DataFrame can be created for a JSON dataset represented by
// an Dataset[String] storing one JSON object per string
// a Dataset[String] storing one JSON object per string
val otherPeopleDataset = spark.createDataset(
"""{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)
val otherPeople = spark.read.json(otherPeopleDataset)
Expand Down
8 changes: 5 additions & 3 deletions python/pyspark/sql/readwriter.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,8 +173,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
"""
Loads JSON files and returns the results as a :class:`DataFrame`.
`JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
For JSON (one record per file), set the `wholeFile` parameter to ``true``.
`JSON Lines <http://jsonlines.org/>`_ (newline-delimited JSON) is supported by default.
For JSON (one record per file), set the ``wholeFile`` parameter to ``true``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before

2017-04-11 10 10 08

After

2017-04-11 10 06 33

If the ``schema`` parameter is not specified, this function goes
through the input once to determine the input schema.
Expand Down Expand Up @@ -634,7 +634,9 @@ def saveAsTable(self, name, format=None, mode=None, partitionBy=None, **options)

@since(1.4)
def json(self, path, mode=None, compression=None, dateFormat=None, timestampFormat=None):
"""Saves the content of the :class:`DataFrame` in JSON format at the specified path.
"""Saves the content of the :class:`DataFrame` in JSON format
(`JSON Lines text format or newline-delimited JSON <http://jsonlines.org/>`_) at the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before
2017-04-11 10 02 21

After

2017-04-11 12 49 38

Note that this is not consistent with Scala/Java ones:

2017-04-11 12 50 13

specified path.
:param path: the path in any Hadoop supported file system
:param mode: specifies the behavior of the save operation when data already exists.
Expand Down
4 changes: 2 additions & 2 deletions python/pyspark/sql/streaming.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,8 +405,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
"""
Loads a JSON file stream and returns the results as a :class:`DataFrame`.

`JSON Lines <http://jsonlines.org/>`_(newline-delimited JSON) is supported by default.
For JSON (one record per file), set the `wholeFile` parameter to ``true``.
`JSON Lines <http://jsonlines.org/>`_ (newline-delimited JSON) is supported by default.
For JSON (one record per file), set the ``wholeFile`` parameter to ``true``.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before

2017-04-11 10 10 08

After

2017-04-11 10 11 46


If the ``schema`` parameter is not specified, this function goes
through the input once to determine the input schema.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -268,8 +268,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
}

/**
* Loads a JSON file (<a href="http://jsonlines.org/">JSON Lines text format or
* newline-delimited JSON</a>) and returns the result as a `DataFrame`.
* Loads a JSON file and returns the results as a `DataFrame`.
*
Copy link
Member Author

@HyukjinKwon HyukjinKwon Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This de-duplicates the documentation as it points the overloaded json() out below.

Before

2017-04-11 10 33 18

After

2017-04-11 12 36 03

* See the documentation on the overloaded `json()` method with varargs for more details.
*
* @since 1.4.0
Expand Down