Skip to content

Commit

Permalink
[SPARK-35104][SQL] Fix ugly indentation of multiple JSON records in a…
Browse files Browse the repository at this point in the history
… single split file generated by JacksonGenerator when pretty option is true

### What changes were proposed in this pull request?

This issue fixes an issue that indentation of multiple output JSON records in a single split file are broken except for the first record in the split when `pretty` option is `true`.
```
// Run in the Spark Shell.
// Set spark.sql.leafNodeDefaultParallelism to 1 for the current master.
// Or set spark.default.parallelism for the previous releases.
spark.conf.set("spark.sql.leafNodeDefaultParallelism", 1)
val df = Seq("a", "b", "c").toDF
df.write.option("pretty", "true").json("/path/to/output")

# Run in a Shell
$ cat /path/to/output/*.json
{
  "value" : "a"
}
 {
  "value" : "b"
}
 {
  "value" : "c"
}
```

### Why are the changes needed?

It's not pretty even though `pretty` option is true.

### Does this PR introduce _any_ user-facing change?

I think "No". Indentation style is changed but JSON format is not changed.

### How was this patch tested?

New test.

Closes #32203 from sarutak/fix-ugly-indentation.

Authored-by: Kousuke Saruta <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
  • Loading branch information
sarutak authored and MaxGekk committed Apr 16, 2021
1 parent 345c380 commit 95db7e6
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.json
import java.io.Writer

import com.fasterxml.jackson.core._
import com.fasterxml.jackson.core.util.DefaultPrettyPrinter

import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
Expand Down Expand Up @@ -73,7 +74,7 @@ private[sql] class JacksonGenerator(

private val gen = {
val generator = new JsonFactory().createGenerator(writer).setRootValueSeparator(null)
if (options.pretty) generator.useDefaultPrettyPrinter() else generator
if (options.pretty) generator.setPrettyPrinter(new DefaultPrettyPrinter("")) else generator
}

private val lineSeparator: String = options.lineSeparatorInWrite
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2844,6 +2844,31 @@ abstract class JsonSuite
assert(readback.collect sameElements Array(Row(0), Row(1), Row(2)))
}
}

test("SPARK-35104: Fix wrong indentation for multiple JSON even if `pretty` option is true") {
withSQLConf(SQLConf.LEAF_NODE_DEFAULT_PARALLELISM.key -> "1") {
withTempPath { path =>
val basePath = path.getCanonicalPath
val df = Seq("a", "b", "c").toDF
df.write.option("pretty", "true").json(basePath)

val expectedText =
s"""{
| "value" : "a"
|}
|{
| "value" : "b"
|}
|{
| "value" : "c"
|}
|""".stripMargin
val actualText = spark.read.option("wholetext", "true")
.text(basePath).map(_.getString(0)).collect().mkString
assert(actualText === expectedText)
}
}
}
}

class JsonV1Suite extends JsonSuite {
Expand Down

0 comments on commit 95db7e6

Please sign in to comment.