-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17374][SQL] Better error messages when parsing JSON using DataFrameReader #14929
[SPARK-17374][SQL] Better error messages when parsing JSON using DataFrameReader #14929
Conversation
50e312f
to
3491f15
Compare
Test build #64818 has finished for PR 14929 at commit
|
Test build #64821 has finished for PR 14929 at commit
|
3491f15
to
b7ebf26
Compare
@@ -62,8 +68,35 @@ class JacksonParser( | |||
throw new RuntimeException(s"Malformed line in FAILFAST mode: $record") | |||
} | |||
if (options.dropMalformed) { | |||
logWarning(s"Dropping malformed line: $record") | |||
if (!isWarningPrintedForMalformedRecord) { | |||
logWarning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only print a warning for the first time.
b7ebf26
to
82f9927
Compare
private val emptyRow: Seq[InternalRow] = Seq(new GenericInternalRow(schema.length)) | ||
|
||
@transient | ||
private var isWarningPrintedForMalformedRecord: Boolean = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private[this]?
Test build #64838 has finished for PR 14929 at commit
|
Test build #64841 has finished for PR 14929 at commit
|
Test build #64955 has finished for PR 14929 at commit
|
retest this please |
Test build #64996 has finished for PR 14929 at commit
|
StructField("c", StringType, true) :: Nil) | ||
|
||
val jsonDF = spark.read.schema(schema).json(corruptRecords) | ||
jsonDF.createOrReplaceTempView("jsonTable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why create this temp view? we can
checkAnswer(jsonDF.select($"a", $"b", $"c"), Seq(Row...))
thanks, merging to master! @clockfly can you send a follow-up PR to address the minor comment in test? |
What changes were proposed in this pull request?
This PR adds better error messages for malformed record when reading a JSON file using DataFrameReader.
For example, for query:
Before change:
We silently replace corrupted line with null
After change:
Add an explicit warning message:
How was this patch tested?
Unit test.