fix: include path of field that causes JSON deserialization error #4249

big-andy-coates · 2020-01-08T15:44:06Z

Description

For complex types, the current deserialization error can be a bit lacking, as it does not include details of which field causes the error. This commit enhances the JSON deserializer to include the JSON path of where the error occurred in the error message.

Error message will now look something like:

{
  "ROWTIME": 1578332584042,
  "ROWKEY": null,
  "LOGGER": "processing.<QUERY ID>.KsqlTopic.source.deserializer",
  "LEVEL": "ERROR",
  "TIME": 1578332584041,
  "MESSAGE": {
    "type": 0,
    "DESERIALIZATIONERROR": {
      "ERRORMESSAGE": "Error deserializing JSON message from topic: <TOPIC NAME>",
      "RECORDB64": "<RECORD>",
      "CAUSE": [
        "Can't coerce string to type. targetType: BIGINT, path: $.colA.fieldB[3].x"
      ]
    },
    "RECORDPROCESSINGERROR": null,
    "PRODUCTIONERROR": null
  }
}

The design keep the happy path quick by using the call stack to maintain the stack of fields, rather than tracking them in some separate stack in memory. Figured this was worth while for such a performance sensitive area of the code. There's no detectable perf hit from this change.

I've also increased the times on the SerdeBenchmark, as from my own experience 10 seconds is not enough time for the JVM to optimise the byte code.

Perf test on JSON deserialization before changes:

# JMH version: 1.21
# VM version: JDK 1.8.0_162, Java HotSpot(TM) 64-Bit Server VM, 25.162-b12
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/bin/java
# VM options: -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=58499:/Applications/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 6 iterations, 30 s each
# Measurement: 3 iterations, 60 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: io.confluent.ksql.benchmark.SerdeBenchmark.deserialize
# Parameters: (schemaName = metrics, serializationFormat = JSON)

# Run progress: 0.00% complete, ETA 00:06:00
# Fork: 1 of 1
# Warmup Iteration   1: 5.188 us/op
# Warmup Iteration   2: 5.124 us/op
# Warmup Iteration   3: 5.193 us/op
# Warmup Iteration   4: 5.529 us/op
# Warmup Iteration   5: 5.378 us/op
# Warmup Iteration   6: 5.266 us/op
Iteration   1: 5.209 us/op
Iteration   2: 5.183 us/op
Iteration   3: 5.274 us/op

Result "io.confluent.ksql.benchmark.SerdeBenchmark.deserialize":
  5.222 ±(99.9%) 0.854 us/op [Average]
  (min, avg, max) = (5.183, 5.222, 5.274), stdev = 0.047
  CI (99.9%): [4.368, 6.075] (assumes normal distribution)

# Run complete. Total time: 00:06:01

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                   (schemaName)  (serializationFormat)  Mode  Cnt  Score   Error  Units
SerdeBenchmark.deserialize       metrics                   JSON  avgt    3  5.222 ± 0.854  us/op

After:

# JMH version: 1.21
# VM version: JDK 1.8.0_162, Java HotSpot(TM) 64-Bit Server VM, 25.162-b12
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/bin/java
# VM options: -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=60943:/Applications/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 6 iterations, 30 s each
# Measurement: 3 iterations, 60 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: io.confluent.ksql.benchmark.SerdeBenchmark.deserialize
# Parameters: (schemaName = metrics, serializationFormat = JSON)

# Run progress: 0.00% complete, ETA 00:06:00
# Fork: 1 of 1
# Warmup Iteration   1: 5.284 us/op
# Warmup Iteration   2: 5.114 us/op
# Warmup Iteration   3: 5.185 us/op
# Warmup Iteration   4: 5.105 us/op
# Warmup Iteration   5: 5.171 us/op
# Warmup Iteration   6: 5.119 us/op
Iteration   1: 5.208 us/op
Iteration   2: 5.248 us/op
Iteration   3: 5.198 us/op

Result "io.confluent.ksql.benchmark.SerdeBenchmark.deserialize":
  5.218 ±(99.9%) 0.477 us/op [Average]
  (min, avg, max) = (5.198, 5.218, 5.248), stdev = 0.026
  CI (99.9%): [4.740, 5.695] (assumes normal distribution)

# Run complete. Total time: 00:06:02

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                   (schemaName)  (serializationFormat)  Mode  Cnt  Score   Error  Units
SerdeBenchmark.deserialize       metrics                   JSON  avgt    3  5.218 ± 0.477  us/op

Testing done

Perf tested + unit tests added.

Reviewer checklist

Ensure docs are updated if necessary. (eg. if a user visible feature is being added or changed).
Ensure relevant issues are linked (description should include text like "Fixes #")

Fixes: confluentinc#4238 For complex types, the current deserialization error can be a bit lacking, as it does not include details of which field causes the error. This commit enhances the JSON deserializer to include the JSON path of where the error occurred in the error message. The design keep the happy path quick by using the call stack to maintain the stack of fields, rather than tracking them in some separate stack in memory. Figured this was worth while for such a performance sensitive area of the code. There's no detectable perf hit from this change. I've also increased the times on the `SerdeBenchmark`, as from my own experience 10 seconds is not enough time for the JVM to optimise the byte code. Perf test on JSON deserialization before changes: ``` # JMH version: 1.21 # VM version: JDK 1.8.0_162, Java HotSpot(TM) 64-Bit Server VM, 25.162-b12 # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/bin/java # VM options: -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=58499:/Applications/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8 # Warmup: 6 iterations, 30 s each # Measurement: 3 iterations, 60 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: io.confluent.ksql.benchmark.SerdeBenchmark.deserialize # Parameters: (schemaName = metrics, serializationFormat = JSON) # Run progress: 0.00% complete, ETA 00:06:00 # Fork: 1 of 1 # Warmup Iteration 1: 5.188 us/op # Warmup Iteration 2: 5.124 us/op # Warmup Iteration 3: 5.193 us/op # Warmup Iteration 4: 5.529 us/op # Warmup Iteration 5: 5.378 us/op # Warmup Iteration 6: 5.266 us/op Iteration 1: 5.209 us/op Iteration 2: 5.183 us/op Iteration 3: 5.274 us/op Result "io.confluent.ksql.benchmark.SerdeBenchmark.deserialize": 5.222 ±(99.9%) 0.854 us/op [Average] (min, avg, max) = (5.183, 5.222, 5.274), stdev = 0.047 CI (99.9%): [4.368, 6.075] (assumes normal distribution) # Run complete. Total time: 00:06:01 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark (schemaName) (serializationFormat) Mode Cnt Score Error Units SerdeBenchmark.deserialize metrics JSON avgt 3 5.222 ± 0.854 us/op ``` After: ``` # JMH version: 1.21 # VM version: JDK 1.8.0_162, Java HotSpot(TM) 64-Bit Server VM, 25.162-b12 # VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/bin/java # VM options: -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=60943:/Applications/IntelliJ IDEA.app/Contents/bin -Dfile.encoding=UTF-8 # Warmup: 6 iterations, 30 s each # Measurement: 3 iterations, 60 s each # Timeout: 10 min per iteration # Threads: 1 thread, will synchronize iterations # Benchmark mode: Average time, time/op # Benchmark: io.confluent.ksql.benchmark.SerdeBenchmark.deserialize # Parameters: (schemaName = metrics, serializationFormat = JSON) # Run progress: 0.00% complete, ETA 00:06:00 # Fork: 1 of 1 # Warmup Iteration 1: 5.284 us/op # Warmup Iteration 2: 5.114 us/op # Warmup Iteration 3: 5.185 us/op # Warmup Iteration 4: 5.105 us/op # Warmup Iteration 5: 5.171 us/op # Warmup Iteration 6: 5.119 us/op Iteration 1: 5.208 us/op Iteration 2: 5.248 us/op Iteration 3: 5.198 us/op Result "io.confluent.ksql.benchmark.SerdeBenchmark.deserialize": 5.218 ±(99.9%) 0.477 us/op [Average] (min, avg, max) = (5.198, 5.218, 5.248), stdev = 0.026 CI (99.9%): [4.740, 5.695] (assumes normal distribution) # Run complete. Total time: 00:06:02 REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial experiments, perform baseline and negative tests that provide experimental control, make sure the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts. Do not assume the numbers tell you what you want them to tell. Benchmark (schemaName) (serializationFormat) Mode Cnt Score Error Units SerdeBenchmark.deserialize metrics JSON avgt 3 5.218 ± 0.477 us/op ```

agavra

LGTM! Nice change :)

vcrfxia · 2020-01-09T04:39:23Z

ksql-benchmark/src/main/java/io/confluent/ksql/benchmark/SerdeBenchmark.java

+        ? new CommandLineOptions(args)
+        : new OptionsBuilder()
+            .include(SerdeBenchmark.class.getSimpleName())
+            .shouldFailOnError(true)


What is this argument that controls whether to fail on error? Is this documented somewhere?

Not sure where it's documented. Just means if the code under test throws exceptions it fails, which is what we want.

vcrfxia · 2020-01-09T04:48:55Z

ksql-benchmark/src/main/java/io/confluent/ksql/benchmark/SerdeBenchmark.java

@@ -68,8 +68,8 @@
 */
 @BenchmarkMode(Mode.AverageTime)
 @OutputTimeUnit(TimeUnit.MICROSECONDS)
-@Warmup(iterations = 3, time = 10)
-@Measurement(iterations = 3, time = 10)
+@Warmup(iterations = 6, time = 30)


I've also increased the times on the SerdeBenchmark, as from my own experience 10 seconds is not enough time for the JVM to optimise the byte code.

Pardon the naivete, but isn't the point of using JMH to avoid optimizations? Why are we trying to allow the JVM time to perform optimizations? Wouldn't that be less representative of how these segments of code are used in practice?

Not at all. We want to JVM to be given time to optimise the code. That's what the warm-up is all about. When KSQL is running in production it will be running for a long time and the JVM will optimise such code paths. So a meaningful benchmark needs to give the JVM time to do this before running tests.

Perhaps my understanding gap is in JVM optimizations, then. It's true that when KSQL is running in production the JVM will have time to optimize this code, but KSQL running in production wouldn't just be running this serde code over and over again. Intuitively it feels like the JVM would be able to make stronger optimizations on the serde code if that's all that's being run, compared to the the serde code being run as part of other KSQL code. Is this not true / not a concern for benchmarking?

vcrfxia · 2020-01-09T04:49:36Z

ksql-serde/src/main/java/io/confluent/ksql/serde/json/KsqlJsonDeserializer.java

+          });
+      return handler.apply(context);
+    } catch (final CoercionException e) {
+      throw new CoercionException(e.getRawMessage(), pathPart + e.getPath(), e);


Very cool! I spent some time thinking about how to implement this and didn't come up with anything nearly as nice. Glad to have had the opportunity to learn from you :)

big-andy-coates requested a review from a team as a code owner January 8, 2020 15:44

agavra approved these changes Jan 8, 2020

View reviewed changes

agavra requested a review from a team January 8, 2020 19:34

agavra self-assigned this Jan 8, 2020

vcrfxia reviewed Jan 9, 2020

View reviewed changes

big-andy-coates merged commit 5cc718b into confluentinc:master Jan 10, 2020

big-andy-coates deleted the json_path branch January 10, 2020 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: include path of field that causes JSON deserialization error #4249

fix: include path of field that causes JSON deserialization error #4249

big-andy-coates commented Jan 8, 2020

agavra left a comment

vcrfxia Jan 9, 2020

big-andy-coates Jan 9, 2020

vcrfxia Jan 9, 2020

big-andy-coates Jan 9, 2020

vcrfxia Jan 9, 2020

vcrfxia Jan 9, 2020

big-andy-coates Jan 9, 2020

fix: include path of field that causes JSON deserialization error #4249

fix: include path of field that causes JSON deserialization error #4249

Conversation

big-andy-coates commented Jan 8, 2020

Description

Testing done

Reviewer checklist

agavra left a comment

Choose a reason for hiding this comment

vcrfxia Jan 9, 2020

Choose a reason for hiding this comment

big-andy-coates Jan 9, 2020

Choose a reason for hiding this comment

vcrfxia Jan 9, 2020

Choose a reason for hiding this comment

big-andy-coates Jan 9, 2020

Choose a reason for hiding this comment

vcrfxia Jan 9, 2020

Choose a reason for hiding this comment

vcrfxia Jan 9, 2020

Choose a reason for hiding this comment

big-andy-coates Jan 9, 2020

Choose a reason for hiding this comment