Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5307] SerializationDebugger #4098

Closed
wants to merge 3 commits into from

Conversation

rxin
Copy link
Contributor

@rxin rxin commented Jan 19, 2015

This patch adds a SerializationDebugger that is used to add serialization path to a NotSerializableException. When a NotSerializableException is encountered, the debugger visits the object graph to find the path towards the object that cannot be serialized, and constructs information to help user to find the object.

The patch uses the internals of JVM serialization (in particular, heavy usage of ObjectStreamClass). Compared with an earlier attempt, this one provides extra information including field names, array offsets, writeExternal calls, etc.

An example serialization stack:

Serialization stack:
  - object not serializable (class: org.apache.spark.serializer.NotSerializable, value: org.apache.spark.serializer.NotSerializable@2c43caa4)
  - element of array (index: 0)
  - array (class [Ljava.lang.Object;, size 1)
  - field (class: org.apache.spark.serializer.SerializableArray, name: arrayField, type: class [Ljava.lang.Object;)
  - object (class org.apache.spark.serializer.SerializableArray, org.apache.spark.serializer.SerializableArray@193c5908)
  - writeExternal data
  - externalizable object (class org.apache.spark.serializer.ExternalizableClass, org.apache.spark.serializer.ExternalizableClass@320bdadc)

…tion - take 2

This patch adds a SerializationDebugger that is used to add serialization path to
a NotSerializableException. When a NotSerializableException is encountered, the debugger
visits the object graph to find the path towards the object that cannot be serialized,
and constructs information to help user to find the object.

Compared with an earlier attempt, this one provides extra information including
field names, array offsets, writeExternal calls, etc.
@rxin
Copy link
Contributor Author

rxin commented Jan 19, 2015

Link to the earlier attempt: #4093 by me and #3518 by @ilganeli

@SparkQA
Copy link

SparkQA commented Jan 19, 2015

Test build #25740 has started for PR 4098 at commit b349b77.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 19, 2015

Test build #25741 has started for PR 4098 at commit 572d0cb.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 19, 2015

Test build #25740 has finished for PR 4098 at commit b349b77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val elem = s"array (class $
    • val elem = s"externalizable object (class $
    • val elem = s"object (class $
    • implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) extends AnyVal

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25740/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 19, 2015

Test build #25741 has finished for PR 4098 at commit 572d0cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val elem = s"array (class $
    • val elem = s"externalizable object (class $
    • val elem = s"object (class $
    • implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) extends AnyVal

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25741/
Test PASSed.



// Bar is not serializable
class NotSerializable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add a new line here.

@SparkQA
Copy link

SparkQA commented Jan 19, 2015

Test build #25753 has started for PR 4098 at commit 553b3ff.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 19, 2015

Test build #25753 has finished for PR 4098 at commit 553b3ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • val elem = s"array (class $
    • val elem = s"externalizable object (class $
    • val elem = s"object (class $
    • implicit class ObjectStreamClassMethods(val desc: ObjectStreamClass) extends AnyVal

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25753/
Test PASSed.

@liancheng
Copy link
Contributor

LGTM. Would be good to add a comment to point out the debugger is disabled if sun.io.serialization.extendedDebugInfo is used.

@pwendell
Copy link
Contributor

This is really cool.

@rxin rxin changed the title [SPARK-5307] SerializationDebugger - take 2 [SPARK-5307] SerializationDebugger Jan 28, 2015
@pwendell
Copy link
Contributor

@rxin LGTM - I took a quick look through, not an expert on this but I think it's good.

@rxin
Copy link
Contributor Author

rxin commented Jan 31, 2015

Thanks. Merging in master.

@asfgit asfgit closed this in 740a568 Jan 31, 2015
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Jan 31, 2015
This patch adds a SerializationDebugger that is used to add serialization path to a NotSerializableException. When a NotSerializableException is encountered, the debugger visits the object graph to find the path towards the object that cannot be serialized, and constructs information to help user to find the object.

The patch uses the internals of JVM serialization (in particular, heavy usage of ObjectStreamClass). Compared with an earlier attempt, this one provides extra information including field names, array offsets, writeExternal calls, etc.

An example serialization stack:
```
Serialization stack:
  - object not serializable (class: org.apache.spark.serializer.NotSerializable, value: org.apache.spark.serializer.NotSerializable2c43caa4)
  - element of array (index: 0)
  - array (class [Ljava.lang.Object;, size 1)
  - field (class: org.apache.spark.serializer.SerializableArray, name: arrayField, type: class [Ljava.lang.Object;)
  - object (class org.apache.spark.serializer.SerializableArray, org.apache.spark.serializer.SerializableArray193c5908)
  - writeExternal data
  - externalizable object (class org.apache.spark.serializer.ExternalizableClass, org.apache.spark.serializer.ExternalizableClass320bdadc)
```

Author: Reynold Xin <[email protected]>

Closes apache#4098 from rxin/SerializationDebugger and squashes the following commits:

553b3ff [Reynold Xin] Update SerializationDebuggerSuite.scala
572d0cb [Reynold Xin] Disable automatically when reflection fails.
b349b77 [Reynold Xin] [SPARK-5307] SerializationDebugger to help debug NotSerializableException - take 2
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Jan 31, 2015
This patch adds a SerializationDebugger that is used to add serialization path to a NotSerializableException. When a NotSerializableException is encountered, the debugger visits the object graph to find the path towards the object that cannot be serialized, and constructs information to help user to find the object.

The patch uses the internals of JVM serialization (in particular, heavy usage of ObjectStreamClass). Compared with an earlier attempt, this one provides extra information including field names, array offsets, writeExternal calls, etc.

An example serialization stack:
```
Serialization stack:
  - object not serializable (class: org.apache.spark.serializer.NotSerializable, value: org.apache.spark.serializer.NotSerializable2c43caa4)
  - element of array (index: 0)
  - array (class [Ljava.lang.Object;, size 1)
  - field (class: org.apache.spark.serializer.SerializableArray, name: arrayField, type: class [Ljava.lang.Object;)
  - object (class org.apache.spark.serializer.SerializableArray, org.apache.spark.serializer.SerializableArray193c5908)
  - writeExternal data
  - externalizable object (class org.apache.spark.serializer.ExternalizableClass, org.apache.spark.serializer.ExternalizableClass320bdadc)
```

Author: Reynold Xin <[email protected]>

Closes apache#4098 from rxin/SerializationDebugger and squashes the following commits:

553b3ff [Reynold Xin] Update SerializationDebuggerSuite.scala
572d0cb [Reynold Xin] Disable automatically when reflection fails.
b349b77 [Reynold Xin] [SPARK-5307] SerializationDebugger to help debug NotSerializableException - take 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants