Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-5681] Fixing Kryo being instantiated w/ invalid SparkConf #7821

Merged
merged 2 commits into from
Feb 2, 2023

Conversation

alexeykudinkin
Copy link
Contributor

@alexeykudinkin alexeykudinkin commented Feb 1, 2023

Change Logs

This is addressing misconfiguration of the Kryo object used specifically to serialize Spark's internal structures (like Expressions): previously we're using default SparkConf instance to configure it, while instead we should have used the one provided by SparkEnv

Impact

Addresses NPE/ClassCastException occurring when trying to run Merge Into statements in Spark SQL

Risk level (write none, low medium or high below)

Low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@@ -188,7 +188,6 @@ trait ProvidesHoodieConfig extends Logging {
PRECOMBINE_FIELD.key -> preCombineField,
PARTITIONPATH_FIELD.key -> partitionFieldsStr,
PAYLOAD_CLASS_NAME.key -> payloadClassName,
HoodieWriteConfig.COMBINE_BEFORE_INSERT.key -> String.valueOf(hasPrecombineColumn),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we change this file for this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stacked on top of another (for testing), will be cleaned up

private val SERIALIZER_THREAD_LOCAL = new ThreadLocal[SerializerInstance] {
private lazy val conf = {
val conf = Option(SparkEnv.get)
// TODO elaborate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix comment

@alexeykudinkin alexeykudinkin changed the title [MINOR] Fixing Kryo being instantiated w/ invalid SparkConf [HUDI-5681] Fixing Kryo being instantiated w/ invalid SparkConf Feb 1, 2023
Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me. @alexeykudinkin we need to test the PR thoroughly before merging it. @xushiyan @YannByron could you also take another look?

@@ -328,7 +328,7 @@ case class MergeIntoHoodieTableCommand(mergeInto: MergeIntoTable) extends Hoodie
}).toMap
// Serialize the Map[UpdateCondition, UpdateAssignments] to base64 string
val serializedUpdateConditionAndExpressions = Base64.getEncoder
.encodeToString(SerDeUtils.toBytes(updateConditionToAssignments))
.encodeToString(Serializer.toBytes(updateConditionToAssignments))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work for all Spark versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly are you referring to?

Comment on lines +472 to +473
private[hudi] object Serializer {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this on all Spark versions (Spark 2.4, 3.1, 3.2, 3.3) in cluster environment (multiple nodes)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked Spark 3.1, 3.2 and 3.3, working fine

@hudi-bot
Copy link

hudi-bot commented Feb 2, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@alexeykudinkin alexeykudinkin merged commit e93fbee into apache:master Feb 2, 2023
yihua pushed a commit that referenced this pull request Feb 2, 2023
)

This is addressing misconfiguration of the Kryo object used specifically to serialize Spark's internal structures (like `Expression`s): previously we're using default `SparkConf` instance to configure it, while instead we should have used the one provided by `SparkEnv`
nsivabalan pushed a commit to nsivabalan/hudi that referenced this pull request Mar 22, 2023
…ache#7821)

This is addressing misconfiguration of the Kryo object used specifically to serialize Spark's internal structures (like `Expression`s): previously we're using default `SparkConf` instance to configure it, while instead we should have used the one provided by `SparkEnv`
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
…ache#7821)

This is addressing misconfiguration of the Kryo object used specifically to serialize Spark's internal structures (like `Expression`s): previously we're using default `SparkConf` instance to configure it, while instead we should have used the one provided by `SparkEnv`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants