-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Delta Lake version:2.4 for deletion vector support #349
Conversation
Support for Deletion Vectors has been added in Delta Lake version 2.4. The upgrade also requires updating the spark runtime version to 3.4+ In addition to chaning the version of the dependencies, this change also incorporates all the backward incompatible changes in the Delta API. 1. getSnapshotAt change: it does not accept a optional timestamp (2nd method argument) anymore. This argument was not provided / used by XTable and can safely be ignored in all invocations. 2. addFile api change: It now requires information about DeleteVector as a parameter. As Deletion vectors writing is not supported in the current version of XTable, a null is provided to the addFile method call. 3. update transaction api change: It now requires an Catalyst Expression object, instead of a generic string object, to be linked to a update operation. This change replaces the string object used by XTable with a Literal-expression. 4. getSnapshot api change: It does not require a timestamp to initialize current snapshot anymore. This change removes the additional argument in the method invocation in XTable. 5. DeltaLog metadata change: The metadata is now available through the DeltaLog's snapshot instance, instead of being made available through the DeltaLog itself like in the older versions. 6. change in the update api: it now requires the user to choose if defaults need to be ignored. It seems that the defaults need to be ingored for operations like copy. By default, the value for ignore-defaults is false for most operations. Hence it is the choosen value in XTable also. 7. Update spark version requires catalog and sql extension configurations in the sessoin definition. This change adds these two configs wherever a spark instance is created for writing Delta Lake commits.
@@ -77,8 +77,6 @@ public static void initSpark() { | |||
spark = SparkSession.builder().config(sparkConf).getOrCreate(); | |||
} | |||
|
|||
@ParameterizedTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Need help in fixing this test. This test is currently failing with an error related to serialization.
Caused by: java.lang.AbstractMethodError: Receiver class org.apache.spark.sql.adapter.Spark3_4Adapter does not define or inherit an implementation of the resolved method 'abstract org.apache.spark.sql.avro.HoodieAvroSerializer createAvroSerializer(org.apache.spark.sql.types.DataType, org.apache.avro.Schema, boolean)' of interface org.apache.spark.sql.hudi.SparkAdapter. at org.apache.hudi.AvroConversionUtils$.createInternalRowToAvroConverter(AvroConversionUtils.scala:59)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at this one but there are more tests failing.
f11984a
to
dd72522
Compare
Fixes #340
Support for Deletion Vectors has been added in Delta Lake version 2.4. The upgrade also requires updating the spark runtime version to 3.4+
In addition to chaning the version of the dependencies, this change also incorporates all the backward incompatible changes in the Delta API.