You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! We’ve noticed that after creating an empty table in the Glue catalog with PyIceberg, initially there are no snapshots. Then if we run a glue job with a merge statement to upsert some CDC records in this table, the job fails with an error Error Category: UNCLASSIFIED_ERROR; IllegalArgumentException: Cannot parse missing long: current-snapshot-id. A workaround that resolved this was to insert and delete a dummy row after creating the table, but that doesn’t seem right. Is this a Glue / Spark bug or did we miss something?
Sung:
This is actually an issue with some of the older Java applications making the incorrect assumption that the current_snapshot_id is a required field. As @Kevin Liu noted, this is an optional attribute in the Spec.
In PyIceberg, we do have an env variable you can set on your application that will force your application to write metadata files with '-1' as the current_snapshot_id to circumvent this issue. It's a hack, but it works in creating files that are compatible with these older Java applications. Please let me know if that's helpful! #473
Apache Iceberg version
None
Please describe the bug 🐞
From slack,
Hi! We’ve noticed that after creating an empty table in the Glue catalog with PyIceberg, initially there are no snapshots. Then if we run a glue job with a merge statement to upsert some CDC records in this table, the job fails with an error Error Category: UNCLASSIFIED_ERROR; IllegalArgumentException: Cannot parse missing long: current-snapshot-id. A workaround that resolved this was to insert and delete a dummy row after creating the table, but that doesn’t seem right. Is this a Glue / Spark bug or did we miss something?
The text was updated successfully, but these errors were encountered: