-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-2757] Implement Hudi AWS Glue sync #5076
Conversation
7c71f7c
to
587f337
Compare
587f337
to
7bb94c0
Compare
b7d2409
to
7bb94c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few minor comments. looks good mostly.
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/AbstractHiveSyncHoodieClient.java
Show resolved
Hide resolved
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/AbstractHiveSyncHoodieClient.java
Show resolved
Hide resolved
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogClient.java
Outdated
Show resolved
Hide resolved
5b94cea
to
a115917
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we can't write any tests for AWSGlueCatalogSync is it. if we mock, we have to mock pretty much everything and there is no point in testing only. is my understanding right.
mostly minor comments. once addressed, we can land.
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AwsGlueCatalogSyncTool.java
Show resolved
Hide resolved
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/AbstractHiveSyncHoodieClient.java
Show resolved
Hide resolved
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/AbstractHiveSyncHoodieClient.java
Show resolved
Hide resolved
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogClient.java
Outdated
Show resolved
Hide resolved
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogClient.java
Outdated
Show resolved
Hide resolved
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogClient.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
@Override | ||
public void updateLastCommitTimeSynced(String tableName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general question about updating HOODIE_LAST_COMMIT_TIME_SYNC. I see in HiveSyncTool, we call hoodieHiveClient.updateLastCommitTimeSynced(tableName)
in the end.
But updating partitions and updating the last commit time synced is not atomic. and within updateLastCommitTimeSynced, we get the latest commit time from activetimeline and update it as the lastCommitTimeSycned.
So, incase of multi-writers, there are chances that timeline has moved between updating partitions and calling updateLastCommitTimeSynced.
may be we should file a jira and do a follow up on the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, should be a follow up item
a115917
to
51879ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lgtm. Approved. |
Add AWS Glue sync implementation to allow sync to AWS Glue catalog directly via AWS SDK APIs.