-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] restart flink job got InvalidAvroMagicException: Not an Avro data file #10285
Comments
The clean function is always there but it will stop cleaning if you set up You can check the hoodie timeline for existing inflight/requested compaction metdata file, which situate at folder: |
It only triggers once in the |
@danny0405 I see. So what is the purpose hudi sink trigger this clean action when the job start? Now I found even I run a compaction service outside the sink job with clean trigger strategy, this clean action in #open method will clean old files because I don't set the same trigger strategy. This impact is not as expected. |
It is designed for batch execution job, maybe we should add back the decision with |
thx @danny0405 |
Describe the problem you faced
hudi sink job cannot restart normally from checkpoint because of InvalidAvroMagicException (CleanFunction)
This sink job does data writing and generates compaction plan. I also set HoodieCleanConfig.AUTO_CLEAN.key() -> "false" and deploy table compaction server alone to execute compaction plan and do some cleaning.
I also wonder to know why clean commits operation still exists when I close AUTO_CLEAN:
Here is my hudi sink job config:
And this is my table compaction server config:
Environment Description
Hudi version : 0.14.0
Hadoop version : 3.2.1
Storage (HDFS/S3/GCS..) : hdfs
Running on Docker? (yes/no) : no
Stacktrace
The text was updated successfully, but these errors were encountered: