forked from apache/hudi
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable bootstrap precombine #1
Open
a49a
wants to merge
2,284
commits into
DTStack:master
Choose a base branch
from
a49a:disable-bootstrap-precombine
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…x type (apache#6406) Co-authored-by: xiaoxingstack <[email protected]>
…instant (apache#6574) * Keep a clustering running at the same time * Simplify filtering logic Co-authored-by: dongsj <[email protected]>
…e#6550) As part of adding support for Spark 3.3 in Hudi 0.12, a lot of the logic from Spark 3.2 module has been simply copied over. This PR is rectifying that by: 1. Creating new module "hudi-spark3.2plus-common" (that is shared across Spark 3.2 and Spark 3.3) 2. Moving shared components under "hudi-spark3.2plus-common"
…apache#6270) Co-authored-by: Volodymyr Burenin <[email protected]> Co-authored-by: Y Ethan Guo <[email protected]>
…the log file to be too large (apache#6602) * hoodie.logfile.max.size It does not take effect, causing the log file to be too large Co-authored-by: [email protected] <loukey_7821>
…ists for MergeOnReadInputFormat#getReader (apache#6678)
Co-authored-by: Y Ethan Guo <[email protected]>
…dd nest type (apache#6486) InternalSchemaChangeApplier#applyAddChange forget to remove parent name when calling ColumnAddChange#addColumns
…ayload to avoid schema mismatch (apache#6689)
…6634) * [HUDI-4813] Fix infer keygen not work in sparksql side issue Co-authored-by: xiaoxingstack <[email protected]>
… MOR snapshot query after delete operations with test (apache#6688) Co-authored-by: Rahil Chertara <[email protected]>
…tinue when multiple cleans are not allowed (apache#6536)
… HoodieLogFileReader (apache#6031) Co-authored-by: Y Ethan Guo <[email protected]>
Co-authored-by: Y Ethan Guo <[email protected]>
…pache#6650) Co-authored-by: yangshuo3 <[email protected]> Co-authored-by: Y Ethan Guo <[email protected]>
…ache#6271) Co-authored-by: Volodymyr Burenin <[email protected]> Co-authored-by: Y Ethan Guo <[email protected]>
…nsert (apache#7396) This PR adjusts NONE sort mode for bulk insert so that, by default, coalesce is not applied, matching the default parquet write behavior. The NONE sort mode still applies coalesce for clustering as the clustering operation relies on the bulk insert and the specified number of output Spark partitions to write a specific number of files.
If someone has enabled schema on read by mistake and never really renamed or dropped a column. it should be feasible to disable schema on read. This patch fixes that. essentially both on read and write path, if "hoodie.schema.on.read.enable" config is not set, it will fallback to regular code path. It might fail or users might miss data if any they have performed any irrevocable changes like renames. But for rest, this should work.
Before the patch, when there are partial failover within the write tasks, the write task current instant was initialized as the latest inflight instant, the write task then waits for a new instant to write with so hangs and failover continuously. For a task recovered from failover (with attempt number greater than 0), the latest inflight instant can actually be reused, the intermediate data files can be cleaned with MARGER files post commit.
Make spark3.3 profile to upgrade from Spark 3.3.0 to 3.3.1 (HUDI-4871) Make spark3.2 profile to upgrade from Spark 3.2.1 to 3.2.3 (HUDI-4411)
…ig (apache#7069) Revert to FSUtils.getAllPartitionPaths to load partitions properly. Details in apache#6016 (comment) Only for 0.12.2 to keep behavior consistent over patch releases
* [HUDI-5007] Prevent Hudi from reading the entire timeline's when performing a LATEST streaming read (apache#6920) (cherry picked from commit 6baf733) * [HUDI-5228] Flink table service job fs view conf overwrites the one of writing job (apache#7214) (cherry picked from commit dc5cc08) Co-authored-by: voonhous <[email protected]>
…ata (apache#7320) (apache#7462) Co-authored-by: just-JL <[email protected]>
…pache#7464) * [HUDI-5366] Closing metadata writer from within writeClient (apache#7437) * Closing metadata writer from within writeClient * Close metadata writer in flink client Co-authored-by: Sagar Sumit <[email protected]> * Fixing build failure * Fixing flink metadata writer usages Co-authored-by: Sagar Sumit <[email protected]>
…ark (apache#7399) (apache#7465) (cherry picked from commit 86d1e39)
… to HoodieROTablePathFilter (apache#7088) * Add the feature flag back to disable HoodieFileIndex and fall back to HoodieROTablePathFilter * Turn off hoodie.file.index.enable by default to test CI * Add tests for Spark datasource with the fallback to HoodieROTablePathFilter
…ableFileIndex (apache#7488) Currently, on the reader or query engine side, the direct file listing on the file system is used by default, as indicated by HoodieMetadataConfig.DEFAULT_METADATA_ENABLE_FOR_READERS (=false). Without providing explicit config of hoodie.metadata.enable, the metadata-table-based file listing is disabled. However, the BaseHoodieTableFileIndex, the common File Index implementation, used by Trino Hive connector, does not respect this default behavior. This leads to performance regression of query latency in Trino Hive connector, due to way of how the connector is integrated with the Input Format and the File Index with metadata table enabled. This PR fixes the BaseHoodieTableFileIndex to respect the expected behavior defined by HoodieMetadataConfig.DEFAULT_METADATA_ENABLE_FOR_READERS, i.e., metadata-table-based file listing is disabled by default. The metadata-table-based file listing is only enabled when hoodie.metadata.enable is set to true and the files partition of the metadata table is ready for read based on the Hudi table config. Impact This mitigates the performance regression of query latency in Trino Hive connector and fixes the read-side behavior of the file listing. Tested the PR that by default, the HoodieParquetInputFormat does not read metadata table for file listing anymore. Co-authored-by: Sagar Sumit <[email protected]> Co-authored-by: Alexey Kudinkin <[email protected]>
apache#7493) - This PR falls back to the original code path using fs view cache as in 0.10.1 or earlier, instead of creating file index. - Query engines using initial InputFormat based integration will not be using file index. Instead directly fetch file status from fs view cache.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.