-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DOCS] Update Indexing and File Layout pages
Summary - Update page to reflect all index types - Updated page to add configs and links
- Loading branch information
Showing
3 changed files
with
83 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,20 @@ | ||
--- | ||
title: File Layouts | ||
toc: true | ||
toc: false | ||
--- | ||
|
||
The following describes the general file layout structure for Apache Hudi | ||
The following describes the general file layout structure for Apache Hudi. Please refer the ** [tech spec](https://hudi.apache.org/tech-specs#file-layout-hierarchy) ** for a more detailed description of the file layouts. | ||
* Hudi organizes data tables into a directory structure under a base path on a distributed file system | ||
* Tables are broken up into partitions | ||
* Within each partition, files are organized into file groups, uniquely identified by a file ID | ||
* Each file group contains several file slices | ||
* Each slice contains a base file (*.parquet) produced at a certain commit/compaction instant time, along with set of log files (*.log.*) that contain inserts/updates to the base file since the base file was produced. | ||
* Each slice contains a base file (*.parquet/*.orc) (defined by the config - [hoodie.table.base.file.format](https://hudi.apache.org/docs/next/configurations/#hoodietablebasefileformat) ) produced at a certain commit/compaction instant time, along with set of log files (*.log.*) that contain inserts/updates to the base file since the base file was produced. | ||
|
||
Hudi adopts Multiversion Concurrency Control (MVCC), where [compaction](/docs/next/compaction) action merges logs and base files to produce new | ||
file slices and [cleaning](/docs/next/hoodie_cleaner) action gets rid of unused/older file slices to reclaim space on the file system. | ||
|
||
![Partition On HDFS](/assets/images/hudi_partitions_HDFS.png) | ||
![Partition On HDFS](/assets/images/hudi_partitions_HDFS.png) | ||
|
||
### Configs | ||
|
||
Please refer [here](https://hudi.apache.org/docs/next/configurations/#Layout-Configs) for additional configs that control storage layout and data distribution, which defines how the files are organized within a table. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters