Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Update Indexing page with all index types and file layout page #9346

Merged
merged 1 commit into from
Aug 11, 2023

Conversation

bhasudha
Copy link
Contributor

@bhasudha bhasudha commented Aug 2, 2023

Change Logs

Update indexing page and file layout page

Impact

Docs changes

Risk level (write none, low medium or high below)

Low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@bhasudha
Copy link
Contributor Author

bhasudha commented Aug 2, 2023

Screenshot 2023-08-02 at 11 37 31 AM
Tested the page locally
Screenshot 2023-08-02 at 11 37 57 AM
Screenshot 2023-08-02 at 11 37 45 AM

@bhasudha bhasudha changed the title [DOCS] Update Indexing page with all index types [DOCS] Update Indexing page with all index types and file layout page Aug 2, 2023
website/docs/indexing.md Outdated Show resolved Hide resolved
@bhasudha bhasudha force-pushed the asf-site-docs-1 branch 2 times, most recently from 89e23ec to 3bba237 Compare August 3, 2023 20:49
| hoodie.simple.index.update.partition.path | true (Optional) | Similar to Key: 'hoodie.bloom.index.update.partition.path' , Only applies if index type is GLOBAL_SIMPLE. When set to true, an update including the partition path of a record that already exists will result in inserting the incoming record into the new partition and deleting the original record in the old partition. When set to false, the original record will only be updated in the old partition <br /><br />`Config Param: SIMPLE_INDEX_UPDATE_PARTITION_PATH_ENABLE` |
| hoodie.hbase.index.update.partition.path | false (Optional) | Only applies if index type is HBASE. When an already existing record is upserted to a new partition compared to whats in storage, this config when set, will delete old record in old partition and will insert it as new record in new partition.<br /><br />`Config Param: UPDATE_PARTITION_PATH_ENABLE` |

#### Flink based configs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 can you help reviewing this part?

@bhasudha
Copy link
Contributor Author

bhasudha commented Aug 8, 2023

After splitting the configs into spark based and flink based ones he page looks locally like this:
Screenshot 2023-08-08 at 2 48 23 PM

website/docs/indexing.md Outdated Show resolved Hide resolved
website/docs/indexing.md Outdated Show resolved Hide resolved
website/docs/indexing.md Outdated Show resolved Hide resolved
website/docs/indexing.md Outdated Show resolved Hide resolved
website/docs/indexing.md Show resolved Hide resolved
website/docs/indexing.md Outdated Show resolved Hide resolved
website/docs/indexing.md Outdated Show resolved Hide resolved
This is based on our experience and you should diligently decide if the same strategies are best for your workloads.

## Indexing Strategies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want to expand the scope here. Will let you take the call. But we should add streaming writes workload type and call out that RLI will be the best.
for eg, if table size is 1TB, but incremental ingestion brings in 1% of data (1Gb or less), RLI will give the best performance out of any other global index options.
Also, for update heavy workloads in case of global index, RLI will out perform other indexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsivabalan Agree. We should consider a second doc PR change for this part. Since this involves some more substantiation. I can collab with you next week to add this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Summary
- Update page to reflect all index types
- Updated page to add configs and links
@bhasudha bhasudha merged commit b8a0e75 into apache:asf-site Aug 11, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants