Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): store compaction group id in per table info #17125

Merged
merged 59 commits into from
Jun 13, 2024

Conversation

wenym1
Copy link
Contributor

@wenym1 wenym1 commented Jun 5, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

After we introduce per state table info in hummock version, we can further store the corresponding compaction group id of a table inside the per table info. Member table ids in each Levels compaction group and related fields in version delta will be deprecated, so that we don't have to maintain the consistency between table info map and compaction group member table ids.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Base automatically changed from yiming/hummock-snapshot-group to main June 5, 2024 11:50
@wenym1 wenym1 marked this pull request as ready for review June 5, 2024 15:34
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +86 to +88
option deprecated = true;
repeated uint32 table_ids_add = 1 [deprecated = true];
repeated uint32 table_ids_remove = 2 [deprecated = true];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that we can completely deprecate GroupMetaChange in GroupDelta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because we won't update the member_table_ids anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add deprecated in L107 then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added deprecated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we miss [deprecated = true] in L107 for GroupMetaChange.

);
} else if let Some(group_change) = &summary.group_table_change {
// TODO: may deprecate this branch? This enum variant is not created anywhere
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed. @Li0k@Little-Wallace What do you think? DO we still need to keep this branch for future usage?

@@ -894,6 +927,7 @@ pub fn build_initial_compaction_group_levels(
vnode_partition_count: 0,
});
}
#[expect(deprecated)] // for backward-compatibility of previous hummock version delta
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to deprecate the whole Levels struct? I thought only member_table_ids will be deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, only the field is deprecated. The annotation is applied over the whole struct because we can't apply on a single fie.d

Comment on lines +86 to +88
option deprecated = true;
repeated uint32 table_ids_add = 1 [deprecated = true];
repeated uint32 table_ids_remove = 2 [deprecated = true];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add deprecated in L107 then?

@@ -395,7 +378,12 @@ impl HummockManager {
let group = CompactionGroupInfo {
id: levels.group_id,
parent_id: levels.parent_group_id,
member_table_ids: levels.member_table_ids.clone(),
member_table_ids: current_version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: how about sorting it before returning to keep the output consistent with what we did prior to this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we may somehow depend on that member_table_ids is sorted, I have switched to use BtreeSet instead of HashSet so that we don't need to explicitly sort the collected vector.

@@ -337,7 +337,12 @@ pub(super) fn calc_new_write_limits(
new_write_limits.insert(
*id,
WriteLimit {
table_ids: levels.member_table_ids.clone(),
table_ids: version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: how about sorting the table_ids here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to btreeset already.

Comment on lines 41 to 42
compaction_group_member_tables: Arc<HashMap<CompactionGroupId, HashSet<TableId>>>,
table_compaction_group_id: Arc<HashMap<TableId, CompactionGroupId>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use Arc here? I don't see any caller of fn compaction_group_member_tables or table_compaction_group_id holding the Arc. Using Arc means we need to do a full in memory index rebuild instead of maintain the index incrementally on apply_delta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I have remove maintaining table_compaction_group_id and change to rebuild it when used, which is the same as before this PR.

For compaction_group_member_tables , I have removed the Arc around it and changed to incrementally maintain it. A debug_assert_eq is applied to compare with the fully rebuilt version to ensure correctness.

Copy link
Contributor Author

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed the comment. @hzxa21 PTAL

Comment on lines +86 to +88
option deprecated = true;
repeated uint32 table_ids_add = 1 [deprecated = true];
repeated uint32 table_ids_remove = 2 [deprecated = true];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added deprecated.

@@ -395,7 +378,12 @@ impl HummockManager {
let group = CompactionGroupInfo {
id: levels.group_id,
parent_id: levels.parent_group_id,
member_table_ids: levels.member_table_ids.clone(),
member_table_ids: current_version
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we may somehow depend on that member_table_ids is sorted, I have switched to use BtreeSet instead of HashSet so that we don't need to explicitly sort the collected vector.

@@ -337,7 +337,12 @@ pub(super) fn calc_new_write_limits(
new_write_limits.insert(
*id,
WriteLimit {
table_ids: levels.member_table_ids.clone(),
table_ids: version
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to btreeset already.

Comment on lines 41 to 42
compaction_group_member_tables: Arc<HashMap<CompactionGroupId, HashSet<TableId>>>,
table_compaction_group_id: Arc<HashMap<TableId, CompactionGroupId>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I have remove maintaining table_compaction_group_id and change to rebuild it when used, which is the same as before this PR.

For compaction_group_member_tables , I have removed the Arc around it and changed to incrementally maintain it. A debug_assert_eq is applied to compare with the fully rebuilt version to ensure correctness.

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the PR

@wenym1 wenym1 added this pull request to the merge queue Jun 13, 2024
Merged via the queue into main with commit cccf946 Jun 13, 2024
32 of 33 checks passed
@wenym1 wenym1 deleted the yiming/compaction-group-in-table-info branch June 13, 2024 10:14
@zwang28 zwang28 mentioned this pull request Jun 14, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants