Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds max_shard_size parameter to shrink API #2519

Merged
merged 8 commits into from
Feb 2, 2023
Merged

Conversation

kolchfa-aws
Copy link
Collaborator

Adds max_shard_size parameter to shrink API

Expands #2352
Fixes #2044

Checklist

  • [ x] By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@kolchfa-aws kolchfa-aws self-assigned this Jan 30, 2023
@kolchfa-aws kolchfa-aws requested a review from a team as a code owner January 30, 2023 15:24
@kolchfa-aws
Copy link
Collaborator Author

@gaobinlong: I've created a section for the max_shard_size parameter. Could you review it for technical accuracy? Thanks!

@gaobinlong
Copy link
Contributor

@kolchfa-aws here is my comment: a total of 400 GB of memory maybe not accurate, shard size means the storage size i.e. the disk usage, so can you change it to a total of 400 GB of storage ?

Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws kolchfa-aws added backport 2.0 PR: Backport label for v2.0.x backport 2.1 PR: Backport label for 2.1 backport 2.2 PR: Backport label for 2.2 backport 2.3 PR: Backport label for 2.3 backport 2.4 PR: Backport label for 2.4 backport 2.5 PR: Backport label for 2.5 labels Feb 2, 2023
@kolchfa-aws
Copy link
Collaborator Author

@gaobinlong Done, thank you for the review!


The `max_shard_size` parameter specifies the maximum size of a primary shard in the target index. OpenSearch uses `max_shard_size` and the total storage for all primary shards in the source index to calculate the number of primary shards and their size for the target index.

The primary shard count of the target index is the lowest factor of the source index's primary shard count, for which the shard size does not exceed `max_shard_size`. Consider the following example. Let's say the source index has 8 primary shards and they occupy a total of 400 GB of storage. If `max_shard_size` is equal to 150 GB, OpenSearch calculates the number of primary shards in the target index using the following algorithm:
Copy link
Contributor

@carolxob carolxob Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The primary shard count of the target index is the lowest factor of the source index's primary shard count, for which the shard size does not exceed `max_shard_size`. Consider the following example. Let's say the source index has 8 primary shards and they occupy a total of 400 GB of storage. If `max_shard_size` is equal to 150 GB, OpenSearch calculates the number of primary shards in the target index using the following algorithm:
The primary shard count of the target index is the lowest factor of the source index's primary shard count, for which the shard size does not exceed `max_shard_size`. As an example, the source index has eight primary shards and they occupy a total of 400 GB of storage. If `max_shard_size` is equal to 150 GB, OpenSearch calculates the number of primary shards in the target index using the following algorithm:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@natebower Could you provide guidance on numerals vs spelled out numbers? I thought that in technical texts we use numerals, but please let me know if it's not true.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spell out cardinal numbers from 1 to 9. For example, one NAT instance. Use numerals for cardinal numbers 10 and higher. Spell out ordinal numbers: first, second, and so on. In a series that includes numbers 10 or higher, use numerals for all. In this case, we should use 8 because 400 also appears in the sentence.

Signed-off-by: Fanit Kolchina <[email protected]>
1. Calculate the minimum number of primary shards as 400/150, rounded to the nearest whole integer. The minimum number of primary shards is 3.
1. Calculate the number of primary shards as the lowest factor of 8 that is greater than 3. The number of primary shards is 4.

The maximum number of primary shards for the target index is equal to the number of primary shards in the source index because the shrink operation is used to reduce the primary shard count. As an example, consider the source index with 5 primary shards that occupy a total of 600 GB of memory. If `max_shard_size` is 100 GB, the minimum number of primary shards is 600/100, which is 6. However, because the number of primary shards in the source index is lower than 6, the number of primary shards in the target index is set to 5.
Copy link
Contributor

@carolxob carolxob Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The maximum number of primary shards for the target index is equal to the number of primary shards in the source index because the shrink operation is used to reduce the primary shard count. As an example, consider the source index with 5 primary shards that occupy a total of 600 GB of memory. If `max_shard_size` is 100 GB, the minimum number of primary shards is 600/100, which is 6. However, because the number of primary shards in the source index is lower than 6, the number of primary shards in the target index is set to 5.
The maximum number of primary shards for the target index is equal to the number of primary shards in the source index because the shrink operation is used to reduce the primary shard count. As an example, consider the source index with five primary shards that occupy a total of 600 GB of memory. If `max_shard_size` is 100 GB, the minimum number of primary shards is 600/100, which is six. However, because the number of primary shards in the source index is lower than six, the number of primary shards in the target index is set to five.

Copy link
Contributor

@carolxob carolxob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with very minor suggestions.

Copy link
Contributor

@ariamarble ariamarble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@cwillum cwillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with a few questions/comments.


The `max_shard_size` parameter specifies the maximum size of a primary shard in the target index. OpenSearch uses `max_shard_size` and the total storage for all primary shards in the source index to calculate the number of primary shards and their size for the target index.

The primary shard count of the target index is the lowest factor of the source index's primary shard count, for which the shard size does not exceed `max_shard_size`. Consider the following example. Let's say the source index has 8 primary shards and they occupy a total of 400 GB of storage. If `max_shard_size` is equal to 150 GB, OpenSearch calculates the number of primary shards in the target index using the following algorithm:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Are these paragraphs under the heading indented? Extra space here?
  2. "Smallest" factor sounds more natural to me than "lowest" factor. But this might be a convention in mathematics and widely accepted. Just asking.
  3. "... of the source index's primary shard count, whose shard size should not [will not?] exceed max_shard_size." Does that mess up the meaning?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Removed extra space, thank you. Does not affect the rendering, but still a good call.
  2. Agreed. Smallest is better.
  3. I feel like "whose" makes it less clear because it's not clear what it refers to.

The primary shard count of the target index is the lowest factor of the source index's primary shard count, for which the shard size does not exceed `max_shard_size`. Consider the following example. Let's say the source index has 8 primary shards and they occupy a total of 400 GB of storage. If `max_shard_size` is equal to 150 GB, OpenSearch calculates the number of primary shards in the target index using the following algorithm:

1. Calculate the minimum number of primary shards as 400/150, rounded to the nearest whole integer. The minimum number of primary shards is 3.
1. Calculate the number of primary shards as the lowest factor of 8 that is greater than 3. The number of primary shards is 4.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smallest versus lowest for factor. (again, I admit I may be out of the know on this)
I like having this example here.

1. Calculate the minimum number of primary shards as 400/150, rounded to the nearest whole integer. The minimum number of primary shards is 3.
1. Calculate the number of primary shards as the lowest factor of 8 that is greater than 3. The number of primary shards is 4.

The maximum number of primary shards for the target index is equal to the number of primary shards in the source index because the shrink operation is used to reduce the primary shard count. As an example, consider the source index with 5 primary shards that occupy a total of 600 GB of storage. If `max_shard_size` is 100 GB, the minimum number of primary shards is 600/100, which is 6. However, because the number of primary shards in the source index is lower than 6, the number of primary shards in the target index is set to 5.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... because the number of primary shards in the source index is smaller than 6, ..."
Number being smaller.

Signed-off-by: Fanit Kolchina <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Just a few small changes and one comment. Thanks!

_api-reference/index-apis/shrink-index.md Outdated Show resolved Hide resolved
_api-reference/index-apis/shrink-index.md Outdated Show resolved Hide resolved
_api-reference/index-apis/shrink-index.md Outdated Show resolved Hide resolved
@kolchfa-aws kolchfa-aws merged commit 6e120ec into main Feb 2, 2023
@opensearch-trigger-bot
Copy link

The backport to 2.0 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.0 2.0
# Navigate to the new working tree
cd .worktrees/backport-2.0
# Create a new branch
git switch --create backport/backport-2519-to-2.0
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6e120ec4b8d6ff03b0706f56b0106a16f1ef9b42
# Push it to GitHub
git push --set-upstream origin backport/backport-2519-to-2.0
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.0

Then, create a pull request where the base branch is 2.0 and the compare/head branch is backport/backport-2519-to-2.0.

@opensearch-trigger-bot
Copy link

The backport to 2.1 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.1 2.1
# Navigate to the new working tree
cd .worktrees/backport-2.1
# Create a new branch
git switch --create backport/backport-2519-to-2.1
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6e120ec4b8d6ff03b0706f56b0106a16f1ef9b42
# Push it to GitHub
git push --set-upstream origin backport/backport-2519-to-2.1
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.1

Then, create a pull request where the base branch is 2.1 and the compare/head branch is backport/backport-2519-to-2.1.

@opensearch-trigger-bot
Copy link

The backport to 2.2 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.2 2.2
# Navigate to the new working tree
cd .worktrees/backport-2.2
# Create a new branch
git switch --create backport/backport-2519-to-2.2
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 6e120ec4b8d6ff03b0706f56b0106a16f1ef9b42
# Push it to GitHub
git push --set-upstream origin backport/backport-2519-to-2.2
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.2

Then, create a pull request where the base branch is 2.2 and the compare/head branch is backport/backport-2519-to-2.2.

opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 2, 2023
* Adds max_shard_size parameter to shrink API

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review comment

Signed-off-by: Fanit Kolchina <[email protected]>

* One more rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented doc review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit 6e120ec)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 2, 2023
* Adds max_shard_size parameter to shrink API

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review comment

Signed-off-by: Fanit Kolchina <[email protected]>

* One more rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented doc review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit 6e120ec)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 2, 2023
* Adds max_shard_size parameter to shrink API

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review comment

Signed-off-by: Fanit Kolchina <[email protected]>

* One more rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented doc review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit 6e120ec)
kolchfa-aws added a commit that referenced this pull request Feb 2, 2023
* Adds max_shard_size parameter to shrink API

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review comment

Signed-off-by: Fanit Kolchina <[email protected]>

* One more rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented doc review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit 6e120ec)

Co-authored-by: kolchfa-aws <[email protected]>
kolchfa-aws added a commit that referenced this pull request Feb 2, 2023
* Adds max_shard_size parameter to shrink API

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review comment

Signed-off-by: Fanit Kolchina <[email protected]>

* One more rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented doc review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit 6e120ec)

Co-authored-by: kolchfa-aws <[email protected]>
kolchfa-aws added a commit that referenced this pull request Feb 2, 2023
* Adds max_shard_size parameter to shrink API

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review comment

Signed-off-by: Fanit Kolchina <[email protected]>

* One more rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented doc review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
(cherry picked from commit 6e120ec)

Co-authored-by: kolchfa-aws <[email protected]>
Naarcha-AWS pushed a commit that referenced this pull request Feb 2, 2023
* Adds max_shard_size parameter to shrink API

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented tech review comment

Signed-off-by: Fanit Kolchina <[email protected]>

* One more rewording

Signed-off-by: Fanit Kolchina <[email protected]>

* Implemented doc review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Update _api-reference/index-apis/shrink-index.md

Co-authored-by: Nathan Bower <[email protected]>

* Implemented editorial comments

Signed-off-by: Fanit Kolchina <[email protected]>

---------

Signed-off-by: Fanit Kolchina <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
@kolchfa-aws kolchfa-aws deleted the max-shard-size branch March 28, 2024 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.0 PR: Backport label for v2.0.x backport 2.1 PR: Backport label for 2.1 backport 2.2 PR: Backport label for 2.2 backport 2.3 PR: Backport label for 2.3 backport 2.4 PR: Backport label for 2.4 backport 2.5 PR: Backport label for 2.5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Add max_shard_size parameter for Shrink API
6 participants