Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health api memory optimisation #15492

Merged

Conversation

SwethaGuptha
Copy link
Contributor

@SwethaGuptha SwethaGuptha commented Aug 29, 2024

Description

#11684 - on a large cluster the memory allocation of TransportClusterHealthAction is around 33%. Memory/compute requirement of TransportClusterHealthAction is proportional to the number of indices and shards on the cluster. Transport ClusterHealthResponse maintains health objects for individual indices and shards that can increase the memory allocation of this with the growth of indices and shards count. At rest layer, there is a room for optimization for _cluster/health API when request is at cluster/indices level by performing inline aggregations and not creating and maintaing shard health objects.

Changes:

  • Introducing a flag honourClusterHealthLevel in ClusterHealthRequest to identify if the request param level should be honored or not when computing health response in TransportClusterHealthAction - Currently level param is only used at REST layer for generating response from the transport response based on the level requested (usage) and has no meaning at transport layer. This flag will help TransportClusterHealthAction in differentiating if indices/shards level health information is required in the transport response or not. By default this flag is set to false to keep the change backward compatible.
  • Initializing ClusterIndexHealth without creating the ClusterShardHealth objects per shard when requested level for cluster health is not at shards level, health aggregation are made inline.
  • ClusterHealthResponse computes the delayed unassigned shards information by iterating over all shards of the cluster(reference).This is avoided by computing the delayed unassigned shards along with computation of other shard states in ClusterStateHealth.
  • Using index based iteration for iterating over ShardRoutings of a shard. For each loop iteration uses an iterator under the hood and since the ShardRouting list maintained in the IndexShardRoutingTable is un-modifiable, a new iterator object is created on the heap.

Memory allocation 5 min profile of an idle cluster before and after optimisation
Before Optimisation
Base Profile
After Optimisation
Memory Optimisation Profile

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for ec0ffc3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 4ed3b18: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 2, 2024

❌ Gradle check result for ebadf42: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gbbafna
Copy link
Collaborator

gbbafna commented Sep 3, 2024

Introducing a flag honourClusterHealthLevel in ClusterHealthRequest to identify if the request param level should be honored or not when computing health response in TransportClusterHealthAction - Currently level param is only used at REST layer for generating response from the transport response based on the level requested (usage) and has no meaning at transport layer. This flag will help TransportClusterHealthAction in differentiating if indices/shards level health information is required in the transport response or not. By default this flag is set to false to keep the change backward compatible.

How are breaking backward compatibility when the flag is true ? Isn't this construct only used in Transport and not in Rest ?

Copy link
Member

@sumitasr sumitasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests are missing for all the changes in source code. Can we please cover those for files where we have added custom logic to iterate and compute health?

UTs covering corner cases will help i guess if you can check on coverage once.

Copy link
Contributor

github-actions bot commented Sep 3, 2024

❌ Gradle check result for 103f468: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha SwethaGuptha reopened this Sep 3, 2024
Copy link
Contributor

github-actions bot commented Sep 3, 2024

❌ Gradle check result for 19e5625: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 3, 2024

❌ Gradle check result for 19e5625: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 3, 2024

❌ Gradle check result for 3817fcd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha
Copy link
Contributor Author

Unit tests are missing for all the changes in source code. Can we please cover those for files where we have added custom logic to iterate and compute health?

UTs covering corner cases will help i guess if you can check on coverage once.

Added UTs.

Copy link
Contributor

github-actions bot commented Sep 4, 2024

❌ Gradle check result for d5c5869: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 4, 2024

❌ Gradle check result for 77338be: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Sep 6, 2024

❌ Gradle check result for 75caa4a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@SwethaGuptha
Copy link
Contributor Author

❌ Gradle check result for 75caa4a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky test #15795

Copy link
Contributor

github-actions bot commented Sep 6, 2024

✅ Gradle check result for 4c9c753: SUCCESS

@gbbafna gbbafna merged commit 89915b6 into opensearch-project:main Sep 6, 2024
33 of 34 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-15492-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 89915b65dc51126189dd42d74c82d2cbac0cb154
# Push it to GitHub
git push --set-upstream origin backport/backport-15492-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-15492-to-2.x.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.17 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.17 2.17
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.17
# Create a new branch
git switch --create backport/backport-15492-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 89915b65dc51126189dd42d74c82d2cbac0cb154
# Push it to GitHub
git push --set-upstream origin backport/backport-15492-to-2.17
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.17

Then, create a pull request where the base branch is 2.17 and the compare/head branch is backport/backport-15492-to-2.17.

SwethaGuptha added a commit to SwethaGuptha/OpenSearch that referenced this pull request Sep 6, 2024
…egations for levels cluster and indices. (opensearch-project#15492)

Signed-off-by: Swetha Guptha <[email protected]>
SwethaGuptha added a commit to SwethaGuptha/OpenSearch that referenced this pull request Sep 6, 2024
…egations for levels cluster and indices. (opensearch-project#15492)

Signed-off-by: Swetha Guptha <[email protected]>
SwethaGuptha added a commit to SwethaGuptha/OpenSearch that referenced this pull request Sep 6, 2024
…egations for levels cluster and indices. (opensearch-project#15492)

Signed-off-by: Swetha Guptha <[email protected]>
SwethaGuptha added a commit to SwethaGuptha/OpenSearch that referenced this pull request Sep 6, 2024
…egations for levels cluster and indices. (opensearch-project#15492)

Signed-off-by: Swetha Guptha <[email protected]>
SwethaGuptha added a commit to SwethaGuptha/OpenSearch that referenced this pull request Sep 6, 2024
…gations for levels cluster and indices. (opensearch-project#15492)

Signed-off-by: Swetha Guptha <[email protected]>
gbbafna pushed a commit that referenced this pull request Sep 6, 2024
…gations for levels cluster and indices. (#15492) (#15799)

Signed-off-by: Swetha Guptha <[email protected]>
jed326 pushed a commit to SwethaGuptha/OpenSearch that referenced this pull request Sep 6, 2024
…egations for levels cluster and indices. (opensearch-project#15492)

Signed-off-by: Swetha Guptha <[email protected]>
SwethaGuptha added a commit to SwethaGuptha/OpenSearch that referenced this pull request Sep 7, 2024
…egations for levels cluster and indices. (opensearch-project#15492)

Signed-off-by: SwethaGuptha <[email protected]>
gbbafna pushed a commit that referenced this pull request Sep 7, 2024
…egations for levels cluster and indices. (#15492) (#15802)

Signed-off-by: SwethaGuptha <[email protected]>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024
…egations for levels cluster and indices. (opensearch-project#15492)

Signed-off-by: Swetha Guptha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants