Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setting to ignore throttling nodes for allocation of unassigned … #14991

Merged
merged 2 commits into from
Jul 31, 2024

Conversation

gbbafna
Copy link
Collaborator

@gbbafna gbbafna commented Jul 29, 2024

…primaries in restore in order to speed up remote restore

Description

For remote store backed domains, if a node in the cluster is replaced by a new node and we trigger restore for all the red indices, all the recoveries are happening only on the new node .

This allocation to only one node is due to balancer weights as well multiple Allocation Constraints, which provides weight to each node. We will assign shards only to the node with minimum weight . The new node will always have minimum weight for all the unassigned shards. The new node will keep throttling the recoveries, until the existing recoveries are done. This results in slowing down the overall recovery time.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@gbbafna gbbafna changed the title Add setting to ignore throttling nodes for allocation of unassgined … Add setting to ignore throttling nodes for allocation of unassigned … Jul 29, 2024
@gbbafna gbbafna marked this pull request as ready for review July 29, 2024 05:06
Copy link
Contributor

❌ Gradle check result for 07c57be: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

…rimaries in restore in order to speed up remote restore

Signed-off-by: Gaurav Bafna <[email protected]>
@gbbafna gbbafna added the backport 2.x Backport to 2.x branch label Jul 31, 2024
Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

❌ Gradle check result for 029c2ae: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for c27772f: SUCCESS

Copy link

codecov bot commented Jul 31, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 71.79%. Comparing base (eb306d2) to head (c27772f).

Files Patch % Lines
.../allocation/allocator/BalancedShardsAllocator.java 83.33% 1 Missing ⚠️
...ting/allocation/allocator/LocalShardsBalancer.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #14991      +/-   ##
============================================
+ Coverage     71.77%   71.79%   +0.01%     
- Complexity    62689    62799     +110     
============================================
  Files          5163     5163              
  Lines        294412   294422      +10     
  Branches      42582    42586       +4     
============================================
+ Hits         211325   211389      +64     
+ Misses        65689    65648      -41     
+ Partials      17398    17385      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gbbafna gbbafna merged commit 5c19809 into opensearch-project:main Jul 31, 2024
34 of 36 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-14991-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5c19809ec05d0a2cf03a5105c5333303bc21cb0d
# Push it to GitHub
git push --set-upstream origin backport/backport-14991-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-14991-to-2.x.

gbbafna added a commit to gbbafna/OpenSearch that referenced this pull request Jul 31, 2024
gbbafna added a commit that referenced this pull request Jul 31, 2024
harshavamsi pushed a commit to harshavamsi/OpenSearch that referenced this pull request Aug 20, 2024
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants