Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication + Remote Store] GA performance test #8874

Closed
tlfeng opened this issue Jul 25, 2023 · 2 comments
Closed

[Segment Replication + Remote Store] GA performance test #8874

tlfeng opened this issue Jul 25, 2023 · 2 comments
Assignees
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep Storage Issues and PRs relating to data and metadata storage v2.10.0

Comments

@tlfeng
Copy link
Collaborator

tlfeng commented Jul 25, 2023

The issue is used to track the performance testing that defined in issue #8109, which needs to be completed before the general availability for using remote storage with segment replication.

Metrics that should be captured in addition what OSB reports:

  • Replication lag - Add to OSB.
  • Network Throughput
  • IOPS
  • Thread pool stats - I think this may already be an optional return from OSB.
  • CPU Utilization

All clusters should have 3 dedicated cluster manager nodes.
Small cluster = ~3 nodes
Large cluster = ~10 nodes
Use m5.xlarge node type for consistency.

Test Scenario:
Based on the comment #8109 (comment)

Scenario Case # Replication Mode Workload Cluster Size (num nodes) Total primary shard count Num Primary/node Num Replicas Size (GB) / Shard Total shards (pri + replica) disk size (GB) / node Load generator disk size(GB) Total data ingestion (GB)
10 nodes, 10 shards, 9 replicas, 5 GB Shard size 1 Seg Rep http_logs 10 10 1 9 5 100 50 50 500
2 Doc Rep http_logs 10 10 1 9 5 100 50 50 500
10 nodes, 10 shards, 1 Replica, 5 GB Shard size 1 Seg Rep http_logs 10 10 1 1 5 20 10 50 100
2 Doc Rep http_logs 10 10 1 1 5 20 10 50 100
10 nodes, 40 shards, 1 replica, 2 GB shard size 1 Seg Rep http_logs 10 40 4 1 2 80 16 80 160
2 Doc Rep http_logs 10 40 4 1 2 80 16 80 160
10 nodes, 40 shards 2 replica, 2 GB shard size 1 Seg Rep http_logs 10 40 4 2 2 120 24 80 240
2 Doc Rep http_logs 10 40 4 2 2 120 24 80 240
3 nodes, 1 replica, 3 shards, 10gb shard size 1 Seg Rep http_logs 3 3 1 1 10 6 20 30 60
2 Doc Rep http_logs 3 3 1 1 10 6 20 30 60
3 nodes, 1 replica, 3 shards, 50gb shard size 1 Seg Rep http_logs 3 3 1 1 50 6 100 150 300
2 Doc Rep http_logs 3 3 1 1 50 6 100 150 300

The below are commands used to run the benchmark test, take the first test scenario as an example:
The command to deploy CDK application of OpenSearch cluster:
cdk deploy "*" --require-approval never \ -c securityDisabled=true -c minDistribution=true -c region=us-west-2 \ -c distributionUrl='https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.10.0/latest/linux/x64/tar/builds/opensearch/dist/opensearch-min-2.10.0-linux-x64.tar.gz' \ -c cpuArch='x64' -c singleNodeCluster=false \ -c dataNodeCount=10 -c dataNodeStorage=$((100+$(shuf -i 1-20 -n 1))) \ -c distVersion=2.10.0 -c serverAccessType=prefixList -c restrictServerAccessTo=pl-f8a64391 \ -c additionalConfig='{ "opensearch.experimental.feature.segment_replication_experimental.enabled": true, "cluster.indices.replication.strategy": "SEGMENT", "opensearch.experimental.feature.remote_store.enabled": true, "s3.client.default.endpoint": "s3.us-west-2.amazonaws.com" }' \ -c vpcId='vpc-0648c0d077c3ea997' -c securityGroupId='sg-0d1ace406e4977a79' -c suffix='10nodes-500gb-1016' \ -c use50PercentHeap=true -c enableRemoteStore=true -c dataInstanceType=m5.xlarge \ -c storageVolumeType=gp3

The command to generate workload data:
expand-data-corpus.py --corpus-size 100 --output-file-suffix 100gb

The command to run benchmark for the test scenario:
opensearch-benchmark execute-test --workload=http_logs \ --pipeline=benchmark-only --target-hosts=opens-clust-FB9PYML03F8V-7a9f77cc6decb7eb.elb.us-west-2.amazonaws.com:80 \ --workload-params='{"index_settings":{"number_of_shards": 10, "number_of_replicas": 9 }, "generated_corpus": "t"}' \ --include-tasks=delete-index,create-index,check-cluster-health,index-append \ --telemetry=node-stats,segment-replication-stats \ --user-tag="replication_type:segment,remote_store:enabled,node_count:10,shard_count:10,replica_count:9,shard_size_in_gb:5"

@tlfeng tlfeng added enhancement Enhancement or improvement to existing feature or request distributed framework v2.10.0 labels Jul 25, 2023
@tlfeng tlfeng self-assigned this Jul 25, 2023
@Bukhtawar Bukhtawar added Indexing:Replication Issues and PRs related to core replication framework eg segrep Storage Issues and PRs relating to data and metadata storage labels Jul 27, 2023
@tlfeng
Copy link
Collaborator Author

tlfeng commented Aug 7, 2023

The below benchmark result collected using OpenSearch version 2.10.0. Built from 2.x branch on 09/06/2023.

Test scenario: 10 nodes, 10 shards, 9 Replicas, 5 GB Shard

Document Replication Segment Replication Diff %
Test duration (ms) 11,230,989 12,282,943 9.37%
Index Throughput (req/s)
p0 65,777.68 54,126.74 -17.71%
p50 68,558.11 60,376.13 -11.93%
p100 87,100.12 60,502.11 -30.54%
average 70,778.02 60,106.00 -15.08%
Index Latency (ms)
p50 480.64 634.44 32.00%
p90 720.88 738.94 2.51%
p99 5655.47 983.81 -82.60%
p100 12368.92 10,402.32 -15.90%
average 592.29 643.87 8.71%
CPU (%)
p50 38.00 7.00 -81.58%
p90 94.00 30.00 -68.09%
p99 100.00 60.00 -40.00%
p100 100.00 100.00 0.00%
average 50.12 15.12 -69.83%
Memory (%)
p50 39.00 31.00 -20.51%
p90 62.77 55.00 -12.37%
p99 70.938 61.00 -14.01%
p100 75.00 63.00 -16.00%
average 39.65 31.15 -21.44%
Error rate 0.00 0.00
Test start time Sep 6, 2023 @ 23:00:01.000 Sep 6, 2023 @ 23:24:28.000

10 nodes, 10 shards, 1 Replica, 5 GB Shard size

Document Replication Segment Replication Diff %
Test duration (ms) 3,043,307 12,318,028 304.76%
Index Throughput (docs/s)
p0 242,679.52 45,522.21 -81.24%
p50 256202.73 59,549.96 -77.01%
p100 277,587.16 60,037.73 -78.37%
average 258,979.06 58,561.05 -77.39%
Index Latency (ms)
p50 97.42 628.84 545.47%
p90 144.96 733.06 405.70%
p99 347.06 981.58 182.83%
p100 17184.57 17,625.31 2.56%
average 135.23 639.55 372.93%
CPU (%)
p50 38.00 7.00 -81.58%
p90 72.11 21.00 -70.88%
p99 99.00 48.00 -51.52%
p100 100.00 100.00 0.00%
average 36.62 9.36 -74.43%
Memory (%)
p50 33.00 30.04 -8.97%
p90 61.00 55.00 -9.84%
p99 70.53 60.00 -14.93%
p100 76.00 61.00 -19.74%
average 34.88 30.56 -12.39%
Error rate 0.00 0.00
Test start time Sep 6, 2023 @ 23:00:02.000 Sep 6, 2023 @ 23:22:22.000

10 nodes, 40 shards, 1 replica, 2 GB shard

Document Replication Segment Replication Diff %
Test duration (ms) 5,037,483 20,822,042 313.34%
Index Throughput (docs/s)
p0 234,632.06 47,826.29 -79.62%
p50 249637.05 57028.65708 -77.77%
p100 276,946.25 58,587.24 -78.85%
average 256,639.49 57,057.34 -77.77%
Index Latency (ms)
p50 108.66 645.26 493.81%
p90 174.49 867.71 397.28%
p99 508.97 1,206.42 137.03%
p100 12443.91 8,969.50 -27.92%
average 148.79 679.46 356.65%
CPU (%)
p50 38.00 7.00 -81.58%
p90 74.00 22.00 -70.27%
p99 95.00 45.07 -52.56%
p100 100.00 100.00 0.00%
average 39.55 10.23 -74.13%
Memory (%)
p50 39 31.06 -20.62%
p90 63.80 55.00 -13.79%
p99 71.156 61.00 -14.27%
p100 75.00 63.00 -16.00%
average 39.13 31.41 -19.73%
Error rate 0.00 0.00
Test start time Sep 6, 2023 @ 23:00:01.000 Sep 7, 2023 @ 14:29:55.000

10 nodes, 40 shards 2 replica, 2 GB shard

Document Replication Segment Replication Diff %
Test duration (ms) 7,176,880 20,823,739 190.15%
Index Throughput (docs/s)
p0 164,705.19 49031.13 -70.23%
p50 177700.63 56,930.22 -67.96%
p100 199,468.36 57,319.02 -71.26%
average 179831.77 56,406.69 -68.63%
Index Latency (ms)
p50 164.50 645.77 292.57%
p90 264.88 852.47 221.83%
p99 1014.23 1,144.86 12.88%
p100 6568.50 17,844.30 171.66%
average 219.53 679.14 209.37%
CPU (%)
p50 38.00 7.00 -81.58%
p90 76.00 24.65 -67.57%
p99 96.00 49.26 -48.69%
p100 100.00 100.00 0.00%
average 40.17 11.51 -71.35%
Memory (%)
p50 39 31.76 -18.86%
p90 63.00 55.34 -12.16%
p99 70 61.00 -12.86%
p100 74.00 65.00 -12.16%
average 39.15 31.57 -19.35%
Error rate 0.00 0.00
Test start time Sep 6, 2023 @ 23:00:02.000 Sep 6, 2023 @ 23:00:01.000

3 nodes, 1 replica, 3 shards, 10gb shard size

Document Replication Segment Replication Diff %
Test duration (ms) 4,224,159 7,344,664 73.87%
Index Throughput (docs/s)
p0 104,873.45 57466.02 -45.20%
p50 111992.48 60,380.32 -46.09%
p100 121,818.61 60,606.93 -50.25%
average 111432.88 60,302.82 -45.88%
Index Latency (ms)
p50 309.83 630.57 103.52%
p90 430.85 730.29 69.50%
p99 773.76 987.14 27.58%
p100 18657.10 3,152.84 -83.10%
average 362.27 638.12 76.14%
CPU (%)
p50 4.85 1.23 -74.74%
p90 98.00 44.00 -55.10%
p99 100.00 84.41 -15.59%
p100 100.00 100.00 0.00%
average 37.97 15.65 -58.79%
Memory (%)
p50 31.00 30.00 -6.79%
p90 57.00 55.00 -3.51%
p99 69 60.00 -13.04%
p100 73.00 62.00 -15.07%
average 32.19 30.31 -5.81%
Error rate 0.00 0.00
Test start time Sep 6, 2023 @ 23:00:01.000 Sep 6, 2023 @ 18:48:12.000

Test scenario: 3 nodes, 1 replica, 3 shards, 50gb shard size

Document Replication Segment Replication Diff %
Test duration (ms) 21,462,822 36,624,857 70.64%
Index Throughput (docs/s)
p0 103,149.20 54008.05 -47.64%
p50 104760.59 60,641.77 -42.11%
p100 124,010.62 60,733.37 -51.03%
average 106878.99 60,543.96 -43.35%
Index Latency (ms)
p50 314.00 628.52 100.16%
p90 435.68 725.92 66.62%
p99 16373.18 966.61 -94.10%
p100 21719.86 5,641.41 -74.03%
average 366.07 635.46 73.59%
CPU (%)
p50 7.10 2.09 -70.58%
p90 98.00 46.00 -53.06%
p99 100.00 84.00 -16.00%
p100 100.00 100.00 0.00%
average 38.45 16.46 -57.19%
Memory (%)
p50 35.00 32.46 -7.31%
p90 59.00 56.00 -5.08%
p99 70.00 64.00 -8.57%
p100 74.00 69.00 -6.76%
average 35.02 32.54 -7.07%
Error rate 0.00 0.00
Test start time Sep 6, 2023 @ 23:00:02.000 Sep 6, 2023 @ 23:14:26.000

In this graph, the 3 numbers in the legend stands for numbers of "nodes", "replica", and "shards". It shows that replication lag gets lower when shard count is increased.
image

@tlfeng
Copy link
Collaborator Author

tlfeng commented Sep 1, 2023

Found 3 kinds of S3 throttling symptom, when running tests with 10x larger shard size.
1

2023-08-22T22:58:44.022-07:00
[2023-08-23T05:58:43,836][ERROR][o.o.i.r.SegmentReplicationTargetService] [ip-10-0-5-4.us-west-2.compute.internal] Exception replicating Id:[2221614] Checkpoint [ReplicationCheckpoint{shardId=[logs-5000gb][1], primaryTerm=11, segmentsGen=29, version=298034, size=30981289070, codec=Lucene95}] Shard:[[logs-5000gb][1]] Source:[RemoteStoreReplicationSource] marking as failed.
	[2023-08-23T05:58:43,836][ERROR][o.o.i.r.SegmentReplicationTargetService] [ip-10-0-5-4.us-west-2.compute.internal] Exception replicating Id:[2221614] Checkpoint [ReplicationCheckpoint{shardId=[logs-5000gb][1], primaryTerm=11, segmentsGen=29, version=298034, size=30981289070, codec=Lucene95}] Shard:[[logs-5000gb][1]] Source:[RemoteStoreReplicationSource] marking as failed.
	2023-08-22T22:58:44.022-07:00
software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: The target server failed to respond
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111) ~[?:?]
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:83) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[?:?]
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[?:?]
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$0(BaseSyncClientHandler.java:68) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:62) ~[?:?]
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:52) ~[?:?]
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:63) ~[?:?]
	at software.amazon.awssdk.services.s3.DefaultS3Client.getObject(DefaultS3Client.java:4481) ~[?:?]
	at software.amazon.awssdk.services.s3.S3Client.getObject(S3Client.java:7873) ~[?:?]
	at org.opensearch.repositories.s3.S3RetryingInputStream.lambda$openStream$1(S3RetryingInputStream.java:120) ~[?:?]
	at java.security.AccessController.doPrivileged(AccessController.java:318) ~[?:?]
	at org.opensearch.repositories.s3.SocketAccess.doPrivileged(SocketAccess.java:55) ~[?:?]
	at org.opensearch.repositories.s3.S3RetryingInputStream.openStream(S3RetryingInputStream.java:119) ~[?:?]
	at org.opensearch.repositories.s3.S3RetryingInputStream.<init>(S3RetryingInputStream.java:101) ~[?:?]
	at org.opensearch.repositories.s3.S3RetryingInputStream.<init>(S3RetryingInputStream.java:84) ~[?:?]
	at org.opensearch.repositories.s3.S3BlobContainer.readBlob(S3BlobContainer.java:138) ~[?:?]
	at org.opensearch.index.store.RemoteDirectory.openInput(RemoteDirectory.java:151) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.RemoteSegmentStoreDirectory.readMetadataFile(RemoteSegmentStoreDirectory.java:211) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.RemoteSegmentStoreDirectory.readLatestMetadataFile(RemoteSegmentStoreDirectory.java:202) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.store.RemoteSegmentStoreDirectory.init(RemoteSegmentStoreDirectory.java:150) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.replication.RemoteStoreReplicationSource.getCheckpointMetadata(RemoteStoreReplicationSource.java:64) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.replication.SegmentReplicationTarget.startReplication(SegmentReplicationTarget.java:162) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.replication.SegmentReplicationTargetService.start(SegmentReplicationTargetService.java:494) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.indices.replication.SegmentReplicationTargetService$ReplicationRunner.doRun(SegmentReplicationTargetService.java:480) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Unable to execute HTTP request: Timeout waiting for connection from pool
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Unable to execute HTTP request: Timeout waiting for connection from pool
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Unable to execute HTTP request: Timeout waiting for connection from pool

2

[2023-08-08T23:01:20,089][ERROR][o.o.i.t.t.BlobStoreTransferService] [ip-10-0-4-131.us-west-2.compute.internal] Failed to upload blob translog-9.tlog

java.util.concurrent.CompletionException: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: EG3GD838K05FGF67, Extended Request ID: zfiTacM9cSPHHCZXc62+/Likv/TVUOsiWAwBoSPq2zsYotMZuw1Q2XTOuu+RaXHJp0EeJsV0z60=)
	at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:65) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51) ~[?:?]
	at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) [?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) [?:?]
	at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79) [utils-2.20.55.jar:?]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) [?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) [?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) [?:?]
	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) [?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeAttemptExecute(AsyncRetryableStage.java:103) [sdk-core-2.20.55.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:184) [sdk-core-2.20.55.jar:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:170) [sdk-core-2.20.55.jar:?]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) [?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) [?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) [?:?]
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) [?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:105) [sdk-core-2.20.55.jar:?]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) [?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) [?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) [?:?]
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) [?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:163) [sdk-core-2.20.55.jar:?]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) [?:?]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) [?:?]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) [?:?]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: EG3GD838K05FGF67, Extended Request ID: zfiTacM9cSPHHCZXc62+/Likv/TVUOsiWAwBoSPq2zsYotMZuw1Q2XTOuu+RaXHJp0EeJsV0z60=)
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156) ~[?:?]
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108) ~[?:?]
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85) ~[?:?]
	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:270) ~[?:?]
	at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.lambda$prepare$0(AsyncResponseHandler.java:89) ~[?:?]
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150) ~[?:?]
	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
	at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onComplete(AsyncResponseHandler.java:132) ~[?:?]
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$DataCountingPublisher$1.onComplete(ResponseHandler.java:515) ~[?:?]
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:250) ~[?:?]
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.access$600(ResponseHandler.java:75) ~[?:?]
	at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onComplete(ResponseHandler.java:371) ~[?:?]
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.publishMessage(HandlerPublisher.java:402) ~[?:?]
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.flushBuffer(HandlerPublisher.java:338) ~[?:?]
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.receivedDemand(HandlerPublisher.java:291) ~[?:?]
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.access$200(HandlerPublisher.java:61) ~[?:?]
	at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher$ChannelSubscription$1.run(HandlerPublisher.java:495) ~[?:?]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[?:?]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	... 1 more
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: RZ4RNTHM0QSTAWWQ, Extended Request ID: 8h9rsTc5q+4KUSnoVn+cBT6pAvCk0+HMr9vBsp7cVQYyilYnxZqp6Tv/p//HqdRK3Ngl1+kielg=)
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: JT5XYKAP108PMEXD, Extended Request ID: AW5qbFw+26k9BUxgGjEmJOCmes7a8qqQJgztEs2TqanMPpcG9HYC6/F+Yx3QTn1QuzUi6P3OTRY=)
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: 1CW33VES7K0PZET7, Extended Request ID: G/KTF5OFJXees8xgU1zssF8zpGakxApZPBAEeooG+/Lfp1fEEfzun7UGzOPMQB5nuNAUkpQM2l0=)


[2023-08-08T23:01:20,095][ERROR][o.o.i.t.t.TranslogTransferManager] [ip-10-0-4-131.us-west-2.compute.internal] [logs-191998][4] Transfer failed for snapshot TranslogTransferSnapshot [ primary term = 1, generation = 9 ]

java.io.IOException: Failed to upload 1 files during transfer
	at org.opensearch.index.translog.transfer.TranslogTransferManager.transferSnapshot(TranslogTransferManager.java:149) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.upload(RemoteFsTranslog.java:268) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.prepareAndUpload(RemoteFsTranslog.java:241) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.ensureSynced(RemoteFsTranslog.java:191) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.Translog.ensureSynced(Translog.java:835) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.InternalTranslogManager.ensureTranslogSynced(InternalTranslogManager.java:178) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.engine.InternalEngine.ensureTranslogSynced(InternalEngine.java:605) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard.lambda$createTranslogSyncProcessor$43(IndexShard.java:4113) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard$6.write(IndexShard.java:4127) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:129) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:117) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.BufferedAsyncIOProcessor.process(BufferedAsyncIOProcessor.java:80) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]
	Suppressed: org.opensearch.index.translog.transfer.FileTransferException: java.util.concurrent.CompletionException: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: EG3GD838K05FGF67, Extended Request ID: zfiTacM9cSPHHCZXc62+/Likv/TVUOsiWAwBoSPq2zsYotMZuw1Q2XTOuu+RaXHJp0EeJsV0z60=)
		at org.opensearch.index.translog.transfer.BlobStoreTransferService.lambda$uploadBlob$6(BlobStoreTransferService.java:129) ~[opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.action.ActionListener$1.onFailure(ActionListener.java:88) ~[opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.action.ActionListener$6.onFailure(ActionListener.java:309) ~[opensearch-2.10.0.jar:2.10.0]
		at org.opensearch.repositories.s3.S3BlobContainer.lambda$asyncBlobUpload$3(S3BlobContainer.java:204) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
		at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
		at org.opensearch.repositories.s3.async.AsyncTransferManager.lambda$uploadInOneChunk$17(AsyncTransferManager.java:318) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?]
		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
		at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
		at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
		at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
		at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallMetricCollectionStage.lambda$execute$0(AsyncApiCallMetricCollectionStage.java:54) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
		at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
		at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncApiCallTimeoutTrackingStage.lambda$execute$2(AsyncApiCallTimeoutTrackingStage.java:67) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
		at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
		at software.amazon.awssdk.utils.CompletableFutureUtils.lambda$forwardExceptionTo$0(CompletableFutureUtils.java:79) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
		at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) ~[?:?]
		at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeAttemptExecute(AsyncRetryableStage.java:103) ~[?:?]
		at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.maybeRetryExecute(AsyncRetryableStage.java:184) ~[?:?]
		at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncRetryableStage$RetryingExecutor.lambda$attemptExecute$1(AsyncRetryableStage.java:170) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
		at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
		at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$null$0(MakeAsyncHttpRequestStage.java:105) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
		at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
		at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeAsyncHttpRequestStage.lambda$executeHttpRequest$3(MakeAsyncHttpRequestStage.java:163) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
		at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[?:?]
		at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.10.0.jar:2.10.0]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
		at java.lang.Thread.run(Thread.java:833) [?:?]
	Caused by: java.util.concurrent.CompletionException: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: EG3GD838K05FGF67, Extended Request ID: zfiTacM9cSPHHCZXc62+/Likv/TVUOsiWAwBoSPq2zsYotMZuw1Q2XTOuu+RaXHJp0EeJsV0z60=)
		at software.amazon.awssdk.utils.CompletableFutureUtils.errorAsCompletionException(CompletableFutureUtils.java:65) ~[?:?]
		at software.amazon.awssdk.core.internal.http.pipeline.stages.AsyncExecutionFailureExceptionReportingStage.lambda$execute$0(AsyncExecutionFailureExceptionReportingStage.java:51) ~[?:?]
		at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:934) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:911) ~[?:?]
		... 27 more
	Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: EG3GD838K05FGF67, Extended Request ID: zfiTacM9cSPHHCZXc62+/Likv/TVUOsiWAwBoSPq2zsYotMZuw1Q2XTOuu+RaXHJp0EeJsV0z60=)
		at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156) ~[?:?]
		at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108) ~[?:?]
		at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85) ~[?:?]
		at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43) ~[?:?]
		at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:270) ~[?:?]
		at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler.lambda$prepare$0(AsyncResponseHandler.java:89) ~[?:?]
		at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1150) ~[?:?]
		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
		at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2147) ~[?:?]
		at software.amazon.awssdk.core.internal.http.async.AsyncResponseHandler$BaosSubscriber.onComplete(AsyncResponseHandler.java:132) ~[?:?]
		at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$DataCountingPublisher$1.onComplete(ResponseHandler.java:515) ~[?:?]
		at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.runAndLogError(ResponseHandler.java:250) ~[?:?]
		at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler.access$600(ResponseHandler.java:75) ~[?:?]
		at software.amazon.awssdk.http.nio.netty.internal.ResponseHandler$PublisherAdapter$1.onComplete(ResponseHandler.java:371) ~[?:?]
		at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.publishMessage(HandlerPublisher.java:402) ~[?:?]
		at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.flushBuffer(HandlerPublisher.java:338) ~[?:?]
		at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.receivedDemand(HandlerPublisher.java:291) ~[?:?]
		at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher.access$200(HandlerPublisher.java:61) ~[?:?]
		at software.amazon.awssdk.http.nio.netty.internal.nrs.HandlerPublisher$ChannelSubscription$1.run(HandlerPublisher.java:495) ~[?:?]
		at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) ~[?:?]
		at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) ~[?:?]
		at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[?:?]
		at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566) ~[?:?]
		at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
		at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
		... 1 more
		Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: RZ4RNTHM0QSTAWWQ, Extended Request ID: 8h9rsTc5q+4KUSnoVn+cBT6pAvCk0+HMr9vBsp7cVQYyilYnxZqp6Tv/p//HqdRK3Ngl1+kielg=)
		Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: JT5XYKAP108PMEXD, Extended Request ID: AW5qbFw+26k9BUxgGjEmJOCmes7a8qqQJgztEs2TqanMPpcG9HYC6/F+Yx3QTn1QuzUi6P3OTRY=)
		Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Please reduce your request rate. (Service: S3, Status Code: 503, Request ID: 1CW33VES7K0PZET7, Extended Request ID: G/KTF5OFJXees8xgU1zssF8zpGakxApZPBAEeooG+/Lfp1fEEfzun7UGzOPMQB5nuNAUkpQM2l0=)

3

[2023-08-25T07:07:47,929][ERROR][o.o.i.t.t.TranslogTransferManager] [ip-10-0-3-162.us-west-2.compute.internal] [logs-1600gb][16] Transfer failed for snapshot TranslogTransferSnapshot [ primary term = 1, generation = 16893 ]

java.io.IOException: Unable to upload object [remote-store/qSnoT1VTSiuipllqbCJfTA/16/translog/metadata/metadata__9223372036854775806__9223372036854758914__9223370343907547759__1] using a single upload
	at org.opensearch.repositories.s3.S3BlobContainer.executeSingleUpload(S3BlobContainer.java:488) ~[?:?]
	at org.opensearch.repositories.s3.S3BlobContainer.lambda$writeBlob$1(S3BlobContainer.java:170) ~[?:?]
	at java.security.AccessController.doPrivileged(AccessController.java:569) ~[?:?]
	at org.opensearch.repositories.s3.SocketAccess.doPrivilegedIOException(SocketAccess.java:61) ~[?:?]
	at org.opensearch.repositories.s3.S3BlobContainer.writeBlob(S3BlobContainer.java:168) ~[?:?]
	at org.opensearch.repositories.s3.S3BlobContainer.writeBlobAtomic(S3BlobContainer.java:221) ~[?:?]
	at org.opensearch.index.translog.transfer.BlobStoreTransferService.uploadBlob(BlobStoreTransferService.java:82) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.transfer.TranslogTransferManager.transferSnapshot(TranslogTransferManager.java:145) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.upload(RemoteFsTranslog.java:268) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.prepareAndUpload(RemoteFsTranslog.java:241) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.RemoteFsTranslog.ensureSynced(RemoteFsTranslog.java:191) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.Translog.ensureSynced(Translog.java:835) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.translog.InternalTranslogManager.ensureTranslogSynced(InternalTranslogManager.java:178) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.engine.InternalEngine.ensureTranslogSynced(InternalEngine.java:605) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard.lambda$createTranslogSyncProcessor$43(IndexShard.java:4130) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.index.shard.IndexShard$6.write(IndexShard.java:4144) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:129) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:117) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.BufferedAsyncIOProcessor.process(BufferedAsyncIOProcessor.java:80) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849) [opensearch-2.10.0.jar:2.10.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:833) [?:?]

Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connection or outbound has closed
	at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111) ~[?:?]
	at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:47) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:223) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:83) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[?:?]
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[?:?]
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76) ~[?:?]
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[?:?]
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56) ~[?:?]
	at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:9324) ~[?:?]
	at org.opensearch.repositories.s3.S3BlobContainer.lambda$executeSingleUpload$24(S3BlobContainer.java:485) ~[?:?]
	at org.opensearch.repositories.s3.SocketAccess.lambda$doPrivilegedVoid$0(SocketAccess.java:70) ~[?:?]
	at java.security.AccessController.doPrivileged(AccessController.java:318) ~[?:?]
	at org.opensearch.repositories.s3.SocketAccess.doPrivilegedVoid(SocketAccess.java:69) ~[?:?]
	at org.opensearch.repositories.s3.S3BlobContainer.executeSingleUpload(S3BlobContainer.java:484) ~[?:?]
	... 22 more
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: Unable to execute HTTP request: Timeout waiting for connection from pool
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: Unable to execute HTTP request: Timeout waiting for connection from pool
	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: Unable to execute HTTP request: Timeout waiting for connection from pool

Caused by: java.net.SocketException: Connection or outbound has closed
	at sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1301) ~[?:?]
	at org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:124) ~[?:?]
	at org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:136) ~[?:?]
	at org.apache.http.impl.io.SessionOutputBufferImpl.flush(SessionOutputBufferImpl.java:144) ~[?:?]
	at org.apache.http.impl.io.ContentLengthOutputStream.close(ContentLengthOutputStream.java:93) ~[?:?]
	at org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:157) ~[?:?]
	at org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:152) ~[?:?]
	at org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238) ~[?:?]
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) ~[?:?]
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) ~[?:?]
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[?:?]
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[?:?]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[?:?]
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[?:?]
	at software.amazon.awssdk.http.apache.internal.impl.ApacheSdkHttpClient.execute(ApacheSdkHttpClient.java:72) ~[?:?]
	at software.amazon.awssdk.http.apache.ApacheHttpClient.execute(ApacheHttpClient.java:254) ~[?:?]
	at software.amazon.awssdk.http.apache.ApacheHttpClient.access$500(ApacheHttpClient.java:104) ~[?:?]
	at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:231) ~[?:?]
	at software.amazon.awssdk.http.apache.ApacheHttpClient$1.call(ApacheHttpClient.java:228) ~[?:?]
	at software.amazon.awssdk.core.internal.util.MetricUtils.measureDurationUnsafe(MetricUtils.java:63) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.executeHttpRequest(MakeHttpRequestStage.java:77) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:56) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.MakeHttpRequestStage.execute(MakeHttpRequestStage.java:39) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[?:?]
	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[?:?]
	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[?:?]
	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179) ~[?:?]
	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76) ~[?:?]
	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[?:?]
	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56) ~[?:?]
	at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:9324) ~[?:?]
	at org.opensearch.repositories.s3.S3BlobContainer.lambda$executeSingleUpload$24(S3BlobContainer.java:485) ~[?:?]
	at org.opensearch.repositories.s3.SocketAccess.lambda$doPrivilegedVoid$0(SocketAccess.java:70) ~[?:?]
	at java.security.AccessController.doPrivileged(AccessController.java:318) ~[?:?]
	at org.opensearch.repositories.s3.SocketAccess.doPrivilegedVoid(SocketAccess.java:69) ~[?:?]
	at org.opensearch.repositories.s3.S3BlobContainer.executeSingleUpload(S3BlobContainer.java:484) ~[?:?]
	... 22 more

S3 throttling during translog upload is tracked in issue #7390, and throttling during segment upload is tracked in issue #7389.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework enhancement Enhancement or improvement to existing feature or request Indexing:Replication Issues and PRs related to core replication framework eg segrep Storage Issues and PRs relating to data and metadata storage v2.10.0
Projects
None yet
Development

No branches or pull requests

3 participants