Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Store] Add total upload and download time from remote store to nodes stats #9454

Merged
merged 12 commits into from
Aug 25, 2023

Conversation

shourya035
Copy link
Member

@shourya035 shourya035 commented Aug 21, 2023

Description

Adding cumulative total_time_spent field in both downloading and uploading segments to remote store in NodesStats API output.

  • At nodes stats level, this field is an addition of time spent in downloading/uploading segments from the remote store across all the shards residing in that node
  • At cluster stats level, this field denotes the entire time taken across all shards in the cluster in downloading/uploading segments from the remote store

This follows a similar trend as that of merges.total_time_in_millis available as of today: https://opensearch.org/docs/latest/api-reference/nodes-apis/nodes-stats/#indices

{
    "_nodes": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "cluster_name": "opensearch-cluster",
    "nodes": {
        "yKBw3ByYSd6KZnx_qOb9NA": {
            "timestamp": 1692710635712,
            "name": "opensearch-node2",
            ...
            "indices": {
                "segments": {
                    "count": 12,
                    "memory": "0b",
                    ...
                    "fixed_bit_set_memory_in_bytes": 0,
                    "max_unsafe_auto_id_timestamp": -1,
                    "remote_store": {
                        "upload": {
                            "total_uploads": {
                                "started": "0b",
                                "started_bytes": 0,
                                "succeeded": "0b",
                                "succeeded_bytes": 0,
                                "failed": "0b",
                                "failed_bytes": 0
                            },
                            "refresh_size_lag": {
                                "total": "5.8mb",
                                "total_bytes": 6146125,
                                "max": "5.8mb",
                                "max_bytes": 6146125
                            },
                            "max_refresh_time_lag": "24.7s",
                            "max_refresh_time_lag_in_millis": 24739,
                            "total_time_spent": "0s",
                            "total_time_spent_in_millis": 0
                        },
                        "download": {
                            "total_downloads": {
                                "started": "7.4mb",
                                "started_bytes": 7767778,
                                "succeeded": "7.4mb",
                                "succeeded_bytes": 7767778,
                                "failed": "0b",
                                "failed_bytes": 0
                            },
                            "total_time_spent": "12.3s",
                            "total_time_spent_in_millis": 12307
                        }
                    },
                    "file_sizes": {}
                }
            }
        },
        "EDUazBdFTYK9-Wij2HtTiQ": {
            "timestamp": 1692710635711,
            "name": "opensearch-node1",
            ...
            "indices": {
                "segments": {
                    "count": 12,
                    ...
                    "index_writer_memory": "200.5kb",
                    "index_writer_memory_in_bytes": 205340,
                    "version_map_memory": "142b",
                    "version_map_memory_in_bytes": 142,
                    "fixed_bit_set": "0b",
                    "fixed_bit_set_memory_in_bytes": 0,
                    "max_unsafe_auto_id_timestamp": -1,
                    "remote_store": {
                        "upload": {
                            "total_uploads": {
                                "started": "7.4mb",
                                "started_bytes": 7786972,
                                "succeeded": "7.4mb",
                                "succeeded_bytes": 7786972,
                                "failed": "0b",
                                "failed_bytes": 0
                            },
                            "refresh_size_lag": {
                                "total": "0b",
                                "total_bytes": 0,
                                "max": "0b",
                                "max_bytes": 0
                            },
                            "max_refresh_time_lag": "0s",
                            "max_refresh_time_lag_in_millis": 0,
                            "total_time_spent": "10.8s",
                            "total_time_spent_in_millis": 10803
                        },
                        "download": {
                            "total_downloads": {
                                "started": "623.2kb",
                                "started_bytes": 638243,
                                "succeeded": "623.2kb",
                                "succeeded_bytes": 638243,
                                "failed": "0b",
                                "failed_bytes": 0
                            },
                            "total_time_spent": "475ms",
                            "total_time_spent_in_millis": 475
                        }
                    },
                    "file_sizes": {}
                }
            }
        }
    }
}

Also, changing field types to Atomic for those which are being tracked on file level because of the upcoming parallel download and the already existing parallel upload logic. This ensures that the stats recording logic is thread-safe

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 784a473

Incompatible components

Incompatible components: [https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/asynchronous-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/reporting.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Aug 21, 2023

Codecov Report

Merging #9454 (717ad11) into main (4e68808) will decrease coverage by 0.14%.
Report is 1 commits behind head on main.
The diff coverage is 91.00%.

@@             Coverage Diff              @@
##               main    #9454      +/-   ##
============================================
- Coverage     71.21%   71.07%   -0.14%     
+ Complexity    57495    57445      -50     
============================================
  Files          4778     4778              
  Lines        270912   270971      +59     
  Branches      39585    39585              
============================================
- Hits         192924   192600     -324     
- Misses        61813    62169     +356     
- Partials      16175    16202      +27     
Files Changed Coverage Δ
...rc/main/java/org/opensearch/index/store/Store.java 81.80% <33.33%> (+0.71%) ⬆️
...search/index/shard/RemoteStoreRefreshListener.java 85.47% <78.57%> (-0.16%) ⬇️
...rg/opensearch/index/remote/RemoteSegmentStats.java 96.55% <88.88%> (+0.55%) ⬆️
...rch/index/remote/RemoteSegmentTransferTracker.java 81.25% <95.45%> (+0.70%) ⬆️
...arch/index/store/DirectoryFileTransferTracker.java 85.56% <97.67%> (+3.13%) ⬆️

... and 470 files with indirect coverage changes

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls make the necessary changes.

Signed-off-by: Shourya Dutta Biswas <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadBlobWithRetries

Copy link
Member

@ashking94 ashking94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sachinpkale sachinpkale merged commit c90b6ea into opensearch-project:main Aug 25, 2023
11 of 12 checks passed
@shourya035 shourya035 deleted the stats-total-time branch August 25, 2023 06:54
shourya035 added a commit to shourya035/OpenSearch that referenced this pull request Aug 25, 2023
…o nodes stats (opensearch-project#9454)

---------

Signed-off-by: Shourya Dutta Biswas <[email protected]>
(cherry picked from commit c90b6ea)
Gaganjuneja pushed a commit to Gaganjuneja/OpenSearch that referenced this pull request Aug 28, 2023
Gaganjuneja pushed a commit to Gaganjuneja/OpenSearch that referenced this pull request Aug 28, 2023
…o nodes stats (opensearch-project#9454)

---------

Signed-off-by: Shourya Dutta Biswas <[email protected]>
Signed-off-by: Gagan Juneja <[email protected]>
kkmr pushed a commit to kkmr/OpenSearch that referenced this pull request Aug 28, 2023
…o nodes stats (opensearch-project#9454)

---------

Signed-off-by: Shourya Dutta Biswas <[email protected]>
Signed-off-by: Kiran Reddy <[email protected]>
kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…o nodes stats (opensearch-project#9454)

---------

Signed-off-by: Shourya Dutta Biswas <[email protected]>
Signed-off-by: Ivan Brusic <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…o nodes stats (opensearch-project#9454)

---------

Signed-off-by: Shourya Dutta Biswas <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants