[RFC] [Remote Store] /_remotestore/stats
API and _nodes/stats
API enhancements for observability on Remote Translog Store upload operations
#8311
Labels
enhancement
Enhancement or improvement to existing feature or request
Storage:Durability
Issues and PRs related to the durability framework
Storage
Issues and PRs relating to data and metadata storage
Table of Contents
/_remotestore/stats
API contractContext
Is your feature request related to a problem? Please describe.
Aligning with #6789, we should be able to query statistics for Remote Translog Store (RTS)-related upload operations.
Describe the solution you'd like
This RFC proposes the addition of new statistics for observability on the upload flow of RTS operations. To support this, changes in the existing
/_remote_store/stats
API contract are also proposed.Changes in the existing
/_remotestore/stats
API contractsegments
andtranslog
.upload
anddownload
will be introduced under thesegments
and thetranslog
keys. These will track the stats related to the upload and download flows respectively. Flow-agnosting stats, if any, pertaining to RSS and RTS would be introduced directly under thesegments
andtranslog
keys respectively.segments.upload
level. New stats for the RSS download flow would be introduced under thesegments.download
level.translog.upload
level. New stats for the RTS download flow would be introduced under thetranslog.download
level.a. RSS upload flow
b. All RTS stats
a. RSS download flow
translog
object will not be returned. Only thesegments
object and the relevant metadata (i.e. theshard_id
) will be returned.Statistics to be introduced for RTS uploads
Visibility on local vs. RTS diff
lag
Represents the number of translog operations not persisted to RTS. This would be relevant for async translog durability.
last_upload_timestamp
Represents the last successful RTS upload epoch timestamp. This wouldn’t change to the timestamp of the last RTS upload operation if the respective upload fails.
Totals
total_uploads
Represents the total number of RTS uploads. Eligible sub-fields (based on operation status):
started
,succeeded
,failed
.total_uploads_in_bytes
Represents the total number of bytes uploaded to the RTS. Eligible sub-fields (based on operation status):
started
,succeeded
,failed
.total_upload_time_in_millis
Represents the total time spent on RTS uploads.
Performance
upload_size_in_bytes
Represents the size of data to be uploaded to RTS. Eligible sub-fields:
moving_avg
.upload_speed_in_bytes_per_sec
Represents the speed of RTS uploads in bytes per second. Eligible sub-fields:
moving_avg
.upload_latency_in_millis
Represents the time taken by RTS upload. Eligible sub-fields:
moving_avg
.API design
Base Path
Supported path parameters
Supported query parameters
local
- Retrieves stats only for the shards on the coordinating node.Shard-level stats for RTS-enabled index
Path:
Response:
Index-level stats for RTS-enabled index
Path:
Response:
Shard-level stats for RTS-disabled but RSS-enabled index
Path:
Response:
Index-level stats for RTS-disabled but RSS-enabled index
Path:
Response:
Related information
The text was updated successfully, but these errors were encountered: