[BanyanDB] Add "sharding_key" to improve TopNAggregation performance #12526
Labels
database
BanyanDB - SkyWalking native database
enhancement
Enhancement on performance or codes
feature
New feature
Milestone
Search before asking
Description
The current data distribution based on the combination of 'name' and 'entity' can lead to performance issues when calculating the 'TopNAggregation'. This is because each shard only has a subset of the top-n list, and the query process has to be responsible for aggregating those lists to obtain the final result. This introduces overhead in terms of querying performance and disk usage.
To address this issue, we propose adding a new optional field called
sharding_key
to bothStream
andMeasure
. This field will be used to determine the data distribution, and it will default toentity
if not specified.For example, if we set the
sharding_key
asservice_id
, then the new route table should look like this:This means that instances from the same service will be placed into the same shard, which should improve the performance of the 'TopNAggregation' query.
Task List
sharding_key
field to theStream
andMeasure
models.sharding_key
field, withentity
as the default ifsharding_key
is not specified.sharding_key
field and its usage.Use case
No response
Related issues
No response
Are you willing to submit a pull request to implement this on your own?
Code of Conduct
The text was updated successfully, but these errors were encountered: