Replies: 4 comments
-
We are aiming for designing and implementing a benchmark to evaluate the performance of analysis of time series databases. Here is my understanding for designing for analytical workloadDatasetIn TSBS we have Metrics data from DevOps or IoT devices(e.g. CPU/memory utilization), there are still other kinds of time series data we can take into consideration:
How to generate dataset is one of the main concern.
We should have a option to control
QueryI collect some scenarios which involve the analysis performance in time series database
Pseudo code for some queries:
SELECT time, id
FROM t
WHERE time > ts_start
AND time < ts_stop
AND a > value 2.Aggregation and Join SELECT time, id, AVG(a), SUM(b)
FROM t
WHERE time > ts_start
AND time < ts_stop SELECT time, id, AVG(t1.a), SUM(t2.b)
FROM t1 JOIN t2 ON t1.a = t2.b
WHERE time > ts_start
AND time < ts_stop
SELECT time, id, AVG(a), SUM(b)
FROM t
WHERE time > ts_start
AND time < ts_stop
GROUP BY id, time
SAMPLE BY 1H
SELECT time, id
FROM t
WHERE time > ts_start
AND time < ts_stop
SAMPLE BY 10s
FILL(LINER) Test suiteLike TPCDS we can have the following tests:
OutputsThis is not a point that can be designed at the very beginning, in a nutshell benchmark is a tool to test the performance, so the most important thing is the time it takes to execute the query, when we have the import, single threaded, and multithreaded execution time we can figure out the metrics like THROUGHPUT and PRICE OVER PERFORMANCE. Reference:
|
Beta Was this translation helpful? Give feedback.
-
Statistical method is easier to implement. We could also pre-generate some data and scale them to the factor we want.
For logs, the Web Server Access Log dataset is a well-known dataset. This is what simple-logging-benchmark uses. |
Beta Was this translation helpful? Give feedback.
-
https://github.com/xo/usql is a brilliant project that provides many DB protocols. We can use it to run SQL. And benefits from Go's feature, its binary can be used in a wide environment which makes it also a good choice to be a binary dep. |
Beta Was this translation helpful? Give feedback.
-
BTW, the paper and corresponding benchmark tool may also be useful |
Beta Was this translation helpful? Give feedback.
-
We use TSBS and prometheus-benchmark to benchmark our database.
Both benchmarks are mainly designed for metrics workload and lack the ability to benchmark analytical workload. We might implement a new benchmark for GreptimeDB and other time series databases. The benchmark should cover more analytical workload for time series data.
It should be easy to set up. It'd be helpful if it could simulate workloads of TSBS and prometheus-benchmark.
I created a repo https://github.com/GreptimeTeam/greptime-bench for this. We could do further discussion in this thread or the repo's issue.
Beta Was this translation helpful? Give feedback.
All reactions