-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos load test / benchmark #346
Comments
Preferable test case description format: Test nameSetup Operations What to measure, what is the goal? What components/features it tests Notes |
Test name Setup Launch 2 different Prometheis connected via Thanos sidecar and Thanos query Load a bunch of data Operations Do a bunch of queries What to measure, what is the goal? Compare performance (Is/How much Prometheus is faster than Thanos?) What components/features it tests Notes I have a couple of people worrying about Thanos performance against Federated Prometheus |
Historical data test with many time series over many windowsSetupGenerate artificially old data, up to a year should be fine. Create many time series with the same metric name (up to 100k) and many label permutations. OperationsQuery the data via query nodes. With various amounts of timeseries touched. Start with 1, work up through 100, 500, 1000, 10000, 100000 timeseries over periods of instant, 1h, 6h, 12h, 1d, 3d, 1w, 2w, 4w, 8w, 12w, 24w, 36w, 52w. What to measure, what is the goal?Measure query times for various combinations and see where things become problematic. Perhaps the original small query over a long time takes longer loading a bunch of data. There should be plenty of insights to be gained. It might be a good way to find bottlenecks in the query nodes, or see what the effects of scaling them might be. You could also go against just the store gRPC API to see the difference in query times. What components/features it testsHistorical data fetching, querying, high cardinality queries. NotesSome of these queries might not work (it's unreasonable to expect they would), but the idea here is to simulate what people will do to Thanos in the real world. Deploying Thanos internally at companies will mean dealing with sets of metrics that are not well designed but "very important to the business", and they'll want Thanos to work this way. Finding out where the limits are and being able to give recommendations about what is possible up front should be useful for Thanos developers and users. |
Thanos Query performance testSetup Run a set of particularly heavy queries against Thanos Query API and against Prometheus API and compare load time. Operations
What to measure, what is the goal? What components/features it tests Notes This test would be handy to find if the number of Thanos Query nodes is helping the performance or decreasing it. |
Compaction/downsampling performanceSetupGenerate artificially data from many prometheus servers (200? 1000?) Volume of data should be similar to what one might get from node-exporter for 10-1000 scraped servers per prometheus server Data doesn't have to be very old, workload should be what's expected between runs of thanos compact. Data should be uploaded to long-term datastore. OperationsRun thanos compact and measure time to do one pass of compaction/downsampling. Preferably run it through some proxy that can add various amount of latency/bandwidth throttling. What to measure, what is the goal?Time to do compaction and how do network limitations affect it. This is relevant for organisations which run on their own hardware but use S3/GCS for long term storage. What components/features it teststhanos compact Notes |
Proxy between what things? (:
Well, you can always have a different bucket if that's an issue. (: But would be nice to know the answer, true. |
Proxy between thanos compact and S3/GCS. Just to introduce a delay/throttle to simulate distance from the store.
True, and it would be good for users to be aware of up front. I only realised this might be an issue for us once I was running the compactor on my workstation which had 100ms latency to the store. |
Hey all, A small update on the progress we're making on the loadtests. We've put together a few tools to help us run some tests. These include a tool to spin up a minimal thanos installation (prometheus + sidecar, thanos-query & thanos-store), a tool to measure query performance of a prometheus or thanos endpoint, and a tool to generate historic TSDB blocks to simulate metrics in LTS. Some early results are giving positive signs, showing we can query 1 year of metrics (sum over 100 timeseries, taken at 15s scrape interval, 210 million total samples) in about 30 seconds. This is using 2 week long blocks with no downsampling. We have noticed that thanos does add some overhead to regular queries, causing most queries to take about twice as long to run on thanos-query compared to vanilla prometheus. We have observed this on short running queries (e.g. just fetching 45k metrics took 0.055s on thanos vs 0.022s on prometheus) as well as longer running queries (rate over 4.5 million metrics took 5.79s on thanos compared to 3.13s on prometheus). This is more or less expected, as we hit the network twice when using thanos-query. We aim to get some further results on metric ingestion rate & performance of different queries soon, so keep an eye on this issue. We do plan on releasing the tooling & testing framework we have built for these tests soon, and I'll update this issue when progress has been made on that. |
Hi, We’ve completed our first round of benchmarking, check out the results -> https://github.com/improbable-eng/thanos/tree/master/benchmark#results. We’ve also released the tooling used to run the benchmarks, so if the results are not appropriate for your use case, feel free to clone & have a play. Enjoy 🙂 |
Quick idea: Deduplication benchmark might be interesting. |
FYI: We removed the tool from repo for now as it was not very well maintained (still used old Thanos version) and it was a separate go module repo that was causing troubles. We might start something like if you have any different ideas how this tool should look like, let us know (: |
Hi All,
We are planning to start some initiative of Thanos load test to check the common metrics like query responsiveness, resource consumptions during common operations on excessively scaled setup.
We want your input before we start! Do you have any particular ideas:
We would like to focus on Thanos features that we want to test, for example:
Query test
Setup:
Operations:
Perform certain query for fresh data (what range? what metric? with dedup?)
What to measure, what is the goal?
Measure the query latency and CPU/Mem resources.
What components/features it tests
Single thanos-query capabilities (global view scalability, deduplication).
Notes:
Usually, you can have 50 Prometheuses connected, but you will ask only for metric filtered by some external labels (e.g
cluster
orenvironment
from only some instances). This will limit fanout. For test purposes, we can mimick the case for the metric that is present and available on all 50 instances to test full fanout.Historical data test
Setup
Operations
Query single store against choosen provider (do we really need thanos-query here on top? we could use just gRPC API). Query old data that is compacted/not-compacted/downsampled/not-downsampled for different time ranges .
What to measure, what is the goal?
Measure query latency and Mem/CPU consumption.
What components/features it tests
Historical data fetch, thanos store gateway.
Maybe some compactor tests as well?
Also any useful tools for benchmarks? I can see:
https://github.com/prometheus/prombench
Prometheus 2 benchmark results: https://coreos.com/blog/prometheus-2.0-storage-layer-optimization
The text was updated successfully, but these errors were encountered: