Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

downsampling: Inconsistent query results on store data #376

Closed
felipejfc opened this issue Jun 14, 2018 · 8 comments
Closed

downsampling: Inconsistent query results on store data #376

felipejfc opened this issue Jun 14, 2018 · 8 comments

Comments

@felipejfc
Copy link

Hi, I have a prometheus setup with 5 days retention and I'm seeing strange query results for intervals outside this range, see:

Query for k8s cpu usage now-14 days:
image

Everything is ok!

Now query for 15 days:
image

That's not ok, data that was present on the last query is not present in this one... what could be causing this?

thanos store log for 14 day query:

level=debug ts=2018-06-14T17:55:40.02736218Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=23 mint=1527789280000 maxt=1528998940000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1527724800000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:55:40.027477391Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=23 mint=1527789280000 maxt=1528998940000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1527724800000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:55:40.027344258Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=23 mint=1527789280000 maxt=1528998940000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1527724800000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:55:40.027779679Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=23 mint=1527789280000 maxt=1528998940000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1527724800000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:55:40.054760716Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=23 mint=1527789280000 maxt=1528998940000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1527724800000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:55:40.054883118Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=23 mint=1527789280000 maxt=1528998940000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1527724800000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:55:40.07814406Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=23 mint=1527789040000 maxt=1528998940000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1527724800000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:55:40.078350323Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=23 mint=1527789040000 maxt=1528998940000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1527724800000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:55:41.950475808Z caller=bucket.go:800 msg="series query processed" stats="&{blocksQueried:46 postingsTouched:552 postingsTouchedSizeSum:4070632 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:1258 seriesTouchedSizeSum:253447 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:17866 chunksTouchedSizeSum:1861778 chunksFetched:17866 chunksFetchedSizeSum:51450095 chunksFetchCount:184 chunksFetchDurationSum:149802503703 getAllDuration:1851983101 mergedSeriesCount:70 mergedChunksCount:17866 mergeDuration:19861396}"
level=debug ts=2018-06-14T17:55:41.954716263Z caller=bucket.go:800 msg="series query processed" stats="&{blocksQueried:46 postingsTouched:598 postingsTouchedSizeSum:9427568 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:464 seriesTouchedSizeSum:97787 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:6612 chunksTouchedSizeSum:2865925 chunksFetched:6612 chunksFetchedSizeSum:41982198 chunksFetchCount:298 chunksFetchDurationSum:186802588703 getAllDuration:1914146084 mergedSeriesCount:24 mergedChunksCount:6612 mergeDuration:12971602}"
level=debug ts=2018-06-14T17:55:41.989527874Z caller=bucket.go:800 msg="series query processed" stats="&{blocksQueried:46 postingsTouched:598 postingsTouchedSizeSum:9427568 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:464 seriesTouchedSizeSum:97787 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:6612 chunksTouchedSizeSum:2796196 chunksFetched:6612 chunksFetchedSizeSum:42536229 chunksFetchCount:297 chunksFetchDurationSum:208190432230 getAllDuration:1927619220 mergedSeriesCount:24 mergedChunksCount:6612 mergeDuration:6426206}"
level=debug ts=2018-06-14T17:55:42.947116671Z caller=bucket.go:800 msg="series query processed" stats="&{blocksQueried:46 postingsTouched:552 postingsTouchedSizeSum:17949124 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:9668 seriesTouchedSizeSum:2004888 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:136680 chunksTouchedSizeSum:59734295 chunksFetched:136680 chunksFetchedSizeSum:264297028 chunksFetchCount:689 chunksFetchDurationSum:608738627953 getAllDuration:2593597046 mergedSeriesCount:536 mergedChunksCount:136680 mergeDuration:325581043}"

thanos store log for 15 days query:

level=debug ts=2018-06-14T17:56:12.429287084Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=24 mint=1527702673000 maxt=1528998973000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1526997600000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:56:12.429393076Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=24 mint=1527702673000 maxt=1528998973000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1526997600000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:56:12.460069565Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=24 mint=1527702913000 maxt=1528998973000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1526997600000-1528761600000 Resolution: 300000\nRange: 1528761600000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:56:12.460256071Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=24 mint=1527702913000 maxt=1528998973000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1526997600000-1528761600000 Resolution: 300000\nRange: 1528761600000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:56:12.466264658Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=24 mint=1527702913000 maxt=1528998973000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1526997600000-1528761600000 Resolution: 300000\nRange: 1528761600000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:56:12.466457773Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=24 mint=1527702913000 maxt=1528998973000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1526997600000-1528761600000 Resolution: 300000\nRange: 1528761600000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:56:12.466554802Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=24 mint=1527702913000 maxt=1528998973000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-0\"}" spans="Range: 1526997600000-1528761600000 Resolution: 300000\nRange: 1528761600000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:56:12.466676523Z caller=bucket.go:672 msg="Blocks source resolutions" blocks=24 mint=1527702913000 maxt=1528998973000 lset="{prometheus=\"monitoring/k8s\",prometheus_replica=\"prometheus-k8s-1\"}" spans="Range: 1526997600000-1528761600000 Resolution: 300000\nRange: 1528761600000-1528992000000 Resolution: 0"
level=debug ts=2018-06-14T17:56:14.115951935Z caller=bucket.go:800 msg="series query processed" stats="&{blocksQueried:48 postingsTouched:574 postingsTouchedSizeSum:5867160 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:1504 seriesTouchedSizeSum:346151 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:19066 chunksTouchedSizeSum:1986770 chunksFetched:19066 chunksFetchedSizeSum:53108993 chunksFetchCount:210 chunksFetchDurationSum:67354104479 getAllDuration:1654984398 mergedSeriesCount:70 mergedChunksCount:19066 mergeDuration:31396167}"
level=debug ts=2018-06-14T17:56:14.195766079Z caller=bucket.go:800 msg="series query processed" stats="&{blocksQueried:48 postingsTouched:622 postingsTouchedSizeSum:13175876 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:537 seriesTouchedSizeSum:60859 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:1896 chunksTouchedSizeSum:1734401 chunksFetched:1896 chunksFetchedSizeSum:40088789 chunksFetchCount:314 chunksFetchDurationSum:105507023747 getAllDuration:1725388093 mergedSeriesCount:24 mergedChunksCount:1896 mergeDuration:3540445}"
level=debug ts=2018-06-14T17:56:14.223252075Z caller=bucket.go:800 msg="series query processed" stats="&{blocksQueried:48 postingsTouched:622 postingsTouchedSizeSum:13175876 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:537 seriesTouchedSizeSum:60859 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:1896 chunksTouchedSizeSum:1691652 chunksFetched:1896 chunksFetchedSizeSum:40628795 chunksFetchCount:313 chunksFetchDurationSum:116399504980 getAllDuration:1748273088 mergedSeriesCount:24 mergedChunksCount:1896 mergeDuration:8378818}"
level=debug ts=2018-06-14T17:56:14.755423499Z caller=bucket.go:800 msg="series query processed" stats="&{blocksQueried:48 postingsTouched:574 postingsTouchedSizeSum:24836496 postingsFetched:0 postingsFetchedSizeSum:0 postingsFetchCount:0 postingsFetchDurationSum:0 seriesTouched:11551 seriesTouchedSizeSum:1296203 seriesFetched:0 seriesFetchedSizeSum:0 seriesFetchCount:0 seriesFetchDurationSum:0 chunksTouched:39372 chunksTouchedSizeSum:38324179 chunksFetched:39372 chunksFetchedSizeSum:358715428 chunksFetchCount:422 chunksFetchDurationSum:269549509562 getAllDuration:2144709645 mergedSeriesCount:536 mergedChunksCount:39372 mergeDuration:150213602}"

are there any easy ways for debugging that?

thanks!

@felipejfc felipejfc changed the title Inconsistency query results on store data Inconsistent query results on store data Jun 14, 2018
@bwplotka
Copy link
Member

You already narrowed it down, nice! ((: From logs, it looks like for 14d store is using RAW data and for 15d store is using downsampled data Resolution: 300000\ (at least for some portion of data). Downsampling is experimental and seems like somehow Thanos has chosen to use 5m resolution block (300000) and somehow... either the block does not exists or it wrongly downsampled. Sounds like a valid downsampling bug.

How to debug it?

  • find downsampled blocks that are actually used and investigate what's there ):
  • use Thanos UI instead of grafana and control the Downsampling resolutions using button.

Thanks for finding this!

NOTE: Downsampled data needs to be carefully used (need _over_time functions to have correct results). More about using downsampled data is in progress: #310

@bwplotka bwplotka added the bug label Jun 15, 2018
@bwplotka bwplotka changed the title Inconsistent query results on store data downsampling: Inconsistent query results on store data Jun 15, 2018
@felipejfc
Copy link
Author

@Bplotka you're right, using thanos-query ui and forcing raw-only data the queries are working as expected.

My queries are not using _over_time, that must be the reason it's not working, I haven't found a way to combine irate and sum_over_time for plotting cpu usage...

Maybe there's no bug them? Is it possible for us to make thanos stopping from downsampling results in the queries made by grafana?

thanks!

@bwplotka
Copy link
Member

Not really, but it sounds like we can add some flag for it. But TBH using downsampling make sense, because you cannot even plot the all raw samples.

I will add the flag for choosing default resolution mode for queries until we have some nice downsampling tutorial with examples.

@bwplotka
Copy link
Member

But still you should see something.. not empty results ):

@felipejfc
Copy link
Author

felipejfc commented Jun 19, 2018 via email

@bwplotka
Copy link
Member

mhmhm, maybe - depends on your query. (:

@felipejfc
Copy link
Author

this is the query:

sum(irate(container_cpu_usage_seconds_total{namespace="$namespace",pod_name=~"$pod"}[1m])) by (pod_name)

bwplotka added a commit that referenced this issue Jul 4, 2018
…rimental phase.

(Details #376)

Signed-off-by: Bartek Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jul 5, 2018
…rimental phase.

(Details #376)

Signed-off-by: Bartek Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jul 5, 2018
(Details #376)

Signed-off-by: Bartek Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jul 5, 2018
(Details #376)

Signed-off-by: Bartek Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jul 5, 2018
* query: Disable downsampling by default.

(Details #376)

Signed-off-by: Bartek Plotka <[email protected]>

* docs: Updated component flags. (#411)

Signed-off-by: Bartek Plotka <[email protected]>
@bwplotka
Copy link
Member

bwplotka commented Mar 7, 2019

Looks like old dup of: #396 which was fixed. (:

@bwplotka bwplotka closed this as completed Mar 7, 2019
fpetkovski added a commit to fpetkovski/thanos that referenced this issue Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants