Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruler: Added support for strict rule groups that does not allow partial_response #970

Merged
merged 1 commit into from
Apr 2, 2019

Conversation

bwplotka
Copy link
Member

@bwplotka bwplotka commented Mar 25, 2019

Changes:

  • User might want to restrict alert to fail if partial response is returned by Query. We do that by adding custom field to RuleGroup YAML.
  • By default partial response is enabled (although it does not make sense IMO that much)

CC @povilasv

CC @mjd95 @improbable-ludwik

Signed-off-by: Bartek Plotka [email protected]

})
}

var refOpts *rules.ManagerOptions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this refOpts part will be improved. Essentially I will use prometheus.WrapRegistererWithPrefix or label.

Copy link
Member

@povilasv povilasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice idea! Everything LGTM except for minor things

cmd/thanos/rule.go Outdated Show resolved Hide resolved

strictMgr = rules.NewManager(&opts)
g.Add(func() error {
strictMgr.Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@bwplotka
Copy link
Member Author

@povilasv Before merging, I plan to change partial_response from bool flag to partial_response_strategy wtih warning, fail or something like that instead.

Copy link
Member

@FUSAKLA FUSAKLA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this is really needed feature. Thanks for doing this! 🎉

Generally ok from me just some minor comments/questions

///
/// This is especially useful for any rule/alert evaluations on top of StoreAPI which usually does not tolerate partial
/// errors.
ABORT = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other strategies planned? Just curious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet but it leaves spaces for other options at some point - extensibility. And is more verbose then just DISABLED (:

pkg/rule/rule.go Show resolved Hide resolved
pkg/rule/rule.go Outdated
return nil
}

// Update updates files into two different rule managers strict and non strict based on special field in RuleGroup file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: actually they don't have to be just two since it's done generally for the strategy type enum

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obsolete comment, good point, thx

ctx = tracing.ContextWithTracer(ctx, tracer)

opts := opts
opts.Registerer = extprom.WrapRegistererWith(prometheus.Labels{"type": "non-strict"}, reg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe would be more consistent to have the labels consistent with the strategies so {"type": "warn"} ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point, forgot to update this, awesome

cmd/thanos/rule.go Outdated Show resolved Hide resolved

The data of each rule node can be labeled to satisfy the clusters labeling scheme. High-availability pairs can be run in parallel and should be distinguished by the designated replica label, just like regular Prometheus servers.
You can think of Rule as a simplified Prometheus that does not require a sidecar and does not collect and do PromQL evaluation (no QueryAPI).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more explicit

Suggested change
You can think of Rule as a simplified Prometheus that does not require a sidecar and does not collect and do PromQL evaluation (no QueryAPI).
You can think of Rule as a simplified Prometheus that does not require a sidecar and does not scrape and do PromQL evaluation (no QueryAPI).

unavailability is the key.

Note that Ruler will never trigger alert on it's own, so *no alert will be triggered if the query fails* without extra (mandatory)
meta monitoring. This is exactly the same Alertmanager project. Even though there might be internal error Alertmanager will only produce metric, will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you are trying to say with this sentence, but maybe it's just me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well maybe it's too far, you are right. removed

// TODO(bwplotka) TO BE EXTENDED.

* `thanos_alert_sender_alerts_dropped_total` to check if our connection to alertmanager works.
* `prometheus_rule_group_last_duration_seconds < prometheus_rule_group_interval_seconds`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add prometheus_rule_evaluation_failures_total right away that's pretty essential alert also.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to work on this and ensure that ruler has all of this


In case of Ruler in HA you need to make sure you have following labelling setup:

* Labels that identifies the HA group ruler is in and replica label with different value for each ruler instance, e.g:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Labels that identifies the HA group ruler is in and replica label with different value for each ruler instance, e.g:
* Labels that identifies the HA group ruler and replica label with different value for each ruler instance, e.g:

docs/components/rule.md Show resolved Hide resolved
@FUSAKLA
Copy link
Member

FUSAKLA commented Mar 27, 2019

And from me it's +1 for disabling the partial response by default since it is feature of thanos and it could cause issues after migration from proetheus for users that doesn't know about this.

@bwplotka
Copy link
Member Author

And from me it's +1 for disabling the partial response by default since it is feature of thanos and it could cause issues after migration from proetheus for users that doesn't know about this.

I would leave this decision for later AND with wider discussion - it's not that straightforward

@bwplotka bwplotka force-pushed the ruler-partial-resp branch 2 times, most recently from 7c31aba to affe7ed Compare March 27, 2019 23:47
@bwplotka
Copy link
Member Author

@FUSAKLA @povilasv rdy for second review iteration.

I added some docs + I made partial response strategy ABORT (so disabled) by default from Thanos Ruler. I think that makes sense more. If we want to have it WARN or ABORT on Query API that's another story.

@bwplotka bwplotka force-pushed the ruler-partial-resp branch 2 times, most recently from d88a0d0 to dad1aa9 Compare March 28, 2019 00:01
@bwplotka
Copy link
Member Author

Nice moment for ruler docs: https://twitter.com/t_colgate/status/1110874738866438149?s=19

@bwplotka
Copy link
Member Author

bwplotka commented Mar 28, 2019

@tcolgate if you want to review / give us feedback if that is clear (the docs part)

Copy link
Member

@povilasv povilasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits, but I really like this. Good work on the docs 🥇

cmd/thanos/rule.go Show resolved Hide resolved
cmd/thanos/rule.go Show resolved Hide resolved
cmd/thanos/rule.go Show resolved Hide resolved

_**NOTE:** The rule component is experimental since it has conceptual tradeoffs that might not be favorable for most use cases. It is recommended to keep deploying rules to the relevant Prometheus servers._
_**NOTE:** It is recommended to keep deploying rules inside the relevant Prometheus servers locally. Use ruler only on specific cases. Read details[below](rule.md#Risk) why._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

to evaluate the query, that is not fast enough to fill the rule. This might indicate other problems like slow StoreAPis or
too complex query expression in rule.

* `thanos_rule_evaluation_with_warnings_total`. If you choose to use Rules and Alerts with [partial response strategy](rule.md#PartialResponse)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice I will add these metrics to our grafana dashboard :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

see WARN log level. This might suggest that those evaluations returns partial response and might be or not accurate.

Those metrics are important for vanilla Prometheus as well, but even more important when we rely on (sometimes WAN) network.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe here we should also add about the new timeout in store.response-timeout, i.e. if you have a slow Store API it will time them out and you will see partial response or log errors in the abort case?

// QueryInstant performs instant query and returns results in model.Vector type.
func QueryInstant(ctx context.Context, logger log.Logger, base *url.URL, query string, t time.Time, dedup bool) (model.Vector, error) {
func QueryInstant(ctx context.Context, logger log.Logger, base *url.URL, query string, t time.Time, extra ThanosQueryParams) (model.Vector, []string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe t time.Time and query should go into ThanosQueryParams ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree as this function does not make any sense without that -> I think it makes sense to separate custom "Thanos" params, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ThanosQueryParams should be QueryOptions and then it's all good?

pkg/promclient/promclient.go Outdated Show resolved Hide resolved
@@ -5,32 +5,36 @@ import (
"net/http"
"time"

thanosrule "github.com/improbable-eng/thanos/pkg/rule"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goimports please :)

pkg/ui/rule.go Outdated Show resolved Hide resolved
@tcolgate
Copy link
Contributor

tcolgate commented Mar 28, 2019

Hi All,
I realise twitter is probably the worst place to discuss this :) Please bear with me, I'll give an outline of our setup.
We have a fairly large prom install (> 1M active time series on the largest instance, several hundred rules, several dozen alerts). There are two sets (staging/prod) of 4 prom installs (multiple cloud installs), and we are attempting to use thanos to federate queries across them (that is working well).
We have all that lot doubled up, one classic prom, and one thanos+prom, and we are attempting to migrate to the latter setup.
I have about 60 developers that contribute to our prom rule base and we have a mechanism that allows them to publish rules for their own applications to the clusters they deploy to. Rules are evaluated on the prom instances in that cluster (this is all in kubernetes).
The rule deployment is setup to minimize interaction with the infra team. We wanted to avoid the typical "PR into an infra repository" flow. Rules go with the code for the service, they are deployed when the app is deployed.
We wish want to move to thanos because the large prom instances aren't scaling well. Mixing query load with scraping a very dynamic set of workloads with a lot of time series, means memory management is a real pain. Also maintaining long term storage in classic prom has been an unpleasant experience.. Sharding classic prom is unsatisfactory, I have some hope that I can horizontally scale thanos, and dynamically scale for query load.

In relation to the ruler docs, the suggestion that the ruler should only be used for long term rules, with other rules being left on the main prom servers strikes me as poor advise, and one play very badly in our setup (and I think in setups with only a subset of our rediculously contrived setup).
  • First off, and most importantly. Deciding which rules require long term query data is a non-trviail task. Rules and alerts often build on other rules and alerts. While well named rules will usually the time window of which functions have been applied, rules are not always particularly well named. For a dev with enough promql to get by, requiring them to stipulate which rules go where based on criteria which don't come up all that often, is going to be hit and miss at best.
  • Programmatically determining which rules go where is also going to be painful, requiring inspection of all the terms in the rule to determine if they are built on rules that in turn rely on a specific time frame.
  • Even if this is achieved, it means the load of rule evaluation is not shared across the queriers, and forces the prom server to keep more data live in memory.

It also feels like the advice is overly cautious.

  • If the ruler is deployed co-located with the store and prom instance that it is evaluating rules for, the query latency is unlikely to be significant (especially when compared to the evaluation interval and scrape intervals, of a typical prom install).
  • Removing the evaluation from the main prom instance should only help to increase the prom reliability further.
  • A single failed/poor evaluation in an interval will be unlikely to fire or clear down a well formed alert.
  • The advice casts the querier in a bad lad, implying that it is somehow not trustable, or not performant. This is definitely not the case.

It feels like it might be better to suggest that most rules are safe to deploy to the ruler. If you have rules and alerts that are absolutely critical, and you know they fit with the historical retention of that instance, then those rules can/should be deployed directly to prometheus.

@bwplotka
Copy link
Member Author

bwplotka commented Mar 28, 2019

Thanks for you feedback it is super useful @tcolgate! BTW can you jump on our slack? (link in README). Would be easier to discuss!

In relation to the ruler docs, the suggestion that the ruler should only be used for long term rules, with other rules being left on the main prom servers strikes me as poor advise, and one play very badly in our setup (and I think in setups with only a subset of our rediculously contrived setup).

I disagree with this and the key here is just a good deployment model for your rules into various pieces that can be easily solved outside of Thanos (config managament).

First off, and most importantly. Deciding which rules require long term query data is a non-trviail task. Rules and alerts often build on other rules and alerts. While well named rules will usually the time window of which functions have been applied, rules are not always particularly well named. For a dev with enough promql to get by, requiring them to stipulate which rules go where based on criteria which don't come up all that often, is going to be hit and miss at best.

So how many layers of nested rules you do that it's hard to tell? I think doing more than 2 is anti pattern. This is because everything is tightly coupled and dependent on each other. And even if you have 100 nested rules on top of each other (100 layers) you HAVE to put them in the same rule group otherwise you will get inconisistent results (group can run concurrently, but all within single group is sequential). BTW I think this is more question to prometheus-dev as the same recommendation applies to local Prometheus rule.

Anyway if they are on the same group it easy to tell what retention is needed. AND rules aggregating more than 5m is not that often spotted and more than 24h are super rare from my expierience. However I cannot see your user's rules so it might be that you have differen use case. Putting most rules locally is the RECOMMENDATION, not a blocker. However from what I can see you really want to use Thanos Ruler MOSTLY to solve lack of proper configuration management. If you miss that part, speak to us / #thanos-dev slack channel , we maybe have some nice ideas on that side as well. At the end, sure you can use Thanos Ruler for that just fine - but you are introducing more risk that is written in those docs. That's why my advice, which I don't think is poor, as we learnt that the hard way, after using Thanos on production for almost a year. (:

Programmatically determining which rules go where is also going to be painful, requiring inspection of all the terms in the rule to determine if they are built on rules that in turn rely on a specific time frame.

TBH It's not that bad as long as you use Golang and existing libraries for parsing PromQL, but sure depends how much time you want to spend on it (:

Even if this is achieved, it means the load of rule evaluation is not shared across the queriers, and forces the prom server to keep more data live in memory.

I don't get that part, what do you mean shared across the queriers ?

If the ruler is deployed co-located with the store and prom instance that it is evaluating rules for, the query latency is unlikely to be significant (especially when compared to the evaluation interval and scrape intervals, of a typical prom install).

Agree 👍

Removing the evaluation from the main prom instance should only help to increase the prom reliability further.

It depends but I tend to disagree.

Evaluation in Prom has been there for 5+ years vs 1 year Thanos project WITH distributed system problems. I trust my code, but still I think the former is less risky (e.g I don't trust network). ^^

A single failed/poor evaluation in an interval will be unlikely to fire or clear down a well formed alert.

Agree, but what should be the flow for Ruler? Should it retry? should it wait for another eval interval (it does the latter)

The advice casts the querier in a bad lad, implying that it is somehow not trustable, or not performant. This is definitely not the case.

Can agree with this, but I would rather say it's the distributed model vs local Prometheus. Think about this as advise for typical users, sometimes not really familiar with high scale problems yet. Again I would treat is as a good advice, but totally ok to ignore if you know what you are doing.

@tcolgate Can you help us rephrase this so it will sound better?

If you have rules and alerts that are absolutely critical, and you know they fit with the historical retention of that instance, then those rules can/should be deployed directly to prometheus.

Sure, I think the key part is to understand the risk which I described. Do you think the risk part is clear enough?

@tcolgate
Copy link
Contributor

I disagree with this and the key here is just a good deployment model for your rules into various pieces that can be easily solved outside of Thanos (config managament).

The difficulty here is that my users are the ones that define the groups. So they have to understand the reasoning for the grouping. Rules which would need to be separated here are rare enough that such knowledge is likely to be lost within the lore of the wider user base.

So how many layers of nested rules you do that it's hard to tell? I think doing more than 2 is anti pattern.

We have some pretty deep nestings, these are now properly split into groups. Even on prom1 we never had evalls over run so whilst it took a while for deep rules to converge to the right value, concurrency tended not to be a problem.

Anyway if they are on the same group it easy to tell what retention is needed. AND rules aggregating more than 5m is not that often spotted and more than 24h are super rare from my expierience.

We have quite a few that are looking for 1 day (usually users looking to track increases over a 24h period), lots of >1h. We have a few SLO related metrics that look for 30d! The difficulty is that we intentionally avoid "baby sitting" users writing queries and alerts, so, as far as is reasonable, stuff should Just Work.

Programmatically determining which rules go where is also going to be painful, requiring inspection of all the terms in the rule to determine if they are built on rules that in turn rely on a specific time frame.

TBH It's not that bad as long as you use Golang and existing libraries for parsing PromQL, but sure depends how much time you want to spend on it (:

We already embed large chunks of prom for verify rules, so this has occurred to me, but it takes more effort than I wanted to get into right away (e.g. it only just occurred to me that offsets in queries will also need to be taken account of).

I don't get that part, what do you mean shared across the queriers ?

Unless I'm misunderstanding, ruler will spread the query load out over thanos queriers? That seems preferable to leaving it all up to the scraping prom instance.

Evaluation in Prom has been there for 5+ years vs 1 year Thanos project WITH distributed system problems. I trust my code, but still I think the former is less risky (e.g I don't trust network). ^^

Maybe I trust your code more that you do :) Failing that, I'm getting enough pain from prom instability to give it a shot (not that prom is unstable, but it is when running large memory hungry instances in kube, as I said, I'm confident I can shard thanos).

Agree, but what should be the flow for Ruler? Should it retry? should it wait for another eval interval (it does the latter)
Agree its not ideal, but I'm happy with a failed eval, as long as I can now they are happening and can try and fix up the underlying issues.

@tcolgate Can you help us rephrase this so it will sound better?

I think my preference would be to remove the NOTE, and have a more verbose paragraph elaborating the tradeoffs. I'll have a think. I think overall my objection to the current phrasing is that for many use cases ruler (and even thanos), exist precisely to resolve a config management situation, the config management issue of running loads of standalone proms with separate config. (I'm already running 9, I'm using thanos to simplify things and make them more stable).

If you have rules and alerts that are absolutely critical, and you know they fit with the historical retention of that instance, then those rules can/should be deployed directly to prometheus.

Sure, I think the key part is to understand the risk which I described. Do you think the risk part is clear enough?

Possibly

@bwplotka
Copy link
Member Author

bwplotka commented Mar 28, 2019

Unless I'm misunderstanding, ruler will spread the query load out over thanos queriers? That seems preferable to leaving it all up to the scraping prom instance.

Sure but most of data (initial hours) are on Prom so you would use Prom to fetch those anyway.

I think my preference would be to remove the NOTE, and have a more verbose paragraph elaborating the tradeoffs. I'll have a think. I think overall my objection to the current phrasing is that for many use cases ruler (and even thanos), exist precisely to resolve a config management situation, the config management issue of running loads of standalone proms with separate config. (I'm already running 9, I'm using thanos to simplify things and make them more stable).

Let's think about this more, this PR is extremely huge - sorry for that - that is antipattern but it's rather the code I want to improve then docs, so I would ship this and iterate on live docs to improve them. I am not sure about the config managment aspects. If we will move border and agree on that we would endup getting (valid) request to implement config distribution to all componetns, rotating, hot realoading, compressing, signing XD everything. We need to put a border somewhere, maybe there is a place for another tool to fix confg management. It might just a side effect of Thanos that it helps to organize your config better. cc @GiedriusS @povilasv @domgreen on that.

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use ruler myself so it's a bit harder for me but at a first glance the code looks clean and the comments by @povilasv seem on point! 👍 I was most worried backward compatibility and the code currently maintains that. Will need to take a deeper look again before merging it

@FUSAKLA
Copy link
Member

FUSAKLA commented Mar 28, 2019

I personally agree with @bwplotka on this.
It's not about trusting a code but also trusting the infrastructure where it is running, network etc.
It doesn't matter how safe it is everything can go wrong eventually and than the number of parts that can break matters. Having the alerts closest to the source of the data they evaluate is definitely best-practice and users should be discouraged from choosing the "easy path" of putting all to the ruler.

But as mentioned above I wouldn't block this PR with this discussion. Not saying we shouldn't discuss it further but in separate PR/thread/channel.

Copy link
Member

@FUSAKLA FUSAKLA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I really like the feature and the work on the docs! Good job.

@bwplotka bwplotka force-pushed the ruler-partial-resp branch 2 times, most recently from ed6c039 to 42b6d6b Compare March 30, 2019 10:41
Copy link
Member Author

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Team for awesome feedback!

CC @FUSAKLA @GiedriusS @povilasv addressed all. Added more info about store.response-timeouts and PartialResponseStrategy being a tradeoff between accuracy(consistency) and availability.

PTAL (:

}

// NOTE: For Rule default is abort as this is recommended for alerting.
p = storepb.PartialResponseStrategy_value[storepb.PartialResponseStrategy_ABORT.String()]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GiedriusS in terms of backward compatibility we break it here. By default you will have abort strategy. (it was "enabled"/"warning" previously only). Added item to CHANGELOG as well.

@bwplotka bwplotka force-pushed the ruler-partial-resp branch 3 times, most recently from 609d78d to 2d67fa8 Compare April 1, 2019 23:45
pkg/promclient/promclient.go Outdated Show resolved Hide resolved
b := &bytes.Buffer{}
if resp.Header.Get("Content-Encoding") != "gzip" {
_, err = io.Copy(b, resp.Body)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to close resp.Body here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funny fact, I copied it (:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from Prometheus

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

433 line is already defering close

return err
}
_, err = io.Copy(b, gzipr)
_ = gzipr.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't we need to close resp.Body here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, copied, but maybe badly, relooking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

433 line is already defering close

pkg/rule/rule_test.go Outdated Show resolved Hide resolved
@bwplotka bwplotka requested a review from povilasv April 2, 2019 08:56
@bwplotka
Copy link
Member Author

bwplotka commented Apr 2, 2019

Addressed all @povilasv

Copy link
Member

@povilasv povilasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamp

@bwplotka bwplotka merged commit 84cad51 into master Apr 2, 2019
@bwplotka bwplotka deleted the ruler-partial-resp branch April 2, 2019 10:09
@@ -227,13 +228,23 @@ func runRule(
Name: "thanos_rule_loaded_rules",
Help: "Loaded rules partitioned by file and group",
},
[]string{"file", "group"},
[]string{"part_resp_strategy", "file", "group"},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not consistent ):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants