-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with Microprofile Metrics #257
Conversation
6d01ef3
to
0828150
Compare
Rebased, conflict resolved, squashed and signoffs added. |
== Integration with Microprofile Metrics | ||
|
||
When Microprofile Fault Tolerance and Microprofile Metrics are used together, metrics are automatically added for each of | ||
the methods annotated with a Fault Tolerance annotation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a Fault Tolerance annotation should be with Retry
, Timeout
, CircuitBreaker
or Bullkhead
annotation
assertThat(m.getCircuitBreakerCallsSuccessDelta(), is(0L)); | ||
assertThat(m.getCircuitBreakerCallsFailureDelta(), is(2L)); | ||
assertThat(m.getCircuitBreakerCallsCircuitOpenDelta(), is(0L)); | ||
assertThat(m.getCircuitBreakerOpenedDelta(), is(1L)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 3rd request will trigger the circuit to open. The circuit will only be operated on a request though.
The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method. | ||
In the tables below, the placeholder `<name>` should be replaced by the fully qualified method name with all `.` characters replaced with `_`. | ||
|
||
=== Metrics added for `@Retry` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that also a metric for average number of retries per call would be useful to monitor stability of the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With name ft_<name>_retry_retries_average
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the number of retries per call could be a histogram, with min, max and mean number of retries per call. In that case, it could be called ft_<name>_retry_retries_per_call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to calculate the average retries per call from the other metrics since you can compute the total number of calls and we also expose the total number of retries.
Prometheus makes it fairly easy to do this sort of calculation and graph the results over time.
|=== | ||
| Name | Type | Unit | Description | ||
|
||
|`ft_<name>_bulkhead_concurrent_executions` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to ft_<name>_bulkhead_concurrent_executions_current
- suffix with current
to indicate that it's an actual state and not a counter or total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided not to do this.
| Counter | None | ||
| The number of times the method was called and succeeded after retrying at least once | ||
|
||
|`ft_<name>_retry_calls_failure_total` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be in line with what was suggested for circuitbreaker, this should be renamed to ft_<name>_retry_calls_failed_total
| Counter | None | ||
| The number of times the method was called and succeeded after retrying at least once | ||
|
||
|`ft_<name>_retry_calls_failure_total` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should document that calls failed
+ calls succeded immediately
+ calls succeeded with retry
= total calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point.
|
||
=== Names | ||
|
||
The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question - if multiple apps are deployed and some of them use the same class with the same FT annotations, would that mean there's only one combined metric for both annotated methods in both apps?
AFAIK, FT Metrics 1.0 doesn't allow exposing the same metric multiple times for each application. It's possible to add tags to a metric but that wouldn't help if 2 metrics would have the same name.
There's an issue open to add support for metrics coming from different applications: eclipse/microprofile-metrics#65.
We should at least specify that FT metrics should be exposed in the same way as application-defined metrics. Later, when the Metrics spec comes to a solution, FT metrics from different apps would be handled automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a good point. I've already added a note to say that the metric names may change in response to enhancements and changes in the metrics spec.
It would also be good to call out that they'll appear in the same way as application-defined metrics.
As well as the problem of metrics from two applications clashing, we've also got a similar problem if a class has two methods with the same name and different arguments. That's on my list to bring up at tomorrow's meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this in the Notes
section at the bottom of the file.
| Counter | None | ||
| Number of calls prevented from running by an open circuit breaker | ||
|
||
|`ft_<name>_circuitbreaker_accumulated_open` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to rename to ft__circuitbreaker_open_time_total so that "total" is at the end and "open" is closer to the beginning. Also, I think that "time total" is more descriptive than "accumulated"
Apply same for half open and closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ft_<name>_circuitbreaker_time_open
prometheus output will append _seconds
|=== | ||
| Name | Type | Unit | Description | ||
|
||
|`ft_<name>_circuitbreaker_calls_succeeded_total` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be documented that total of succeeded + failed + prevented = sum of all calls in total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| Histogram | Nanoseconds | ||
| Histogram of execution times for the method | ||
|
||
|`ft_<name>_timeout_calls_timed_out_total` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document that timeout_calls_timed_out + timeout_calls_not_timed_out = total sum of calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not strictly true.
If a method is annotated with @Retry
and @Timeout
, we may end up attempting to run the method multiple times. Each of those attempts can either time out or not and that should be reflected in the metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the best way of wording this in the spec is.
Maybe we should explicitly document that timeout_calls_timed_out + timeout_calls_not_timed_out will equal the number of timed executions, which may be greater than the number of calls if the method is annotated with @Retry
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|=== | ||
| Name | Type | Unit | Description | ||
|
||
|`ft_<name>_retry_calls_success_no_retry_total` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be in line with what was suggested for circuitbreaker, this should be renamed to ft_<name>_retry_calls_succeeded_not_retried_total
| Counter | None | ||
| The number of times the method was called and succeeded without retrying | ||
|
||
|`ft_<name>_retry_calls_success_retry_total` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be in line with what was suggested for circuitbreaker, this should be renamed to ft_<name>_retry_calls_succeeded_retried_total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the metric names should be consistent and I'm fine with changing them all to the past tense, though I'll raise it at the meeting tomorrow before I go through and change them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be careful with _total
as I wrote in a more general comment.
|
||
=== Configuring Metrics Integration | ||
|
||
The integration with MicroProfile Metrics can be disabled by setting a config property named `MP_Fault_Tolerance_Metrics_Enabled` to the value `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since MP configuration values can be changed during runtime it should be specified what happens if the value changes. I guess any change during runtime wouldn't have any effect but it's worth to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
public class BulkheadMetricBean { | ||
|
||
@Bulkhead(2) | ||
public void waitFor(long milliseconds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tests, it's better to control the length of execution. Unnecessary sleeps make the tests unreliable and long to run. So it's better to pass a future instead of timeout to have complete control over the timing, such as:
public void waitFor(Future condition) {
condition.get();
}
and call it with:
CompletableFuture cf = new CompletableFuture();
waitFor(cf);
// do something that should happen meanwhile
cf.complete();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a much better way of controlling the execution order. I'll update the test to use this technique.
*/ | ||
@Asynchronous | ||
@Bulkhead(value = 2, waitingTaskQueue = 2) | ||
public Future<Void> waitForAsync(long milliseconds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as above, better to use future instead of timeout.
|
||
assertThat("concurrent executions", m.getBulkheadConcurrentExecutions().get(), is(2L)); | ||
|
||
f1.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If waitFor is rewritten using a completable future, it should complete the future passed to waitFor earlier
public void closeTheCircuit() throws Exception { | ||
// Assume the circuit is open | ||
// Wait for it to half close, then close it with passing work | ||
Thread.sleep(1500); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be improved to sleep only if the circuit is really open. The state can be retrieved from MetricGetter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would have been true, although we just removed that metric in the meeting this afternoon because it would be very easy to create a misleading graph unless it was sampled very rapidly.
We could possibly check the accumulated time metrics instead, as long as we were careful not to hang the test if those metrics were broken in the implementation under test.
Thanks for your comments @OndrejM, they were really helpful, I will work on the changes tomorrow. |
General comment: all metrics that are a counter , will append |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the _total
as the metrics impl will add that for counters that are exposed via Prometheus exporter.
Also @raymondlam just mentioned that in the internal data model we use dots to separate name components and not underscored. The Prometheus exporter will convert those to underscore (and a different exporter will do that differently again).
// Contributors: | ||
// Andrew Rouse | ||
|
||
== Integration with Microprofile Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we clarified on the hangout that all Counters are monotonic and we're going to remove _total
suffix we need to state this also in the spec. The Metrics counters can be monotonic but can also be decremented, see Counted.
I suggest to replace "Counter" in every metric type with "Monotonic Counter"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mpMetrics doesn't currently add _total
to the end of counters -- I'd suggest we indicate that counter names should end in _total
. If mpMetrics adds _total
to the end of counter names in prometheus format it should avoid doing so to metric names that already end in _total. (as you've already commented in eclipse/microprofile-metrics#172 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already a note at the bottom that all the counters are monotonic. As we're now keeping the total
suffix, I think this is sufficient.
I'm reluctant to put "Monotonic Counter" in the type column of the table as Counter
is the interface in the Metrics API which must be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing to add is we should clarify that Counters are monotonic
This issue suggests that adding There's another issue to add it, but it's unresolved: eclipse/microprofile-metrics#172 |
So I would leave the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _total
suffix should be clarified
I've pushed the latest changes to the spec but haven't updated the TCK yet. Things I know are outstanding:
|
I suggest merging the PR as is now and continue the discussion in the issue #234 followed by another PR. |
@@ -88,15 +107,15 @@ In the tables below, the placeholder `<name>` should be replaced by the fully qu | |||
| Counter | None | |||
| Number of calls prevented from running by an open circuit breaker | |||
|
|||
|`ft_<name>_circuitbreaker_accumulated_open` | |||
|`ft_<name>_circuitbreaker_time_open` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're going to leave the _total
suffix for counters as Metrics doesn't add it, this should also have the _total
suffix: ft_<name>_circuitbreaker_time_open_total
| Gauge<Long> | Nanoseconds | ||
| Amount of time the circuit breaker has spent in open state | ||
|
||
|`ft_<name>_circuitbreaker_accumulated_half_open` | ||
|`ft_<name>_circuitbreaker_time_half_open` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add _total
suffix
| Gauge<Long> | Nanoseconds | ||
| Amount of time the circuit breaker has spent in half-open state | ||
|
||
|`ft_<name>_circuitbreaker_accumulated_closed` | ||
|`ft_<name>_circuitbreaker_time_closed` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add _total
suffix
|
The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method. | ||
In the tables below, the placeholder `<name>` should be replaced by the fully qualified method name with all `.` characters replaced with `_`. | ||
|
||
=== Metrics added for `@Retry`, `@Timeout`, `@CircuitBreaker` and `@Bulkhead` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will run into naming collisions if you try to reuse the same metric names for different FT annotations. For example, if @Retry
and @Timeout
both declare a metric called ft_<name>_invocations_total
they will either fail at metric registration time (if reusable is set to false) or double count invocations (if reusable is set to true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. The intent here was to say that if any of these annotations are present on the method, then the following metrics are added. If two or more annotations are present, then the metrics should still only be added once.
I'll update the wording to clarify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good - if you're able to coordinate which metrics get created between FT annotations, that would address this.
=== Names | ||
|
||
The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method. | ||
In the tables below, the placeholder `<name>` should be replaced by the fully qualified method name with all `.` characters replaced with `_`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the base metrics we use .
s in the metric names. For prometheus format we automatically convert the .
characters to _
, but in json format we leave the .
characters as is. mpMetrics is intended to be used by more than just prometheus. The dot notation is preferred for Graphite. dropwizard metrics natively uses dots between parts of a name. So, the net of that is we may want to specify here what the metric names should look like in the code (and JSON) as well as for Prometheus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This spec should specify the values passed to the Metrics API. I'll update it to use .
or camel case as that appears to be the pattern.
Based on the predefined metrics, the pattern ft.<name>.retry.callsSucceededNotRetried.total
looks the most appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
|`ft_<name>_timeout_execution_duration` | ||
| Histogram | Nanoseconds | ||
| Histogram of execution times for the method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that, in the prometheus format, when a metric is in nanoseconds it will automatically be converted to seconds. You can still specify that it's in nanoseconds and record it in nanoseconds, but the prometheus output will show in seconds and the json format will show in nanoseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the plan. As method execution times can be very short and time deltas should be measured with System.nanotime()
anyway, nanoseconds seemed like the most appropriate unit to use.
|
||
|`ft_<name>_bulkhead_execution_duration` | ||
| Histogram | Nanoseconds | ||
| Histogram of executions times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
executions -> execution
Also, would be good to clarify if this includes time in queue or just service execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have an @Timeout
and and @Bulkhead
on the same method, you will end up with two histograms with basically the same contents. Histograms are expensive (they store the most recent 1000 samples to do computations), so it might be worth having one ft_<name>_execution_duration
that is used for any FT annotation that requires execution timing.
|
||
|`ft_<name>_bulkhead_wait_time`* | ||
| Histogram | Nanoseconds | ||
| Histogram of the time executions spend waiting in the queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be ft_<name>_bulkhead_wait_duration
to be consistent with ft_<name>_bulkhead_execution_duration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
== Notes | ||
|
||
Metrics added my this specification will appear as application metrics for the application which uses the Fault Tolerance annotations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my -> by
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
One of the attempts timed out. | ||
|
||
`ft_com_example_MyClass_doWork_timeout_calls_not_timed_out_total = 2` + | ||
Two of the attempts did not time out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this example. It helps tie everything together.
| Counter | None | ||
| The number of times the method was called | ||
|
||
|`ft_<name>_invocations_failed` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be _invocations_failed_total
as it's a monotonic counter metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
All looks good, I'm OK to merge. @pilhuhn, can you review and approve if OK? |
I've fixed the last of the bugs I found in the TCK. I've rebased and squashed the commits and this is ready to merge. It still doesn't cover:
I'll add these in another pull request. |
Unsquashed commits are here in case anyone wants to see what was recently changed: https://github.com/Azquelt/microprofile-fault-tolerance/tree/metrics-pre-squash Sorry, should have left them unsquashed for inspection. |
Just pushed one last change where I'd used |
Emily asked me to do one more push with the TCKs for Unsquashed version is on the branch linked above. |
| The total number of times the method was retried | ||
|=== | ||
|
||
Note that the sum of `ft_<name>_retry_calls_succeeded_not_retried_total`, `ft_<name>_retry_calls_succeeded_retried_total` and `ft_<name>_retry_calls_failed_total` will give the total number of calls for which the Retry logic was run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _ needs to be replaced by '.'
|
||
Note that the sum of `ft.<name>.circuitbreaker.callsSucceeded.total`, `ft.<name>.circuitbreaker.callsFailed.total` and `ft.<name>.circuitbreaker.callsPrevented.total` will give the total number of calls for which the Bulkhead logic was run. | ||
|
||
Similarly, the sum of `ft.<name>.circuitbreaker.timeOpen.total`, `ft.<name>.circuitbreaker.timeHalfOpen.total` and `ft.<name>.circuitbreaker.timeClosed.total` will give the total time that the circuit breaker has been active for. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say ft.<name>.circuitbreaker.open.total
and ft.<name>.circuitbreaker.halfOpen.total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| Histogram | Nanoseconds | ||
| Histogram of method execution times. This does not include any time spent waiting in the bulkhead queue. | ||
|
||
|`ft.<name>.bulkhead.queuePopulation`* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ft.<name>.bulkhead.queuePopulation
=> ft.<name>.bulkhead.waitingQueuePopulation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe separate waitingQueue and population: ft.<name>.bulkhead.waitingQueue.population
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| Gauge<Long> | None | ||
| Number of executions currently waiting in the queue | ||
|
||
|`ft.<name>.bulkhead.waitDuration`* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ft.<name>.bulkhead.waitDuration
=> ft.<name>.bulkhead.atWaitingQueueDuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest rather ft.<name>.bulkhead.waitingQueue.duration
- it has the same prefix as the histogram ft.<name>.bulkhead.waitingQueue.population
and would appear next to each other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case would you want to change ft.<name>.bulkhead.executionDuration
and ft.<name>.timeout.executionDuration
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (did not change the executionDuration
metrics.
| Gauge<Long> | None | ||
| Number of executions currently waiting in the queue | ||
|
||
|`ft.<name>.bulkhead.waitDuration`* | ||
|`ft.<name>.bulkhead.waitingQueue.duration`* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ft.<name>.bulkhead.waitingQueue.duration
=> ft.<name>.bulkhead.waiting.duration
as it describes the execution and is not measuring the waiting queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @Azquelt !
Signed-off-by: Andrew Rouse <[email protected]>
Signed-off-by: Andrew Rouse <[email protected]>
Squashed and ready to merge. |
@pilhuhn I believe your question was discussed and agreed to still add _total as Metrics spec does not mandate _total was appended automatically. |
For #234