Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with Microprofile Metrics #257

Merged
merged 2 commits into from
May 3, 2018
Merged

Conversation

Azquelt
Copy link
Member

@Azquelt Azquelt commented Apr 24, 2018

For #234

@Azquelt Azquelt force-pushed the metrics branch 4 times, most recently from 6d01ef3 to 0828150 Compare April 25, 2018 10:45
@Azquelt
Copy link
Member Author

Azquelt commented Apr 25, 2018

Rebased, conflict resolved, squashed and signoffs added.

== Integration with Microprofile Metrics

When Microprofile Fault Tolerance and Microprofile Metrics are used together, metrics are automatically added for each of
the methods annotated with a Fault Tolerance annotation.
Copy link
Member

@Emily-Jiang Emily-Jiang Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a Fault Tolerance annotation should be with Retry, Timeout, CircuitBreaker or Bullkhead annotation

assertThat(m.getCircuitBreakerCallsSuccessDelta(), is(0L));
assertThat(m.getCircuitBreakerCallsFailureDelta(), is(2L));
assertThat(m.getCircuitBreakerCallsCircuitOpenDelta(), is(0L));
assertThat(m.getCircuitBreakerOpenedDelta(), is(1L));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the 3rd request will trigger the circuit to open. The circuit will only be operated on a request though.

The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method.
In the tables below, the placeholder `<name>` should be replaced by the fully qualified method name with all `.` characters replaced with `_`.

=== Metrics added for `@Retry`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that also a metric for average number of retries per call would be useful to monitor stability of the call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With name ft_<name>_retry_retries_average

Copy link
Contributor

@OndroMih OndroMih Apr 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the number of retries per call could be a histogram, with min, max and mean number of retries per call. In that case, it could be called ft_<name>_retry_retries_per_call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to calculate the average retries per call from the other metrics since you can compute the total number of calls and we also expose the total number of retries.

Prometheus makes it fairly easy to do this sort of calculation and graph the results over time.

|===
| Name | Type | Unit | Description

|`ft_<name>_bulkhead_concurrent_executions`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename to ft_<name>_bulkhead_concurrent_executions_current - suffix with current to indicate that it's an actual state and not a counter or total.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided not to do this.

| Counter | None
| The number of times the method was called and succeeded after retrying at least once

|`ft_<name>_retry_calls_failure_total`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be in line with what was suggested for circuitbreaker, this should be renamed to ft_<name>_retry_calls_failed_total

| Counter | None
| The number of times the method was called and succeeded after retrying at least once

|`ft_<name>_retry_calls_failure_total`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should document that calls failed + calls succeded immediately + calls succeeded with retry = total calls

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point.


=== Names

The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question - if multiple apps are deployed and some of them use the same class with the same FT annotations, would that mean there's only one combined metric for both annotated methods in both apps?

AFAIK, FT Metrics 1.0 doesn't allow exposing the same metric multiple times for each application. It's possible to add tags to a metric but that wouldn't help if 2 metrics would have the same name.

There's an issue open to add support for metrics coming from different applications: eclipse/microprofile-metrics#65.

We should at least specify that FT metrics should be exposed in the same way as application-defined metrics. Later, when the Metrics spec comes to a solution, FT metrics from different apps would be handled automatically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good point. I've already added a note to say that the metric names may change in response to enhancements and changes in the metrics spec.

It would also be good to call out that they'll appear in the same way as application-defined metrics.

As well as the problem of metrics from two applications clashing, we've also got a similar problem if a class has two methods with the same name and different arguments. That's on my list to bring up at tomorrow's meeting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this in the Notes section at the bottom of the file.

| Counter | None
| Number of calls prevented from running by an open circuit breaker

|`ft_<name>_circuitbreaker_accumulated_open`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to rename to ft__circuitbreaker_open_time_total so that "total" is at the end and "open" is closer to the beginning. Also, I think that "time total" is more descriptive than "accumulated"

Apply same for half open and closed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ft_<name>_circuitbreaker_time_open prometheus output will append _seconds

|===
| Name | Type | Unit | Description

|`ft_<name>_circuitbreaker_calls_succeeded_total`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be documented that total of succeeded + failed + prevented = sum of all calls in total

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

| Histogram | Nanoseconds
| Histogram of execution times for the method

|`ft_<name>_timeout_calls_timed_out_total`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document that timeout_calls_timed_out + timeout_calls_not_timed_out = total sum of calls

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly true.

If a method is annotated with @Retry and @Timeout, we may end up attempting to run the method multiple times. Each of those attempts can either time out or not and that should be reflected in the metrics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the best way of wording this in the spec is.

Maybe we should explicitly document that timeout_calls_timed_out + timeout_calls_not_timed_out will equal the number of timed executions, which may be greater than the number of calls if the method is annotated with @Retry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

|===
| Name | Type | Unit | Description

|`ft_<name>_retry_calls_success_no_retry_total`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be in line with what was suggested for circuitbreaker, this should be renamed to ft_<name>_retry_calls_succeeded_not_retried_total

| Counter | None
| The number of times the method was called and succeeded without retrying

|`ft_<name>_retry_calls_success_retry_total`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be in line with what was suggested for circuitbreaker, this should be renamed to ft_<name>_retry_calls_succeeded_retried_total

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the metric names should be consistent and I'm fine with changing them all to the past tense, though I'll raise it at the meeting tomorrow before I go through and change them all.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful with _total as I wrote in a more general comment.


=== Configuring Metrics Integration

The integration with MicroProfile Metrics can be disabled by setting a config property named `MP_Fault_Tolerance_Metrics_Enabled` to the value `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since MP configuration values can be changed during runtime it should be specified what happens if the value changes. I guess any change during runtime wouldn't have any effect but it's worth to clarify this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will add this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public class BulkheadMetricBean {

@Bulkhead(2)
public void waitFor(long milliseconds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In tests, it's better to control the length of execution. Unnecessary sleeps make the tests unreliable and long to run. So it's better to pass a future instead of timeout to have complete control over the timing, such as:

public void waitFor(Future condition) {
    condition.get();
}

and call it with:

CompletableFuture cf = new CompletableFuture();
waitFor(cf);
// do something that should happen meanwhile
cf.complete();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a much better way of controlling the execution order. I'll update the test to use this technique.

*/
@Asynchronous
@Bulkhead(value = 2, waitingTaskQueue = 2)
public Future<Void> waitForAsync(long milliseconds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above, better to use future instead of timeout.


assertThat("concurrent executions", m.getBulkheadConcurrentExecutions().get(), is(2L));

f1.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If waitFor is rewritten using a completable future, it should complete the future passed to waitFor earlier

public void closeTheCircuit() throws Exception {
// Assume the circuit is open
// Wait for it to half close, then close it with passing work
Thread.sleep(1500);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be improved to sleep only if the circuit is really open. The state can be retrieved from MetricGetter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would have been true, although we just removed that metric in the meeting this afternoon because it would be very easy to create a misleading graph unless it was sampled very rapidly.

We could possibly check the accumulated time metrics instead, as long as we were careful not to hang the test if those metrics were broken in the implementation under test.

@Azquelt
Copy link
Member Author

Azquelt commented Apr 25, 2018

Thanks for your comments @OndrejM, they were really helpful, I will work on the changes tomorrow.

@pilhuhn
Copy link

pilhuhn commented Apr 26, 2018

General comment: all metrics that are a counter , will append _total when exposed via Prometheus format. In that case having already the metric name ending in _total will end up in _total_total for Prometheus output. So you may want to strip _total from all the metric names

Copy link

@pilhuhn pilhuhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the _total as the metrics impl will add that for counters that are exposed via Prometheus exporter.

Also @raymondlam just mentioned that in the internal data model we use dots to separate name components and not underscored. The Prometheus exporter will convert those to underscore (and a different exporter will do that differently again).

// Contributors:
// Andrew Rouse

== Integration with Microprofile Metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we clarified on the hangout that all Counters are monotonic and we're going to remove _total suffix we need to state this also in the spec. The Metrics counters can be monotonic but can also be decremented, see Counted.

I suggest to replace "Counter" in every metric type with "Monotonic Counter"

Copy link
Member

@donbourne donbourne Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpMetrics doesn't currently add _total to the end of counters -- I'd suggest we indicate that counter names should end in _total. If mpMetrics adds _total to the end of counter names in prometheus format it should avoid doing so to metric names that already end in _total. (as you've already commented in eclipse/microprofile-metrics#172 )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a note at the bottom that all the counters are monotonic. As we're now keeping the total suffix, I think this is sufficient.

I'm reluctant to put "Monotonic Counter" in the type column of the table as Counter is the interface in the Metrics API which must be used.

Copy link
Contributor

@OndroMih OndroMih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing to add is we should clarify that Counters are monotonic

@OndroMih
Copy link
Contributor

This issue suggests that adding _total suffix isn't part of MP Metrics: eclipse/microprofile-metrics#137

There's another issue to add it, but it's unresolved: eclipse/microprofile-metrics#172

@OndroMih
Copy link
Contributor

So I would leave the _total suffix in all counter metrics, because MP metrics doesn't add it for prometheus exporter. As Prometheus convention is to end counters with _total it's better to have it there.

Copy link
Contributor

@OndroMih OndroMih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _total suffix should be clarified

@Azquelt
Copy link
Member Author

Azquelt commented Apr 26, 2018

I've pushed the latest changes to the spec but haven't updated the TCK yet.

Things I know are outstanding:

  • Should we be using dots to separate name components rather than underscores?
  • Should all metrics be registered at application startup?
  • How should we handle name clashes? (either between classes in different applications, or between methods on the same class)

@OndroMih
Copy link
Contributor

I suggest merging the PR as is now and continue the discussion in the issue #234 followed by another PR.

@@ -88,15 +107,15 @@ In the tables below, the placeholder `<name>` should be replaced by the fully qu
| Counter | None
| Number of calls prevented from running by an open circuit breaker

|`ft_<name>_circuitbreaker_accumulated_open`
|`ft_<name>_circuitbreaker_time_open`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're going to leave the _total suffix for counters as Metrics doesn't add it, this should also have the _total suffix: ft_<name>_circuitbreaker_time_open_total

| Gauge<Long> | Nanoseconds
| Amount of time the circuit breaker has spent in open state

|`ft_<name>_circuitbreaker_accumulated_half_open`
|`ft_<name>_circuitbreaker_time_half_open`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add _total suffix

| Gauge<Long> | Nanoseconds
| Amount of time the circuit breaker has spent in half-open state

|`ft_<name>_circuitbreaker_accumulated_closed`
|`ft_<name>_circuitbreaker_time_closed`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add _total suffix

@Azquelt
Copy link
Member Author

Azquelt commented Apr 27, 2018

@OndrejM I've just pushed a couple of changes which were requested and updated the names of metrics in the TCK to match the spec. I'm happy for this to be merged and to continue further discussion in #234. (I didn't really want it merged while the TCK and spec said different things)

@Emily-Jiang
Copy link
Member

@Azquelt

  • Should we be using dots to separate name components rather than underscores?
    use dot.
  • Should all metrics be registered at application startup?
    yes
  • How should we handle name clashes? (either between classes in different applications, or between methods on the same class)
    we should document this limitation and when tag is introduced, these limitation will be removed.

The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method.
In the tables below, the placeholder `<name>` should be replaced by the fully qualified method name with all `.` characters replaced with `_`.

=== Metrics added for `@Retry`, `@Timeout`, `@CircuitBreaker` and `@Bulkhead`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will run into naming collisions if you try to reuse the same metric names for different FT annotations. For example, if @Retry and @Timeout both declare a metric called ft_<name>_invocations_total they will either fail at metric registration time (if reusable is set to false) or double count invocations (if reusable is set to true).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. The intent here was to say that if any of these annotations are present on the method, then the following metrics are added. If two or more annotations are present, then the metrics should still only be added once.

I'll update the wording to clarify this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good - if you're able to coordinate which metrics get created between FT annotations, that would address this.

=== Names

The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method.
In the tables below, the placeholder `<name>` should be replaced by the fully qualified method name with all `.` characters replaced with `_`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the base metrics we use .s in the metric names. For prometheus format we automatically convert the . characters to _, but in json format we leave the . characters as is. mpMetrics is intended to be used by more than just prometheus. The dot notation is preferred for Graphite. dropwizard metrics natively uses dots between parts of a name. So, the net of that is we may want to specify here what the metric names should look like in the code (and JSON) as well as for Prometheus.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This spec should specify the values passed to the Metrics API. I'll update it to use . or camel case as that appears to be the pattern.

Based on the predefined metrics, the pattern ft.<name>.retry.callsSucceededNotRetried.total looks the most appropriate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


|`ft_<name>_timeout_execution_duration`
| Histogram | Nanoseconds
| Histogram of execution times for the method
Copy link
Member

@donbourne donbourne Apr 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that, in the prometheus format, when a metric is in nanoseconds it will automatically be converted to seconds. You can still specify that it's in nanoseconds and record it in nanoseconds, but the prometheus output will show in seconds and the json format will show in nanoseconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the plan. As method execution times can be very short and time deltas should be measured with System.nanotime() anyway, nanoseconds seemed like the most appropriate unit to use.


|`ft_<name>_bulkhead_execution_duration`
| Histogram | Nanoseconds
| Histogram of executions times
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executions -> execution
Also, would be good to clarify if this includes time in queue or just service execution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have an @Timeout and and @Bulkhead on the same method, you will end up with two histograms with basically the same contents. Histograms are expensive (they store the most recent 1000 samples to do computations), so it might be worth having one ft_<name>_execution_duration that is used for any FT annotation that requires execution timing.


|`ft_<name>_bulkhead_wait_time`*
| Histogram | Nanoseconds
| Histogram of the time executions spend waiting in the queue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ft_<name>_bulkhead_wait_duration to be consistent with ft_<name>_bulkhead_execution_duration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


== Notes

Metrics added my this specification will appear as application metrics for the application which uses the Fault Tolerance annotations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my -> by

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

One of the attempts timed out.

`ft_com_example_MyClass_doWork_timeout_calls_not_timed_out_total = 2` +
Two of the attempts did not time out.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this example. It helps tie everything together.

| Counter | None
| The number of times the method was called

|`ft_<name>_invocations_failed`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be _invocations_failed_total as it's a monotonic counter metric

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@OndroMih
Copy link
Contributor

OndroMih commented May 1, 2018

All looks good, I'm OK to merge. @pilhuhn, can you review and approve if OK?

@Azquelt
Copy link
Member Author

Azquelt commented May 2, 2018

I've fixed the last of the bugs I found in the TCK. I've rebased and squashed the commits and this is ready to merge.

It still doesn't cover:

  • What happens when two methods have the same name and class
  • TCKs for ft.<name>.invocations.total and ft.<name>.invocations.failed.total

I'll add these in another pull request.

@Azquelt
Copy link
Member Author

Azquelt commented May 2, 2018

Unsquashed commits are here in case anyone wants to see what was recently changed: https://github.com/Azquelt/microprofile-fault-tolerance/tree/metrics-pre-squash

Sorry, should have left them unsquashed for inspection.

@Azquelt
Copy link
Member Author

Azquelt commented May 2, 2018

Just pushed one last change where I'd used org.junit.Assert instead of org.hamcrest.MatcherAssert 🙄

@Azquelt
Copy link
Member Author

Azquelt commented May 2, 2018

Emily asked me to do one more push with the TCKs for .invocation.total and .invocation.failed.total metrics.

Unsquashed version is on the branch linked above.

| The total number of times the method was retried
|===

Note that the sum of `ft_<name>_retry_calls_succeeded_not_retried_total`, `ft_<name>_retry_calls_succeeded_retried_total` and `ft_<name>_retry_calls_failed_total` will give the total number of calls for which the Retry logic was run.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _ needs to be replaced by '.'


Note that the sum of `ft.<name>.circuitbreaker.callsSucceeded.total`, `ft.<name>.circuitbreaker.callsFailed.total` and `ft.<name>.circuitbreaker.callsPrevented.total` will give the total number of calls for which the Bulkhead logic was run.

Similarly, the sum of `ft.<name>.circuitbreaker.timeOpen.total`, `ft.<name>.circuitbreaker.timeHalfOpen.total` and `ft.<name>.circuitbreaker.timeClosed.total` will give the total time that the circuit breaker has been active for.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say ft.<name>.circuitbreaker.open.total and ft.<name>.circuitbreaker.halfOpen.total

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

| Histogram | Nanoseconds
| Histogram of method execution times. This does not include any time spent waiting in the bulkhead queue.

|`ft.<name>.bulkhead.queuePopulation`*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ft.<name>.bulkhead.queuePopulation => ft.<name>.bulkhead.waitingQueuePopulation

Copy link
Contributor

@OndroMih OndroMih May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe separate waitingQueue and population: ft.<name>.bulkhead.waitingQueue.population

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

| Gauge<Long> | None
| Number of executions currently waiting in the queue

|`ft.<name>.bulkhead.waitDuration`*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ft.<name>.bulkhead.waitDuration => ft.<name>.bulkhead.atWaitingQueueDuration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest rather ft.<name>.bulkhead.waitingQueue.duration - it has the same prefix as the histogram ft.<name>.bulkhead.waitingQueue.population and would appear next to each other

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case would you want to change ft.<name>.bulkhead.executionDuration and ft.<name>.timeout.executionDuration as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (did not change the executionDuration metrics.

| Gauge<Long> | None
| Number of executions currently waiting in the queue

|`ft.<name>.bulkhead.waitDuration`*
|`ft.<name>.bulkhead.waitingQueue.duration`*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ft.<name>.bulkhead.waitingQueue.duration => ft.<name>.bulkhead.waiting.duration as it describes the execution and is not measuring the waiting queue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@Emily-Jiang Emily-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @Azquelt !

@Azquelt
Copy link
Member Author

Azquelt commented May 3, 2018

Squashed and ready to merge.

@Emily-Jiang
Copy link
Member

@pilhuhn I believe your question was discussed and agreed to still add _total as Metrics spec does not mandate _total was appended automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants