Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing raw Command Execution Time metrics #333

Closed
asibs opened this issue Oct 30, 2014 · 9 comments
Closed

Exposing raw Command Execution Time metrics #333

asibs opened this issue Oct 30, 2014 · 9 comments

Comments

@asibs
Copy link

asibs commented Oct 30, 2014

I would like to expose the various Hystrix Metrics to an internal monitoring system, and looked at creating a custom HystrixMetricsPublisher. This would work for most of the information I'm interested in, but I could not find a way to expose the command latencies (both the HystrixCommand.run() execution time and the full end-to-end execution including any queuing time, etc).

At the moment it doesn't seem possible to expose these raw metrics, only rolling aggregates of mean and percentiles. The reason I'm after the raw metrics is because I have a monitoring system which will track metrics over days / weeks, allowing more fine grained historical analysis than could be achieved by exposing already aggregated data.

If this sounds like a useful addition to Hystrix I am happy to fork and implement. Below are some thoughts I had about how this functionality could be implemented, so if this is deemed a suitable & useful addition, any guidance on a preferred approach would be welcome.

I first thought I may be able to achieve this by extending HystrixCommandMetrics with my own implementation, but looking at HystrixCommand and AbstractCommand there appears to be no way to change the HystrixCommandMetrics instance used in AbstractCommand. HystrixCommandMetrics ultimately always gets passed in as null to the AbstractCommand constructor, resulting in a call to HystrixCommandMetrics.getInstance(...) in the metrics initialisation block.

So one approach would be to replace the call to HystrixCommandMetrics.getInstance(...) in AbstractCommand with a call to a factory, and allow others to set the factory they wish to use, thus allowing people to write their own implementations of HystrixCommandMetrics. This approach may not be desirable though, HystrixCommandMetrics is quite an important class in terms of providing HealthCounts to the CircuitBreaker, so allowing others to override the implementation may be deemed risky.

Another approach would be to add one or more methods to HystrixEventNotifier. Something along the lines of a "markCommandDuration(...)" method which would then be called from HystrixCommandMetrics.addCommandExecutionTime(...) and HystrixCommandMetrics.addUserThreadExecutionTime(...), allowing a custom HystrixEventNotifier to also see these execution time events (including the time taken) and pass them onto some external monitoring utility. This approach would make use of the existing ability to plugin a custom HystrixEventNotifier.

@benjchristensen
Copy link
Contributor

FYI that I've seen this but am swamped right now and intend on coming back to this.

@benjchristensen benjchristensen added this to the 1.4.x milestone Nov 11, 2014
@mattrjacobs mattrjacobs removed this from the 1.4.x milestone Dec 19, 2014
@asibs asibs removed this from the 1.4.x milestone Dec 19, 2014
@mattrjacobs mattrjacobs added this to the 1.4.0-RC7 milestone Dec 19, 2014
@mattrjacobs
Copy link
Contributor

@asibs I want to make sure I understand this properly. Your point is that all the HystrixCommandMetrics instance is wired into AbstractCommand in such a way that there's no way to provide a custom implementation. Moreover, HystrixCommandMetrics automatically does aggregation via the HystrixRollingNumber and HystrixRollingPercentile instances it contains.

Because of those 2 factors, you're not able to generate a metrics stream that contains non-aggregated command timings.

Did I get that right?

If so, that sounds reasonable and would likely require a pluggable HystrixCommandMetrics implementation. I'm pushing to get 1.4.0 out the door, and would like to defer this until I've finished that.

@mattrjacobs mattrjacobs modified the milestones: 1.4.x, 1.4.0-RC7 Feb 3, 2015
@asibs
Copy link
Author

asibs commented Feb 10, 2015

@mattrjacobs Apologies for late reply.
Yes, that's correct - although AbstractCommand does allow a custom implementation of HystrixCommandMetrics, all of the accessible constructors of HystrixCommand end up passing null for the HystrixCommandMetrics argument, resulting in a default implementation being used in AbstractCommand.

Allowing a custom implementation of HystrixCommandMetrics would allow me to write an implementation that exposes non-aggregated command timings.

There are alternative approaches though - allowing users to provide a custom implementation of HystrixCommandMetrics is potentially risky, as this class provides some pretty core Circuit Breaker functionality (specifically, it creates the HealthCounts object, which is used by the HystrixCircuitBreaker object to determine whether to break the circuit or not). If allowing users to provide custom implementations of HystrixCommandMetrics was deemed risky, the alternative is to add calls to HystrixEventNotifier from HystrixCommandMetrics.addCommandExecutionTime(...) and HystrixCommandMetrics.addUserThreadExecutionTime(...). Users can already provide a custom implementation of HystrixEventNotifier.

Hope that all makes sense.

Deferring this until after 1.4.0 is fine for me.

@nsowen
Copy link

nsowen commented Jul 24, 2015

Any updates or further plans on this?

My thoughts:
I looked through the source and it would be quite easy to implement by extending HystrixPlugins to provide a custom HystrixCommandMetrics implementation on the fly just as @asibs suggested.

HystrixCommandMetrics should be abstract then and moved into strategy/metrics with just enough implementation than can be easily overridden, e.g. the getInstance() methods and aggregation hashmap might be worth keeping. Also, the default implementation HystrixCommandMetricsDefault must be introduced if no custom implementation is given. Also it should be considered to introduce the same thing for the HystrixThreadPoolMetrics.

@mattrjacobs
Copy link
Contributor

Good timing - I have been getting ready to tackle this very problem. An issue is that HystrixCommandMetrics is public, and changing it in a backwards-incompatible way would require a 2.0.

From a first glance, the constructor of HystrixCommandMetrics is package-protected and all of the public methods seems sane, so we might actually be OK.

I am hoping to start work on this in the next 2 weeks. As I get into the details, I'll put my notes here. If you'd like to get started before me, be my guest :)

@mattrjacobs
Copy link
Contributor

@nsowen Would appreciate feedback on #843 if you have time.

I believe I've addressed one of @asibs's concerns by making HystrixCommandMetrics.getHealthCounts() final so that concrete implementations are not implementing this from scratch

@nsowen
Copy link

nsowen commented Aug 3, 2015

@mattrjacobs Please excuse the delay. I think your (quite impressive) changes will work for me! I also checked for any broken public methods, but couldn't find any as well. If I can assist with any of your TODOs regarding this topic, feel free to assign some tasks :-)

@mattrjacobs mattrjacobs removed this from the 1.4.x milestone Aug 3, 2015
@mattrjacobs
Copy link
Contributor

@nsowen Great, thanks for taking the time to review them. I think this will require a 1.5.0 release, as I did make some previously-public methods package-private or protected. I'll put those sort of comments directly on PR #843

@mattrjacobs
Copy link
Contributor

In master, (and 1.5.0 soon), you can now call HystrixCommandCompletionStream.getInstance(HystrixCommandKey).observe(). The return type is Observable<HystrixCommandCompletion>. Each of these objects in the stream contains event counts and latencies, so you're free to consume them how you wish.

@nsowen / @asibs Please take a look and see if this meets your needs. You can see how I retrofit HystrixCommandMetrics to use this stream (and others) if you need an example of usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants