Support raw Prometheus metrics from models #1651

haykinson · 2020-04-01T16:27:51Z

Currently in our model code we gather metrics using OpenCensus, then have to marshal that data into the format that the /metrics endpoint uses. It would be great if we could, instead, just use a Prometheus exporter for OpenCensus and publish those metrics in its native format, to be scraped as per the implementation #1507.

The text was updated successfully, but these errors were encountered:

RafalSkolasinski · 2020-04-01T18:17:40Z

Hi, not sure if I understand what you want to achieve. Do you want to bypass logic that transforms output of metrics method in the wrapper and exposes it and instead provide content for /metrics on your own?

haykinson · 2020-04-01T19:08:32Z

Correct.

ryandawsonuk · 2020-04-02T10:41:30Z

So basically you'd like to add your own custom endpoint for metrics and configure prometheus to scrape it? And you're using the python wrapper?

haykinson · 2020-04-02T17:57:00Z

Correct, we're using the python wrapper (with 1.0.2 right now). However, we're trying to capture measures (internally within our model) using OpenCensus. We then try to expose them using the metrics method, but Seldon assumes that each COUNTER metric is actually additive to the previous one that was captured (i.e. on each call it performs effectively old += new). Since we're keeping the count measure ourselves, we're in effect already doing the counter aggregation, and it feels like it would be easier for us to just expose the metrics via the Prometheus exporter for OpenCensus, rather than proxy the metric via Seldon. The Prometheus exporter runs a separate server on a separate port, I think, but the key aspect is that we don't let Seldon manipulate what we emit.

As an aside, maybe there's an easier way of handling this issue? My goal is to measure failures / latencies internal to my code, classified with tags etc. Given that the model class is instantiated at service startup, and that predict() can be called separately from metrics(), we would need to keep our own aggregations that are then returned by metrics(). The way I see it:

predict() called. Increment counter from 0 to 1
metrics() called. Return counter value of 1, reset to 0.
predict() called. Increment counter from 0 to 1.
predict() called. Increment counter from 1 to 2.
metrics() called. Return counter value of 2, reset to 0.
metrics() called. Return counter value of 0 or don't return the counter.

This is part of the reason we're using OpenCensus to return aggregates. But they don't provide a way to reset the count, typically, and so we're just always increasing every time metrics() is called. My workaround for this right now is to keep a counter metric internal to my code, but expose it within metrics() as a GAUGE instead.

If I misunderstand the way this works and you suggest a better way of handling this than exposing our aggregates to prometheus directly, please advise.

RafalSkolasinski · 2020-04-06T09:17:05Z

@haykinson Would for example having RAW_COUNTER type possible to be returned by metrics method that expose metrics in same format as COUNTER but does not auto-increment (just uses value provided) address your use case?

haykinson · 2020-04-06T23:59:15Z

Yeah, I suppose that could work as well as a GAUGE in that case. Even better would be to let the metrics endpoint be scraped by Prometheus from a custom port / path (i.e. 0.0.0.0:9090/metrics), so that it doesn't have to be served from the model class — this way histograms would be servable too without depending on any seldon wrapper translation that may happen.

seldondev · 2020-05-07T08:54:51Z

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

ukclivecox · 2022-01-09T11:20:33Z

Will close this. Can you reopen if still a requirement?

haykinson added the triage Needs to be triaged and prioritised accordingly label Apr 1, 2020

ukclivecox removed the triage Needs to be triaged and prioritised accordingly label Apr 2, 2020

seldondev added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 7, 2020

axsaucedo changed the title ~~Support raw Prometheus metrics from models~~ OSS-50: Support raw Prometheus metrics from models Apr 26, 2021

axsaucedo changed the title ~~OSS-50: Support raw Prometheus metrics from models~~ Support raw Prometheus metrics from models Apr 28, 2021

ukclivecox closed this as completed Jan 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support raw Prometheus metrics from models #1651

Support raw Prometheus metrics from models #1651

haykinson commented Apr 1, 2020

RafalSkolasinski commented Apr 1, 2020

haykinson commented Apr 1, 2020

ryandawsonuk commented Apr 2, 2020

haykinson commented Apr 2, 2020

RafalSkolasinski commented Apr 6, 2020 •

edited

Loading

haykinson commented Apr 6, 2020

seldondev commented May 7, 2020

ukclivecox commented Jan 9, 2022

Support raw Prometheus metrics from models #1651

Support raw Prometheus metrics from models #1651

Comments

haykinson commented Apr 1, 2020

RafalSkolasinski commented Apr 1, 2020

haykinson commented Apr 1, 2020

ryandawsonuk commented Apr 2, 2020

haykinson commented Apr 2, 2020

RafalSkolasinski commented Apr 6, 2020 • edited Loading

haykinson commented Apr 6, 2020

seldondev commented May 7, 2020

ukclivecox commented Jan 9, 2022

RafalSkolasinski commented Apr 6, 2020 •

edited

Loading