Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose /metrics in python wrapper #1507

Merged
merged 46 commits into from
Mar 23, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
0708c56
accessing shared_dict from some method
RafalSkolasinski Mar 6, 2020
0c036c5
example of shared dict updated on custom metrics and accessed on /met…
RafalSkolasinski Mar 6, 2020
529485f
WIP: first success of serving metrics
RafalSkolasinski Mar 6, 2020
f92b060
WIP: introduce SeldonMetrics class (to be shared between processes)
RafalSkolasinski Mar 9, 2020
d7e6cef
WIP: add prometheus_client to requirements
RafalSkolasinski Mar 9, 2020
ac4c86f
WIP: do not use global variable to store metrics
RafalSkolasinski Mar 9, 2020
2e57f56
WIP: run metrics endpoint as separated microservice
RafalSkolasinski Mar 10, 2020
dde19d7
WIP: gather metrics when GRPC endpoints are called (no worker separat…
RafalSkolasinski Mar 10, 2020
fba0796
WIP: use thread names to id workers in GPRC mode
RafalSkolasinski Mar 10, 2020
ff66476
WIP: fix tests
RafalSkolasinski Mar 10, 2020
90d9ec4
WIP: add support for TIMER type of metrics
RafalSkolasinski Mar 10, 2020
87e40af
run linter
RafalSkolasinski Mar 10, 2020
899cdfa
WIP: apply suggestions from Alejandro
RafalSkolasinski Mar 11, 2020
b593437
WIP: introduce usage of Lock for thread-safety
RafalSkolasinski Mar 11, 2020
74e6db5
WIP: read metrics labels from environmental variables
RafalSkolasinski Mar 11, 2020
6514d43
WIP: do not use dashes in prometheus labels
RafalSkolasinski Mar 12, 2020
9c3c898
WIP: add debug logger to metrics
RafalSkolasinski Mar 13, 2020
d67ef06
WIP: make wrapper prometheus endpoint configurable
RafalSkolasinski Mar 13, 2020
4750ee1
WIP: modify engine/executor annotations and name metrics port
RafalSkolasinski Mar 13, 2020
a7a1aeb
update to operator for metrics and test notebook
ukclivecox Mar 15, 2020
7221103
add tf serving metrics example
ukclivecox Mar 16, 2020
6415817
fix analytics docs
ukclivecox Mar 16, 2020
21696bc
merge/append metrics for request responses
RafalSkolasinski Mar 17, 2020
50feddd
add unit tests to new metrics microservice
RafalSkolasinski Mar 17, 2020
6fb999b
clarify: UNDEFINED -> NOT_IMPLEMENTED
RafalSkolasinski Mar 17, 2020
16965c7
run linter
RafalSkolasinski Mar 18, 2020
8774d30
fix imports in test_metrics
RafalSkolasinski Mar 18, 2020
1fd74d8
code fixes after rebase
RafalSkolasinski Mar 18, 2020
de4e268
remove redundant code from webhook
ukclivecox Mar 18, 2020
0bcd907
Fix defaulting of graph elements to model. Only do when impl or metho…
ukclivecox Mar 18, 2020
bee23cf
fix one of e2e tests that misteriously started to fail
RafalSkolasinski Mar 19, 2020
1ed9c7c
gather custom metrics from predict_raw
RafalSkolasinski Mar 19, 2020
3267107
run linter
RafalSkolasinski Mar 19, 2020
a186f6d
avoid catching exception
RafalSkolasinski Mar 19, 2020
ac1cb00
make metrics be always expected by microservice create functions
RafalSkolasinski Mar 19, 2020
6bb4a98
run linter
RafalSkolasinski Mar 19, 2020
d1ee74a
make seldon_metrics mandatory argument for SeldonModelGRPC
RafalSkolasinski Mar 19, 2020
4fb20e3
update SeldonMetrics after calling other raw methods and include GRPC…
RafalSkolasinski Mar 19, 2020
1edb88b
run linter
RafalSkolasinski Mar 19, 2020
54dcc0c
add better tests for metrics with predict_raw (include grpc)
RafalSkolasinski Mar 19, 2020
62e5ecc
combine different metrics test together
RafalSkolasinski Mar 19, 2020
81bdf06
extend metrics test to cover different microservice endpoints
RafalSkolasinski Mar 19, 2020
353c4b3
reduce logging by moving info to debug for non implemented methods, c…
RafalSkolasinski Mar 20, 2020
9c64e2a
allow to define custom prometheus labels
RafalSkolasinski Mar 23, 2020
d3194cf
update docs
RafalSkolasinski Mar 23, 2020
aec4c3f
simplify custom metrics docs + add ommited files in previous commit
RafalSkolasinski Mar 23, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -208,4 +208,5 @@ examples/ambassador/custom/ambassador_custom.py
examples/ambassador/headers/ambassador_headers.py
examples/ambassador/shadow/ambassador_shadow.py
examples/models/metrics/metrics.py
examples/models/custom_metrics/customMetrics.py
examples/models/tracing/tracing.py
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,8 +295,8 @@ Below are some of the core components together with link to the logs that provid
* [Java Language Wrapper [Incubating] ](https://docs.seldon.io/projects/seldon-core/en/latest/java/README.html)
* [R Language Wrapper [ALPHA] ](https://docs.seldon.io/projects/seldon-core/en/latest/R/README.html)
* [NodeJS Language Wrapper [ALPHA] ](https://docs.seldon.io/projects/seldon-core/en/latest/nodejs/README.html)
* [Custom Language Wrapper [ALPHA] ](https://docs.seldon.io/projects/seldon-core/en/latest/analytics/custom_metrics.html)
* [Go Language Wrapper [ALPHA] ](https://docs.seldon.io/projects/seldon-core/en/latest/go/README.html)

### Ingress

* [Ambassador Ingress ](https://docs.seldon.io/projects/seldon-core/en/latest/ingress/ambassador.html)
Expand All @@ -307,6 +307,7 @@ Below are some of the core components together with link to the logs that provid
* [Supported API Protocols ](https://docs.seldon.io/projects/seldon-core/en/latest/graph/protocols.html)
* [CI/CD MLOps at Scale ](https://docs.seldon.io/projects/seldon-core/en/latest/analytics/cicd-mlops.html)
* [Metrics with Prometheus ](https://docs.seldon.io/projects/seldon-core/en/latest/analytics/analytics.html)
* [Custom Metrics ](https://docs.seldon.io/projects/seldon-core/en/latest/analytics/custom_metrics.html)
* [Payload Logging with ELK ](https://docs.seldon.io/projects/seldon-core/en/latest/analytics/logging.html)
* [Distributed Tracing with Jaeger ](https://docs.seldon.io/projects/seldon-core/en/latest/graph/distributed-tracing.html)
* [Autoscaling in Kubernetes ](https://docs.seldon.io/projects/seldon-core/en/latest/graph/autoscaling.html)
Expand Down
21 changes: 11 additions & 10 deletions doc/source/analytics/analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,27 +28,28 @@ Seldon Core provides an example Helm analytics chart that displays the above Pro
```bash
helm install seldon-core-analytics seldon-core-analytics \
--repo https://storage.googleapis.com/seldon-charts \
--set grafana_prom_admin_password=password \
--set persistence.enabled=false \
--namespace seldon-system
```

The available parameters are:

* ```grafana_prom_admin_password``` : The admin user Grafana password to use.
* ```persistence.enabled``` : Whether Prometheus persistence is enabled.

Once running you can expose the Grafana dashboard with:

```bash
kubectl port-forward $(kubectl get pods -n seldon-system -l app=grafana-prom-server -o jsonpath='{.items[0].metadata.name}') 3000:3000 -n seldon-system
kubectl port-forward svc/seldon-core-analytics-grafana 3000:80 -n seldon-system
```

You can then view the dashboard at http://localhost:3000/dashboard/db/prediction-analytics?refresh=5s&orgId=1
You can then view the dashboard at http://localhost:3000/dashboard/db/prediction-analytics

![dashboard](./dashboard.png)

It is also possible expose Prometheus itself with:
```bash
kubectl port-forward svc/seldon-core-analytics-prometheus-seldon 3001:80 -n seldon-system
```

and then access it at http://localhost:3001/



## Example

There is [an example notebook you can use to test the metrics](../examples/metrics.html).

55 changes: 49 additions & 6 deletions doc/source/analytics/custom_metrics.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Custom Metrics

Seldon Core exposes basic metrics via Prometheus endpoints on its service orchestrator that include request count, request time percentiles and rolling accuracy for each running model. However, you may wish to expose custom metrics from your components which are automatically added to Prometheus. For this purpose you can supply extra fields in the returned meta data of the response object in the API calls to your components as illustrated below:
Seldon Core exposes basic metrics via Prometheus endpoints on its service orchestrator that include request count, request time percentiles and rolling accuracy for each running model as described in [metrics](./analytics.md) documentation.
However, you may wish to expose custom metrics from your components which are automatically added to Prometheus.
For this purpose you can supply extra fields in the returned meta data of the response object in the API calls to your components as illustrated below:

```json
{
Expand All @@ -10,7 +12,7 @@ Seldon Core exposes basic metrics via Prometheus endpoints on its service orches
"type": "COUNTER",
"key": "mycounter",
"value": 1.0,
"tags": {"mytag":"mytagvalue"}
"tags": {"mytag": "mytagvalue"}
},
{
"type": "GAUGE",
Expand Down Expand Up @@ -39,7 +41,7 @@ We provide three types of metric that can be returned in the meta.metrics list:

* COUNTER : a monotonically increasing value. It will be added to any existing value from the metric key.
* GAUGE : an absolute value showing a level, it will overwrite any existing value.
* TIMER : a time value (in msecs).
* TIMER : a time value (in msecs), it will be aggregated into Prometheus' HISTOGRAM.

Each metric, apart from the type, takes a key and a value. The proto buffer definition is shown below:

Expand All @@ -53,12 +55,53 @@ message Metric {
string key = 1;
MetricType type = 2;
float value = 3;
map<string,string> tags = 4;
map<string,string> tags = 4;
}
```

## Metrics endpoints

As we expose the metrics via Prometheus, if ```tags``` are added they must appear in every metric response and always have the same set of keys since Prometheus does not allow metrics to have varying numbers of tags. This condition is enforced by the [micrometer](https://micrometer.io/) library we use to expose the metrics. Exceptions will happen if this condition is violated.
Custom metrics are exposed directly by the Python wrapper.
In order for `Prometheus` to scrape multiple endpoints from a single `Pod` we use `metrics` name for ports that expose `Prometheus` metrics:
```yaml
ports:
- containerPort: 6000
name: metrics
protocol: TCP
```

This require us to use a following entry
```
- source_labels: [__meta_kubernetes_pod_container_port_name]
action: keep
regex: metrics(-.*)?
```
in the Prometheus [config](https://github.com/SeldonIO/seldon-core/blob/master/helm-charts/seldon-core-analytics/files/prometheus/prometheus-config.yaml) together with following two annotations:
```
prometheus.io/scrape: "true"
prometheus.io/path: "/prometheus"
```

Note: we do not use `prometheus.io/port` annotation in this configuration.


Before Seldon Core 1.1 custom metrics have been returned to the orchestrator which exposed them all together to `Prometheus` via a single endpoint.
We used to have at this time all three following annotations:
```yaml
prometheus.io/scrape: "true"
prometheus.io/path: "/prometheus"
prometheus.io/port: "8000"
```


## Labels

As we expose the metrics via `Prometheus`, if ```tags``` are added they must appear in every metric response otherwise `Prometheus` will consider such metrics as a new time series, see official [documentation].

Before Seldon Core 1.1 orchestrator enforced presence of same set of labels using the [micrometer](https://micrometer.io/) library to expose metrics. Exceptions would happen if this condition have been violated.


## Supported wrappers

At present the following Seldon Core wrappers provide integrations with custom metrics:

Expand All @@ -67,4 +110,4 @@ At present the following Seldon Core wrappers provide integrations with custom m

## Example

There is an [example notebook illustrating a model with custom metrics in python](../examples/tmpl_model_with_metrics.html).
There is an [example notebook illustrating a model with custom metrics in python](../examples/custom_metrics.html).
3 changes: 3 additions & 0 deletions doc/source/examples/custom_metrics.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../examples/models/custom_metrics/customMetrics.ipynb"
}
1 change: 1 addition & 0 deletions doc/source/go/go_wrapper_link.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.. mdinclude:: ../../../incubating/wrappers/s2i/go/README.md
8 changes: 4 additions & 4 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,16 +74,16 @@ Documentation Index
Java Language Wrapper [Incubating] <java/README.md>
R Language Wrapper [ALPHA] <R/README.md>
NodeJS Language Wrapper [ALPHA] <nodejs/README.md>
Custom Language Wrapper [ALPHA] <analytics/custom_metrics.md>
Go Language Wrapper [ALPHA] <go/go_wrapper_link.rst>


.. toctree::
:maxdepth: 1
:caption: Ingress

Ambassador Ingress <ingress/ambassador.md>
Istio Ingress <ingress/istio.md>

.. toctree::
:maxdepth: 1
:caption: Production
Expand Down
80 changes: 37 additions & 43 deletions doc/source/python/python_wrapping_docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,38 +147,37 @@ These arguments can be set when deploying in a Seldon Deployment. An example can

```
"graph": {
"name": "tfserving-proxy",
"endpoint": { "type" : "REST" },
"type": "MODEL",
"children": [],
"parameters":
[
{
"name":"grpc_endpoint",
"type":"STRING",
"value":"localhost:8000"
},
{
"name":"model_name",
"type":"STRING",
"value":"mnist-model"
},
{
"name":"model_output",
"type":"STRING",
"value":"scores"
},
{
"name":"model_input",
"type":"STRING",
"value":"images"
},
{
"name":"signature_name",
"type":"STRING",
"value":"predict_images"
}
]
"name": "tfserving-proxy",
"endpoint": {"type" : "REST"},
"type": "MODEL",
"children": [],
"parameters": [
{
"name":"grpc_endpoint",
"type":"STRING",
"value":"localhost:8000"
},
{
"name":"model_name",
"type":"STRING",
"value":"mnist-model"
},
{
"name":"model_output",
"type":"STRING",
"value":"scores"
},
{
"name":"model_input",
"type":"STRING",
"value":"images"
},
{
"name":"signature_name",
"type":"STRING",
"value":"predict_images"
}
]
},
```

Expand All @@ -191,14 +190,14 @@ The allowable ```type``` values for the parameters are defined in the [proto buf

To add custom metrics to your response you can define an optional method ```metrics``` in your class that returns a list of metric dicts. An example is shown below:

```
```python
class MyModel(object):

def predict(self,X,features_names):
def predict(self, X, features_names):
return X

def metrics(self):
return [{"type":"COUNTER","key":"mycounter","value":1}]
return [{"type": "COUNTER", "key": "mycounter", "value": 1}]
```

For more details on custom metrics and the format of the metric dict see [here](../custom_metrics.md).
Expand All @@ -210,17 +209,12 @@ There is an [example notebook illustrating a model with custom metrics in python

To add custom meta data you can add an optional method ```tags``` which can return a dict of custom meta tags as shown in the example below:

```
```python
class MyModel(object):

def predict(self,X,features_names):
def predict(self, X, features_names):
return X

def tags(self):
return {"mytag":1}
return {"mytag": 1}
```





Loading