[Feature Request] Built-in load generator and metrics collection #684

sriumcp · 2021-05-20T21:55:36Z

Is your feature request related to a problem? Please describe.
Iter8 currently has a dependency on telemetry (example Prometheus) to collect basic data such as error rate and latency values for a service. If a built-in task could query the application and populate certain built in metrics for the given version(s), conformance and canary tests in Iter8 tutorials can be performed without dependence on any telemetry provider.

Why is the feature useful for Iter8 users?
This feature will enable users to get started with Iter8 without setting up metrics databases for telemetry.

Describe the solution you'd like
A task that can generate load and collect some standard built-in metrics like latency and error rates.

Describe alternatives you've considered
Use built in load generation without metrics collection. This nullifies significant advantages of the above proposal.

Does this issue require a design doc/discussion? If there is a link to the design document/discussions, please provide it below.

This task will make it possible to collect the following metrics for any version of any application.

Request count
Error count
Error rate
Mean latency
Median latency
75th percentile tail latency
90th percentile tail latency
95th percentile tail latency
99th percentile tail latency

For example, embedding the following task will enable collection of the above 9 metrics for default and canary versions hosted at their respective URLs. Since there is a payload URL, Iter8 will download the payload from the URL and send post requests with this payload to the two versions.

- task: metrics/collect
   with:
    payloadURL: https://raw.githubusercontent.com/kubeflow/kfserving/master/docs/samples/v1beta1/rollout/input.json
    versions:
    - name: default
      url: http://default-version.default.svc.cluster.local
    - name: canary
      url: http://canary-version.default.svc.cluster.local

The task will also support a loadOnly option, which when set to true, will not collect any metrics but will simply generate load. Here is a variation of the above task with loadOnly set to true. Since there is no payload now, the request will be GET requests as opposed to POST above.

- task: metrics/collect
   with:
    loadOnly: true
    versions:
    - name: default
      url: http://default-version.default.svc.cluster.local
    - name: canary
      url: http://canary-version.default.svc.cluster.local

How will this feature be tested?

Unit tested in handler repo
CRD changes will be unit tested in etc3 repo. In particular, etc3 interactions with analytics should not overwrite existing metrics histograms in status.
Unit tested in analytics repo
Docker image of handler will be integration tested as part of at least one tutorial which will be converted to use built-in metrics collector (eventually most tutorials will shift to builtin metrics)

How will this feature be documented?

iter8.tools will have this task description documented
Knative conformance tutorial & experiment will use builtin metrics

The text was updated successfully, but these errors were encountered:

kalantar · 2021-05-21T12:23:27Z

I don't understand how a handler can be used to implement this. The controller waits for handlers to complete before proceeding and this appears to be a task that must run throughout the duration of the experiment.

sriumcp · 2021-05-21T13:09:28Z

The task will run, collect the metrics using fortio and update the experiment status fields corresponding to metrics -- all in one synchronous step.

The task can be invoked multiple times throughout the course of an experiment (using loop actions), in which case, the task will not overwrite the older metrics, but aggregate newly collected values with the older ones.

This is possible because of the way fortio generates the metrics in the form of histograms.

kalantar · 2021-05-21T15:11:06Z

If I understand correctly, we would typically define such an experiment with a very small duration.intervalSecond; the majority of the time would be in this collection job instead.

sriumcp · 2021-05-21T15:57:11Z

Yes, the above set up makes sense to me.

sriumcp · 2021-06-01T14:12:29Z

@huang195 Slightly updated design discussion above.

sriumcp added the kind/enhancement New feature or request label May 20, 2021

sriumcp self-assigned this May 20, 2021

sriumcp mentioned this issue May 25, 2021

[Feature Request] Store intermediate results from built-in metrics collection in experiment yaml #686

Closed

sriumcp added area/tasks Iter8 tasks area/analytics Metrics, statistical estimation, and bandit algorithms for traffic shifting area/install Iter8 installation and packaging and removed area/install Iter8 installation and packaging labels May 28, 2021

This was referenced Jun 4, 2021

General improvements for iter8.tools text and Iter8 repo issue/PR templates #732

Merged

install and doc changes needed to introduce builtin metrics #750

Merged

[Feature Request] Make urlTemplate and jqExpression fields optional in metric definitions #751

Closed

sriumcp closed this as completed in #750 Jun 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Built-in load generator and metrics collection #684

[Feature Request] Built-in load generator and metrics collection #684

sriumcp commented May 20, 2021 •

edited

Loading

kalantar commented May 21, 2021

sriumcp commented May 21, 2021 •

edited

Loading

kalantar commented May 21, 2021

sriumcp commented May 21, 2021

sriumcp commented Jun 1, 2021

[Feature Request] Built-in load generator and metrics collection #684

[Feature Request] Built-in load generator and metrics collection #684

Comments

sriumcp commented May 20, 2021 • edited Loading

kalantar commented May 21, 2021

sriumcp commented May 21, 2021 • edited Loading

kalantar commented May 21, 2021

sriumcp commented May 21, 2021

sriumcp commented Jun 1, 2021

sriumcp commented May 20, 2021 •

edited

Loading

sriumcp commented May 21, 2021 •

edited

Loading