Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Built-in load generator and metrics collection #684

Closed
6 tasks done
sriumcp opened this issue May 20, 2021 · 5 comments · Fixed by #750
Closed
6 tasks done

[Feature Request] Built-in load generator and metrics collection #684

sriumcp opened this issue May 20, 2021 · 5 comments · Fixed by #750
Assignees
Labels
area/analytics Metrics, statistical estimation, and bandit algorithms for traffic shifting area/tasks Iter8 tasks kind/enhancement New feature or request

Comments

@sriumcp
Copy link
Member

sriumcp commented May 20, 2021

Is your feature request related to a problem? Please describe.
Iter8 currently has a dependency on telemetry (example Prometheus) to collect basic data such as error rate and latency values for a service. If a built-in task could query the application and populate certain built in metrics for the given version(s), conformance and canary tests in Iter8 tutorials can be performed without dependence on any telemetry provider.

Why is the feature useful for Iter8 users?
This feature will enable users to get started with Iter8 without setting up metrics databases for telemetry.

Describe the solution you'd like
A task that can generate load and collect some standard built-in metrics like latency and error rates.

Describe alternatives you've considered
Use built in load generation without metrics collection. This nullifies significant advantages of the above proposal.

Does this issue require a design doc/discussion? If there is a link to the design document/discussions, please provide it below.

This task will make it possible to collect the following metrics for any version of any application.

  1. Request count
  2. Error count
  3. Error rate
  4. Mean latency
  5. Median latency
  6. 75th percentile tail latency
  7. 90th percentile tail latency
  8. 95th percentile tail latency
  9. 99th percentile tail latency

For example, embedding the following task will enable collection of the above 9 metrics for default and canary versions hosted at their respective URLs. Since there is a payload URL, Iter8 will download the payload from the URL and send post requests with this payload to the two versions.

- task: metrics/collect
   with:
    payloadURL: https://raw.githubusercontent.com/kubeflow/kfserving/master/docs/samples/v1beta1/rollout/input.json
    versions:
    - name: default
      url: http://default-version.default.svc.cluster.local
    - name: canary
      url: http://canary-version.default.svc.cluster.local

The task will also support a loadOnly option, which when set to true, will not collect any metrics but will simply generate load. Here is a variation of the above task with loadOnly set to true. Since there is no payload now, the request will be GET requests as opposed to POST above.

- task: metrics/collect
   with:
    loadOnly: true
    versions:
    - name: default
      url: http://default-version.default.svc.cluster.local
    - name: canary
      url: http://canary-version.default.svc.cluster.local

How will this feature be tested?

  • Unit tested in handler repo
  • CRD changes will be unit tested in etc3 repo. In particular, etc3 interactions with analytics should not overwrite existing metrics histograms in status.
  • Unit tested in analytics repo
  • Docker image of handler will be integration tested as part of at least one tutorial which will be converted to use built-in metrics collector (eventually most tutorials will shift to builtin metrics)

How will this feature be documented?

  • iter8.tools will have this task description documented
  • Knative conformance tutorial & experiment will use builtin metrics
@sriumcp sriumcp added the kind/enhancement New feature or request label May 20, 2021
@sriumcp sriumcp self-assigned this May 20, 2021
@kalantar
Copy link
Member

I don't understand how a handler can be used to implement this. The controller waits for handlers to complete before proceeding and this appears to be a task that must run throughout the duration of the experiment.

@sriumcp
Copy link
Member Author

sriumcp commented May 21, 2021

The task will run, collect the metrics using fortio and update the experiment status fields corresponding to metrics -- all in one synchronous step.

The task can be invoked multiple times throughout the course of an experiment (using loop actions), in which case, the task will not overwrite the older metrics, but aggregate newly collected values with the older ones.

This is possible because of the way fortio generates the metrics in the form of histograms.

@kalantar
Copy link
Member

If I understand correctly, we would typically define such an experiment with a very small duration.intervalSecond; the majority of the time would be in this collection job instead.

@sriumcp
Copy link
Member Author

sriumcp commented May 21, 2021

Yes, the above set up makes sense to me.

@sriumcp sriumcp added area/tasks Iter8 tasks area/analytics Metrics, statistical estimation, and bandit algorithms for traffic shifting area/install Iter8 installation and packaging and removed area/install Iter8 installation and packaging labels May 28, 2021
@sriumcp
Copy link
Member Author

sriumcp commented Jun 1, 2021

@huang195 Slightly updated design discussion above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/analytics Metrics, statistical estimation, and bandit algorithms for traffic shifting area/tasks Iter8 tasks kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants