Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable canary analysis for worker nodes (no web traffic) #361

Closed
giggio opened this issue Nov 7, 2019 · 2 comments
Closed

Enable canary analysis for worker nodes (no web traffic) #361

giggio opened this issue Nov 7, 2019 · 2 comments
Labels
question Further information is requested

Comments

@giggio
Copy link

giggio commented Nov 7, 2019

I have a worker pod that I want to go through a canary process with Flagger, but I haven't found a way to do it without skipping analysis with spec.skipAnalysis: true. This is because the pod does not get any web traffic (besides scrapes from Prometheus).
The pod is producing metrics, and I have a custom metric setup, but the canary fails because of the pod is not receiving traffic. This is the result of the describe:

  Type     Reason  Age                     From     Message
  ----     ------  ----                    ----     -------
  Warning  Synced  8m54s                   flagger  Halt advancement temp-worker-primary.temp waiting for rollout to finish: 0 of 1 updated replicas are available    Normal   Synced  8m36s                   flagger  Initialization done! temp-worker.temp
  Normal   Synced  7m37s                   flagger  New revision detected! Scaling up temp-worker.temp
  Normal   Synced  7m17s                   flagger  Starting canary analysis for temp-worker.temp
  Normal   Synced  7m17s                   flagger  Advance temp-worker.temp canary weight 10
  Warning  Synced  3m57s (x10 over 6m57s)  flagger  Halt advancement no values found for metric Server up percentage probably temp-worker.temp is not receiving traffic
  Warning  Synced  3m37s                   flagger  Rolling back temp-worker.temp failed checks threshold reached 10
  Warning  Synced  3m35s                   flagger  Canary failed! Scaling down temp-worker.temp

Is there a way to get this to work with Flagger today? If not, can this be implemented, maybe a spec.skipTrafficAnalysis setting.

@stefanprodan
Copy link
Member

The query behind Server up percentage is not retuning any values from Prometheus, it shouldn't mention anything about traffic, I'll remove that part of the log.

@stefanprodan stefanprodan added the question Further information is requested label Nov 7, 2019
@stefanprodan
Copy link
Member

stefanprodan commented Nov 7, 2019

For apps that don't receive traffic but have a http endpoint like metrics and health a blue/green style canary can be used. Here is an example:

  canaryAnalysis:
    interval: 30s
    threshold: 2
    iterations: 10
    # run the curl check ten times at 30s interval
    webhooks:
      - name: health-test
        type: pre-rollout
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          type: bash
          cmd: "curl --fail http://temp-worker.temp:8080/health"
      - name: readiness-test
        type: rollout
        url: http://flagger-loadtester.test/
        timeout: 1s
        metadata:
          type: bash
          cmd: "curl -s http://temp-worker.temp:8080/metrics | grep some_metric_name"

Note that there are no metrics used, the analysis will take 300 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants