Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALB Target Group name is not consistent with CloudWatch metrics dimensions #2589

Closed
2 tasks done
danil-smirnov opened this issue Feb 15, 2023 · 8 comments · Fixed by #2604
Closed
2 tasks done

ALB Target Group name is not consistent with CloudWatch metrics dimensions #2589

danil-smirnov opened this issue Feb 15, 2023 · 8 comments · Fixed by #2604
Labels
bug Something isn't working

Comments

@danil-smirnov
Copy link
Contributor

danil-smirnov commented Feb 15, 2023

Checklist:

  • I've included steps to reproduce the bug.
  • I've inclued the version of argo rollouts.

Describe the bug

We are trying to leverage the status.alb.canaryTargetGroup.name value taken from Rollout object in our AnalysisTemplate as suggested in #1242.

We are going to use this value in a query to CloudWatch to filter metrics by a dimension representing the canary target group, e.g. like this:

        - {
            "id": "errors",
            "metricStat": {
              "metric": {
                "namespace": "AWS/ApplicationELB",
                "metricName": "HTTPCode_Target_5XX_Count",
                "dimensions": [
                  {
                      "name": "TargetGroup",
                      "value": "{{`targetgroup/{{args.targetGroupName }}`}}"
                  }
                ]
              },
              "period": 300,
              "stat": "Sum",
              "unit": "Count"
            }
          }

It should be possible as per #1241 that says:

In order to query AWS CloudWatch for metrics about a canary service, we need to know the canary TargetGroup. However, the target group name is not easily known to the user. By filling in the target group information under the rollout status, it would allow users to perform queries about metrics specific to the canary target group by passing the target group name to the cloudwatch query

However, when trying to use the value we discovered that it differs from the metrics dimension used by CloudWatch, for example:

status.alb.canaryTargetGroup.name = k8s-default-damagere-a867372dec

while CloudWatch metrics dimension = k8s-default-damagere-a867372dec/78abd55d69ba13b9

The latter value is a part of the status.alb.canaryTargetGroup.arn parameter but there seems no way to extract it from the value on the way from the Rollout object to the AnalysisTemplate query.

Thus, we don't see how to use the status.alb.canaryTargetGroup.name in real life for CloudWatch metrics-based analysis.

It would be great if we can fill in one more parameter status.alb.canaryTargetGroup.fullName with the full Target Group name that can be used in CW queries directly.

To Reproduce

Expected behavior

Screenshots

Version

We use Argo Rollout helm chart v.2.22.2

Logs

# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@danil-smirnov danil-smirnov added the bug Something isn't working label Feb 15, 2023
@danil-smirnov
Copy link
Contributor Author

Example fix:
danil-smirnov#1

@zachaller
Copy link
Collaborator

zachaller commented Feb 17, 2023

I took a quick peak at the AWS docs and this all seems to make sense would you mind opening a PR with the fix?

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-cloudwatch-metrics.html

@danil-smirnov
Copy link
Contributor Author

Hi @zachaller , I've created the PR #2604 but the build step is failing. I've checked the error message and it seems irrelevant to my fix. Could you lookt at it, please?

@danil-smirnov
Copy link
Contributor Author

@zachaller I've rebased from the master but some tests are still failing with quite a vague error... Could you provide some hints, please?

@zachaller
Copy link
Collaborator

@danil-smirnov the TestWritebackInformer one you can ignore tha that is flakey but this test seems to be failing as well

=== FAIL: rollout/trafficrouting/alb TestVerifyWeight (0.41s)
time="2023-02-21T17:18:48Z" level=warning msg="Failed to find load balancer: verify-weight-test-abc-123.us-west-2.elb.amazonaws.com" event_reason=LoadBalancerNotFound namespace=default rollout=rollout
time="2023-02-21T17:18:48Z" level=warning msg="Failed to find load balancer: verify-weight-test-abc-123.us-west-2.elb.amazonaws.com" event_reason=LoadBalancerNotFound namespace=default rollout=rollout
panic: runtime error: slice bounds out of range [2:1] [recovered]
	panic: runtime error: slice bounds out of range [2:1]

goroutine 266 [running]:
testing.tRunner.func1.2({0x1db1020, 0xc000411c38})
	/opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
	/opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1399 +0x39f
panic({0x1db1020, 0xc000411c38})
	/opt/hostedtoolcache/go/1.19.6/x64/src/runtime/panic.go:884 +0x212
github.com/argoproj/argo-rollouts/rollout/trafficrouting/alb.(*Reconciler).VerifyWeight(0xc000677170, 0xa, {0x0, 0x0, 0x0?})
	/home/runner/work/argo-rollouts/argo-rollouts/rollout/trafficrouting/alb/alb.go:234 +0x1f12
github.com/argoproj/argo-rollouts/rollout/trafficrouting/alb.TestVerifyWeight(0xc0000d2680)
	/home/runner/work/argo-rollouts/argo-rollouts/rollout/trafficrouting/alb/alb_test.go:578 +0x711
testing.tRunner(0xc0000d2680, 0x2052a88)
	/opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
	/opt/hostedtoolcache/go/1.19.6/x64/src/testing/testing.go:1493 +0x35f

@danil-smirnov
Copy link
Contributor Author

Thank you @zachaller ! The checks are green now.

@danil-smirnov
Copy link
Contributor Author

@zachaller I've added one more unit test and all checks are green now.

@zachaller
Copy link
Collaborator

@danil-smirnov thank you, I should be able to review it later today or tomorrow

zachaller added a commit that referenced this issue Feb 25, 2023
Fixes #2589 (#2604)

* Adding status.alb.canaryTargetGroup.fullName for ALB

Signed-off-by: Danil Smirnov <[email protected]>

* Fixes

Signed-off-by: Danil Smirnov <[email protected]>

* Formatting fix

Signed-off-by: Danil Smirnov <[email protected]>

* Fixes

Signed-off-by: Danil Smirnov <[email protected]>

* Fixes

Signed-off-by: Danil Smirnov <[email protected]>

* Fixes

Signed-off-by: Danil Smirnov <[email protected]>

* Formatting fix

Signed-off-by: Danil Smirnov <[email protected]>

* run make codegen

Signed-off-by: zachaller <[email protected]>

* run make codegen

Signed-off-by: zachaller <[email protected]>

* ALB tests fix

Signed-off-by: Danil Smirnov <[email protected]>

* ALB tests fix

Signed-off-by: Danil Smirnov <[email protected]>

* Fixes

Signed-off-by: Danil Smirnov <[email protected]>

* Adding unit test

Signed-off-by: Danil Smirnov <[email protected]>

---------

Signed-off-by: Danil Smirnov <[email protected]>
Signed-off-by: zachaller <[email protected]>
Co-authored-by: zachaller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants