Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traefikservices.traefik.containo.us not found error when using traefik.io/v1alpha1 TraefikService #3615

Closed
2 tasks done
evega-ws opened this issue Jun 4, 2024 · 11 comments
Closed
2 tasks done
Labels
bug Something isn't working

Comments

@evega-ws
Copy link

evega-ws commented Jun 4, 2024

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug
Trying to use Traefik 3.0.0 with Argo Rollouts which does not include Traefik 1.X CRD traefik.containo.us.

Using a TraefikService object with apiVersion: traefik.io/v1alpha1 fails with the following error

traefikservices.traefik.containo.us "my-service" not found

Additional logs

{"event_reason":"TrafficRoutingError","level":"warning","msg":"traefikservices.traefik.containo.us \"my-service\" not found","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"generation":8,"level":"error","msg":"roCtx.reconcile err traefikservices.traefik.containo.us \"my-service\" not found","namespace":"default","resourceVersion":"559279197","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"generation":8,"level":"info","msg":"Reconciliation completed","namespace":"default","resourceVersion":"559279197","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z","time_ms":4.942671}
{"level":"error","msg":"rollout syncHandler error: traefikservices.traefik.containo.us \"canary-service\" not found","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"level":"info","msg":"rollout syncHandler queue retries: 132 : key \"default/example-rollout-canary\"","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}

To Reproduce
ArgoCD Canary Rollout with trafficRouting configured to use traefik

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: example-rollout-canary
spec:
  replicas: 2
  selector:
    matchLabels:
      app: example-rollout-canary
  template:
    metadata:
      labels:
        app: example-rollout-canary
    spec:
      containers:
      - name: example-rollout-canary
        image: argoproj/rollouts-demo:blue
        ports:
        - name: http
          containerPort: 8080
          protocl: TCP
  strategy:
    canary:
      canaryService: canary-preview
      stableService: canary-endpoint
      maxUnavailable: 1
      steps:
      - setWeight: 20 
      - pause: {duration: 5m}
      - setWeight: 40
      - pause: {duration: 5m}
      trafficRouting:
        traefik:
          weightedTraefikServiceName: my-service

TraefikService using the 2.x API traefik.io/v1alpha1

apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
  name: my-service
spec:
  weighted:
    services:
    - name: canary-endpoint
      port: 80
    - name: canary-preview
      port: 80

Canary Service objects for Traefik to route to

apiVersion: v1
kind: Service
metadata:
  name: canary-endpoint
spec:
  selector:
    app: example-rollout-canary
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http
--- 
apiVersion: v1
kind: Service
metadata:
  name: canary-preview
spec:
  selector:
    app: example-rollout-canary
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http

Expected behavior
Argo Rollouts controller is able to look up and reference TraefikService resources using the newest API version.

Screenshots
image

Version
v1.6.6

Logs

{"event_reason":"TrafficRoutingError","level":"warning","msg":"traefikservices.traefik.containo.us \"my-service\" not found","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"generation":8,"level":"error","msg":"roCtx.reconcile err traefikservices.traefik.containo.us \"my-service\" not found","namespace":"default","resourceVersion":"559279197","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"generation":8,"level":"info","msg":"Reconciliation completed","namespace":"default","resourceVersion":"559279197","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z","time_ms":4.942671}
{"level":"error","msg":"rollout syncHandler error: traefikservices.traefik.containo.us \"canary-service\" not found","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}
{"level":"info","msg":"rollout syncHandler queue retries: 132 : key \"default/example-rollout-canary\"","namespace":"default","rollout":"example-rollout-canary","time":"2024-06-04T21:34:52Z"}

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@evega-ws evega-ws added the bug Something isn't working label Jun 4, 2024
@zachaller
Copy link
Collaborator

Did you try setting these flags

command.Flags().StringVar(&traefikAPIGroup, "traefik-api-group", defaults.DefaultTraefikAPIGroup, "Set the default Traerfik apiGroup that controller uses.")
command.Flags().StringVar(&traefikVersion, "traefik-api-version", defaults.DefaultTraefikVersion, "Set the default Traerfik apiVersion that controller uses.")

@smutoni2022
Copy link

@zachaller I am having the same issue. Did setting the flags work for you? and if so how do we set the flags in argo-rollout deployment?

@smutoni2022
Copy link

@evega-ws do you have any progress on this bug because I am having the same error as you described above.

@evega-ws
Copy link
Author

evega-ws commented Jul 5, 2024

@smutoni2022 Unfortunately I have not been able to fix this. Given that I am using helm I've set the flags as follows

controller:
  extraArgs:
  - "--traefik-api-group=traefik.io"
  - "--traefik-api-version=traefik.io/v1alpha1"

I used --traefik-api-version=traefik.io/v1alpha1 as seen in the tests file https://github.com/argoproj/argo-rollouts/blob/master/utils/defaults/defaults_test.go#L406 . The code seems to reflect this is the correct syntax.

	group := defaults.GetTraefikAPIGroup()
	parts := strings.Split(defaults.GetTraefikVersion(), "/")
...
	SetTraefikAPIGroup("traefik.containo.us")
	assert.Equal(t, "traefik.containo.us", GetTraefikAPIGroup())
	SetTraefikAPIGroup(DefaultTraefikAPIGroup)
	assert.Equal(t, DefaultTraefikAPIGroup, GetTraefikAPIGroup())

	SetTraefikVersion("traefik.containo.us/v1alpha1")
	assert.Equal(t, "traefik.containo.us/v1alpha1", GetTraefikVersion())
	SetTraefikVersion(DefaultTraefikVersion)
	assert.Equal(t, DefaultTraefikVersion, GetTraefikVersion())

The flag seems to be applied correctly, however it is still unable to pick up my TraefikService.

{"event_reason":"TrafficRoutingError","level":"warning","msg":"my-service.traefik.io is forbidden: User \"system:serviceaccount:argocd:argo-rollouts\" cannot list resource \"my-service\" in API group \"traefik.io\" in the namespace \"templates\"","namespace":"templates","rollout":"example-rollout-canary","time":"2024-07-05T23:21:28Z"}

This makes sense as it is trying to list a resource type called my-service.traefik.io which should not exist. It should be a traefikservices.traefik.io type of resource, with a name of my-service. The previous error is a good example of how it should work

traefikservices.traefik.containo.us "my-service" not found
NOT
my-service.traefik.containo.us

Changing the ClusterRole permissions to add list to the apiGroups: -traefik.io option makes no difference, in case the role was a missing list permission.

Unsure as to how to proceed, given that we are using traefik > v3.0 we are not in a position to fall back to the deprecated traefik.containo.us apigroup.

@smutoni2022
Copy link

@evega-ws I have tried the same arguments in my helm chart as well and I got same error . Is it possible to reopen this issue to get more visibility from others?

@evega-ws
Copy link
Author

evega-ws commented Jul 9, 2024

@smutoni2022 I am unable to re-open the issue, perhaps @zachaller could re-open the issue if warranted? It doesn't look like the flags usage works in this case.

@smutoni2022
Copy link

@zachaller @BrunoTarijon This fix is not working. I tested it by upgrading to the latest argo-rollout helm chart and upgrade traefik api to traefik.io. I still get the same error about the service not being found. Can you explain how we can implement this fix beyond what I did,

@BrunoTarijon
Copy link
Contributor

BrunoTarijon commented Aug 9, 2024

Hey, I don't think that my changes are in the latest release (1.7.1), I build the image from the master branch. Maybe it is in the 1.7.2 release.
The 1.7.1 release is from june 24

@smutoni2022
Copy link

@BrunoTarijon @zachaller I have tested this again with the latest release of 1.7.2. No luck . It still shows the service not found error mentioned before. I am not sure if there is an extra config I need to make in the chart other than updating the chart version.

@BrunoTarijon
Copy link
Contributor

@smutoni2022, I have just installed the argo-rollouts in a new local cluster (1.7.2 release) and add the arg to the deployment

      args:
        - --traefik-api-group=traefik.io
        - --traefik-api-version=traefik.io/v1alpha1

everything seems to work

apiVersion: traefik.io/v1alpha1
kind: TraefikService
metadata:
  name: traefik-service
spec:
  weighted:
    services:
      - name: nginx-canary
        port: 80
      - name: nginx-stable
        port: 80
---

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollouts-demo
spec:
  selector:
    matchLabels:
      run: nginx
  template:
    metadata:
      labels:
        run: nginx
    spec:
      containers:
      - name: guestbook
        image: argoproj/rollouts-demo:blue
  replicas: 5
  strategy:
    canary:
      canaryService: nginx-canary
      stableService: nginx-stable
      trafficRouting:
        traefik:
          weightedTraefikServiceName: traefik-service 
      steps:
      - setWeight: 40
      - pause: {duration: 10}
      - setWeight: 60
      - pause: {duration: 10}
      - setWeight: 80
      - pause: {duration: 10}

---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: nginx
  name: nginx-stable
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: nginx
  name: nginx-canary
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx

Maybe if you shared more info I can help you.

@smutoni2022
Copy link

@BrunoTarijon I was missing the arguments. I added the extra ergs in the values file and works fine now. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants