Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avp-helm-container spawns bash processes which do not get terminated #460

Closed
krausemi opened this issue Feb 1, 2023 · 3 comments
Closed

Comments

@krausemi
Copy link

krausemi commented Feb 1, 2023

Bug description

After the execution of the defined generate process for helm charts (avp-helm) the executed bash sub-processes are stuck in state "defunct" instead of being terminated.

image

The number of zombie processes is increasing rapidly and after some hours the process limits within the underlying node gets reached. By reaching the limit the node itself is unusable.

Logs

time="2023-02-01T14:35:34Z" level=info msg="sh -c find . -name 'Chart.yaml' && find . -name 'values.yaml'" dir=/tmp/_cmp_server/98b7b460-f3d6-46e0-af3c-8f1585a50f70 execID=4e02c
time="2023-02-01T14:35:34Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=MatchRepository grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-02-01T14:35:34Z" grpc.time_ms=10.849 span.kind=server system=grpc
time="2023-02-01T14:35:34Z" level=info msg="Generating manifests with no request-level timeout"
time="2023-02-01T14:35:34Z" level=info msg="bash -c helm template $ARGOCD_APP_NAME -n $ARGOCD_APP_NAMESPACE ${ARGOCD_ENV_HELM_ARGS} ${ARGOCD_ENV_HELM_OPTIONS} -f <(echo \"$ARGOCD_ENV_HELM_VALUES\") . |\nargocd-vault-plugin generate -s infrastructure:argocd-vault-plugin-credentials -\n" dir=/tmp/_cmp_server/7e16abb9-c649-49ef-af46-15e5e3d3a716 execID=562f5
time="2023-02-01T14:35:34Z" level=info msg="finished streaming call with code OK" grpc.code=OK grpc.method=GenerateManifest grpc.service=plugin.ConfigManagementPluginService grpc.start_time="2023-02-01T14:35:34Z" grpc.time_ms=78.883 span.kind=server system=grpc

Using the --verbose-sensitive-output parameter did not log more than the logs above (or I did something wrong :D).

Installation setup

Used Dockerfile for image creation

ARG ARGOCD_VERSION=2.5.7
ARG AVP_VERSION=1.13.1

FROM registry.access.redhat.com/ubi8 as download

RUN mkdir /custom-tools/ && \
    cd /custom-tools/ && \
    curl -L https://github.com/argoproj-labs/argocd-vault-plugin/releases/download/v${AVP_VERSION}/argocd-vault-plugin_${AVP_VERSION}_linux_amd64 -o argocd-vault-plugin && \
    chmod +x argocd-vault-plugin

FROM quay.io/argoproj/argocd:v${ARGOCD_VERSION} as target

COPY certs.crt /etc/ssl/certs/
COPY --from=download /custom-tools/argocd-vault-plugin /usr/local/bin/

Used values for argocd-vault-plugin sidecar installation

extraContainers:
  - name: avp-helm
    command: [/var/run/argocd/argocd-cmp-server]
    image: <internal-registry>/path/to/image/argocd-vault-plugin-sidecar:<internal tag>
    securityContext:
      runAsNonRoot: true
      runAsUser: 999
    volumeMounts:
      - mountPath: /var/run/argocd
        name: var-files
      - mountPath: /home/argocd/cmp-server/plugins
        name: plugins
      - mountPath: /tmp
        name: tmp-dir
      - mountPath: /home/argocd/cmp-server/config/plugin.yaml
        subPath: avp-helm.yaml
        name: cmp-plugin

Used config for the avp-helm-sidecar-container

apiVersion: argoproj.io/v1alpha1
kind: ConfigManagementPlugin
metadata:
  name: argocd-vault-plugin-helm
spec:
  allowConcurrency: true
  # Note: this command is run _before_ any Helm templating is done, therefore the logic is to check
  # if this looks like a Helm chart
  discover:
    find:
      command:
        - sh
        - "-c"
        - "find . -name 'Chart.yaml' && find . -name 'values.yaml'"
  generate:
    command:
      - bash
      - "-c"
      - |
        helm template $ARGOCD_APP_NAME -n $ARGOCD_APP_NAMESPACE ${ARGOCD_ENV_HELM_ARGS}     
        ${ARGOCD_ENV_HELM_OPTIONS} -f <(echo "$ARGOCD_ENV_HELM_VALUES") . | argocd-vault-plugin generate -s 
        infrastructure:argocd-vault-plugin-credentials -
  lockRepo: false

ArgoCD example application

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: dex
spec:
  destination:
    namespace: infrastructure
    server: "https://kubernetes.default.svc"
  project: default
source:
  repoURL: <internal-registry>/path/to/chart
  chart: dex
  targetRevision: 0.9.0
  plugin:
    env:
      - name: HELM_VALUES
        value: >-
          replicaCount: 1
          image:
            repository: <internal-registry>/path/to/image/dex
            pullPolicy: IfNotPresent
            tag: ""

          imagePullSecrets:
            - name: container-pullsecret
          config:
            staticClients:
              - id: grafana-client
              secret: "<path:cluster/data/testcluster/dex#grafana-client>"
              name: 'Grafana'
              redirectURIs:
              - https://grafana.domain/login/generic_oauth
      - name: HELM_OPTIONS
        value: '--include-crds'
syncPolicy:
  automated:
    selfHeal: true
  syncOptions:
    - CreateNamespace=true
    - ApplyOutOfSyncOnly=true

How to reproduce

  1. Deploy argocd together with the argocd-vault-plugin by using the above configuration.
  2. Deploy an application which contains avp placeholders like in the example above.
  3. Connect to the avp-helm container and list the running processes by executing:
    kubectl exec -it -n <namespace> <pod name> -c <container name (in my case avp-helm)> -- bash -c "ps -ef | head"
  4. Deploy some more helm charts with avp placeholders or update already present applications so that the avp-helm-sidecar has something to do.
  5. Check the number of running sub-processes in state "defunct" and count them, e.g. by using wc -l after listing.
  6. Wait a little and see how the number is growing rapidly. The more apps are deployed, the more zombie processes will be available in a short matter of time.

Expected behavior

I would expect that the sub-processes of the argocd-cmp-server, which are spawned within the avp-helm-sidecar-container, will be terminated after execution instead of being zombies.

Workaround

For a temporary workaround we implemented a cronjob which restarts the argocd-repo-server-pod (which contains the avp-helm-sidecar-container) on a daily basis. Therefore the spawned zombie processes will be killed and the node itself will not reach its process limit.

@werne2j
Copy link
Member

werne2j commented Mar 27, 2023

@krausemi I think this would be a question for Argo CD. As we do not control how the custom plugins get executed, we just tell Argo CD the command to run on init and generate. I imagine they would be responsible to those processes

You can see a similar issue here argoproj/argo-cd#8689

@krausemi
Copy link
Author

krausemi commented Mar 28, 2023

@werne2j: Thanks for the hint. I'll open up an issue there. Let's see how it goes. :)

I'll leave this issue open until I got some feedback from Argo CD team.

Edit: link to the opened issue argoproj/argo-cd#13026

@krausemi
Copy link
Author

krausemi commented Mar 31, 2023

Hi again,

The issue could be fixed with the input I got from the parent ticket I've opened.
I'll create a pull request so that you can takeover the results and other people won't run in the same error. :)

Edit: PR created - #485

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants