Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive requests to git repository on commit to monorepo w/ multi-source apps #14725

Closed
de-slalonde opened this issue Jul 26, 2023 · 17 comments · Fixed by #16410, #16501 or #17109
Closed
Labels
bug Something isn't working multi-source-apps Bugs or enhancements related to multi-source Applications.

Comments

@de-slalonde
Copy link

de-slalonde commented Jul 26, 2023

Checklist:

  • [x ] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • [x ] I've included steps to reproduce the bug.
  • [x ] I've pasted the output of argocd version.

Describe the bug

With 71 multi-source applications in a monorepo (in Bitbucket), a single commit triggers 271 requests to the git repository (11 fetch, 260 ls-remote). Logs show 333 "getting manifest cache" entries, with 15 cache hits and 318 cache misses. Example application sources:

sources:
  - path: manifests/my-app/
    repoURL: [email protected]:my-company/argo.git
    targetRevision: HEAD
  - chart: my-chart
    helm:
      releaseName: my-release
      valueFiles:
      - $values/values/my-app-values.yaml
    repoURL: my-helm-repo/charts
    targetRevision: 0.1.0
  - ref: values
    repoURL: [email protected]:my-company/argo.git
    targetRevision: HEAD

To Reproduce

Make any commit to the repository.

Expected behavior

Far fewer requests to the git repository and more cache hits.

Screenshots

N/A

Version

argocd: v2.7.7+4650bb2.dirty
  BuildDate: 2023-07-05T19:42:32Z
  GitCommit: 4650bb2817c3c81405f40cf77e93ef2b5fb275fb
  GitTreeState: dirty
  GoVersion: go1.19.10
  Compiler: gc
  Platform: linux/amd64

Logs

msg="getting manifests cache" appName=tenant-1 appSrc="{\"appSrc\":{\"repoURL\":\"\",\"ref\":\"values\"},\"srcRefs\":{\"$values\":{\"repoURL\":\"[email protected]:my-company/argo-dev.git\",\"project\":\"\",\"targetRevision\":\"HEAD\",\"chart\":\"\"}}}

In depth discussion with @crenshaw-dev on Slack with more detail. https://cloud-native.slack.com/archives/C01TSERG0KZ/p1690317861587929

Discussed at Argo SIG Scalability meeting on July and @ishitasequeira volunteered to take a look.

@de-slalonde de-slalonde added the bug Something isn't working label Jul 26, 2023
@de-slalonde
Copy link
Author

de-slalonde commented Jul 26, 2023

I've created a public repo that also exhibits this issue: https://bitbucket.org/sam-lalonde/argocd-14725. I created the app of apps defined in app-of-apps.yaml at the root of the repo. I have 10 nginx apps configured, with a custom values.yaml in the same repo, and a directory with extra manifests (just a configmap) also in the same repo. Once everything is settled down, if I change a value for a single app, for example, change revisionHistoryLimit in one of the values.yaml and commit, the webhook informs argo and in the logs I get:

getting manifests cache: 83
cache hit: 23
cache miss: 60

From my monitoring I can see:
git fetch: 6
git ls-remote: 52

That's quite a few git requests for just 10 apps (and I'm only updating 1), I'm hoping to run 500+ apps.

I don't exactly know how the code behaves, but it seems like when Argo receives the webhook from Bitbucket it should:

  1. Pull the repo (once) so the cache is up to date (apps wait during this?)
  2. Update all the apps, which should now all get successful cache hits

Currently when the webhook is received all apps are simultaneously clobbering Bitbucket.

@de-slalonde de-slalonde changed the title Excessive requests to git repository on commit to monrepo Excessive requests to git repository on commit to monorepo Jul 26, 2023
@crenshaw-dev crenshaw-dev changed the title Excessive requests to git repository on commit to monorepo Excessive requests to git repository on commit to monorepo w/ multi-source apps Jul 26, 2023
@ishitasequeira
Copy link
Member

Thanks @de-slalonde for the detailed explanation and an example.

One thing I noticed is that we are using the same repository with same TargetRevision (HEAD) in every app within app-of-apps. The repo is being used as a source for app-of-apps, as one of the sources and as a ref source within each app. So, whenever there is a new commit, it tries updating the app-of-apps, each app inside app-of-apps, and each source inside each app.

The point to consider here - argocd should have used the cache, once updated by any app, because the Repo and the Revision(HEAD) used in every app and source are same.

Still investigating 🤔 🔍 🔬

@de-slalonde
Copy link
Author

In the logs I can see dozens of cache misses in the first 500ms after the change is detected. They couldn't possibly get an updated cache when it's all happening so fast. (This is just a small part of the logs).

image

Is the app of apps pattern making this issue even worse for me? Our plan was to have ALL our apps in a single app-of-apps, but we could consider breaking things up into groups of apps-of-apps, or I suppose (not ideal) move away from app-of-apps altogether and make API calls to create and delete apps instead of just dropping them into the repo.

@crenshaw-dev
Copy link
Member

I'd be very surprised if the issue were the app-of-apps. I think our investigation showed that most of the attempts to hit the cache are from the ref: values source of each app. I suspect that we're not effectively de-duping those requests, maybe because we're not holding a lock on the repo at the target revision like (I believe) we do for non-ref sources.

@de-slalonde
Copy link
Author

de-slalonde commented Jul 27, 2023

@crenshaw-dev Once the repo has been pulled, and the cache has been updated, does Argo then have the information that wasn't available in the Bitbucket webhook payload to help it know what has actually changed so that unaffected applications can be just "nudged to the next revision" (or whatever that was you mentioned in the meeting).

@crenshaw-dev
Copy link
Member

Even though it probably has access to the necessary info, the repo-server doesn't do git diff analysis to determine whether it should rebuild manifests. If there's a new commit, it rebuilds. There is an open issue to change this.

In other words, we should expect plenty of helm template calls. But not a bunch of git fetch or git ls-remote calls. Since the repo info is the same for every app, a single ls-remote and fetch should be fine. But since we don't currently have a way to know whether a new helm template is necessary for each app, we must do one for every app.

@s7an-it
Copy link

s7an-it commented Jul 31, 2023

Are you using webhooks by the way and path based computation to decide if cache should be used or not as last time (2 months I tested in Bitbucket it wasn't working at all).
Outside of that I had a talk with BB team regarding IP request from Argo and I am on quite low number (I was investigating some rate limits and turned out to not be related), I wonder if that changed in past few weeks and I am in the app of app of app of app of app of app patterns with thounds of apps. I run still 2.6.9 but wll switch in to 2.8.1 probably. But this thread made me examine my data as I am very heavy user on Bitbucket, I will try to post some info from my end as well to see if it can somehow be evaded. I am curious as I saw the big BB PR revamp so it might be related.
On a side note, from my end with a lot of dramatic experience on BB x Argo I highly suggest to evade mono repo, my life expectancy went up since I moved to multi one in terms of overall performance.

@phanama
Copy link
Contributor

phanama commented Aug 25, 2023

@de-slalonde @ishitasequeira @crenshaw-dev

I recently encountered a similar issue, using a monorepo w/multi-source apps generated by AppSet. The repo-server logs showed mostly cache miss, something like:

level=info msg="manifest cache miss: &ApplicationSource{RepoURL:https://REPO_URL.git,Path:,TargetRevision:master,Helm:nil,Kustomize:nil,Directory:nil,Plugin:nil,Chart:,Ref:values,}/SHA1_VALUE"

Looking at the cache miss logs, most have Ref:values, making me suspect the helm values ref source as the cause. I somehow fixed the cache miss by adding specific directory to the ref source:

sources:
  - directory:
      exclude: * #necessary so that argocd doesn't consume and try to manage the argocd-able files in the path
    path: my-app/
    repoURL: https://REPO_URL
    targetRevision: HEAD
    ref: values
  - chart: CHART_NAME
    repoURL: CHART_URL
    targetRevision: v0.0.1
    helm:
      version: v3
      releaseName: RELEASE_NAME
      valueFiles:
      - $values/my-app/values.yaml

After doing this, our repo-server now shows 100% cache hit:

level=info msg="manifest cache hit: &ApplicationSource{RepoURL:https://REPO_URL.git,Path:my-app,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:&ApplicationSourceDirectory{Recurse:false,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:*,Include:,},Plugin:nil,Chart:,Ref:values,}/SHA1"`
...

Our repo-server then shows an almost-total decrease of the git fetch metrics; though the git ls-remote number stays the same.

image

@crenshaw-dev
Copy link
Member

I'm sitting at my desk chuckling at this workaround. That's an incredible find. Makes me think there must be something wrong with how we populate the cache when the source is a values-only source (i.e. doesn't contribute any manifests to the Application). Adding a no-manifests manifest source must force it to be cached properly.

@ryan-dyer-sp
Copy link

We also have encountered this issue and the workaround provided here worked for us.

@tribu
Copy link

tribu commented Sep 4, 2023

+1 to this issue. In my case we have a mono git repo and multi apps/charts with a dependency chart. The above work around didnot fix the issue for us. We are still facing ~12k git requests per minute. also in our case the requests were"Cache Hit" and still causing rate limit details are here

@phanama
Copy link
Contributor

phanama commented Sep 4, 2023

@tribu Are you still seeing both git ls-remote and git fetch requests or just git ls-remote requests?

@de-slalonde
Copy link
Author

In my case I was able to use the concepts from that workaround to cut down on my git fetch, but my git ls-remote are still through the roof.

@ryan-dyer-sp
Copy link

ryan-dyer-sp commented Sep 5, 2023

I feel there is something similar in regards to the behavior of ls-remote. I have effectively 5 mono repos. Looking at the ls-remote metrics for each repo we see (per hour)

repo1 - 75
repo2 - 480
repo3 - 600
repo4 - 5300
repo5 - 4500

all of these mono repos, except repo4 have the same number of applications defined to pull values files from these repos. repo4 has 4x the amount of apps defined.

in the case of repo1 the file is a singular static file across all apps. eg.

valueFiles:
- $values/file.yaml

In the case of repo2 and repo3, they use a variable based on the cluster name. eg.

valueFiles:
- $values/{{ .Values.clusterName}}.yaml

In the case of repo4, they use a variable like repo2/3, but on 4 of the clusters, there is an additional variable

valueFiles:
- $values/{{ .Values.clusterName}}.yaml
{{- if special cluster }}
- $values/{{ .Values.environment }}.yaml
- $values/file.yaml
{{- end }}

and finally in the case of repo5, each of the 4 apps (per cluster) references a different static file. This one I find the most interesting as I would simply expect this to produce 4x the volume of ls-remotes as repo1, not 60x. eg
app1

- $values/app1.yaml

app2

- $values/app2.yaml

The other thing that may be of interest is that on the apps for repo4/5 which have the huge volume of requests, we utilize ignoreDifferences, which we do not do on any other applications.

@andrleite
Copy link

@phanama
+1 We tried the suggested workaround by adding the path to the ref source. We got the same result. Reduce the git fetch API calls but ls-remote is still the same. Similar behavior we mentioned #12878

@laurentiusoica
Copy link

laurentiusoica commented Sep 29, 2023

Same here. Multi source apps with an OCI registry as a source for helm charts and a single Git repository as a source for configuration values. In my case Argo is short polling every 3 minutes; no webhooks configured. I can see excessive number of git requests not just on a new commit, but for every 3 minutes.
After applying the work around fetch does not grow anymore. but ls-remote still growing. Although there is a single git repository serving config files, the repo server runs several ls-remote for each new reconciliation cycle.

@LiamMccafferty-bud
Copy link

LiamMccafferty-bud commented Oct 18, 2023

Can confirm we are also experiencing the same behaviour deploying from a mono-repo hosting lots of values files separate from the git repo we deploy the helm chart from.

We added all the files we care about to a section like below and can confirm we are still getting dozens of cache misses a second.

argo version: 2.8.4+c279299

sources:
- repoUrl:  https://repo-which-contains-only-yaml-files.github.com
   targetRevison: main 
   directory:
      includes: "{file1.yaml, path-to/file2.yaml}"
      excludes: "*"
    ref: values
 - repoURL: https://repo-which-contains-a-helm-chart.github.com
    targetRevision: myversion
    path: path-to-chart
     helm:
      valueFiles: 
        - $values/file1.yaml 
        - $values/path-to/file2.yaml 
      parameters: 
        -  name: oneofmanyparams
            value: avalueofparam       

gcp-cherry-pick-bot bot pushed a commit that referenced this issue Nov 30, 2023
…unManifestGenAsync not using cache (Issue #14725) (#16410)

* fix(repo-server): excess git requests part 1, resolveReferencedSources and runManifestGenAsync

Signed-off-by: nromriell <[email protected]>

* fix: remove unnecessary settings instantiation

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
crenshaw-dev pushed a commit that referenced this issue Nov 30, 2023
…unManifestGenAsync not using cache (Issue #14725) (#16410) (#16494)

* fix(repo-server): excess git requests part 1, resolveReferencedSources and runManifestGenAsync



* fix: remove unnecessary settings instantiation



---------

Signed-off-by: nromriell <[email protected]>
Co-authored-by: Nathan Romriell <[email protected]>
irinam0992 pushed a commit to irinam0992/argo-cd that referenced this issue Nov 30, 2023
…unManifestGenAsync not using cache (Issue argoproj#14725) (argoproj#16410)

* fix(repo-server): excess git requests part 1, resolveReferencedSources and runManifestGenAsync

Signed-off-by: nromriell <[email protected]>

* fix: remove unnecessary settings instantiation

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
Signed-off-by: irinam0992 <[email protected]>
vladfr pushed a commit to vladfr/argo-cd that referenced this issue Dec 13, 2023
…unManifestGenAsync not using cache (Issue argoproj#14725) (argoproj#16410)

* fix(repo-server): excess git requests part 1, resolveReferencedSources and runManifestGenAsync

Signed-off-by: nromriell <[email protected]>

* fix: remove unnecessary settings instantiation

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
tesla59 pushed a commit to tesla59/argo-cd that referenced this issue Dec 16, 2023
…unManifestGenAsync not using cache (Issue argoproj#14725) (argoproj#16410)

* fix(repo-server): excess git requests part 1, resolveReferencedSources and runManifestGenAsync

Signed-off-by: nromriell <[email protected]>

* fix: remove unnecessary settings instantiation

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
JulienFuix pushed a commit to JulienFuix/argo-cd that referenced this issue Feb 6, 2024
…unManifestGenAsync not using cache (Issue argoproj#14725) (argoproj#16410)

* fix(repo-server): excess git requests part 1, resolveReferencedSources and runManifestGenAsync

Signed-off-by: nromriell <[email protected]>

* fix: remove unnecessary settings instantiation

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
ishitasequeira added a commit that referenced this issue Mar 25, 2024
…ions (Issue #14725) (#17109)

* fix(repo-server): excess git requests, cache lock on revisions

Signed-off-by: nromriell <[email protected]>

* fix: pr feedback, simplify, add configurable variable

Signed-off-by: nromriell <[email protected]>

* fix: codegen, lint

Signed-off-by: nromriell <[email protected]>

* fix: test print, no opts set, var type nit

Signed-off-by: nromriell <[email protected]>

* chore: add additional logging for unexpected cache error

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
Co-authored-by: Ishita Sequeira <[email protected]>
lyda pushed a commit to lyda/argo-cd that referenced this issue Mar 28, 2024
…unManifestGenAsync not using cache (Issue argoproj#14725) (argoproj#16410)

* fix(repo-server): excess git requests part 1, resolveReferencedSources and runManifestGenAsync

Signed-off-by: nromriell <[email protected]>

* fix: remove unnecessary settings instantiation

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
Signed-off-by: Kevin Lyda <[email protected]>
lyda pushed a commit to lyda/argo-cd that referenced this issue Mar 28, 2024
…ions (Issue argoproj#14725) (argoproj#17109)

* fix(repo-server): excess git requests, cache lock on revisions

Signed-off-by: nromriell <[email protected]>

* fix: pr feedback, simplify, add configurable variable

Signed-off-by: nromriell <[email protected]>

* fix: codegen, lint

Signed-off-by: nromriell <[email protected]>

* fix: test print, no opts set, var type nit

Signed-off-by: nromriell <[email protected]>

* chore: add additional logging for unexpected cache error

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
Co-authored-by: Ishita Sequeira <[email protected]>
Signed-off-by: Kevin Lyda <[email protected]>
mkieweg pushed a commit to mkieweg/argo-cd that referenced this issue Jun 11, 2024
…ions (Issue argoproj#14725) (argoproj#17109)

* fix(repo-server): excess git requests, cache lock on revisions

Signed-off-by: nromriell <[email protected]>

* fix: pr feedback, simplify, add configurable variable

Signed-off-by: nromriell <[email protected]>

* fix: codegen, lint

Signed-off-by: nromriell <[email protected]>

* fix: test print, no opts set, var type nit

Signed-off-by: nromriell <[email protected]>

* chore: add additional logging for unexpected cache error

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
Co-authored-by: Ishita Sequeira <[email protected]>
Hariharasuthan99 pushed a commit to AmadeusITGroup/argo-cd that referenced this issue Jun 16, 2024
…unManifestGenAsync not using cache (Issue argoproj#14725) (argoproj#16410)

* fix(repo-server): excess git requests part 1, resolveReferencedSources and runManifestGenAsync

Signed-off-by: nromriell <[email protected]>

* fix: remove unnecessary settings instantiation

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
Hariharasuthan99 pushed a commit to AmadeusITGroup/argo-cd that referenced this issue Jun 16, 2024
…ions (Issue argoproj#14725) (argoproj#17109)

* fix(repo-server): excess git requests, cache lock on revisions

Signed-off-by: nromriell <[email protected]>

* fix: pr feedback, simplify, add configurable variable

Signed-off-by: nromriell <[email protected]>

* fix: codegen, lint

Signed-off-by: nromriell <[email protected]>

* fix: test print, no opts set, var type nit

Signed-off-by: nromriell <[email protected]>

* chore: add additional logging for unexpected cache error

Signed-off-by: nromriell <[email protected]>

---------

Signed-off-by: nromriell <[email protected]>
Co-authored-by: Ishita Sequeira <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment