-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
source-controller OOM events #303
Comments
Changing the source-controller deployment resources stanza as follows:
addresses the issue |
I had the same issue but this time increasing the memory limts to 2Gi did mitigate the issue |
I am seeing OOMs with |
Same here on flux2 version |
This issue seems to be linked to: Generally speaking it is strange that a service which just downloads some files from other repos consumes so much memory. |
I was able to trigger this issue by putting |
As with any workload on Kubernetes, the right resource limit configuration highly depends on what you are making the source-controller do (and you may thus have to increase it). Helm related operations for example, are resource intensive because at present we haven't found a right optimization path to work with repository index files without loading them in memory in full (due to certain constraints around the unmarshalling of YAML). Combined with the popularity of some solutions like Artifactory, which likes to stuff as much as possible in a single index (in some cases resulting in a file of >100MB), and the fact that the reconciliation of resources is isolated, resource usage exceeding the defaults can be expected. Another task that can be resource intensive is the packaging of a Helm chart from a Git source, because Helm first loads all the chart data into an object in memory (including all files, and the files of the dependencies), before writing it to disk. For a fun experiment: check the current resources your CI worker nodes have (or ask around), or monitor the resource usage of various
The controller does much more than just downloading files, and I think you are oversimplifying or underestimating the inner workings of the controller, and ignoring the fact that it has several features that perform composition tasks, etc. In addition, to ensure proper isolation of e.g. credentials, most Git things are done in memory as well.
Your Helm index likely is simply too big, or your resource limit settings are too low, see explanation above. Lastly, we are continuously looking into ways to reduce the footprint of our controllers, and I can already tell you some paths have been identified (and are actively worked on) to help reduce it. Do however always keep in mind that while the YAML creates simple looking and composable abstractions, there will always be processes behind it that actually execute the task, and that the hardware of your local development machine often outperforms most containers. |
No, it appears
|
That is expected, as
|
Yes sure, but it synchronized that change from the repository into the Helmrepository resource and then OOMed the source controller trying to read the helmrepo. I backed out the change in git but then had to manually edit the helmrepository object since the source controller was hung. Not saying it should support days just that that is a footgun. If it's not supported I would have expected the helmrepository to fail validation on the sync |
@kav can you please move this into a separate issue? I did a small test yesterday evening and was indeed able to apply a resource with an invalid |
updated source-controller deployment according to this issue: fluxcd/source-controller#303
Having the same issue with OOMKilled and with the information from #192 pinned it down to large helm-repo of bitnami with index-file alone having 13.4M |
For large Helm repository index files, you can enable caching to reduce the memory footprint of source-controller, docs here: https://fluxcd.io/docs/cheatsheets/bootstrap/#enable-helm-repositories-caching |
Thanks for the documentation link @stefanprodan. That was helpful. Removing bitnami-helm-repos in redundant namespaces brought down the mem-footprint to 190M, yet still peaking every 10min (helm repo update interval) I will check on enabling helm-caching. Thanks again, much appreciated. |
Needed to update 0.28 -> 0.30 so the helm-cache-arguments were available.
Thanks again. |
Yeap that's consistent with what I'm seeing on my test clusters, using source-controller cache brought the memory from 2GB down to 200MB. |
Enabling Helm caching doc is now here: https://fluxcd.io/flux/installation/configuration/vertical-scaling/#enable-helm-repositories-caching |
Describe the bug
When registering FluxCD to a repository in gitlab enterprise, I am seeing OOM activity on the
source-controller
pod. Removing the 1GB memory limit fixes the issues.To Reproduce
Register fluxcd on a repo with some level of complexity, I believe.
Expected behavior
The source-controller pod should not be killed and restarted repeatedly.
Additional context
Below please provide the output of the following commands:
The text was updated successfully, but these errors were encountered: