-
Notifications
You must be signed in to change notification settings - Fork 741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPAMD optimizations and makefile changes #1975
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks Senthil. I am running few UTs, will update it and the merge. |
This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days |
/not stale. Review pending |
Ran a simple pod churn test - 1 node, replica 1 to 30 to 0 and repeated 10 runs. WARM IP = 4 and MIN IP = 10. Describe interface calls reduced quite significantly - With 1.11.2 released image -> With the fix -> API Calls covering both the above timelines -> I will be running few more tests to slow down the pod churn and measure the performance. Also will be checking the pod launch time since now we would rely on IMDS sync on increasing IP pool. |
This will introduce a bit of delay in the pod launch time during churn scenarios since we rely on IPs to be available in IMDS. So I will be splitting the PR, one with make file changes and reducing the describe call and another with just the dependency on IMDS for #1967 which might need more analysis. |
2d2ed8b
to
e67c394
Compare
a476fcf
to
82a9803
Compare
/lgtm |
* 1.10.3 release artifacts (#1962) * Stale PR and issue cleanup wrkflow (#1964) * fix image name during build (#1968) * add event recorder utils to raise aws-node pod events (#1536) * refactor uploader scripts (#1972) * Fix cni panic due to pod.Annotations is a nil map (#1974) Co-authored-by: Relk Li <[email protected]> * chart: Add extraVolumes and extraVolumeMounts (#1949) Co-authored-by: Jayanth Varavani <[email protected]> * Add the new command in the section of CNI Plugin Sequence (#1813) Co-authored-by: Jayanth Varavani <[email protected]> * Bump github.com/containernetworking/cni from 0.8.0 to 0.8.1 (#1966) Bumps [github.com/containernetworking/cni](https://github.com/containernetworking/cni) from 0.8.0 to 0.8.1. - [Release notes](https://github.com/containernetworking/cni/releases) - [Commits](containernetworking/cni@v0.8.0...v0.8.1) --- updated-dependencies: - dependency-name: github.com/containernetworking/cni dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jayanth Varavani <[email protected]> * Update README to highlight containerd.sock edge case with EKS AMI. (#1884) * Update README to highlight containerd.sock edge case with EKS AMI. * Updated Instructions as per review. * add cni release test script (#1971) * Multus release manifest (#1984) * release manifest for Multus v3.8.0-eksbuild.1 * minor change to Readme * Added Tests for validating Multus Installation (#1811) * Added Tests for validating Multus Installation Added missing files Refactored code Tried to make it modular and extensible. * Deleted redundant file * Fixed compilation issues * fixed minor error * Added script to trigger Multus tests (will be used by prow job) * remove multus installation logic from ginkgo * remove redundant changes * Cleaned up run-multus-tests helper script * Updated Readme for running multus tests Added few checks in canary helper script * revert changes to canary.sh * Pass tag as an argument * Updated Readme * Updated tag for multus tests to use latest image * Port new integration tests (#1928) * Minor changes to run-integration-tests Added integration-new framework tests * Modified run-integration-tests to use new integration tests * reverted redundant changes * Merge integration with integration-new * increase timeout (#1985) fix syntax for ginkgo-v2 * Added configurable flag to create test nodes with arm64 and containerd runtime (#1977) * Cleanup binary file (#1987) * log error in ipamd on api server timeout (#1988) * Refactored code and Added cni addon upgrade/downgrade regression test (#1861) * Refactored code Addon upgrade/downgrade test similar to #1795 Added tests for addon upgrade/downgrade Changed DEFAULT version Added addon status checks Fetch latest addon version for given K8s Cluster Update kops cluster config used in weekly tests (#1862) * Change to kops cluster creation scripts * Add logging for retry attempt * Switch kops cluster to use docker container runtime Co-authored-by: Jayanth Varavani <[email protected]> Renamed package name for adddon tests removed unnecessary changes Fixed replica count for MTU and Veth test in host networking Updated ENI/IP limits file for newly added instances (#1864) * Added new instances * Updated test readme * needed rebase * formatting * remove all references to integration-new migrate to ginkgo v2 in addon test files * fix maxIPPerInterface count on pod_networking_suite * Increase default deployment ready timeout Co-authored-by: Vikas Basavaraj <[email protected]> * Remove generation of calico manifests (#1905) * cni manifest upgrade downgrade test (#1863) * Added upgrade/downgrade script template Refactored code Addon upgrade/downgrade test similar to #1795 Added tests for addon upgrade/downgrade Changed DEFAULT version Added addon status checks Fetch latest addon version for given K8s Cluster Update kops cluster config used in weekly tests (#1862) * Change to kops cluster creation scripts * Add logging for retry attempt * Switch kops cluster to use docker container runtime Co-authored-by: Jayanth Varavani <[email protected]> Added upgrade/downgrade test for custom cni-manifest-file Added missing files remove upgrade-downgrade.sh * Add eks.go file , deleted by mistake * Extract apply manifest logic in common Remove redundant code * Add PD traffic test for cni upgrade downgrade test * Update golang to Go 1.18 (#1991) * Update CNI Plugins to v1.1.1 (#1997) * Update release manifests for VPC CNI v1.11.2 (#2001) (#2002) * Enable Calico on ARM64 and add configureable flags for Calico installation (#2004) * Enable Calico on ARM64 and add configureable flags for Calico installation * Add v to Calico version in release test script * fix integration test script (#1998) * Updated dependencies (#2012) * Fix readme (#2013) * Added upgrade/downgrade script template Refactored code Addon upgrade/downgrade test similar to #1795 Added tests for addon upgrade/downgrade Changed DEFAULT version Added addon status checks Fetch latest addon version for given K8s Cluster Update kops cluster config used in weekly tests (#1862) * Change to kops cluster creation scripts * Add logging for retry attempt * Switch kops cluster to use docker container runtime Co-authored-by: Jayanth Varavani <[email protected]> Added upgrade/downgrade test for custom cni-manifest-file Added missing files remove upgrade-downgrade.sh * Add eks.go file , deleted by mistake * Extract apply manifest logic in common Remove redundant code * Add PD traffic test for cni upgrade downgrade test * Updated Readme * Merge fix-ginkgo to master (#2014) * fix path failure * seperate makefile for test Co-authored-by: abhipth <[email protected]> * Multus manifest for release v3.9.0-eksbuild.1 (#2016) * Updating new instances - p4de (#2018) * Updating new instances * fix formatting * Fix go build failure with v6 networking suite. (#2020) * Update README.md (#2021) * Fix Go build for ipamd test package. (#2023) * Fix Go build for ipamd test package. * Fix format with make format * Fix go build for cni test package. (#2024) * Prevent allocate/free ENIs when node is marked noSchedule (#1927) * Prevent allocate/free ENIs when node is marked noSchedule * Update UTs * Re-use logger instance (#2029) * Re-use logger instance - Existing logger initialization constructed different logger instances upon call to Get() method. - Fixed the initailiation logic to re-use the logger instance. * Added unit tests for logger initialization fix * fix addOn version api for beta (#2034) * Update yaml.v3 package dependency (#2036) * Update yaml.v3 package dependency * Increase cpu requests limit (#2038) - Porting changes from release-1.10 branch made in PR #1749 * fix ipamd integration failures and cleanup (#2039) * fix integration test failures and cleanup * README update * cleanup info logs in event recorder and test script (#2043) * add nodeSelector to cni-metrics-helper test deployment and update image tag (#2047) * fix makefile path in canary test script (#2051) * disable arm build (#2052) * Updated changelog for 1.11.3 release (#2053) Co-authored-by: Vikas Basavaraj Mallapura <“[email protected]”> * Updating master branch config files to release 1.11.3 (#2055) * Updated changelog for 1.11.3 release * Image tag and chart version update for 1.11.3 release (#2050) Co-authored-by: Vikas Basavaraj Mallapura <“[email protected]”> (cherry picked from commit ff42a83) Co-authored-by: Vikas Basavaraj Mallapura <“[email protected]”> * update aws-node clusterrole permissions (#2058) * Fix minor typo on documentation (#2059) s/varibales/variables/ * multus manifest for release v3.9.0-eksbuild.2 (#2057) * Setting AWS_VPC_K8S_CNI_RANDOMIZESNAT to the default value (#2028) Co-authored-by: Jayanth Varavani <[email protected]> * Fixing prefixes per ENI value in example (#2060) Prefixes per ENI in row 6 should be 1 not 3. Co-authored-by: Jayanth Varavani <[email protected]> * IPAMD optimizations and makefile changes (#1975) * IPAMD optimizations and makefile changes * Minor comments * Removed IMDS dependency * fix test * fix test * fix test-format * Updated new instances (#2062) * Updated new instances * fix format Co-authored-by: M00nF1sh <[email protected]> Co-authored-by: Sushmitha Ravikumar <[email protected]> Co-authored-by: Relk Li <[email protected]> Co-authored-by: Relk Li <[email protected]> Co-authored-by: Jan-Otto Kröpke <[email protected]> Co-authored-by: Shuntaro Azuma <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Senthil Kumaran <[email protected]> Co-authored-by: cgchinmay <[email protected]> Co-authored-by: Vikas Basavaraj <[email protected]> Co-authored-by: Hao Zhou <[email protected]> Co-authored-by: abhipth <[email protected]> Co-authored-by: Prasad Jivane <[email protected]> Co-authored-by: Vikas Basavaraj Mallapura <“[email protected]”> Co-authored-by: Guillaume Delacour <[email protected]> Co-authored-by: Venkata Gunapati <[email protected]> Co-authored-by: Muhammed Karakas <[email protected]>
What type of PR is this?
bug
Which issue does this PR fix:
Fixes #1834
Fixes #1872
What does this PR do / Why do we need it:
During reconciling or even startup, IPAMD calls tryAssignCidr and adds the IPs/Prefixes to datastore once EC2 allocates the IPs and doesn't wait for IMDS to sync. In scenarios where IMDS is out of sync we will end up tagging those pods as stale during restart. Hence similar to Alloc path, we need to wait until IMDS sync is done.
While we wait for the IPs to be present in IMDS, we check if the requested CIDRs are available but this can be more than what the ENI supports and we end up waiting in the retry loop. Instead we need to request for min of {IPs/Prefixes needed, max supported on ENI}
Make file grouping.
If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:
#1834 #1872
Testing done on this change:
Automation added to e2e:
Yes
Will this PR introduce any new dependencies?:
No, EC2 calls will reduce since reconciler on adding IPs would make an ec2 call now instead it will wait for IMDS sync.
Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No
Does this change require updates to the CNI daemonset config files to work?:
No
Does this PR introduce any user-facing change?:
No
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.