Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record pod metadata and allocationTime in IP allocation state file #1958

Merged
merged 2 commits into from
Apr 11, 2022

Conversation

M00nF1sh
Copy link
Contributor

@M00nF1sh M00nF1sh commented Apr 8, 2022

What type of PR is this?
/kind feature

Which issue does this PR fix:
N/A

What does this PR do / Why do we need it:
This PR records additional metadata(pod's namespace & name) and allocationTimestamp about IP allocations into the state file.
Which improves the debug experience, and allow CNI to know Pod's info after CRI socket is removed(which can be leveraged for things like state file drift detection)

{
            "networkName": "_migrated-from-cri",
            "containerID": "69f891af2059cccc41678bae50d35e45cd31ef59dd59547d57d4754ff1a4511a",
            "ifName": "unknown",
            "ipv4": "192.168.5.118",
            "allocationTimestamp": 1647904840686935398,
            "metadata": {
                "k8sPodNamespace": "kube-system",
                "k8sPodName": "coredns-85d5b4454c-ml29q"
            }
},

If an issue # is not available please add repro steps and logs from IPAMD/CNI showing the issue:

Testing done on this change:

  • manually tested against dockershim & containerd:
    • ipam.json after recovery:
    {"version":"vpc-cni-ipam/1","allocations":[{"networkName":"_migrated-from-cri","containerID":"75ebe0da8b3f1e5ee3faab89e538fb32af7c038fd0cd0ebf39dee8e459d1788e","ifName":"unknown","ipv4":"192.168.12.15","allocationTimestamp":1647904840550836685,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"coredns-85d5b4454c-l9pwn"}},{"networkName":"_migrated-from-cri","containerID":"b26913f3b6009fc8e554ae62f6c43e27c8414af04cc6b708c1d8f416386f6bcc","ifName":"unknown","ipv4":"192.168.16.112","allocationTimestamp":1647904840496172364,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"aws-load-balancer-controller-588487d8fc-fcn4m"}},{"networkName":"_migrated-from-cri","containerID":"bb54610652bf15bf8fe02f0411fce05e3c132419990d7e87a277dbca2b3058a0","ifName":"unknown","ipv4":"192.168.8.32","allocationTimestamp":1647904840619460226,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"aws-load-balancer-controller-588487d8fc-tptns"}},{"networkName":"_migrated-from-cri","containerID":"69f891af2059cccc41678bae50d35e45cd31ef59dd59547d57d4754ff1a4511a","ifName":"unknown","ipv4":"192.168.5.118","allocationTimestamp":1647904840686935398,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"coredns-85d5b4454c-ml29q"}}]}
    
    • ipam.json after created 6 new pods:
    {"version":"vpc-cni-ipam/1","allocations":[{"networkName":"aws-cni","containerID":"86393b9f1df0e7685b5910f965e53189ca498677e70e25b0a9dcd2a2d4586d19","ifName":"eth0","ipv4":"192.168.23.207","allocationTimestamp":1649464716972801842,"metadata":{"k8sPodNamespace":"game-2048","k8sPodName":"deployment-2048-9796d449-z9px5"}},{"networkName":"_migrated-from-cri","containerID":"75ebe0da8b3f1e5ee3faab89e538fb32af7c038fd0cd0ebf39dee8e459d1788e","ifName":"unknown","ipv4":"192.168.12.15","allocationTimestamp":1647904840550836685,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"coredns-85d5b4454c-l9pwn"}},{"networkName":"_migrated-from-cri","containerID":"b26913f3b6009fc8e554ae62f6c43e27c8414af04cc6b708c1d8f416386f6bcc","ifName":"unknown","ipv4":"192.168.16.112","allocationTimestamp":1647904840496172364,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"aws-load-balancer-controller-588487d8fc-fcn4m"}},{"networkName":"aws-cni","containerID":"f01429a0aa8890cb5f3a23721367c02f0c71b5b8c5595df650f46f81a585faef","ifName":"eth0","ipv4":"192.168.6.38","allocationTimestamp":1649464717694547918,"metadata":{"k8sPodNamespace":"game-2048","k8sPodName":"deployment-2048-9796d449-n4tx2"}},{"networkName":"aws-cni","containerID":"603f5869a39c0d79b09743e010c8d88efcbfa91ed65dfbfbf6b9c4b4295dddef","ifName":"eth0","ipv4":"192.168.23.87","allocationTimestamp":1649464716882459247,"metadata":{"k8sPodNamespace":"game-2048","k8sPodName":"deployment-2048-9796d449-g4vcw"}},{"networkName":"_migrated-from-cri","containerID":"bb54610652bf15bf8fe02f0411fce05e3c132419990d7e87a277dbca2b3058a0","ifName":"unknown","ipv4":"192.168.8.32","allocationTimestamp":1647904840619460226,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"aws-load-balancer-controller-588487d8fc-tptns"}},{"networkName":"aws-cni","containerID":"3d585e34ead299c28a1cedb6bb5c69db8795a815d56bcd91fbb30e17e41c6684","ifName":"eth0","ipv4":"192.168.15.30","allocationTimestamp":1649464717170318997,"metadata":{"k8sPodNamespace":"game-2048","k8sPodName":"deployment-2048-9796d449-2z8wl"}},{"networkName":"aws-cni","containerID":"558dd4ec5db25456514f2488d9b8b644e15d2b2f8e8593f3bc395c03befa8945","ifName":"eth0","ipv4":"192.168.6.115","allocationTimestamp":1649464717347900310,"metadata":{"k8sPodNamespace":"game-2048","k8sPodName":"deployment-2048-9796d449-gzg5b"}},{"networkName":"_migrated-from-cri","containerID":"69f891af2059cccc41678bae50d35e45cd31ef59dd59547d57d4754ff1a4511a","ifName":"unknown","ipv4":"192.168.5.118","allocationTimestamp":1647904840686935398,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"coredns-85d5b4454c-ml29q"}},{"networkName":"aws-cni","containerID":"f0b1fd965ae54742b3ee5c5f76441a262a4d286e8d6a4ebd82d5b3111f1bd3d1","ifName":"eth0","ipv4":"192.168.14.165","allocationTimestamp":1649464716814346224,"metadata":{"k8sPodNamespace":"game-2048","k8sPodName":"deployment-2048-9796d449-drgxw"}}]}
    
    • ipam.json after deleted 6 pods:
    {"version":"vpc-cni-ipam/1","allocations":[{"networkName":"_migrated-from-cri","containerID":"75ebe0da8b3f1e5ee3faab89e538fb32af7c038fd0cd0ebf39dee8e459d1788e","ifName":"unknown","ipv4":"192.168.12.15","allocationTimestamp":1647904840550836685,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"coredns-85d5b4454c-l9pwn"}},{"networkName":"_migrated-from-cri","containerID":"b26913f3b6009fc8e554ae62f6c43e27c8414af04cc6b708c1d8f416386f6bcc","ifName":"unknown","ipv4":"192.168.16.112","allocationTimestamp":1647904840496172364,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"aws-load-balancer-controller-588487d8fc-fcn4m"}},{"networkName":"_migrated-from-cri","containerID":"bb54610652bf15bf8fe02f0411fce05e3c132419990d7e87a277dbca2b3058a0","ifName":"unknown","ipv4":"192.168.8.32","allocationTimestamp":1647904840619460226,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"aws-load-balancer-controller-588487d8fc-tptns"}},{"networkName":"_migrated-from-cri","containerID":"69f891af2059cccc41678bae50d35e45cd31ef59dd59547d57d4754ff1a4511a","ifName":"unknown","ipv4":"192.168.5.118","allocationTimestamp":1647904840686935398,"metadata":{"k8sPodNamespace":"kube-system","k8sPodName":"coredns-85d5b4454c-ml29q"}}]}
    
    • The /v1/enis API is also updated to contain assignedTime in str time format due to this change
    "IPAddresses": {
        "192.168.12.15": {
            "Address": "192.168.12.15",
            "IPAMKey": {
                "networkName": "_migrated-from-cri",
                "containerID": "75ebe0da8b3f1e5ee3faab89e538fb32af7c038fd0cd0ebf39dee8e459d1788e",
                "ifName": "unknown"
            },
            "IPAMMetadata": {
                "k8sPodNamespace": "kube-system",
                "k8sPodName": "coredns-85d5b4454c-l9pwn"
            },
            "AssignedTime": "2022-03-21T23:20:40.550836685Z",
            "UnassignedTime": "0001-01-01T00:00:00Z"
        }
    },
    
  • Benchmarked with 500 pod entries on m5.xlarge(Kubernetes only support at most 150 pods per node). Since even with 500 pods per nodes, it took only 0.35ms(vs 0.21ms before) to checkpoint with metadata and timestamp. The performance impact is unnoticeable in foreseeable future.
    without_meta_time                                     215092 ns/op
    with_meta_only                                           312026 ns/op
    with_meta_time(str time format)               762831 ns/op
    with_meta_timestamp(int time format)    353913 ns/op
    

Automation added to e2e:

Test covered by unit test

Will this PR introduce any new dependencies?:

No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
This don't break upgrades/downgrades.
Tested with a running cluster.

Does this change require updates to the CNI daemonset config files to work?:

No.

Does this PR introduce any user-facing change?:

Yes, the ipam.json file will now contain additional metadata(pod's namespace & name) and allocationTimestamp about IP allocations

records additional metadata(pod's namespace & name) and allocationTime about IP allocations into IP pool state file.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@M00nF1sh M00nF1sh requested a review from a team as a code owner April 8, 2022 21:29
@M00nF1sh M00nF1sh force-pushed the enhance_state branch 2 times, most recently from 8666617 to b3a704e Compare April 8, 2022 21:40
Copy link
Contributor

@jayanthvn jayanthvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@M00nF1sh M00nF1sh merged commit f6db4c1 into aws:master Apr 11, 2022
@@ -455,7 +480,7 @@ func (ds *DataStore) ReadBackingStore(isv6Enabled bool) error {
}
addr := &AddressInfo{Address: ipAddr.String()}
cidr.IPAddresses[ipAddr.String()] = addr
ds.assignPodIPAddressUnsafe(allocation.IPAMKey, eni, addr)
ds.assignPodIPAddressUnsafe(addr, allocation.IPAMKey, allocation.Metadata, time.Unix(0, allocation.AllocationTimestamp))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not convert everything to UTC instead? (i.e.,) time.Unix(0, allocation.AllocationTimestamp).UTC(). I'm not sure if it defaults to UTC. Did we check that? Otherwise might be confusing to have it converted to different time zones.

sushrk pushed a commit to sushrk/amazon-vpc-cni-k8s that referenced this pull request Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants