Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readiness reflector #748

Merged
merged 6 commits into from
May 16, 2019
Merged

Conversation

freehan
Copy link
Contributor

@freehan freehan commented May 3, 2019

  • initial commit of readiness reflector
    • the readiness reflector will check if the pods has NEG readiness gate.
    • if so, then will poll NEG health status and patch the pod NEG readiness condition when the endpoint became ready.
  • Refactoring:
    • modify common NEG types to accomendate readiness gate
    • move NegSyncerKey to pkg/neg/types
  • Adapt NEG controller for readiness reflector:
    • include non-terminating pods to add in NEG
      • This will make sure if a pod is not ready due to the NEG readiness gate will be added into NEG
    • adapt NEG controller and Syncer Manager to take readiness gate into account.
    • adapt transaction NEG syncer to feedback into readiness reflector

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 3, 2019
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 3, 2019
for _, hs := range r.Healths {
if hs.BackendService != nil {
// This assumes the ingress backend service uses the NEG naming scheme
if strings.Contains(hs.BackendService.BackendService, key.Name) {
Copy link
Contributor Author

@freehan freehan May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I am only checking the health status from the backend service managed by Ingress.

Other option is to loop thru all of them. If one is Healthy then ready. Or I can say all of them have to be Healthy. But this makes event reporting more complex. It has to report this endpoint in NEG is not ready in a BS.

@freehan freehan force-pushed the readiness-reflector4 branch 5 times, most recently from f78b057 to 2227c9a Compare May 4, 2019 00:07
negtypes "k8s.io/ingress-gce/pkg/neg/types"
)

// Reflector defines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finish this comment


// Reflector defines
type Reflector interface {
// Run starts up the readiness reflector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run starts the reflector.

Closing stopCh will signal the reflector to stop running.

type Reflector interface {
// Run starts up the readiness reflector
Run(stopCh <-chan struct{})
// SyncPod registers the pod with reflector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this take an obj interface{}, also document what type obj is supposed to be if you are pushing the type checking into the routine.
also, syncpod does not return an error, so what happens if an object of the invalid type is passed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I use obj inteface{} mostly is because the informer can throw that into the interface directly.

I can also pass *v1.Pod, instead

// SyncPod registers the pod with reflector
SyncPod(obj interface{})
// CommitPods signals the reflector that pods has been added to one NEG
CommitPods(syncerKey negtypes.NegSyncerKey, negName string, zone string, endpointMap map[negtypes.NetworkEndpoint]types.NamespacedName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document the parameters.

negName is the name of the network endpoint group in the zone (e.g. xxx)
..

endpointMap seems like it should be typedef'd

type endpointMap map[...]...

CommitPods(syncerKey negtypes.NegSyncerKey, negName string, zone string, endpointMap map[negtypes.NetworkEndpoint]types.NamespacedName)
}

// NegLookup defines an interface for looking up pod membership
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NegGetter?

return map[negtypes.NetworkEndpoint]types.NamespacedName{}
}

// filterEndpoint will filter out the endpoints that does not need health polling from the input endpoint map
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should call this removeIrrelevantEndpoints, as filter usually means no side effects f(set) => newSet

for endpoint, namespacedName := range endpointMap {
pod, exists, err := getPodFromStore(podLister, namespacedName.Namespace, namespacedName.Name)
if err != nil {
klog.Warningf("Failed to retrieve pod %q from store: %v", namespacedName.String(), err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a warning or something that will happen from time to time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only happen if the cache returns error. I do not think this is a regular error.

// If pod has neg readiness gate and its condition is False, then return true.
func needToProcess(pod *v1.Pod) bool {
negConditionReady, readinessGateExists := evalNegReadinessGate(pod)
if !readinessGateExists {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return readinessGateExists && negConditionReady

push logging up one layer (don't log in functional predicate if possible, easy way to get tons of log spam)

@@ -286,31 +298,43 @@ func (s *transactionSyncer) commitTransaction(err error, networkEndpointMap map[

for networkEndpoint := range networkEndpointMap {
entry, ok := s.transactions.Get(networkEndpoint)
// clear transaction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: review this closer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can go over this with you.

@@ -265,3 +277,31 @@ func makeEndpointBatch(endpoints negtypes.NetworkEndpointSet) (map[negtypes.Netw
}
return endpointBatch, nil
}

func keyFunc(namespace, name string) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated code

@freehan freehan force-pushed the readiness-reflector4 branch 3 times, most recently from d9b6c92 to d677bba Compare May 14, 2019 18:31
@freehan freehan mentioned this pull request May 14, 2019
@freehan
Copy link
Contributor Author

freehan commented May 16, 2019

I have adjusted the commits. PTAL

@bowei
Copy link
Member

bowei commented May 16, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 16, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bowei, freehan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 249eae9 into kubernetes:master May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants