Fix defaulting of legacy ClusterNetwork fields #16897

danwinship · 2017-10-16T22:29:15Z

3.6 nodes can't currently start up against a 3.7 master (eg, during upgrade) because the old ClusterNetwork fields aren't set. (Some absent-minded reviewer had even noticed that this was broken and then not fixed yet (#14558 (comment)) and then approved the PR anyway.)

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1502866

cc @eparis

eparis · 2017-10-17T03:04:43Z

/retest
/approve
once jake takes a look I think we want this on the stage branch too. Which requires a manual merge.

JacobTanenbaum

LGTM

knobunc

/lgtm

openshift-merge-robot · 2017-10-17T14:23:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, eparis, knobunc

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/network/OWNERS~~ [danwinship,eparis,knobunc]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

openshift-merge-robot · 2017-10-17T15:50:40Z

/test all [submit-queue is verifying that this PR is safe to merge]

openshift-ci-robot · 2017-10-17T17:06:29Z

@danwinship: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/extended_conformance_gce	`07049d4`	link	`/test extended_conformance_gce`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

eparis · 2017-10-17T17:49:16Z

merged by hand because we are literally breaking cluster upgrades.

…tage Automatic merge from submit-queue. Fix defaulting of legacy ClusterNetwork fields Backport of #16897

pravisankar · 2017-10-17T19:13:08Z

pkg/network/apis/network/validation/validation.go

+				allErrs = append(allErrs, field.Invalid(field.NewPath("hostsubnetlength"), clusterNet.HostSubnetLength, "hostsubnetlength must be identical to clusterNetworks[0].hostSubnetLength"))
+			}
+		} else if clusterNet.Network != "" || clusterNet.HostSubnetLength != 0 {
+			if clusterNet.Network != clusterNet.ClusterNetworks[0].CIDR || clusterNet.HostSubnetLength != clusterNet.ClusterNetworks[0].HostSubnetLength {


Do we need to enforce this condition for non default cluster network? May be we shouldn't care.

What would it mean to have Network set but not equal to ClusterNetworks[0]?

(I had suggested a long time back in the original PR that we should keep using Network/HostSubnetLength for the primary network, and only use ClusterNetworks for additional networks. But it's too late to change that now.)

Okay, OpenShift sdn only cares about default cluster network but based on what we described in swagger doc for Network/ClusterNetworks, now it seems reasonable to expect network to match clusterNetworks[0] for other use cases as part of api validation.

pravisankar · 2017-10-17T19:15:07Z

pkg/network/master/master.go

 	}
 	master.networkInfo, err = common.ParseNetworkInfo(clusterNetworkEntries, networkConfig.ServiceNetworkCIDR)
 	if err != nil {
 		return err
 	}
+	if len(clusterNetworkEntries) == 0 {


This condition could be moved before common.ParseNetworkInfo()

pravisankar · 2017-10-17T19:18:32Z

pkg/network/master/master.go

 	}
 	master.networkInfo, err = common.ParseNetworkInfo(clusterNetworkEntries, networkConfig.ServiceNetworkCIDR)
 	if err != nil {
 		return err
 	}
+	if len(clusterNetworkEntries) == 0 {
+		panic("No ClusterNetworks set in networkConfig; should have been defaulted in if not configured")


Do we want to panic here? returning an error will propagate to the SDN RunController() and will fail as expected.

It can only happen if someone breaks the config-reading code. It's basically a "can't happen", I just put the check+panic in to make it slightly clearer what was going than if we just let them get an out-of-bounds exception when we dereference clusterNetworkEntries[0] below

I wasn't suggesting allowing de-referencing clusterNetworkEntries[0] but to return a normal error instead of panic because it does the same job of stopping the sdn controller. I thought panic could be used when we are too deep in the code and just returning an error will not suffice (like stopping the controller).

I thought panic could be used when we are too deep in the code and just returning an error will not suffice (like stopping the controller).

Hm... I tend to think of it more as error is for runtime errors ("someone wrote bad values into the config file") and panic is for devel-time errors ("this code assumes that default values will have been filled in, but they weren't"). The Go spec is not especially clear about this, although "Effective Go" notes that panic can be "a way to indicate that something impossible has happened", which is what I was going for here.

pravisankar · 2017-10-17T19:21:14Z

Core logic looks good and fixes the issue.

danwinship added 2 commits October 16, 2017 17:27

Fix an error message

dea5d49

Fix defaulting of legacy ClusterNetwork fields

07049d4

danwinship added component/networking kind/bug Categorizes issue or PR as related to a bug. sig/networking labels Oct 16, 2017

danwinship assigned pravisankar and JacobTanenbaum Oct 16, 2017

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 16, 2017

openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2017

eparis added the priority/P0 label Oct 17, 2017

JacobTanenbaum approved these changes Oct 17, 2017

View reviewed changes

knobunc approved these changes Oct 17, 2017

View reviewed changes

openshift-ci-robot assigned knobunc Oct 17, 2017

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2017

danwinship mentioned this pull request Oct 17, 2017

Fix defaulting of legacy ClusterNetwork fields #16905

Merged

eparis merged commit 84ec200 into openshift:master Oct 17, 2017

openshift-merge-robot added a commit that referenced this pull request Oct 17, 2017

Merge pull request #16905 from danwinship/clusternetwork-3.6-vs-3.7-s…

0b34b04

…tage Automatic merge from submit-queue. Fix defaulting of legacy ClusterNetwork fields Backport of #16897

pravisankar reviewed Oct 17, 2017

View reviewed changes

danwinship deleted the clusternetwork-3.6-vs-3.7 branch October 19, 2017 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix defaulting of legacy ClusterNetwork fields #16897

Fix defaulting of legacy ClusterNetwork fields #16897

danwinship commented Oct 16, 2017 •

edited

Loading

eparis commented Oct 17, 2017

JacobTanenbaum left a comment

knobunc left a comment

openshift-merge-robot commented Oct 17, 2017

openshift-merge-robot commented Oct 17, 2017

openshift-ci-robot commented Oct 17, 2017 •

edited

Loading

eparis commented Oct 17, 2017

pravisankar Oct 17, 2017

danwinship Oct 17, 2017

pravisankar Oct 17, 2017

pravisankar Oct 17, 2017

pravisankar Oct 17, 2017

danwinship Oct 17, 2017

pravisankar Oct 17, 2017

danwinship Oct 19, 2017

pravisankar commented Oct 17, 2017

Fix defaulting of legacy ClusterNetwork fields #16897

Fix defaulting of legacy ClusterNetwork fields #16897

Conversation

danwinship commented Oct 16, 2017 • edited Loading

eparis commented Oct 17, 2017

JacobTanenbaum left a comment

Choose a reason for hiding this comment

knobunc left a comment

Choose a reason for hiding this comment

openshift-merge-robot commented Oct 17, 2017

openshift-merge-robot commented Oct 17, 2017

openshift-ci-robot commented Oct 17, 2017 • edited Loading

eparis commented Oct 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pravisankar commented Oct 17, 2017

danwinship commented Oct 16, 2017 •

edited

Loading

openshift-ci-robot commented Oct 17, 2017 •

edited

Loading