Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix defaulting of legacy ClusterNetwork fields #16897

Merged
merged 2 commits into from
Oct 17, 2017

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented Oct 16, 2017

3.6 nodes can't currently start up against a 3.7 master (eg, during upgrade) because the old ClusterNetwork fields aren't set. (Some absent-minded reviewer had even noticed that this was broken and then not fixed yet (#14558 (comment)) and then approved the PR anyway.)

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1502866

cc @eparis

@danwinship danwinship added component/networking kind/bug Categorizes issue or PR as related to a bug. sig/networking labels Oct 16, 2017
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 16, 2017
@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2017
@eparis
Copy link
Member

eparis commented Oct 17, 2017

/retest
/approve
once jake takes a look I think we want this on the stage branch too. Which requires a manual merge.

Copy link
Contributor

@JacobTanenbaum JacobTanenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2017
@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, eparis, knobunc

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 17, 2017

@danwinship: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/openshift-jenkins/extended_conformance_gce 07049d4 link /test extended_conformance_gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@eparis eparis merged commit 84ec200 into openshift:master Oct 17, 2017
@eparis
Copy link
Member

eparis commented Oct 17, 2017

merged by hand because we are literally breaking cluster upgrades.

openshift-merge-robot added a commit that referenced this pull request Oct 17, 2017
…tage

Automatic merge from submit-queue.

Fix defaulting of legacy ClusterNetwork fields

Backport of #16897
allErrs = append(allErrs, field.Invalid(field.NewPath("hostsubnetlength"), clusterNet.HostSubnetLength, "hostsubnetlength must be identical to clusterNetworks[0].hostSubnetLength"))
}
} else if clusterNet.Network != "" || clusterNet.HostSubnetLength != 0 {
if clusterNet.Network != clusterNet.ClusterNetworks[0].CIDR || clusterNet.HostSubnetLength != clusterNet.ClusterNetworks[0].HostSubnetLength {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to enforce this condition for non default cluster network? May be we shouldn't care.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would it mean to have Network set but not equal to ClusterNetworks[0]?

(I had suggested a long time back in the original PR that we should keep using Network/HostSubnetLength for the primary network, and only use ClusterNetworks for additional networks. But it's too late to change that now.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, OpenShift sdn only cares about default cluster network but based on what we described in swagger doc for Network/ClusterNetworks, now it seems reasonable to expect network to match clusterNetworks[0] for other use cases as part of api validation.

}
master.networkInfo, err = common.ParseNetworkInfo(clusterNetworkEntries, networkConfig.ServiceNetworkCIDR)
if err != nil {
return err
}
if len(clusterNetworkEntries) == 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition could be moved before common.ParseNetworkInfo()

}
master.networkInfo, err = common.ParseNetworkInfo(clusterNetworkEntries, networkConfig.ServiceNetworkCIDR)
if err != nil {
return err
}
if len(clusterNetworkEntries) == 0 {
panic("No ClusterNetworks set in networkConfig; should have been defaulted in if not configured")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to panic here? returning an error will propagate to the SDN RunController() and will fail as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can only happen if someone breaks the config-reading code. It's basically a "can't happen", I just put the check+panic in to make it slightly clearer what was going than if we just let them get an out-of-bounds exception when we dereference clusterNetworkEntries[0] below

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting allowing de-referencing clusterNetworkEntries[0] but to return a normal error instead of panic because it does the same job of stopping the sdn controller. I thought panic could be used when we are too deep in the code and just returning an error will not suffice (like stopping the controller).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought panic could be used when we are too deep in the code and just returning an error will not suffice (like stopping the controller).

Hm... I tend to think of it more as error is for runtime errors ("someone wrote bad values into the config file") and panic is for devel-time errors ("this code assumes that default values will have been filled in, but they weren't"). The Go spec is not especially clear about this, although "Effective Go" notes that panic can be "a way to indicate that something impossible has happened", which is what I was going for here.

@pravisankar
Copy link

Core logic looks good and fixes the issue.

@danwinship danwinship deleted the clusternetwork-3.6-vs-3.7 branch October 19, 2017 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/networking kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. priority/P0 sig/networking size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants