rebase 1.10.0 #19137

liggitt · 2018-03-28T20:16:56Z

based on:

follow-ups:

@liggitt:
- release notes (added to OpenShift 3.10 Release Notes Tracker openshift-docs#8651):
  - deprecated port-forward -p flag is removed
  - --runtime-config apis/ prefix deprecated
  - format change of -o name
- clusterresourcequota flake - opened test/cmd/quota.sh:20: executing 'oc get secrets -o name --all-namespaces; oc describe appliedclusterresourcequota/for-deads -n quota-foo --as deads' #19282
- chase down use of --runtime-config=apis/...
- switch preferred API group for apps/v1 types, networkpolicies - Switch preferred storage to apps/v1 #19287
- resolve different etcd resource prefix for APIService objects - resolve different etcd resource prefix for APIService objects #19949
- remove use of NewResyncableFIFO
@derekwaynecarr @sjenning:
- decide on kubelet fsinfo startup fix: b61c00c - resolve handling of RootFsInfo() error on kubelet start #19948
@openshift/sig-storage:
- new extended test failures: run extended tests post-rebase #19260
@juanvallejo @soltysh:
- server-side printers (test coverage, implement for all types) - implement server-side printers for origin types #19946
- sweep commands outputting ungrouped objects with -o name in test/cmd tests and update to use grouped APIs - sweep commands outputting ungrouped objects with -o name in test/cmd tests and update to use grouped APIs #19947
@soltysh:
- buffered audit setup - buffered audit setup #19945
- skip kubectl.ScalerFor in deployer and scale directly - skip kubectl.ScalerFor in deployer and scale directly #19944

liggitt · 2018-04-03T03:42:38Z

@sjenning seeing a kubelet startup failure in our e2e setup that seems caused by https://github.com/kubernetes/kubernetes/pull/59769/files#diff-bf28da68f62a8df6e99e447c4351122dR1331

comparing origin.log from a good e2e run from master (kube 1.9.1) and a bad e2e run on this PR (kube 1.10.0), both appear to have hit the failed to get device for dir error

in 1.9. it was logged as an error repeatedly:

E0403 02:12:09.714804 8384 container_manager_linux.go:584] [ContainerManager]: Fail to get rootfs information failed to get device for dir "/tmp/openshift/test-end-to-end/cluster-up/openshift.local.volumes": could not find device with major: 0, minor: 38 in cached partitions map

in 1.10, this error is now fatal:

F0403 03:06:53.509202 8818 kubelet.go:1354] Failed to start ContainerManager failed to get rootfs info: failed to get device for dir "/tmp/openshift/test-end-to-end/cluster-up/openshift.local.volumes": could not find device with major: 0, minor: 38 in cached partitions map

questions:

is it correct for this error to be fatal?
if so, how can we correct our e2e env to avoid this error? is it related to e2e running a containerized kubelet?

deads2k · 2018-04-03T11:59:15Z

pkg/bulk/cmd.go

@@ -169,6 +168,22 @@ func ClientMapperFromConfig(config *rest.Config) resource.ClientMapperFunc {
 	})
 }

+// setKubernetesDefaults sets default values on the provided client config for accessing the
+// Kubernetes API or returns an error if any of the defaults are impossible or invalid.
+func setKubernetesDefaults(config *rest.Config) error {


You think this is safer than exposing the function upstream?

yes, I expect further shenanigans in legacyscheme usage upstream

deads2k · 2018-04-03T11:59:44Z

pkg/cmd/server/start/start_kube_controller_manager.go

@@ -7,7 +7,7 @@ import (
 	"github.com/golang/glog"

 	controllerapp "k8s.io/kubernetes/cmd/kube-controller-manager/app"
-	_ "k8s.io/kubernetes/plugin/pkg/scheduler/algorithmprovider"


Wow, this is tragic and so symptomatic of the scheduler problems.

deads2k · 2018-04-03T12:01:38Z

pkg/proxy/unidler/unidlerproxy.go

 	newFunc := func(protocol api.Protocol, ip net.IP, port int) (userspace.ProxySocket, error) {
 		return newUnidlerSocket(protocol, ip, port, signaler)
 	}
-	return userspace.NewCustomProxier(loadBalancer, listenIP, iptables, exec, pr, syncPeriod, minSyncPeriod, udpIdleTimeout, newFunc)
+	return userspace.NewCustomProxier(loadBalancer, listenIP, iptables, exec, pr, syncPeriod, minSyncPeriod, udpIdleTimeout, nodePortAddresses, newFunc)


@DirectXMan12 ptal

mfojtik · 2018-04-03T12:02:26Z

@liggitt https://github.com/liggitt/origin/commit/be4d28cd7f303d13c0dc95fb530f386739c5b306

(i was able to spawn 1.10 cluster via cluster up with this, guess it will make some tests pass)

deads2k · 2018-04-03T12:04:45Z

pkg/apps/controller/deployer/deployer_controller.go

@@ -420,6 +421,7 @@ func (c *DeploymentController) makeDeployerPod(deployment *v1.ReplicationControl
 			RestartPolicy:                 v1.RestartPolicyNever,
 			ServiceAccountName:            c.serviceAccount,
 			TerminationGracePeriodSeconds: &gracePeriod,
+			ShareProcessNamespace:         &shareProcessNamespace,


Why is this a thing? Is the default bad somehow?

Also the deployer pod is single-container Pod... From what I saw in kube, this is still an alpha feature that is disabled by default anyway ?

This makes the default explicit, had to set something or explicitly ignore the field in the unit test, which I didn't want to do

deads2k · 2018-04-03T12:11:33Z

hack/lib/build/version.sh

@@ -109,12 +109,9 @@ function os::build::version::kubernetes_vars() {
 	# Try to match the "git describe" output to a regex to try to extract
 	# the "major" and "minor" versions and whether this is the exact tagged
 	# version or whether the tree is between two tagged versions.
-	if [[ "${KUBE_GIT_VERSION}" =~ ^v([0-9]+)\.([0-9]+)(\.[0-9]+)*([-].*)?$ ]]; then
+	if [[ "${KUBE_GIT_VERSION}" =~ ^v([0-9]+)\.([0-9]+)\. ]]; then


@stevekuznetsov @smarterclayton ptal

because it didn't actually work against the output of git describe, and all we care about is the major/minor bits

Can you give the output of git describe? IIRC this will break stuff in OSE

Can you give the output of git describe?

v1.10.0-46-g9070269

IIRC this will break stuff in OSE

nope, this is setting kube minor/major version, which are unset today. this isn't touching anything openshift-related

deads2k · 2018-04-03T12:12:37Z

hack/test-cmd.sh

@@ -19,7 +19,7 @@ function find_tests() {
    local full_test_list=()
    local selected_tests=()

-    full_test_list=( $(find "${OS_ROOT}/test/cmd" -name '*.sh') )
+    full_test_list=( $(find "${OS_ROOT}/test/cmd" -name '*.sh' | sort) )


I prefer unsorted. Can you randomize instead please.

Why? I like knowing how much time I have left in my tests

We found a lot of bugs once we stopped sorting in the past, people were not cleaning up cluster resources in test X and not questioning the state of the universe in test X+N, then writing test X+N to only work when run after text X...

randomizing without being reproducible leads to flakes that magically pass on a subsequent retry

Is that an issue today? You're arguing to change the status quo :)

Is that an issue today?

yeah, hit it while running this suite.

You're arguing to change the status quo :)

will push a revert with the next batch of changes. I don't think the current random+unreproducible state is helpful, but I don't care enough to argue.

deads2k · 2018-04-03T12:13:08Z

pkg/oc/cli/cmd/debug.go

@@ -129,7 +129,7 @@ func NewCmdDebug(fullName string, f *clientcmd.Factory, in io.Reader, out, errou
 	}

 	cmd := &cobra.Command{
-		Use:     "debug RESOURCE/NAME [ENV1=VAL1 ...] [-c CONTAINER] [options] [-- COMMAND]",
+		Use:     "debug RESOURCE/NAME [ENV1=VAL1 ...] [-c CONTAINER] [flags] [-- COMMAND]",


Not that I object, but why do I care about this change?

Because cobra changed to auto append [flags] if you don't include it in your usage, which screws up usage for things like rsh

deads2k · 2018-04-03T12:13:55Z

test/cmd/admin.sh

@@ -117,7 +117,7 @@ echo "certs: ok"
 os::test::junit::declare_suite_end

 os::test::junit::declare_suite_start "cmd/admin/groups"
-os::cmd::expect_success_and_text 'oc adm groups new shortoutputgroup -o name' 'groups/shortoutputgroup'
+os::cmd::expect_success_and_text 'oc adm groups new shortoutputgroup -o name' 'group/shortoutputgroup'


oh, that's neat. The command needs updating. Printer looks off

deads2k · 2018-04-03T12:14:30Z

test/cmd/basicresources.sh

@@ -92,7 +92,7 @@ echo "pods: ok"
 os::test::junit::declare_suite_end

 os::test::junit::declare_suite_start "cmd/basicresources/label"
-os::cmd::expect_success_and_text 'oc create -f examples/hello-openshift/hello-pod.json -o name' 'pod/hello-openshift'
+os::cmd::expect_success_and_text 'oc create -f examples/hello-openshift/hello-pod.json -o name' 'pod.*/hello-openshift'


Not obvious to me. What are the characters between pod and slash?

deads2k · 2018-04-03T12:16:06Z

test/cmd/printer.sh


 # Test that infos printer supports all outputFormat options
 os::cmd::expect_success_and_text 'oc new-app node -o yaml | oc set env -f - MYVAR=value' 'deploymentconfig "node" updated'
 os::cmd::expect_success 'oc new-app node -o yaml | oc set env -f - MYVAR=value -o custom-colums="NAME:.metadata.name"'
 os::cmd::expect_success_and_text 'oc new-app node -o yaml | oc set env -f - MYVAR=value -o yaml' 'apiVersion: v1'
 os::cmd::expect_success_and_text 'oc new-app node -o yaml | oc set env -f - MYVAR=value -o json' '"apiVersion": "v1"'
 os::cmd::expect_success_and_text 'oc new-app node -o yaml | oc set env -f - MYVAR=value -o wide' 'node'
-os::cmd::expect_success_and_text 'oc new-app node -o yaml | oc set env -f - MYVAR=value -o name' 'deploymentconfigs/node'
+os::cmd::expect_success_and_text 'oc new-app node -o yaml | oc set env -f - MYVAR=value -o name' 'deploymentconfig/node'


@juanvallejo @soltysh another command that needs updating.

deads2k · 2018-04-03T12:16:20Z

test/cmd/router.sh

@@ -84,7 +84,7 @@ os::cmd::expect_success 'oc adm policy add-scc-to-user privileged -z ipfailover'
 os::cmd::expect_success_and_text 'oc adm ipfailover --virtual-ips="1.2.3.4" --dry-run' 'Creating IP failover'
 os::cmd::expect_success_and_text 'oc adm ipfailover --virtual-ips="1.2.3.4" --dry-run' 'Success \(dry run\)'
 os::cmd::expect_success_and_text 'oc adm ipfailover --virtual-ips="1.2.3.4" --dry-run -o yaml' 'name: ipfailover'
-os::cmd::expect_success_and_text 'oc adm ipfailover --virtual-ips="1.2.3.4" --dry-run -o name' 'deploymentconfig/ipfailover'
+os::cmd::expect_success_and_text 'oc adm ipfailover --virtual-ips="1.2.3.4" --dry-run -o name' 'deploymentconfig.*/ipfailover'


specificity please

deads2k · 2018-04-03T12:16:45Z

test/cmd/set-liveness-probe.sh

@@ -19,7 +19,7 @@ os::cmd::expect_success_and_text 'oc status --suggest' 'dc/simple-deployment has
 os::cmd::expect_failure_and_text 'oc set probe dc/simple-deployment --liveness --get-url=http://google.com:80 --local' 'You must provide one or more resources by argument or filename'
 # test --dry-run flag with -o formats
 os::cmd::expect_success_and_text 'oc set probe dc/simple-deployment --liveness --get-url=http://google.com:80 --dry-run' 'simple-deployment'
-os::cmd::expect_success_and_text 'oc set probe dc/simple-deployment --liveness --get-url=http://google.com:80 --dry-run -o name' 'deploymentconfigs/simple-deployment'
+os::cmd::expect_success_and_text 'oc set probe dc/simple-deployment --liveness --get-url=http://google.com:80 --dry-run -o name' 'deploymentconfig/simple-deployment'


@juanvallejo @soltysh another command to update

deads2k · 2018-04-03T12:16:58Z

test/cmd/admin.sh

@@ -117,7 +117,7 @@ echo "certs: ok"
 os::test::junit::declare_suite_end

 os::test::junit::declare_suite_start "cmd/admin/groups"
-os::cmd::expect_success_and_text 'oc adm groups new shortoutputgroup -o name' 'groups/shortoutputgroup'
+os::cmd::expect_success_and_text 'oc adm groups new shortoutputgroup -o name' 'group/shortoutputgroup'


@juanvallejo @soltysh another command to update

deads2k · 2018-04-03T12:22:20Z

interesting: webhook auth resolver change lgtm
interesting: policy updates (adds daemonset permissions) origin bits lgtm
interesting: node config changes fail on swap is probably going to cause a problem. I don't think I'd bother messing with the rest though.
interesting: fix upstream unit test selection lgtm
interesting: controller manager wiring lgtm - also, go me! - pull out separately with the other easier wiring and we'll merge early
interesting: scheduler wiring lgtm - also, go me! :)
interesting: node wiring lgtm - also, go me! :) @eparis told you so :)
interesting: stop marking optional flags required - you sure these are optional? pull out separately and merge early?
interesting: inject print handlers lgtm
interesting: test/cmd fixes lgetm - followups needed
interesting: e2e fixes lgtm, but I'd squash the test-cmd one
interesting: integration test fixes lgtm, but just that?
interesting: make runtime-config kube-compatible ugly, but ok
interesting: fix export to use internal objects lgtm, looks like one to backport

And I'm going to take a break. Someone is going to help fix the apiserver wiring, right.... right...?

mfojtik · 2018-04-03T12:24:06Z

pkg/apps/strategy/recreate/recreate.go

-	scaler, _ := kubectl.ScalerFor(kapi.Kind("ReplicationController"), client)
+
+	// TODO: implement for RC?
+	var scalesGetter scaleclient.ScalesGetter


this is now panicking the deployer pods...

@DirectXMan12

It's in a FIXME commit in a WIP PR… not quite ready for review :)

@mfojtik fixed

liggitt · 2018-04-03T16:22:18Z

interesting: node config changes fail on swap is probably going to cause a problem. I don't think I'd bother messing with the rest though.

yeah, openshift/kubernetes@3ca675e effectively changed the default back to false. will do the same for the rebase and follow up with long term plan

interesting: fix export to use internal objects lgtm, looks like one to backport

no backport needed, realized this was fixing up an incorrect choice I made when picking replacements for the factory-provided decoder that went away.

picked things that can go into master now into #19200

sjenning · 2018-04-04T02:01:31Z

re @liggitt

is it correct for this error to be fatal?

I don't think so

if so, how can we correct our e2e env to avoid this error? is it related to e2e running a containerized kubelet?

Upstream used to tolerate this error with retries. There is no indication on the PR that changed it why this was done.

fyi @derekwaynecarr

derekwaynecarr · 2018-04-05T20:43:18Z

@liggitt @sjenning - the issue described from the kubelet is tied to changes from LocalStorageCapacityIsolation feature, which unfortunately even when disabled still has kubelet do actions to figure out rootfs configuration. i am trying to figure out if we can change cAdvisor to handle /tmpfs as a --root-dir, will report back.

liggitt · 2018-04-08T03:00:53Z

/retest

liggitt · 2018-04-08T21:25:50Z

/retest

liggitt · 2018-04-08T21:31:39Z

new commits that need reviewing (in service of fixing the "no kind Namespace found in v1" error the cluster-up tests were hitting when /oapi was unavailable)

UPSTREAM: 62196: Remove need for server connections for dry-run create
UPSTREAM: 62199: Make priority rest mapper handle partial discovery
UPSTREAM: 62234: Handle partial group and resource responses consistently

liggitt · 2018-04-08T23:19:38Z

seeing flakes in gcp (1-2 different failures in successive runs)

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/19137/test_pull_request_origin_extended_conformance_gce/18737/

The HAProxy router should expose prometheus metrics for a route

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/19137/test_pull_request_origin_extended_conformance_gce/18751/

The HAProxy router converges when multiple routers are writing conflicting status
prune builds based on settings in the buildconfig successful, failed, canceled, and errored builds should be pruned based on settings in the build config

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/19137/test_pull_request_origin_extended_conformance_gce/18728/

deploymentconfigs keep the deployer pod invariant valid [Conformance] should deal with cancellation after deployer pod succeeded

liggitt · 2018-04-08T23:19:51Z

/retest

mfojtik · 2018-04-09T08:16:05Z

@liggitt i believe that flakes are pre-existing

mfojtik · 2018-04-09T08:17:23Z

@liggitt @deads2k green \o/ I vote for merging this and deal with the follow ups

soltysh · 2018-04-09T08:28:20Z

👍 from me to merging as is

mfojtik · 2018-04-09T08:32:25Z

/lgtm

openshift-ci-robot · 2018-04-09T08:32:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, mfojtik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [liggitt,mfojtik]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mfojtik · 2018-04-09T08:46:39Z

@Kargakis the tide does not seem to handle merging of this PR, should we merge this by hand or is there some special label that enable merging for this one ?

0xmichalis · 2018-04-09T08:53:15Z

By hand for now.

mfojtik · 2018-04-09T08:53:43Z

Green merging this after talking to @Kargakis
The tests are green, the queue is disabled, so there is no conflict. Fingers crossed.

mfojtik · 2018-04-09T09:04:29Z

@liggitt awesome job!

sttts · 2018-04-09T10:09:39Z

awesome job!

Indeed!

liggitt · 2018-04-09T10:37:39Z

Go ahead and pause the origin->kubernetes publishing until we record the origin sha and adjust the bot to publish to the origin-3.10-kubernetes-1.10.0 branch

mfojtik · 2018-04-09T11:37:11Z

@liggitt paused

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 28, 2018

openshift-ci-robot requested review from deads2k and smarterclayton March 28, 2018 20:17

openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 28, 2018

openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 31, 2018

deads2k reviewed Apr 3, 2018

View reviewed changes

mfojtik reviewed Apr 3, 2018

View reviewed changes

liggitt mentioned this pull request Apr 3, 2018

Pre-rebase fixes #19200

Merged

openshift deleted a comment from mfojtik Apr 3, 2018

mfojtik added the queue/critical-fix label Apr 9, 2018

openshift-ci-robot assigned mfojtik Apr 9, 2018

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 9, 2018

mfojtik merged commit d67b8ce into openshift:master Apr 9, 2018

liggitt deleted the rebase-1.10.0 branch April 9, 2018 11:54

nak3 mentioned this pull request Apr 30, 2018

Bash completion displays hidden and deprecated flags #19256

Closed

rebase 1.10.0 #19137

rebase 1.10.0 #19137

Conversation

liggitt commented Mar 28, 2018 • edited Loading

liggitt commented Apr 3, 2018

Choose a reason for hiding this comment

liggitt Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfojtik commented Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Apr 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Apr 3, 2018

sjenning commented Apr 4, 2018 • edited Loading

derekwaynecarr commented Apr 5, 2018

liggitt commented Apr 8, 2018

liggitt commented Apr 8, 2018

liggitt commented Apr 8, 2018

liggitt commented Apr 8, 2018

liggitt commented Apr 8, 2018

mfojtik commented Apr 9, 2018

mfojtik commented Apr 9, 2018

soltysh commented Apr 9, 2018

mfojtik commented Apr 9, 2018

openshift-ci-robot commented Apr 9, 2018

mfojtik commented Apr 9, 2018

0xmichalis commented Apr 9, 2018

mfojtik commented Apr 9, 2018

mfojtik commented Apr 9, 2018

sttts commented Apr 9, 2018

liggitt commented Apr 9, 2018

mfojtik commented Apr 9, 2018

liggitt commented Mar 28, 2018 •

edited

Loading

liggitt Apr 3, 2018 •

edited

Loading

mfojtik commented Apr 3, 2018 •

edited

Loading

sjenning commented Apr 4, 2018 •

edited

Loading