-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow bootstrap configuration to be configured and reentrant #16571
Allow bootstrap configuration to be configured and reentrant #16571
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
@deads2k bootstrapping |
b76d21a
to
6704770
Compare
// BootstrapConfigName is the name of a config map to read node-config.yaml from. | ||
BootstrapConfigName string | ||
// BootstrapConfigNamespace is the namespace the config map for bootstrap config is expected to load from. | ||
BootstrapConfigNamespace string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why another configurable namespace? Sounds like another thing we will end up removing in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is configurable from the node, not from the master. Nodes are allowed to point to different namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is configurable from the node, not from the master. Nodes are allowed to point to different namespaces.
I think the "why" question still stands? Why not be prescriptive, at least until someone has a compelling subdivision use-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't as a general rule lock clients into namespaces. I don't see a trend away from clients owning their destiny, and I don't believe in being overly restrictive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't as a general rule lock clients into namespaces. I don't see a trend away from clients owning their destiny, and I don't believe in being overly restrictive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don't as a general rule lock clients into namespaces. I don't see a trend away from clients owning their destiny, and I don't believe in being overly restrictive here.
summarizing in-person discussion: not configurable in the server, configurable in the client if someone really wants to wire it.
@@ -120,6 +122,7 @@ const ( | |||
NodeProxierRoleBindingName = NodeProxierRoleName + "s" | |||
NodeAdminRoleBindingName = NodeAdminRoleName + "s" | |||
NodeReaderRoleBindingName = NodeReaderRoleName + "s" | |||
NodeConfigReaderRoleBindingName = NodeConfigReaderRoleName + "s" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more + "s"
please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it still there for the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it still there for the others?
Legacy. We stopped doing it upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the comment in the code saying that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably separate into section with header and footer to that effect
|
||
func GetBootstrapNodeConfigProvisioningRoleBindings(namespace string) []rbac.RoleBinding { | ||
return []rbac.RoleBinding{ | ||
newOriginRoleBindingForClusterRole(NodeConfigReaderRoleBindingName, NodeConfigReaderRoleName, namespace). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing new should use newOriginRoleBindingForClusterRole
@@ -1083,3 +1092,11 @@ var rolesToShow = sets.NewString( | |||
"system:image-pusher", | |||
"view", | |||
) | |||
|
|||
func GetBootstrapNodeConfigProvisioningRoleBindings(namespace string) []rbac.RoleBinding { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No namespace configuration please.
6704770
to
9a6fa55
Compare
@@ -1099,3 +1108,11 @@ func GetBootstrapNamespaceRoleBindings() map[string][]rbac.RoleBinding { | |||
} | |||
return ret | |||
} | |||
|
|||
func GetBootstrapNodeConfigProvisioningRoleBindings(namespace string) []rbac.RoleBinding { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This namespace should be pinnned. @enj did you pull merge yet?
@@ -666,6 +666,15 @@ func GetOpenshiftBootstrapClusterRoles() []rbac.ClusterRole { | |||
}, | |||
{ | |||
ObjectMeta: metav1.ObjectMeta{ | |||
Name: NodeConfigReaderRoleName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it should be namespace scoped. #16517 made it easily possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make this a role since we remove the namespace configuration.
pkg/cmd/server/origin/master.go
Outdated
@@ -251,6 +251,7 @@ func (c *MasterConfig) Run(kubeAPIServerConfig *kubeapiserver.Config, controller | |||
} | |||
|
|||
// add post-start hooks | |||
aggregatedAPIServer.GenericAPIServer.AddPostStartHookOrDie("node.openshift.io-sharednamespace", c.ensureOpenShiftNodeNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after #16517 this won't be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we auto init namespaces in the role bindings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to the shared mode.
Why is shared resource namespaces still its own hook?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is shared resource namespaces still its own hook?
cruft
@@ -212,12 +212,20 @@ func (o NodeOptions) loadBootstrap(hostnames []string, nodeConfigDir string) err | |||
|
|||
// try to refresh the latest node-config.yaml | |||
o.ConfigFile = filepath.Join(nodeConfigDir, "node-config.yaml") | |||
config, err := c.Core().ConfigMaps("kube-system").Get("node-config", metav1.GetOptions{}) | |||
config, err := c.Core().ConfigMaps(o.NodeArgs.BootstrapConfigNamespace).Get(o.NodeArgs.BootstrapConfigName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't seem like the namespace should be configurable. Name yes, but I don't see why we'd let people change the namespace.
@@ -212,12 +212,20 @@ func (o NodeOptions) loadBootstrap(hostnames []string, nodeConfigDir string) err | |||
|
|||
// try to refresh the latest node-config.yaml | |||
o.ConfigFile = filepath.Join(nodeConfigDir, "node-config.yaml") | |||
config, err := c.Core().ConfigMaps("kube-system").Get("node-config", metav1.GetOptions{}) | |||
config, err := c.Core().ConfigMaps(o.NodeArgs.BootstrapConfigNamespace).Get(o.NodeArgs.BootstrapConfigName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're re-using this error down below. Please give it a special name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're re-using this error down below. Please give it a special name.
error name still outstanding
} | ||
|
||
return err | ||
// if there is no node-config.yaml and no server config map, generate one | ||
glog.V(2).Infof("Generating a local configuration since no server config available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"since no server or local config available"
@@ -100,7 +100,7 @@ func newKubeControllerManager(kubeconfigFile, saPrivateKeyFile, saRootCAFile, po | |||
cmdLineArgs["cluster-signing-key-file"] = []string{""} | |||
} | |||
if _, ok := cmdLineArgs["experimental-cluster-signing-duration"]; !ok { | |||
cmdLineArgs["experimental-cluster-signing-duration"] = []string{"0s"} | |||
cmdLineArgs["experimental-cluster-signing-duration"] = []string{"720h"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer much shorter to flush out refreshing errors. o(12 hours)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't work in 3.7 for client certs so we're going to do it on the node level or backport the required changes. Also, the goal is to get to bootstrapping first, then tighten rotation, so I don't want to block on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't work in 3.7 for client certs so we're going to do it on the node level or backport the required changes. Also, the goal is to get to bootstrapping first, then tighten rotation, so I don't want to block on that.
This looks like a time bomb. Are you saying you're going to be manually refereshing certificates or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh so what happens after a month?
return err | ||
} | ||
return nil | ||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you are suppose to bootstrap and you can't, I think you should fail, not use a local config instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Although should probably be based on whether you asked for a config or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn on this. An empty config map should result in a default. A missing config map or typod map seems like an error. But at the same time a 403 or other error (especially for the default config) seems borderline arbitrary. A node having a config you didn't expect is bad, a node being down because of a correlated auth mixup also seems bad.
This PR does a lot more than "correcting small issues", for example it permits to configure stuff that some people believe should be hard coded (eg namespaces). Can you please give a better explanation of what is the aim of this PR @smarterclayton ? |
@smarterclayton Could you please explain why you are proposing @enj is also not keen on configurable config namespace. |
Nodes configure themselves against masters. Nodes need to be able to
bulk configured to a class of config (name). We don't hardcode
namespaces for config lookup in general principle. Admins are free to
point configuration elsewhere. There is nothing special about the
config source location - especially for nodes that may be owned by
separate security partitions.
Bootstrapping is the replacement for static config and will be the
default mode for online 3.7+ and future openshift deployments. The
option to statically deliver config is preserved for bare metal
environments.
|
9a6fa55
to
8b7da17
Compare
Seeing multiple rebinds using the new way for namespace mappings: #16611 |
Note that I in general agree with the "don't make things configurable that
you don't want to be configurable". On the server side, the namespaces we
create in bootstrap and our core setup definitely fall under that.
But compiling in our binaries that act as clients of the system to say "I
will not accept results that aren't from a precanned namespace" crosses the
line a bit into being too forceful. If someone wants to bind their
node-config with their kubelet config and set it in kube-system because
that's the only place the new kubelet will look, I'm ok with that being an
option for a client.
…On Thu, Sep 28, 2017 at 8:39 PM, OpenShift CI Robot < ***@***.***> wrote:
@smarterclayton <https://github.com/smarterclayton>: The following tests
*failed*, say /retest to rerun them all:
Test name Commit Details Rerun command
ci/openshift-jenkins/extended_conformance_gce 9a6fa55
<9a6fa55>
link
<https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16571/test_pull_request_origin_extended_conformance_gce/8768/> /test
extended_conformance_gce
ci/openshift-jenkins/integration 8b7da17
<8b7da17>
link
<https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16571/test_pull_request_origin_integration/8603/> /test
integration
Full PR test history <https://openshift-gce-devel.appspot.com/pr/16571>. Your
PR dashboard <https://openshift-gce-devel.appspot.com/pr/smarterclayton>.
Please help us cut down on flakes by linking to
<https://github.com/kubernetes/community/blob/master/contributors/devel/flaky-tests.md#filing-issues-for-flaky-tests>
an open issue
<https://github.com/openshift/origin/issues?q=is:issue+is:open> when you
hit one in your PR.
Instructions for interacting with me using PR comments are available here
<https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md>.
If you have questions or suggestions related to my behavior, please file an
issue against the kubernetes/test-infra
<https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
repository. I understand the commands that are listed here
<https://github.com/kubernetes/test-infra/blob/master/commands.md>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16571 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p5PiRustT6_5O5CCsFnvsKxHuMaDks5snDwkgaJpZM4Pk3C7>
.
|
8b7da17
to
51410f3
Compare
d5b2d8e
to
d6a7486
Compare
- RotateKubeletClientCertificate=true,RotateKubeletServerCertificate=true | ||
masterClientConnectionOverrides: | ||
acceptContentTypes: application/vnd.kubernetes.protobuf,application/json | ||
burst: 40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qps looks ok, but burst looks small
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually our default.
hostPID: true | ||
containers: | ||
- name: network | ||
image: openshift/node:v3.7.0-alpha.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a parameterized version/tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an example, wanted to have the larger discussion about how static config flows from openshift/origin -> ansible first. Wanted to have something checked in that should be reproducible for a bit, then will be made formal and moved out of here.
// Start up a metrics server if requested | ||
if len(c.ProxyConfig.MetricsBindAddress) > 0 { | ||
mux := http.NewServeMux() | ||
mux.HandleFunc("/proxyMode", func(w http.ResponseWriter, r *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name it alpha or something? this is like config-lite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with upstream, so we can't rename it.
|
||
MasterCertDir string | ||
ConfigDir flag.StringFlag | ||
|
||
AllowDisabledDocker bool | ||
// VolumeDir is the volume storage directory. | ||
VolumeDir string | ||
// VolumeDirProvided is set to true if the user has specified this flag. | ||
VolumeDirProvided bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blech
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold your nose a month or so longer and most of this code dies and goes away and we're all happy.
I still think you should remove lgtm otherwise. |
d6a7486
to
d349f9b
Compare
/test all Updated to remove the |
The cert may have expired.
When we mount /var/run/openshift-sdn into the container, we need to be able to clear its contents but the directory itself cannot be removed as it is a mount point. Also clarify one error.
This makes running in a separate process for networks able to have a health check and for metrics to be reported.
CFSSL throws an opaque error, and bootstrapping requires user intervention to configure anyway.
This allows the kubelet to be configured to load default configuration out of a known namespace. By default it is openshift-node/node-config. Correct an error in bootstrapping where errors weren't logged, and properly ignore forbidden errors when trying to load the config map. Add a better description of bootstrapping to openshift start node. Ensure the volume directory is correctly loaded from node-config during bootstrapping instead of being overwritten into the config directory. Enable client and server rotation on the node automatically when bootstrapping, and only do a client certificate creation (server bootstrapping done by kubelet only). This unfortunately requires setting a fake value in the node config that will be cleared later - as we are moving towards a future where node-config does not exist this entire section will likely go away. Relax validation on node-config to allow cert-dir to be provided instead of explicit certificates. bootstrap
Other config variants will be stored in this location. The new namespace ensures clean security isolation.
Need to be able to take node-config from bootstrap node. For openshift start network the --kubeconfig flag from the CLI overrides the value of masterKubeConfig in the provided node config. If the value is empty (like it is by default) the in-cluster-config is used. Reorganize the node startup slightly so there is even less overlap between kubelet and network. A future change will completely separate these two initialization paths.
d349f9b
to
ae05ccd
Compare
@smarterclayton: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
Applying label |
Automatic merge from submit-queue. |
Make bootstrapping a real production node possible.
--bootstrap-config-name
can be used to customize which config is looked up (one per node group)openshift start network
work podifiedThere is still one bug outstanding upstream that can be fixed separately - the kubelet client rotation can fail due to the cert expiring and be unable to get new certs, so it never exits.
Tested the following scenario extensively (requires a new openshift/node image tagged as v3.7.0-alpha.1):
oc create configmap -n openshift-node node-config --from-file=node-config.yaml=contrib/kubernetes/default-node-config.yaml
openshift start node --bootstrap-config-name=node-config --config=/etc/origin/node/node-config.yaml --enable=kubelet --loglevel=3
(which has it run only the kubelet)oc observe csr -- oc adm certificate approve
to approve both csroc create -f contrib/kubernetes/static/network-policy.yaml
oc create -f contrib/kubernetes/static/network-daemonset.yaml
oc run --restart=Never --attach -it --image=centos:7 -- /bin/bash
and thenyum install bind-utils -y && dig +search kubernetes.default.svc
Follow up for the daemonset - openshift-sdn expects to have access to the dockershim.sock which this doesn't bind mount in.