Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gcp-deployer] Setup oauth2 id & secret; insert service account keys via backend service #1255

Merged
merged 5 commits into from
Aug 7, 2018

Conversation

kunmingg
Copy link
Contributor

@kunmingg kunmingg commented Jul 20, 2018

[gcp-deployer]
Web app will integrate with a go backend service.
All future api calls dealing with GCP/k8s resources management show go through backend.
Backend service update will come in separate PRs.


This change is Reviewable

@kunmingg kunmingg requested review from jlewi and yebrahim July 20, 2018 23:00
@kunmingg kunmingg changed the title insert service account keys to GKE cluster [gcp-deployer] insert service account keys to GKE cluster Jul 20, 2018
@kunmingg
Copy link
Contributor Author

/hold

@jlewi
Copy link
Contributor

jlewi commented Jul 21, 2018

/lgtm
/assign @yebrahim

Looks good to me but lets wait for Yasser.

@kunmingg Why is there a hold on the PR?

Copy link
Contributor

@yebrahim yebrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this. I do have a few comments though, especially when we can run the code to get the cluster endpoint.

const token = await Gapi.getToken();
if (!token) {
this.setState({
error: 'You are not signed in',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point the user must be signed in, and some other error must have occurred, so we should just say that, something like: "Error occurred getting Google ID token." This error shouldn't happen though.

@@ -402,8 +404,43 @@ export default class DeployForm extends React.Component<any, DeployFormState> {
});
});

// Step 4: In cluster resource set up
const endpoint = Gapi.getClusterEndpoint(project, this.state.zone, deploymentName);
k8s.Config.defaultClient();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this line.

@@ -412,6 +449,8 @@ export default class DeployForm extends React.Component<any, DeployFormState> {

const servicesToEnable = new Set([
'deploymentmanager.googleapis.com',
'servicemanagement.googleapis.com',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this in the list initially, but I never saw it when listing services for the project, which is what I'm using to decide if a service is enabled, so the code ends up retrying to enable this service over and over until it quits. Are you sure this would work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we keeping this? I think it'll cause the service enable API to keep looping forever.

name: `projects/${projectId}/serviceAccounts/${serviceAccounts}`
}).then(response => response.result,
badResult => {
throw new Error('Errors creating key for serviceAccount ${serviceAccounts}');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add the error here (even by JSON stringifying it) in the error message, so that the user at least has an idea what might've gone wrong, even if it's a big ugly error object.

@@ -47,6 +47,26 @@ export default class Gapi {

}

public static iam = class {

public static async createKey(projectId: string, serviceAccounts: string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2 space indentation all over.

return user ? user.getAuthResponse().id_token : null;
}

public static async getClusterEndpoint(projectId: string, zone: string, clusterId: string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation.

path: `https://container.googleapis.com/v1/projects/${projectId}/zones/${zone}/clusters/${clusterId}`
}).then(response => (response.result as any).endpoint,
badResult => {
throw new Error('Errors getting cluster endpoint: ' + flattenDeploymentOperationError(badResult.result));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this method would work on this error object structure, if you can verify that, we can use this method but let's rename it, maybe "flattenResponseError" or something.

@@ -402,8 +404,43 @@ export default class DeployForm extends React.Component<any, DeployFormState> {
});
});

// Step 4: In cluster resource set up
const endpoint = Gapi.getClusterEndpoint(project, this.state.zone, deploymentName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work here. Cluster creation + kubeflow deployment will take several minutes, so we can't just issue this call right away, we'll need some experience for waiting.
If this code is here as a placeholder, can we either comment it out or gate it on a condition? I can work with it as I'm editing the waiting experience.

@kunmingg
Copy link
Contributor Author

@yebrahim @jlewi
Thanks for reviewing, tested locally, found one blocker:
chrome won't allow requests to self-signed-cert address (like our k8s cluster).
Haven't found neat workaround...

How about give master ip a DNS though cloud endpoint?
Thoughts?

@jlewi
Copy link
Contributor

jlewi commented Jul 27, 2018

@kunmingg That won't work; we need the secret in order to be able to create the endpoint and get the signed certificate. In chrome if you try to navigate to a page with a self-signed cert it will give a warning and you can click through to accept it. is it not possible to do that in JS code?

@jlewi
Copy link
Contributor

jlewi commented Jul 27, 2018

@kunmingg @yebrahim Could we prompt the user to install the self signed cert for the K8s master in their browser?

@kunmingg
Copy link
Contributor Author

Found an workaround, now need another service:
#1276

@yebrahim
Copy link
Contributor

What kind of auth is currently used for the kubeflow cluster by default? If it's just basic username/password auth, then you don't want to open CORS for anyone to talk to the k8s master.

@yebrahim
Copy link
Contributor

@jlewi I don't know if that's doable, never come across this before. This seems like just a CORS issue though? If we add localhost and kubeflow.org to CORS whitelist origins, and use Bearer token for auth, should this take care of it?

@kunmingg
Copy link
Contributor Author

@yebrahim
To add our hosts to CORS whitelist origins, I plan to host a CORS proxy on kubeflow.org (#1276).
(or let user use chrome extension)
Sounds good?

@yebrahim
Copy link
Contributor

If you're planning to add the hosts to the origins whitelist, why do you need the proxy? The browser should just be able talk directly to the k8s api server.

@jlewi
Copy link
Contributor

jlewi commented Jul 27, 2018

A proxy is highly undesirable. We don't want users to have to provide their credentials to a backend service they don't control.

@yebrahim We are connecting to the K8s master e.g.
https://XX.XXX.XXX

And we authenticate using a bearer token in the authorization header. So its not localhost; its the ip of the K8s master.

@kunmingg
Copy link
Contributor Author

@yebrahim
We don't own k8s api server, can not directly edit from there.

@yebrahim
Copy link
Contributor

@kunmingg I was under the assumption you can specify the origin whitelist while deploying the k8s cluster, is that not correct?

@kunmingg
Copy link
Contributor Author

@yebrahim
I think on GKE we do not have access to k8s master config.

@yebrahim
Copy link
Contributor

If this is the case, that's a huge bummer!
We should make sure of this before we write a server component to proxy requests. From brief googling, I can see there is a CORS annotation for nginx ingress controller, but I don't know if kubeflow is using that.

Otherwise, seems like a server piece is unavoidable, @jlewi. The API server just won't accept requests coming from the browser without the CORS annotation.

@jlewi
Copy link
Contributor

jlewi commented Jul 29, 2018

@kunmingg @yebrahim My suggestion is to not block this PR on the CORS issue. I believe its possible to disable CORS in the browser using various extensions.

I think here's one for chrome
https://chrome.google.com/webstore/detail/allow-control-allow-origi/nlfbmbojpeacfghkpbjhddihlkkiljbi?hl=en

So my suggestion is that we use such extensions as a workaround for now so we can continue to make progress.

Thoughts?

@kunmingg
Copy link
Contributor Author

kunmingg commented Jul 30, 2018

@jlewi @yebrahim
I can deploy k8s components through deployment manager:
we might have to deploy multi times per click.

In future if we change bootstrapper server to a k8s operator, we can deploy all components without accessing k8s API directly.

@jlewi
Copy link
Contributor

jlewi commented Jul 30, 2018

Which K8s resources? I think using DM to create a single resource e.g. the bootstrapper/click to deploy APP might be ok.

Overall though I think using DM to create K8s resources leads to bad UX because its unclear if a user wants to update those resources whether to do so using DM or by talking directly to K8s ApiServer.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Jul 31, 2018
Copy link
Contributor

@yebrahim yebrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just one or two critical things, the rest is nits.

this._appendLine("Cluster endpoint not available yet.")
});
if (!curStatus) {
wait(getTimeout);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to await this.

this._generateServiceAccountSecret(project, deploymentName, 'admin'));
k8sApi.createNamespacedSecret('kubeflow',
this._generateServiceAccountSecret(project, deploymentName, 'user'));
const saKey = await Gapi.iam.createKey(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use a more expressive name? maybe serviceAccountKey?

}
kubeflowUtil.properties.clusterType = this.state.deploymentName + '-type';
kubeflowUtil.properties.saKey = saKey.privateKeyData;
kubeflowUtil.properties.clientId = window.btoa(this.state.clientId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't need the window. bit, you can call btoa directly.

return gapi.client.request({
method: 'POST',
path: `https://iam.googleapis.com/v1/projects/${projectId}/serviceAccounts/${serviceAccounts}/keys`,
}).then(response => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can simplify this to one line: .then(response => response.result as CreateKeyResponse)

return response.result as CreateKeyResponse;
},
badResult => {
throw new Error('Errors creating key for serviceAccount ${serviceAccounts}: '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use backticks "`" for string formatting.

name: admin-gcp-sa
type: Opaque
data:
admin-gcp-sa.json: {{ properties["saKey"] }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a smart idea; using DM to insert the service account key into the cluster. Unfortunately this means the secret key would be stored in DM and viewable by anyone with access to the project. I don't think we want that.

An alternative approach would be to just use the service account as the service account for the GKE nodes.
The downside of that approach would be that it means all pods on that node would run with elevated privileges which would be undesirable.

Its possible that we could use RBAC to restrict what can run on a specific node pool so that other processes couldn't run on that node pool.

Copy link
Contributor Author

@kunmingg kunmingg Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently our code assume those keys will be stored in k8s cluster, so anyone with project access can view from cluster secrets?

Another way is maybe we can expire those keys once we finish deploy?
Then user need to rely on their own credentials to access GCS / GCR / big query, which should be better for access control.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using VM account for nodes, we should be able to avoid exposing secrets.
Handled in separate PR: #1302

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting them in DM might be an enhanced risk because K8s secrets are namespace scoped resource so not everyone might have access to the namespace. I'm not even sure project viewers do.

@jlewi
Copy link
Contributor

jlewi commented Aug 2, 2018

@kunmingg it looks like this PR is doing two things

  1. Inserting Service accounts via DM
  2. Adding OAuthClientID and OAuthSecret for IAP

Per our discussion today I think the plan is to have a go backend. So I think we can just have the go backend create the secrets and insert them in the cluster. The go backend can talk directly to the K8s master.

@kunmingg
Copy link
Contributor Author

kunmingg commented Aug 3, 2018

Update:
per our discussion, will use backend to deal with k8s resource management.
Separate PRs will take care backend part.

@kunmingg kunmingg changed the title [gcp-deployer] insert service account keys to GKE cluster [gcp-deployer] Setup oauth2 id & secret; insert service account keys via backend service Aug 3, 2018
@kunmingg
Copy link
Contributor Author

kunmingg commented Aug 4, 2018

/retest

add client id & secret fields;
manage k8s resource through deployment manager;
From now on web app create k8s resources through backend service.
@@ -25,6 +26,9 @@ import clusterJinjaPath from './configs/cluster.jinja';
// selects auto domain then we should automatically supply the suffix
// <hostname>.endpoints.<Project>.cloud.goog

// Assume user access app via kubectl proxy
const BackendAddress = 'http://127.0.0.1:8001/api/v1/namespaces/default/services/kubeflow-controller:8080/proxy/';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to make this assumption? Can we parameterize this based on how the user accessed the click to deploy app?

What if we had our go backend serve the click to deploy app? i.e. can we just add a handler to our go server that returns the javascript click to deploy app?

Copy link
Contributor Author

@kunmingg kunmingg Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make it a parameter in yaml spec.
Serving from go side might be even better, I have other PRs for backend change, let's pick it up there later.

@jlewi
Copy link
Contributor

jlewi commented Aug 5, 2018

Thanks. Can you update the PR description as well please?

Copy link
Contributor

@jlewi jlewi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 20 unresolved discussions (waiting on @yebrahim, @jlewi, @kunmingg, @gaocegege, and @ankushagarwal)


components/gcp-click-to-deploy/kf_app.yaml, line 1 at r3 (raw file):

apiVersion: v1

Add a comment explaining what this manifest is for.


components/gcp-click-to-deploy/kf_app.yaml, line 40 at r3 (raw file):

        image: gcr.io/kubeflow-images-public/bootstrapper:latest
        workingDir: /opt/bootstrap
        command: [ "/opt/kubeflow/bootstrapper"]

nit put args into command; command should just be a list containing the binary to run and then the command line arguments.


components/gcp-click-to-deploy/kf_app.yaml, line 44 at r3 (raw file):

          "--in-cluster",
          "--namespace=kubeflow",
          "--apply",

What is the --apply argument doing shouldn't we be invoking it in daemon mode?


components/gcp-click-to-deploy/src/DeployForm.tsx, line 408 at r1 (raw file):

Previously, yebrahim (Yasser Elsayed) wrote…

I don't think this will work here. Cluster creation + kubeflow deployment will take several minutes, so we can't just issue this call right away, we'll need some experience for waiting.
If this code is here as a placeholder, can we either comment it out or gate it on a condition? I can work with it as I'm editing the waiting experience.

Is this fixed?


components/gcp-click-to-deploy/src/DeployForm.tsx, line 30 at r3 (raw file):

interface DeployFormState {

Using an argument and specifying the YAML seems good to me.

@jlewi
Copy link
Contributor

jlewi commented Aug 6, 2018

Per our discussion I think all the logic for deploying things should now move into the go backend; but we can do that in a follow on PR.

@jlewi
Copy link
Contributor

jlewi commented Aug 6, 2018

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ankushagarwal ankushagarwal removed their request for review August 7, 2018 05:16
@kunmingg
Copy link
Contributor Author

kunmingg commented Aug 7, 2018

/hold cancel

@k8s-ci-robot k8s-ci-robot merged commit 239dcc2 into kubeflow:master Aug 7, 2018
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021
…via backend service (kubeflow#1255)

* insert service account keys to GKE cluster

* handle review feedbacks;
add client id & secret fields;
manage k8s resource through deployment manager;

* Refactor:
From now on web app create k8s resources through backend service.

* rebase

* patch yaml spec missing pieces, make backend URL configurable through yaml per user's need
surajkota pushed a commit to surajkota/kubeflow that referenced this pull request Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants