-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gcp-deployer] Setup oauth2 id & secret; insert service account keys via backend service #1255
Conversation
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking care of this. I do have a few comments though, especially when we can run the code to get the cluster endpoint.
const token = await Gapi.getToken(); | ||
if (!token) { | ||
this.setState({ | ||
error: 'You are not signed in', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point the user must be signed in, and some other error must have occurred, so we should just say that, something like: "Error occurred getting Google ID token." This error shouldn't happen though.
@@ -402,8 +404,43 @@ export default class DeployForm extends React.Component<any, DeployFormState> { | |||
}); | |||
}); | |||
|
|||
// Step 4: In cluster resource set up | |||
const endpoint = Gapi.getClusterEndpoint(project, this.state.zone, deploymentName); | |||
k8s.Config.defaultClient(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this line.
@@ -412,6 +449,8 @@ export default class DeployForm extends React.Component<any, DeployFormState> { | |||
|
|||
const servicesToEnable = new Set([ | |||
'deploymentmanager.googleapis.com', | |||
'servicemanagement.googleapis.com', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this in the list initially, but I never saw it when listing services for the project, which is what I'm using to decide if a service is enabled, so the code ends up retrying to enable this service over and over until it quits. Are you sure this would work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we keeping this? I think it'll cause the service enable API to keep looping forever.
name: `projects/${projectId}/serviceAccounts/${serviceAccounts}` | ||
}).then(response => response.result, | ||
badResult => { | ||
throw new Error('Errors creating key for serviceAccount ${serviceAccounts}'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add the error here (even by JSON stringifying it) in the error message, so that the user at least has an idea what might've gone wrong, even if it's a big ugly error object.
@@ -47,6 +47,26 @@ export default class Gapi { | |||
|
|||
} | |||
|
|||
public static iam = class { | |||
|
|||
public static async createKey(projectId: string, serviceAccounts: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 2 space indentation all over.
return user ? user.getAuthResponse().id_token : null; | ||
} | ||
|
||
public static async getClusterEndpoint(projectId: string, zone: string, clusterId: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation.
path: `https://container.googleapis.com/v1/projects/${projectId}/zones/${zone}/clusters/${clusterId}` | ||
}).then(response => (response.result as any).endpoint, | ||
badResult => { | ||
throw new Error('Errors getting cluster endpoint: ' + flattenDeploymentOperationError(badResult.result)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this method would work on this error object structure, if you can verify that, we can use this method but let's rename it, maybe "flattenResponseError
" or something.
@@ -402,8 +404,43 @@ export default class DeployForm extends React.Component<any, DeployFormState> { | |||
}); | |||
}); | |||
|
|||
// Step 4: In cluster resource set up | |||
const endpoint = Gapi.getClusterEndpoint(project, this.state.zone, deploymentName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work here. Cluster creation + kubeflow deployment will take several minutes, so we can't just issue this call right away, we'll need some experience for waiting.
If this code is here as a placeholder, can we either comment it out or gate it on a condition? I can work with it as I'm editing the waiting experience.
@kunmingg That won't work; we need the secret in order to be able to create the endpoint and get the signed certificate. In chrome if you try to navigate to a page with a self-signed cert it will give a warning and you can click through to accept it. is it not possible to do that in JS code? |
Found an workaround, now need another service: |
What kind of auth is currently used for the kubeflow cluster by default? If it's just basic username/password auth, then you don't want to open CORS for anyone to talk to the k8s master. |
@jlewi I don't know if that's doable, never come across this before. This seems like just a CORS issue though? If we add localhost and kubeflow.org to CORS whitelist origins, and use Bearer token for auth, should this take care of it? |
If you're planning to add the hosts to the origins whitelist, why do you need the proxy? The browser should just be able talk directly to the k8s api server. |
A proxy is highly undesirable. We don't want users to have to provide their credentials to a backend service they don't control. @yebrahim We are connecting to the K8s master e.g. And we authenticate using a bearer token in the authorization header. So its not localhost; its the ip of the K8s master. |
@yebrahim |
@kunmingg I was under the assumption you can specify the origin whitelist while deploying the k8s cluster, is that not correct? |
@yebrahim |
If this is the case, that's a huge bummer! Otherwise, seems like a server piece is unavoidable, @jlewi. The API server just won't accept requests coming from the browser without the CORS annotation. |
@kunmingg @yebrahim My suggestion is to not block this PR on the CORS issue. I believe its possible to disable CORS in the browser using various extensions. I think here's one for chrome So my suggestion is that we use such extensions as a workaround for now so we can continue to make progress. Thoughts? |
Which K8s resources? I think using DM to create a single resource e.g. the bootstrapper/click to deploy APP might be ok. Overall though I think using DM to create K8s resources leads to bad UX because its unclear if a user wants to update those resources whether to do so using DM or by talking directly to K8s ApiServer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, just one or two critical things, the rest is nits.
this._appendLine("Cluster endpoint not available yet.") | ||
}); | ||
if (!curStatus) { | ||
wait(getTimeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to await
this.
this._generateServiceAccountSecret(project, deploymentName, 'admin')); | ||
k8sApi.createNamespacedSecret('kubeflow', | ||
this._generateServiceAccountSecret(project, deploymentName, 'user')); | ||
const saKey = await Gapi.iam.createKey( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use a more expressive name? maybe serviceAccountKey
?
} | ||
kubeflowUtil.properties.clusterType = this.state.deploymentName + '-type'; | ||
kubeflowUtil.properties.saKey = saKey.privateKeyData; | ||
kubeflowUtil.properties.clientId = window.btoa(this.state.clientId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you don't need the window.
bit, you can call btoa
directly.
return gapi.client.request({ | ||
method: 'POST', | ||
path: `https://iam.googleapis.com/v1/projects/${projectId}/serviceAccounts/${serviceAccounts}/keys`, | ||
}).then(response => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can simplify this to one line: .then(response => response.result as CreateKeyResponse)
return response.result as CreateKeyResponse; | ||
}, | ||
badResult => { | ||
throw new Error('Errors creating key for serviceAccount ${serviceAccounts}: ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use backticks "`" for string formatting.
name: admin-gcp-sa | ||
type: Opaque | ||
data: | ||
admin-gcp-sa.json: {{ properties["saKey"] }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a smart idea; using DM to insert the service account key into the cluster. Unfortunately this means the secret key would be stored in DM and viewable by anyone with access to the project. I don't think we want that.
An alternative approach would be to just use the service account as the service account for the GKE nodes.
The downside of that approach would be that it means all pods on that node would run with elevated privileges which would be undesirable.
Its possible that we could use RBAC to restrict what can run on a specific node pool so that other processes couldn't run on that node pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently our code assume those keys will be stored in k8s cluster, so anyone with project access can view from cluster secrets?
Another way is maybe we can expire those keys once we finish deploy?
Then user need to rely on their own credentials to access GCS / GCR / big query, which should be better for access control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By using VM account for nodes, we should be able to avoid exposing secrets.
Handled in separate PR: #1302
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think putting them in DM might be an enhanced risk because K8s secrets are namespace scoped resource so not everyone might have access to the namespace. I'm not even sure project viewers do.
@kunmingg it looks like this PR is doing two things
Per our discussion today I think the plan is to have a go backend. So I think we can just have the go backend create the secrets and insert them in the cluster. The go backend can talk directly to the K8s master. |
Update: |
/retest |
add client id & secret fields; manage k8s resource through deployment manager;
@@ -25,6 +26,9 @@ import clusterJinjaPath from './configs/cluster.jinja'; | |||
// selects auto domain then we should automatically supply the suffix | |||
// <hostname>.endpoints.<Project>.cloud.goog | |||
|
|||
// Assume user access app via kubectl proxy | |||
const BackendAddress = 'http://127.0.0.1:8001/api/v1/namespaces/default/services/kubeflow-controller:8080/proxy/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to make this assumption? Can we parameterize this based on how the user accessed the click to deploy app?
What if we had our go backend serve the click to deploy app? i.e. can we just add a handler to our go server that returns the javascript click to deploy app?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make it a parameter in yaml spec.
Serving from go side might be even better, I have other PRs for backend change, let's pick it up there later.
Thanks. Can you update the PR description as well please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 6 files reviewed, 20 unresolved discussions (waiting on @yebrahim, @jlewi, @kunmingg, @gaocegege, and @ankushagarwal)
components/gcp-click-to-deploy/kf_app.yaml, line 1 at r3 (raw file):
apiVersion: v1
Add a comment explaining what this manifest is for.
components/gcp-click-to-deploy/kf_app.yaml, line 40 at r3 (raw file):
image: gcr.io/kubeflow-images-public/bootstrapper:latest workingDir: /opt/bootstrap command: [ "/opt/kubeflow/bootstrapper"]
nit put args into command; command should just be a list containing the binary to run and then the command line arguments.
components/gcp-click-to-deploy/kf_app.yaml, line 44 at r3 (raw file):
"--in-cluster", "--namespace=kubeflow", "--apply",
What is the --apply argument doing shouldn't we be invoking it in daemon mode?
components/gcp-click-to-deploy/src/DeployForm.tsx, line 408 at r1 (raw file):
Previously, yebrahim (Yasser Elsayed) wrote…
I don't think this will work here. Cluster creation + kubeflow deployment will take several minutes, so we can't just issue this call right away, we'll need some experience for waiting.
If this code is here as a placeholder, can we either comment it out or gate it on a condition? I can work with it as I'm editing the waiting experience.
Is this fixed?
components/gcp-click-to-deploy/src/DeployForm.tsx, line 30 at r3 (raw file):
interface DeployFormState {
Using an argument and specifying the YAML seems good to me.
Per our discussion I think all the logic for deploying things should now move into the go backend; but we can do that in a follow on PR. |
… yaml per user's need
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jlewi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
…via backend service (kubeflow#1255) * insert service account keys to GKE cluster * handle review feedbacks; add client id & secret fields; manage k8s resource through deployment manager; * Refactor: From now on web app create k8s resources through backend service. * rebase * patch yaml spec missing pieces, make backend URL configurable through yaml per user's need
[gcp-deployer]
Web app will integrate with a go backend service.
All future api calls dealing with GCP/k8s resources management show go through backend.
Backend service update will come in separate PRs.
This change is