Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-7327: Set certs on both creation and update of notebooks #373

Merged

Conversation

harshad16
Copy link
Member

@harshad16 harshad16 commented Jul 30, 2024

Description

Set certs on both creation and update of notebooks
Fixes: https://issues.redhat.com/browse/RHOAIENG-7327

How Has This Been Tested?

For testing, the setup would use the RHOAI operator instance:

  • Create a workbench instance in 2.9+ RHOAI version.

  • Once the workbench is created check the details
    Screenshot from 2024-08-12 01-56-52

  • after that set the trustedCABundle setup inside the DSCI instance

spec:
  trustedCABundle:
      customCABundle: ''
      managementState: Managed
  • Once the setup is set, update the DSC instance.
spec:
  components:
       workbenches:
          devFlags:
            manifests:
              - contextDir: components/odh-notebook-controller/config
                sourcePath: ''
                uri: 'https://github.com/opendatahub-io/kubeflow/tarball/pull/373/head'
              - contextDir: components/notebook-controller/config
                sourcePath: ''
                uri: 'https://github.com/opendatahub-io/kubeflow/tarball/pull/373/head'
              - contextDir: ''
                sourcePath: base
                uri: 'https://github.com/opendatahub-io/notebooks/tarball/v1.21.0'
          managementState: Managed
  • After the setup, Toggle the workbench with changes, either with Notebook CR change or restart(toggle start/stop)
  • Notice the change:
    Screenshot from 2024-08-12 01-56-52

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@harshad16
Copy link
Member Author

/hold

Working in progress

@jstourac
Copy link
Member

jstourac commented Jul 31, 2024

I haven't tested this in any way yet, but LGTM in general.

I'm trying to recall what were the reasons we didn't do it this way from the start? 🤔
Also, do we have any tracking issue for this? --- update: https://issues.redhat.com/browse/RHOAIENG-7327

err = CheckAndMountCACertBundle(ctx, w.Client, notebook, log)
if err != nil {
return admission.Errored(http.StatusInternalServerError, err)
}
Copy link
Member

@jstourac jstourac Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first impression of this is that this is triggered only when the Notebook CR is changed somehow. But the original issue is talking also about the plain workbench restart (https://issues.redhat.com/browse/RHOAIENG-7327) - so this change covers also this case?

Copy link
Member

@jiridanek jiridanek Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me, starting the notebook updates it, because starting/stopping (and apparently restarting!) is done through setting annotations on the CR

if metav1.HasAnnotation(instance.ObjectMeta, "kubeflow-resource-stopped") {

(I did not know the CR supports restarting, that is not exposed in the dashboard ui)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, currently, I have an ose-cli cronjob running every evening that scales down the statefulsets / stops the notebooks in all namespaces (cause users forget and we don't have long-running ones):

currentdatetimenotebook=$(date '+%Y-%m-%dT%H:%M:%SZ');
oc patch notebook $notebook -n "$ds_ns" --type="json" -p="[{\"op\": \"add\", \"path\": \"/metadata/annotations/kubeflow-resource-stopped\", \"value\":\"$currentdatetimenotebook\"}]";

Got that hint from dashboard workbench slider GUI ...

and yes, restarting is not part of dashboard logic currently on the part of notebooks, only start / stop i.e. omitting / deleting kubeflow-resource-stopped leads to start.

https://github.com/search?q=repo%3Aopendatahub-io%2Fodh-dashboard%20%22kubeflow-resource-stopped%22&type=code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct the goal is to update certs, in both cases
Notebook CR change or Restart (which is toggle start/stop).
This would take care of all the scenarios.

@harshad16
Copy link
Member Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Do not merge this PR label Aug 12, 2024
@harshad16 harshad16 changed the title Set certs on both creation and update of notebooks RHOAIENG-7327: Set certs on both creation and update of notebooks Aug 12, 2024
@harshad16
Copy link
Member Author

This is ready for review.

I'm trying to recall what were the reasons we didn't do it this way from the start?

The reason was, initially, as we started with this
#252,
the focus was to only include the certs of the RHOAI global certs.
And we didn't want to disrupt any long running notebook, if the cluster was not set with global certs.
as self-signed certs were not included in logic, only global certs.
with that case, it could have caused issue for long running notebooks.

Later, when we did the combination of both self-signed and global.
#270
with that, now we can implement this change for long running notebook as well.
as this now cover all grounds and would disrupt any flow.

@jiridanek
Copy link
Member

It's missing tests, as specified on the Jira ticket; do you want to create new follow-up ticket to add the tests later?

@atheo89
Copy link
Member

atheo89 commented Aug 14, 2024

/lgtm

@harshad16
Copy link
Member Author

It's missing tests, as specified on the Jira ticket; do you want to create new follow-up ticket to add the tests later?

Totally missed it, and missed updating this PR.
the new test is included, also updated the older test with some corrections.

result:

Screenshot from 2024-08-22 04-08-05
Screenshot from 2024-08-22 04-07-54

Copy link
Member

@jiridanek jiridanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few suggestions about the test documentation strings

Signed-off-by: Harshad Reddy Nalla <[email protected]>
Co-authored-by: Jiri Daněk <[email protected]>
@harshad16
Copy link
Member Author

Ready of review again

@jiridanek
Copy link
Member

/lgtm

@harshad16
Copy link
Member Author

/approve

Thanks for the review.

Copy link

openshift-ci bot commented Aug 26, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harshad16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 5d51c2b into opendatahub-io:v1.7-branch Aug 26, 2024
12 checks passed
@harshad16
Copy link
Member Author

/cherrypick stable

@openshift-cherrypick-robot

@harshad16: #373 failed to apply on top of branch "stable":

Applying: Set certs on both creation and update of notebooks
Using index info to reconstruct a base tree...
M	components/odh-notebook-controller/controllers/notebook_webhook.go
Falling back to patching base and 3-way merge...
Auto-merging components/odh-notebook-controller/controllers/notebook_webhook.go
Applying: Included Test case, testing certs update on update of notebook
Using index info to reconstruct a base tree...
M	components/odh-notebook-controller/controllers/notebook_controller_test.go
Falling back to patching base and 3-way merge...
Auto-merging components/odh-notebook-controller/controllers/notebook_controller_test.go
CONFLICT (content): Merge conflict in components/odh-notebook-controller/controllers/notebook_controller_test.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 Included Test case, testing certs update on update of notebook
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherrypick stable

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants