Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tekton tasks and pipelines to run notebooks and generate reports #613

Closed
jlewi opened this issue Feb 18, 2020 · 13 comments
Closed

Tekton tasks and pipelines to run notebooks and generate reports #613

jlewi opened this issue Feb 18, 2020 · 13 comments

Comments

@jlewi
Copy link
Contributor

jlewi commented Feb 18, 2020

Kubeflow relies heavily on notebooks for reporting and e2e testing.

We'd like to make it easier to run notebooks and generate reports for them.

One way to do that would be to create reusable Tekton tasks to run notebooks (e.g. using papermill) and then upload an HTML version of the output to an object store like GCS or S3.

Then users could easily automate the running and generation of notebook reports just by adding tasks to a Tekton pipeline.

The task should be parameterized so that users just have to specify the parameters defining their notebook. Possible parameters

inputs:

  • Git repo containing the source notebook (using a Tekton Git resource)
  • Path within the repo to the notebook

Outputs:

  • Location where report should be published.

We should define a catalog of tasks inside kubeflow/testing and put the task there. Similar to
https://github.com/tektoncd/catalog.

kubeflow/examples#735 provides an example of using papermill to execute a notebook, then running nbconvert to convert to html, and then uploading it to GCS. That could be as a model of what a Tekton task might do.

As a follow on issue we might want to investigate using commuter to view these notebooks

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/feature 0.91

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented Feb 18, 2020

/cc @gabrielwen

@jlewi
Copy link
Contributor Author

jlewi commented Mar 12, 2020

I think @gabrielwen is working on this.

@gabrielwen any update on this?

@sarahmaddox
Copy link
Contributor

/area gsoc

@StephennFernandes
Copy link

I would like to work on this issue as a part of GSOC 2020

@11fenil11
Copy link

@sarahmaddox I would like to contribute on this issue during GSoC journey.

@jtfogarty
Copy link

/priority p1

@jlewi
Copy link
Contributor Author

jlewi commented Jun 9, 2020

#622 included some initial tekton tasks and pipelines for running our E2E tests.
I'm currently trying to get those running against our auto deployed blueprints per:
GoogleCloudPlatform/kubeflow-distribution#42

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/engprod 0.88

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 10, 2020

Here's the current notebook task

The use of init containers is a bit brittle and leads to passing around repo references and duplicating git logic best handled by Tekton.

I think a better approach is

  • Tekton pipeline (which runs on the test cluster) builds a docker image which includes any source
    repos needed

  • We fire off the K8s job to run the notebook using this docker image so that the k8s job doesn't need to use init containers and pull from git.

jlewi pushed a commit to jlewi/examples that referenced this issue Jun 10, 2020
* Changes pulled in from kuueflow/examples#764

* Notebook tests should print a link to the stackdriver logs for
  the actual notebook job.

* Related to kubeflow/testing#613
k8s-ci-robot pushed a commit to kubeflow/examples that referenced this issue Jun 15, 2020
* Changes pulled in from kuueflow/examples#764

* Notebook tests should print a link to the stackdriver logs for
  the actual notebook job.

* Related to kubeflow/testing#613

Co-authored-by: Gabriel Wen <[email protected]>
@jlewi
Copy link
Contributor Author

jlewi commented Jun 18, 2020

/cc @rmgogogo @Bobgy

jlewi pushed a commit to jlewi/testing that referenced this issue Jun 19, 2020
* kubeflow#613 currently the way we run notebook tests is
  by firing off a K8s job on the KF cluster which runs the notebook.

  * The K8s job uses init containers to pull in source code and install
    dependencies like papermill.

  * This is a bit brittle.

* To fix this we will instead use Tekton to build a docker image that
  takes the notebook image and then adds the notebook code to it.

* Dockerfile.notebook_runner dockerfile to build the test image.

* Add tekton tasks to build the image.
jlewi pushed a commit to jlewi/testing that referenced this issue Jun 24, 2020
Notebook tests should build a docker image to run the notebook in.

* kubeflow#613 currently the way we run notebook tests is
  by firing off a K8s job on the KF cluster which runs the notebook.

  * The K8s job uses init containers to pull in source code and install
    dependencies like papermill.

  * This is a bit brittle.

* To fix this we will instead use Tekton to build a docker image that
  takes the notebook image and then adds the notebook code to it.

  * Dockerfile.notebook_runner dockerfile to build the test image.

The pipeline to run the notebook consists of two tasks

  1. A Tekton Task to build a docker image to run the notebook in

  1. A tekton task that fires off a K8s job to run the notebook on the Kubeflow cluster.

Here's a list of changes to make this work

* tekton_client should provide methods to upload artifacts but not parse
  junits

* Add a tekton_client method to construct the full image URL based on
  the digest returned from kaniko

* Copy over the code for running the notebook tests from kubeflow/examples
and start modifying it.

* Create a simple CLI to wait for nomos to sync resources to the cluster
  * This is used in some syntactic sugar make rules to aid the dev-test loop

The mnist test isn't completing successfully yet because GoogleCloudPlatform/kubeflow-distribution#61 means the KF
deployments don't have proper GSA's to write to GCS.

Related to: kubeflow#613
jlewi pushed a commit to jlewi/testing that referenced this issue Jun 26, 2020
Notebook tests should build a docker image to run the notebook in.

* kubeflow#613 currently the way we run notebook tests is
  by firing off a K8s job on the KF cluster which runs the notebook.

  * The K8s job uses init containers to pull in source code and install
    dependencies like papermill.

  * This is a bit brittle.

* To fix this we will instead use Tekton to build a docker image that
  takes the notebook image and then adds the notebook code to it.

  * Dockerfile.notebook_runner dockerfile to build the test image.

The pipeline to run the notebook consists of two tasks

  1. A Tekton Task to build a docker image to run the notebook in

  1. A tekton task that fires off a K8s job to run the notebook on the Kubeflow cluster.

Here's a list of changes to make this work

* tekton_client should provide methods to upload artifacts but not parse
  junits

* Add a tekton_client method to construct the full image URL based on
  the digest returned from kaniko

* Copy over the code for running the notebook tests from kubeflow/examples
and start modifying it.

* Create a simple CLI to wait for nomos to sync resources to the cluster
  * This is used in some syntactic sugar make rules to aid the dev-test loop

The mnist test isn't completing successfully yet because GoogleCloudPlatform/kubeflow-distribution#61 means the KF
deployments don't have proper GSA's to write to GCS.

Related to: kubeflow#613
k8s-ci-robot pushed a commit that referenced this issue Jun 29, 2020
* Revamp how Tekton pipelines to run notebooks work.

Notebook tests should build a docker image to run the notebook in.

* #613 currently the way we run notebook tests is
  by firing off a K8s job on the KF cluster which runs the notebook.

  * The K8s job uses init containers to pull in source code and install
    dependencies like papermill.

  * This is a bit brittle.

* To fix this we will instead use Tekton to build a docker image that
  takes the notebook image and then adds the notebook code to it.

  * Dockerfile.notebook_runner dockerfile to build the test image.

The pipeline to run the notebook consists of two tasks

  1. A Tekton Task to build a docker image to run the notebook in

  1. A tekton task that fires off a K8s job to run the notebook on the Kubeflow cluster.

Here's a list of changes to make this work

* tekton_client should provide methods to upload artifacts but not parse
  junits

* Add a tekton_client method to construct the full image URL based on
  the digest returned from kaniko

* Copy over the code for running the notebook tests from kubeflow/examples
and start modifying it.

* Create a simple CLI to wait for nomos to sync resources to the cluster
  * This is used in some syntactic sugar make rules to aid the dev-test loop

The mnist test isn't completing successfully yet because GoogleCloudPlatform/kubeflow-distribution#61 means the KF
deployments don't have proper GSA's to write to GCS.

Related to: #613

* tekton_client.py can't use format strings yet because we are still running under python2.

* Remove f-style strings.

* Fix typo.

* Address PR comments.

* * copy-buckets should not abort on error as this prevents artifacts
  from being copied and thus the results from showing up in testgrid
  see #703
@stale
Copy link

stale bot commented Sep 17, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in one week if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Sep 26, 2020

This issue has been closed due to inactivity.

@stale stale bot closed this as completed Sep 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants