Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Kubeflow operator and API #52

Closed
wants to merge 0 commits into from
Closed

[WIP] Kubeflow operator and API #52

wants to merge 0 commits into from

Conversation

inc0
Copy link

@inc0 inc0 commented Apr 2, 2018

There were multiple discussions regarding CLI, and one of solutions to
address common use cases would be to create full-fledged API endpoint.

This change is rough first attempt to describe API and architecture of
such service


This change is Reviewable

Copy link
Member

@ScorpioCPH ScorpioCPH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great work!

steps like uploading code or training model. This control plane then could expose API that can be leveraged by various programming languages, CLI,
plugins etc.

## Goals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any user case for this proposal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in second PR

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal!

I have some questions about the implementation.

Central point to all this would be running http server + service that would provide endpoint to API. Spawning this service would be part of ksonnet apply.

* Storage backend will be required - whether it's PV-based or S3, service will need persistent file backend for things like code, logs or trained model checkpoints.
* User would specify backend framework on spawn (multi-backend is tbd). Depending on which backend is chosen, control-service would wrap given framework operator and make decision regarding tooling (like "if tensorflow then monitoring=tensorboard).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question here:

  • As you know, we defines different kinds of CRDs, then how to implement the logic about specifying the framework? If we implement a new kf-operator, could you add the API in this proposal?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, I'll think about it, also since I failed at github, let's use new PR #62

@jlewi
Copy link
Contributor

jlewi commented Apr 11, 2018

I know this is still WIP but I think making progress on defining common APIs that translate across different frameworks is going to be very difficult.

We already have an issue kubeflow/kubeflow#102 just trying to define a common API just for a predict method and that issue has basically gone nowhere. In particular, I don't see any consensus forming between the different frameworks (H20.AI, CMLE, Pipeline.AI, Seldon) about the API for a predict method.

Getting those folks to agree on a common API just for Predict is already a major undertaking.

This proposal is going way beyond that. For example, arriving at a common understanding of what a model is just as tricky. We already have a variety of formats; e.g.

  • SavedModelFormat for TensorFlow
  • Onyx for (used by a variety of formats)
  • Whatever Scikit models and Seldon use

I think its too early to try to define common APIs because we don't have solid examples illustrating the different use cases. Most of our examples are TensorFlow. We don't even have an XGBoost example yet.

Its also not clear to me that the fact that we don't have common APIs is a pain point yet. For example, a common Predict API seems useful if you want to try different models without having to change your client.

My suggestion would be to create an E2E example that can be solved with a DL framework and a non DL framework. For example, we could try predicting Zillow prices kubeflow/examples#16 with XGBoost and using a wide model in TF. I think that would help understand how much overlap there is between different frameworks.

Also I think an XGBoost example would be hugely impactful.

@jlewi
Copy link
Contributor

jlewi commented Apr 11, 2018

For model management kartib / model DB look pretty sweet.

How much functionality does that give us? I'd love to see model DB incorporated into our E2E examples (kubeflow/examples#81) so that we can get first hand experience of the value add and identify missing features.

@inc0
Copy link
Author

inc0 commented Apr 11, 2018

Thanks everyone for feedback! Let me incorporate it and add some more thoughts I got over creating PoC for CLI and will publish new version of this.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants