Skip to content
This repository has been archived by the owner on Jul 17, 2023. It is now read-only.

API update #69

Open
geomodular opened this issue Jun 17, 2021 · 2 comments
Open

API update #69

geomodular opened this issue Jun 17, 2021 · 2 comments
Assignees
Labels

Comments

@geomodular
Copy link
Collaborator

This issue reviews the current Wave ML API and proposes the next direction.

Motivation

ML libraries/APIs tends to have a different approach of creating and working with objects comparing to Wave ML. Take H2O-3 for example (to be clear it's not related just to H2O-3):

fit_h2o = H2OSingularValueDecompositionEstimator(export_checkpoints_dir=checkpoints_dir, seed=-5)
fit_h2o.train(x=list(range(4)), training_frame=arrests)

It instantiates an Estimator directly, i.e., it's not using a factory function. The idea is to follow a similar pattern so ML folks can stand on common ground.

Current status

A direct model instantiation is not available in Wave ML. To get a model, build_model() function needs to be called, which returns a common Model:

from h2o_wave_ml import build_model
...
m = build_model()
m.predict()

This function is great for several reasons:

  • The name implies functionality; it's clear what it is for.
  • It looks simple and readable in code.
  • Function can choose what model is appropriate for a given environment.

However, from the way build_model() is actual used, users tend to specify DAI/H2O-3 in build_model() anyway through model_type argument. This leaves the function rather baseless.

Also, the function might not be easy to pick up anymore due to a large amount of arguments (we have the code examples to mitigate this).

The last thing, IMO, It packs too many operations behind the scenes. It might be better to split it up.

Solution

We can keep the build_model() function and refactor code to support direct instantiation of models as well:

from h2o_wave_ml import DAIModel
...
m = DAIModel()
m.build()

This would also give us room to incorporate feature such as direct client access (in the roadmap):

from h2o_wave_ml import DAIModel
...
m = DAIModel()
dai_client = m.client
dai_client.datasets.create()

Or to actually split training process and support a prediction on standalone DAI:

from h2o_wave_ml import DAIModel
...
m = DAIModel()
m.build()
predictions = m.predict()  # Predictions made on standalone DAI.
m.deploy()
predictions = m.predict()  # Predictions made on deployment.

This way, we might be able to integrate h2o_wave_ml.utils into API more fluently. Since we are not working with a common Model anymore, we can pick and enable features for a particular model:

from h2o_wave_ml import DAIModel
...
m = DAIModel()
m.build()
file_path = m.save_autodoc()

Notes

  • This clearly is a refactoring task. It might take some time.
  • It would be good to do a quick POC to identify problems.
  • Refactor should not touch an existing API.

Thanks @vopani for the early discussion.

Comments are welcomed.

cc @lo5 @mmalohlava @mtanco @tkmaker @AakashKumarNain

@geomodular
Copy link
Collaborator Author

geomodular commented Jul 15, 2021

I have a working concept on a branch: refactor/issue-69 for anyone who wants to try this out. There is no change to API that's not compatible with the current (v0.8.1) workflow.

What's added:

  • DAIModel and H2O3Model can be imported from h2o_wave_ml:
from h2o_wave_ml import H2O3Model
  • It can be instantiated afterward and .build() can be invoked:
m = H2O3Model()
m.build()
  • If the model is DAI, the client is accessible and autodoc is available from within object:
m = DAIModel()
client = m.get_dai_client()
client.datasets.create()  # Do whatever.
m = DAIModel()
m.build()
m.save_autodoc()

I hesitate to merge this, for now, as I'm not convinced this is the step in the right direction.

@vopani
Copy link
Contributor

vopani commented Jul 21, 2021

I played a bit with this. I think its great!
Find this structure far more simpler and intuitive to use since it follows the basic structure of most ML modelling pipelines.

I also like this because its very to pick up already existing codes (say that use a sklearn model) and just replace the model instantiation line with a Wave Model.

It's also as easy and seamless to use across environments (cloud vs local) as before, so I see no issues for enabling this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants