API update #69

geomodular · 2021-06-17T17:02:26Z

This issue reviews the current Wave ML API and proposes the next direction.

Motivation

ML libraries/APIs tends to have a different approach of creating and working with objects comparing to Wave ML. Take H2O-3 for example (to be clear it's not related just to H2O-3):

fit_h2o = H2OSingularValueDecompositionEstimator(export_checkpoints_dir=checkpoints_dir, seed=-5)
fit_h2o.train(x=list(range(4)), training_frame=arrests)

It instantiates an Estimator directly, i.e., it's not using a factory function. The idea is to follow a similar pattern so ML folks can stand on common ground.

Current status

A direct model instantiation is not available in Wave ML. To get a model, build_model() function needs to be called, which returns a common Model:

from h2o_wave_ml import build_model
...
m = build_model()
m.predict()

This function is great for several reasons:

The name implies functionality; it's clear what it is for.
It looks simple and readable in code.
Function can choose what model is appropriate for a given environment.

However, from the way build_model() is actual used, users tend to specify DAI/H2O-3 in build_model() anyway through model_type argument. This leaves the function rather baseless.

Also, the function might not be easy to pick up anymore due to a large amount of arguments (we have the code examples to mitigate this).

The last thing, IMO, It packs too many operations behind the scenes. It might be better to split it up.

Solution

We can keep the build_model() function and refactor code to support direct instantiation of models as well:

from h2o_wave_ml import DAIModel
...
m = DAIModel()
m.build()

This would also give us room to incorporate feature such as direct client access (in the roadmap):

from h2o_wave_ml import DAIModel
...
m = DAIModel()
dai_client = m.client
dai_client.datasets.create()

Or to actually split training process and support a prediction on standalone DAI:

from h2o_wave_ml import DAIModel
...
m = DAIModel()
m.build()
predictions = m.predict()  # Predictions made on standalone DAI.
m.deploy()
predictions = m.predict()  # Predictions made on deployment.

This way, we might be able to integrate h2o_wave_ml.utils into API more fluently. Since we are not working with a common Model anymore, we can pick and enable features for a particular model:

from h2o_wave_ml import DAIModel
...
m = DAIModel()
m.build()
file_path = m.save_autodoc()

Notes

This clearly is a refactoring task. It might take some time.
It would be good to do a quick POC to identify problems.
Refactor should not touch an existing API.

Thanks @vopani for the early discussion.

Comments are welcomed.

cc @lo5 @mmalohlava @mtanco @tkmaker @AakashKumarNain

The text was updated successfully, but these errors were encountered:

geomodular · 2021-07-15T10:33:40Z

I have a working concept on a branch: refactor/issue-69 for anyone who wants to try this out. There is no change to API that's not compatible with the current (v0.8.1) workflow.

What's added:

DAIModel and H2O3Model can be imported from h2o_wave_ml:

from h2o_wave_ml import H2O3Model

It can be instantiated afterward and .build() can be invoked:

m = H2O3Model()
m.build()

If the model is DAI, the client is accessible and autodoc is available from within object:

m = DAIModel()
client = m.get_dai_client()
client.datasets.create()  # Do whatever.

m = DAIModel()
m.build()
m.save_autodoc()

I hesitate to merge this, for now, as I'm not convinced this is the step in the right direction.

vopani · 2021-07-21T11:36:18Z

I played a bit with this. I think its great!
Find this structure far more simpler and intuitive to use since it follows the basic structure of most ML modelling pipelines.

I also like this because its very to pick up already existing codes (say that use a sklearn model) and just replace the model instantiation line with a Wave Model.

It's also as easy and seamless to use across environments (cloud vs local) as before, so I see no issues for enabling this.

geomodular added area/core Core code related issue type/design type/refactoring labels Jun 17, 2021

geomodular self-assigned this Jun 17, 2021

geomodular added a commit that referenced this issue Jul 9, 2021

refactor: make H2O3 model instantiable #69

9242d5e

geomodular added a commit that referenced this issue Jul 12, 2021

refactor: make DAI model to be instantiable #69

86b9518

geomodular added a commit that referenced this issue Jul 14, 2021

refactor: update build function #69

23b0e30

geomodular added a commit that referenced this issue Jul 15, 2021

fix: check for a DAI model availability #69

f3a03f2

geomodular added a commit that referenced this issue Jul 15, 2021

feat: add save autodoc feat to DAI model #69

ba52fdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API update #69

API update #69

geomodular commented Jun 17, 2021

geomodular commented Jul 15, 2021 •

edited

Loading

vopani commented Jul 21, 2021 •

edited

Loading

API update #69

API update #69

Comments

geomodular commented Jun 17, 2021

Motivation

Current status

Solution

Notes

geomodular commented Jul 15, 2021 • edited Loading

vopani commented Jul 21, 2021 • edited Loading

geomodular commented Jul 15, 2021 •

edited

Loading

vopani commented Jul 21, 2021 •

edited

Loading