-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SageMaker on Flyte: TrainingJob for training with built-in algorithms and basic HPOJob support [Alpha] #120
Conversation
Codecov Report
@@ Coverage Diff @@
## master #120 +/- ##
==========================================
+ Coverage 80.83% 80.86% +0.02%
==========================================
Files 219 225 +6
Lines 14313 14678 +365
Branches 1195 1205 +10
==========================================
+ Hits 11570 11869 +299
- Misses 2460 2526 +66
Partials 283 283
Continue to review full report at Codecov.
|
…hpojob's interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - will approve after the final version is in.
from flytekit.common.constants import SdkTaskType | ||
|
||
|
||
class SdkSimpleTrainingJobTask(_sdk_task.SdkTask): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does Simple mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this meant to be built in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- meaning using built-in algorithm mode where users don't write his/her own decorated function.
This looks awesome, two major comments
|
Looks good to me, but we should get plugin merged before this |
… and basic HPOJob support [Alpha] (flyteorg#120) * adding trainingjob model and sagemaker task * adding models for sagemaker proto messages * add new line at eof * adding common trainingjob task * redo flytekit changes to comply with new interface and proto definition * Fix a logic bug in training job model. Adding SdkSimpleTrainingJobTask type * Add a comment * Add SdkSimpleHPOJobTask * Remove the embedding of the underlying trainingjob's output from the hpojob's interface * fix a typo * add new line at eof * adding custom training job sdk type * add code for tranlating an enum in hpo_job model; fix hpo_job_task sdk task * missing a colon * add the missing input stopping_condition for training job tasks * bump flyteidl version * bump to a beta version * fixing unit tests * fixing unit tests * replacing interface types * change * fixed training job unit test * fix hpo job task interface and hide task type from users * fix hpo job task interface * fix hpo models * fix serialization of the underlying trainingjob of a hpo job * Expose training job as a parameter * Working! * replacing hyphens with underscores * updated * bug fix * Sagemaker nb * Sagemaker HPO * remove .demo directory * register and launch standalone trainingjob task * Merge * adding unit test for SdkSimpleHPOJobTask * fixing unit tests * preventing installing numpy==1.19.0 which introduces a breaking change for unit tests * fix semver * make changes corresponding to flyteidl changes (renaming hpo to hyperparameter tuning) * bump beta version * Delete config.yaml * make changes to reflect changes in flyteidl * make task name consistent * add missing properties for hyperparameter models * add missing type hints and remove unused imports * remove unused sdk sagemaker dir * remove unused test file * revert numpy semver * remove type hints for self because CI is using python 3.6.3 while __future__.annotations requires python 3.7 * complete docstrings for hpo job task * fix unit test * adding input_file_type (wip) * add input file type support * add docs * reflecting the renamed type and field * reflecting remove of libsvm content type * reflecting remove of libsvm content type * Give metric_definitions a None as the default value because built-in algorithm does not allow custom metrics * nix a print statement * nix custom training job for the current release * rename SdkSimpleTrainingJobTask to SdkBuiltinAlgorithmTrainingJobTask * revert setup.py dependency Co-authored-by: Yee Hing Tong <[email protected]> Co-authored-by: Ketan Umare <[email protected]> Co-authored-by: Haytham AbuelFutuh <[email protected]>
TL;DR
This PR adds the necessary definitions for basic support of SageMaker TrainingJob and HPOJob (Hyperparameter Optimization)
Type
Are all requirements met?
Complete description
This PR adds the basic support for users to invoke SageMaker TrainingJob (built-in algorithm mode) and SageMaker HPOJob from within Flyte. The follow paragraphs demonstrates the supported use cases.
Defining a Simple Training Job
Users can leverage SageMaker's powerful built-in algorithms easily without needing to write any function or logic. They can simply define a SdkBuildinAlgorithmTrainingJobTask in Flytekit and supplies the settings and the spec of the built-in algorithm as follows:
Defining a simple Hyperparameter Tuning Job
SageMaker-on-Flyte also supports easy chaining between a TrainingJob task and a hpo job. After users define a TrainingJob task, he/she may want to apply hyperparameter tuning to the the training, while also maintaining the flexibility to run the TrainingJob task standalone. This should be easily doable by using the SdkSimpleHPOJobTask in our SDK. SdkSimpleHPOJobTask accepts the definition of a TrainingJob task as a part of the spec.
Invoking Training Jobs Task and Hyperparameter Tuning Jobs Task
Invoking Training Job Tasks and HPO Job Tasks from inside a Flyte workflow is pretty much the same as invoking other types of tasks. You should be able to achieve this by simply supplying the required inputs to the task definition. For Training Job Tasks and HPO Job Tasks , required inputs are pre-defined inputs that we think is needed for every training job. That's why you don't see the declaration of these inputs in the task definition -- we added them for you in our SDK.
Flyte's Single Task Execution capability also makes it easy to invoke a SimpleTrainingJobTask and SimpleHPOJobTask. That is, users do not need a workflow to launch the SageMaker tasks; instead, they can simply define the tasks, and then register and launch the tasks standalone, which enables fast iterations.
Tracking Issue
flyteorg/flyte#255
Follow-up issue
flyteorg/flyte#431