Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SageMaker on Flyte: TrainingJob for training with built-in algorithms and basic HPOJob support [Alpha] #120

Merged
merged 88 commits into from
Jul 31, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
f80f9e7
adding trainingjob model and sagemaker task
bnsblue Jun 3, 2020
db31d1c
adding models for sagemaker proto messages
bnsblue Jun 4, 2020
55420b8
add new line at eof
bnsblue Jun 4, 2020
0edef9c
adding common trainingjob task
bnsblue Jun 9, 2020
0e98520
redo flytekit changes to comply with new interface and proto definition
bnsblue Jun 22, 2020
5fa5d40
Fix a logic bug in training job model. Adding SdkSimpleTrainingJobTas…
bnsblue Jun 22, 2020
d588e3e
Add a comment
bnsblue Jun 22, 2020
54c5d2e
Add SdkSimpleHPOJobTask
bnsblue Jun 22, 2020
27a9ba4
Remove the embedding of the underlying trainingjob's output from the …
bnsblue Jun 22, 2020
e40e033
fix a typo
bnsblue Jun 22, 2020
56ba534
add new line at eof
bnsblue Jun 22, 2020
a38657d
adding custom training job sdk type
bnsblue Jun 22, 2020
a081fa0
add code for tranlating an enum in hpo_job model; fix hpo_job_task sd…
bnsblue Jun 23, 2020
e38674c
missing a colon
bnsblue Jun 23, 2020
18172ea
add the missing input stopping_condition for training job tasks
bnsblue Jun 24, 2020
f08be91
bump flyteidl version
bnsblue Jun 24, 2020
1df9b63
bump to a beta version
bnsblue Jun 25, 2020
635f19c
merge with master and bump version accordingly
bnsblue Jun 25, 2020
7e8ff64
fixing unit tests
bnsblue Jun 25, 2020
643708a
fixing unit tests
bnsblue Jun 25, 2020
fcb3dee
replacing interface types
bnsblue Jun 25, 2020
669ff39
change
Jun 25, 2020
186e977
fixed training job unit test
bnsblue Jun 25, 2020
f65d97f
fix hpo job task interface and hide task type from users
bnsblue Jun 25, 2020
babefc1
Merge branch 'add-sagemaker-trainingjob-hpojob' of github.com:lyft/fl…
bnsblue Jun 25, 2020
f0d37cd
fix hpo job task interface
bnsblue Jun 25, 2020
20b9809
fix hpo models
bnsblue Jun 25, 2020
008cf38
fix serialization of the underlying trainingjob of a hpo job
bnsblue Jun 25, 2020
2f274d6
Expose training job as a parameter
Jun 25, 2020
ac15490
Working!
Jun 25, 2020
a7d4cdd
replacing hyphens with underscores
bnsblue Jun 26, 2020
43a42a0
updated
Jun 26, 2020
5204e9d
bug fix
Jun 26, 2020
aff9055
Sagemaker nb
Jun 26, 2020
4f3a3c1
Sagemaker HPO
Jun 29, 2020
edfd2a6
remove .demo directory
Jun 29, 2020
b0bbca2
Merge branch 'master' into add-sagemaker-trainingjob-hpojob
Jun 30, 2020
ba40a9b
Merge branch 'master' into add-sagemaker-trainingjob-hpojob
Jun 30, 2020
345e057
register and launch standalone trainingjob task
bnsblue Jul 9, 2020
9d5f243
Merge
EngHabu Jul 9, 2020
5ba0483
Complete the examples in sagemaker-hpo notebook and add text descript…
bnsblue Jul 14, 2020
acab1a9
update notebook
bnsblue Jul 14, 2020
70a4954
all hands demo notebook added
bnsblue Jul 15, 2020
7f28f26
update a notebook
bnsblue Jul 15, 2020
aaa2056
update the demo notebook
bnsblue Jul 16, 2020
044f7b5
adding unit test for SdkSimpleHPOJobTask
bnsblue Jul 21, 2020
e45ccf7
failing the unit test
bnsblue Jul 22, 2020
1aa3a7c
failing the unit test
bnsblue Jul 22, 2020
4e8f923
wip for custom training job
bnsblue Jul 23, 2020
e82e242
Revert "wip for custom training job"
bnsblue Jul 23, 2020
c725a1d
Revert "failing the unit test"
bnsblue Jul 23, 2020
dcc33cd
Revert "failing the unit test"
bnsblue Jul 23, 2020
9982aa7
fixing unit tests
bnsblue Jul 23, 2020
70472ac
bump minor version
bnsblue Jul 23, 2020
a081557
preventing installing numpy==1.19.0 which introduces a breaking chang…
bnsblue Jul 23, 2020
6f5003f
fix semver
bnsblue Jul 23, 2020
adcc3b5
Merge
EngHabu Jul 24, 2020
a463f3f
make changes corresponding to flyteidl changes (renaming hpo to hyper…
bnsblue Jul 24, 2020
58b1d5e
bump beta version
bnsblue Jul 24, 2020
c32045b
Merge branch 'add-sagemaker-trainingjob-hpojob' of github.com:lyft/fl…
bnsblue Jul 24, 2020
6d14fe6
Delete config.yaml
EngHabu Jul 24, 2020
6f29de6
sagemaker-hpo notebook update
bnsblue Jul 24, 2020
f659a47
Merge branch 'add-sagemaker-trainingjob-hpojob' of github.com:lyft/fl…
bnsblue Jul 24, 2020
06c254e
make changes to reflect changes in flyteidl
bnsblue Jul 27, 2020
da9fc56
make task name consistent
bnsblue Jul 28, 2020
cd70862
add missing properties for hyperparameter models
bnsblue Jul 28, 2020
998aac0
add missing type hints and remove unused imports
bnsblue Jul 28, 2020
585d5ab
remove unused sdk sagemaker dir
bnsblue Jul 28, 2020
f77d7f1
remove unused test file
bnsblue Jul 28, 2020
5b912bb
revert numpy semver
bnsblue Jul 28, 2020
02800df
removing notebooks
bnsblue Jul 28, 2020
16867fb
remove type hints for self because CI is using python 3.6.3 while __f…
bnsblue Jul 28, 2020
7c6461e
complete docstrings for hpo job task
bnsblue Jul 28, 2020
5429d3d
merging with master and resolve conflict
bnsblue Jul 28, 2020
e6345ae
fix unit test
bnsblue Jul 28, 2020
d4008d8
adding input_file_type (wip)
bnsblue Jul 29, 2020
af64b52
add input file type support
bnsblue Jul 29, 2020
363beff
add docs
bnsblue Jul 29, 2020
21f3a5d
reflecting the renamed type and field
bnsblue Jul 30, 2020
8b0c1f4
reflecting remove of libsvm content type
bnsblue Jul 30, 2020
d09f37f
reflecting remove of libsvm content type
bnsblue Jul 30, 2020
76b2f0f
Give metric_definitions a None as the default value because built-in …
bnsblue Jul 31, 2020
0eee0b3
nix a print statement
bnsblue Jul 31, 2020
84a6972
nix custom training job for the current release
bnsblue Jul 31, 2020
4cd76d3
rename SdkSimpleTrainingJobTask to SdkBuiltinAlgorithmTrainingJobTask
bnsblue Jul 31, 2020
050f50c
merge with master and bump minor version
bnsblue Jul 31, 2020
62e6af6
revert setup.py dependency
bnsblue Jul 31, 2020
7dbac57
add back existing notebooks
bnsblue Jul 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions flytekit/models/sagemaker/training_job.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,15 +163,15 @@ def __init__(
self,
algorithm_name: int,
algorithm_version: str,
metric_definitions: List[MetricDefinition],
input_mode: int,
metric_definitions: List[MetricDefinition] = None,
input_content_type: int = InputContentType.TEXT_CSV,
):
self._input_mode = input_mode
self._input_content_type = input_content_type
self._algorithm_name = algorithm_name
self._algorithm_version = algorithm_version
self._metric_definitions = metric_definitions
self._metric_definitions = metric_definitions or []

@property
def input_mode(self) -> int:
Expand Down Expand Up @@ -213,7 +213,12 @@ def metric_definitions(self) -> List[MetricDefinition]:
"""
A list of metric definitions for SageMaker to evaluate/track on the progress of the training job
See this: https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html
and this: https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-metrics.html

Note that, when you use one of the Amazon SageMaker built-in algorithms, you cannot define custom metrics.
If you are doing hyperparameter tuning, built-in algorithms automatically send metrics to hyperparameter tuning.
When using hyperparameter tuning, you do need to choose one of the metrics that the built-in algorithm emits as
the objective metric for the tuning job.
See this: https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-metrics.html
:rtype: List[MetricDefinition]
"""
return self._metric_definitions
Expand Down
8 changes: 6 additions & 2 deletions tests/flytekit/unit/sdk/tasks/test_sagemaker_tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@
input_content_type=InputContentType.TEXT_CSV,
algorithm_name=AlgorithmName.XGBOOST,
algorithm_version="0.72",
metric_definitions=[MetricDefinition(name="Validation error", regex="validation:error")]
),
)

Expand Down Expand Up @@ -96,6 +95,8 @@ def test_simple_training_job_task():
assert simple_training_job_task.metadata.discoverable is False
assert simple_training_job_task.metadata.discovery_version == ''
assert simple_training_job_task.metadata.retries.retries == 0
assert "metricDefinitions" not in simple_training_job_task.custom["algorithmSpecification"].keys()

ParseDict(simple_training_job_task.custom['trainingJobResourceConfig'],
_pb2_TrainingJobResourceConfig) # fails the test if it cannot be parsed

Expand Down Expand Up @@ -129,6 +130,8 @@ def test_simple_training_job_task():


def test_simple_hpo_job_task():
print(simple_xgboost_hpo_job_task.custom["trainingJob"])
bnsblue marked this conversation as resolved.
Show resolved Hide resolved

assert isinstance(simple_xgboost_hpo_job_task, SdkSimpleHyperparameterTuningJobTask)
assert isinstance(simple_xgboost_hpo_job_task, _sdk_task.SdkTask)
# Checking if the input of the underlying SdkTrainingJobTask has been embedded
Expand Down Expand Up @@ -176,7 +179,8 @@ def test_simple_hpo_job_task():
assert simple_xgboost_hpo_job_task.metadata.retries.retries == 2

assert simple_xgboost_hpo_job_task.metadata.deprecated_error_message == ''

assert "metricDefinitions" in simple_xgboost_hpo_job_task.custom["trainingJob"]["algorithmSpecification"].keys()
assert len(simple_xgboost_hpo_job_task.custom["trainingJob"]["algorithmSpecification"]["metricDefinitions"]) == 1
""" These are attributes for SdkRunnable. We will need these when supporting CustomTrainingJobTask and CustomHPOJobTask
assert simple_xgboost_hpo_job_task.task_module == __name__
assert simple_xgboost_hpo_job_task._get_container_definition().args[0] == 'pyflyte-execute'
Expand Down