SDK - Added support for raw input artifact argument values to ContainerOp #791

Ark-kun · 2019-02-07T07:30:11Z

Only constant string artifact arguments are supported for now. (no artifact passing yet.)

Usage:

def consumer_op(text):
    return dsl.ContainerOp(
        name='consumer',
        image='alpine',
        command=['cat', dsl.InputArtifactArgument(text)],
    )

or

def consumer_op(text):
    return dsl.ContainerOp(
        name='consumer',
        image='alpine',
        command=['cat', '/tmp/inputs/text/data'],
        input_artifact_arguments=[
            dsl.InputArtifactArgument(argument=text, path='/tmp/inputs/text/data'), # path is optional
        ],
    )

Full artifact passing DSL:

# OP producing artifacts (already available)
def producer_op(text):
    return dsl.ContainerOp(
        name='producer',
        image='alpine',
        command=['sh', '-c', 'echo ' + text + ' > /tmp/output.txt'],
        #output_artifact_paths={'text-artifact': '/tmp/output.txt'}, # Use file_outputs
        file_outputs={'text-artifact': '/tmp/output.txt'},
    )

# Op consuming artifacts (this PR)
def consumer_op(text_artifact):
    return dsl.ContainerOp(
        name='consumer',
        image='alpine',
        command=['cat', dsl.InputArtifactArgument(text)],
    )

# Pipeline with artifact passing (non in this PR)
@pipeline()
def my_pipeline():
    producer_task = producer_op('Hello world!')
    consumer_task = consumer_op(producer_task.outputs['text-artifact'])

This change is

neuromage · 2019-02-26T20:33:06Z

How does this relate to @hongye-sun's proposal btw?

Ark-kun · 2019-02-28T00:19:56Z

How does this relate to @hongye-sun's proposal btw?

This PR was added before @hongye-sun's doc was created. I created this PR to enable the "raw" artifact arguments (that Hongye wanted previously) in a way that's aligned with the future artifact passing support.

I think that this design follows the Argo way.

hongye-sun · 2019-02-28T19:48:17Z

Can we put this PR onhold as we haven't closed on the artifact interface yet.

hongye-sun · 2019-03-06T00:45:42Z

/hold

hongye-sun · 2019-05-16T23:34:14Z

sdk/python/tests/compiler/testdata/input_artifact_raw_value.py

+    text_input_path = '/inputs/text/data'
+    return dsl.ContainerOp(
+        name='component_with_input_artifact',
+        input_artifact_paths={'text': text_input_path},


Why are input_artifact_paths and input_artifact_arguments needed? Can we field for all input artifacts?

Why not expose full argo input_artifacts spec here?

Why not expose full argo input_artifacts spec here?

We are moving away from hard-coding any artifact storage information in the pipeline Argo YAML. This makes the pipelines more portable which is a common request from customers and will become more and more important.
Argo is moving in the same direction where all artifact repository information is configured in cluster-side configMaps.

Generally I do not want to expose more than we currently need, especially features that can be hard to support with different artifact passing systems (volumes, Airflow, Docker, etc)

Let's look at at Argo's spec for input artifacts: https://github.com/argoproj/argo/blob/master/pkg/apis/workflow/v1alpha1/types.go#L308

name - Supported

path - Supported

mode - Not desirable: Not requested at this moment. Hard to support on GCS, passive volumes etc.

from - N/A: Not supported for containers

ArtifactLocation - Not desirable to expose: We want to prevent hard-coding any artifact repository information at the component level. The same effect can be archived at the argument level instead.

globalName - Not desirable: Not requested at this moment. We do not want components to directly affect any global state.

ArchiveStrategy - Not desirable to expose: Too Argo-specific. Bad compatibility with other data storage systems.

optional - Might be desirable.

So, apart from name and path which I'm already including in this PR, there is only one additional field that can be desirable - optional.

Would you like me to replace
input_artifact_paths={'text': text_input_path},
with
input_artifacts=[InputArtifactSpec(name='text', path=text_input_path)],
?

I'll be happy to do so.

Could you explain why input_artifact_arguments is needed? Why not combine it with input_artifact_paths into one?

I still think that we should support full argo's artifact feature and ContainerOp is a low level API which is argo specific. I am fine to expose partial of the argo's feature for now, but I don't want to limit the interface to be too opinionated. Maybe we should call out a design meeting to discuss this.

Could you explain why input_artifact_arguments is needed? Why not combine it with input_artifact_paths into one?

As a side note: please try giving more actionable feedback. If you want something to be different, provide a mockup of how you want it to should look like and what is the behavior that you want.

They are two different concepts: "input declaration" and "argument passing".
Both of these concepts involve inputs/parameters, but these concepts are very different.
These two lines are not the same:

# This line has input declarations def my_func(param1: int = 3, param2 : str = 'bla'):

# This line has argument passing my_func(param1=10, param2='bar')

Also you can only declare function parameters once, but you can pass arguments to it many times.

Argo has the same distinction:
Please check the relevant Argo examples: https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml

This is Argo input artifact definition:

inputs: artifacts: - name: message path: /tmp/message

This is Argo artifact argument passing:

template: print-message arguments: artifacts: - name: message from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"

These are two separate concepts, two separate parts and Argo does not mix them.

How can you even mix them when the same template is used multiple times? In this case there is a single input definition section and multiple argument passing sections. How can you merge them together?

P.S. Combining input definitions and artifact passing is invalid in Argo for valid and obvious reasons. This is invalid:

container: inputs: artifacts: - name: message path: /tmp/message from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"

Generally I prefer the feedback to be actionable.
Also It's nice to have small PRs, so it's good when acting upon feedback does not increase the scope and size of the PR. We had cases where reviewers' requests have changed a small and focused PR to a 5000+ lines monster. It would be really great if we can keep the PR scopes as small as possible.

I am confused by the interface, so I asked the question about the diff between input_artifact_arguments and input_artifact_paths and why do you need both? I also suggested to have a design discussion about the confusion here. I am not sure why you think it's non-actionable feedback.

Now I understand the diff from you last comment, but I still think it's unnecessary in DSL. Why do you need user to provide the name of the input artifact and duplicate it every time when they declare an input? The same reason applies why we don't ask user to declare each input parameters when they define the ContainerOp. For example, you can combine them into one field like following:

return dsl.ContainerOp( name='component_with_input_artifact', input_artifacts={ '/path/to/artifact1': artifact_value},

The name of the input artifact can be auto-generated. When converting to argo template, the compiler can take care of generating the template and argument parts separately.

For example, you can combine them into one field like following:

Thank you for actionable suggestion.

The name of the input artifact can be auto-generated.

Hmm. What about auto-generating the path instead?

input_artifacts={'artifact1': artifact_value},

How does that look to you?

Should the input_artifacts be a list of a dictionary?
Should we make some data type to describe the input artifact entry?
E.g.

input_artifacts=[InputArtifact(path='/path/to/artifact1', argument=artifact_value)],

The same reason applies why we don't ask user to declare each input parameters when they define the ContainerOp.

ContainerOp is a somewhat strange mixture of Argo's ContainerTemplate and DagTask. It has parts of both (although some parts are missing). This has already brought us some problems especially when Ning was implementing type checking. (To check argument types, you need to know the input types and for that you need inputs. Ning had to introduce new decorator and do some tricks just to recover the information about inputs.)

The name of the input artifact can be auto-generated. When converting to argo template, the compiler can take care of generating the template and argument parts separately.

This makes sense and I've implemented it. I even simplified it further (path and name can also be specified, but they're optional).

return dsl.ContainerOp( name='consumer', image='alpine', command=['cat', dsl.InputArtifactArgument("Hello world")], )

hongye-sun · 2019-05-16T23:37:17Z

Do you mind sharing your design about how the full artifact DSL API will look like?

Ark-kun · 2019-05-17T01:18:06Z

Do you mind sharing your design about how the full artifact DSL API will look like?

Sure. Here is the link demonstrating the whole DSL (producing, consuming, passing): #336 (comment) (Also included in this PR's description).
Passing artifacts uses the same syntax as passing parameters.

…of the constructor.

In some scenarios it's hard to provide the artifact arguments through the `command` list when it already has resolved artifact paths.

as opposed to default input values.

…usion

Also renamed input_artifact_arguments to artifact_argument_paths in the ContainerOp's constructor

getattr is too fragile and can be broken by renames.

Ark-kun · 2019-08-29T00:48:19Z

Talked with Hongye offline a while ago.
Our priorities have shifted and he's pretty busy having a lot of work on his hands, so it might not be the best use of his time to spend it on reviewing this PR. Fortunately, Hongye has agreed that someone else can review this PR if they have time.

I still consider this to be a step towards the important goal of supporting artifact passing, so maybe someone can help with the review.
/assign @numerology @ajchili
/hold cancel

Ark-kun · 2019-08-29T00:49:04Z

/cc @numerology @ajchili

Ark-kun · 2019-08-29T00:49:26Z

/assign @numerology @ajchili

ajchili

Small nit pick questions before an lgtm.

sdk/python/kfp/compiler/_op_to_template.py

sdk/python/kfp/dsl/_container_op.py

Added the test_input_path_placeholder_with_constant_argument test

ajchili · 2019-08-29T02:04:09Z

/lgtm

Ark-kun · 2019-08-29T03:38:51Z

/approve

k8s-ci-robot · 2019-08-29T03:38:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Ark-kun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/OWNERS~~ [Ark-kun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…ubeflow#837) * handle DSL for loop item separator * handle compilation for separator * add tests for loop with separator * style: self -> cls in classmethods * fix: dsl+compile * update test results * style: remove unused import * style: blank lines * add license to tests * fix tests: no value passing of str loop src

* m: typo in a comment * remove separator param in loop spec * update test

Ark-kun added the area/sdk/dsl/compiler label Feb 7, 2019

Ark-kun assigned hongye-sun Feb 7, 2019

Ark-kun requested a review from hongye-sun February 7, 2019 07:30

k8s-ci-robot requested a review from gaoning777 February 7, 2019 07:30

k8s-ci-robot added size/L labels Feb 7, 2019

Ark-kun force-pushed the SDK---Added-support-for-raw-artifact-values-to-ContainerOp branch from f2db334 to 7a5c0d2 Compare February 7, 2019 07:34

Ark-kun removed the request for review from gaoning777 February 7, 2019 07:39

Ark-kun requested review from gaoning777, IronPan and neuromage February 26, 2019 19:42

k8s-ci-robot added the do-not-merge/hold label Mar 6, 2019

Ark-kun changed the title ~~SDK - Added support for raw artifact values to ContainerOp~~ SDK - Added support for raw input artifact argument values to ContainerOp Mar 22, 2019

Ark-kun mentioned this pull request May 14, 2019

Artifact representation in Pipeline #1003

Closed

Ark-kun force-pushed the SDK---Added-support-for-raw-artifact-values-to-ContainerOp branch 2 times, most recently from 0dd0c3e to 7bd74cf Compare May 14, 2019 22:55

Ark-kun assigned gaoning777 May 16, 2019

hongye-sun reviewed May 16, 2019

View reviewed changes

Ark-kun mentioned this pull request May 17, 2019

Support for Argo artifacts #336

Closed

Ark-kun mentioned this pull request Jul 25, 2019

Trouble Moving file from one pipeline to the next #1662

Closed

Ark-kun added 4 commits August 27, 2019 23:27

SDK - Added support for raw artifact values to ContainerOp

ea04348

ContainerOp now gets artifact artguments from command line instead …

6fa15fd

…of the constructor.

Added back input_artifact_arguments to the ContainerOp constructor.

c532b76

In some scenarios it's hard to provide the artifact arguments through the `command` list when it already has resolved artifact paths.

Exporting InputArtifactArgument from kfp.dsl

665aa20

Ark-kun added 3 commits August 27, 2019 23:56

Updated the sample

9aa5da7

Properly passing artifact arguments as task arguments

bc2d916

as opposed to default input values.

Renamed input_artifact_arguments to artifact_arguments to reduce conf…

6bd95bd

…usion

Ark-kun force-pushed the SDK---Added-support-for-raw-artifact-values-to-ContainerOp branch from 1af6b7c to 6bd95bd Compare August 28, 2019 07:55

Renamed InputArtifactArgument to InputArgumentPath

4ab43ec

Also renamed input_artifact_arguments to artifact_argument_paths in the ContainerOp's constructor

Ark-kun force-pushed the SDK---Added-support-for-raw-artifact-values-to-ContainerOp branch from 582b6cd to 4ab43ec Compare August 28, 2019 22:55

Replaced getattr with isinstance checks.

3d4829e

getattr is too fragile and can be broken by renames.

k8s-ci-robot removed the do-not-merge/hold label Aug 29, 2019

k8s-ci-robot requested review from ajchili and numerology August 29, 2019 00:49

k8s-ci-robot assigned ajchili and numerology Aug 29, 2019

ajchili reviewed Aug 29, 2019

View reviewed changes

sdk/python/kfp/compiler/_op_to_template.py Outdated Show resolved Hide resolved

sdk/python/kfp/compiler/_op_to_template.py Show resolved Hide resolved

sdk/python/kfp/dsl/_container_op.py Outdated Show resolved Hide resolved

Ark-kun added 2 commits August 28, 2019 18:22

Fixed the type annotations

16de4dc

Unlocked the input artifact support in components

4744e92

Added the test_input_path_placeholder_with_constant_argument test

k8s-ci-robot added the lgtm label Aug 29, 2019

k8s-ci-robot added the approved label Aug 29, 2019

k8s-ci-robot merged commit 0fc68bb into kubeflow:master Aug 29, 2019

HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this pull request Mar 11, 2024

fix(sdk) separator param. Part of kubeflow#791 (kubeflow#842)

eab7b4c

* m: typo in a comment * remove separator param in loop spec * update test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SDK - Added support for raw input artifact argument values to ContainerOp #791

SDK - Added support for raw input artifact argument values to ContainerOp #791

Ark-kun commented Feb 7, 2019 •

edited

Loading

neuromage commented Feb 26, 2019

Ark-kun commented Feb 28, 2019

hongye-sun commented Feb 28, 2019

hongye-sun commented Mar 6, 2019

hongye-sun May 16, 2019

hongye-sun May 16, 2019

Ark-kun May 17, 2019 •

edited

Loading

hongye-sun May 20, 2019

Ark-kun May 22, 2019 •

edited

Loading

Ark-kun May 22, 2019

hongye-sun May 22, 2019

Ark-kun May 23, 2019 •

edited

Loading

Ark-kun Jun 19, 2019

hongye-sun commented May 16, 2019

Ark-kun commented May 17, 2019 •

edited

Loading

Ark-kun commented Aug 29, 2019

Ark-kun commented Aug 29, 2019

Ark-kun commented Aug 29, 2019

ajchili left a comment

ajchili commented Aug 29, 2019

Ark-kun commented Aug 29, 2019

k8s-ci-robot commented Aug 29, 2019

SDK - Added support for raw input artifact argument values to ContainerOp #791

SDK - Added support for raw input artifact argument values to ContainerOp #791

Conversation

Ark-kun commented Feb 7, 2019 • edited Loading

neuromage commented Feb 26, 2019

Ark-kun commented Feb 28, 2019

hongye-sun commented Feb 28, 2019

hongye-sun commented Mar 6, 2019

hongye-sun May 16, 2019

Choose a reason for hiding this comment

hongye-sun May 16, 2019

Choose a reason for hiding this comment

Ark-kun May 17, 2019 • edited Loading

Choose a reason for hiding this comment

hongye-sun May 20, 2019

Choose a reason for hiding this comment

Ark-kun May 22, 2019 • edited Loading

Choose a reason for hiding this comment

Ark-kun May 22, 2019

Choose a reason for hiding this comment

hongye-sun May 22, 2019

Choose a reason for hiding this comment

Ark-kun May 23, 2019 • edited Loading

Choose a reason for hiding this comment

Ark-kun Jun 19, 2019

Choose a reason for hiding this comment

hongye-sun commented May 16, 2019

Ark-kun commented May 17, 2019 • edited Loading

Ark-kun commented Aug 29, 2019

Ark-kun commented Aug 29, 2019

Ark-kun commented Aug 29, 2019

ajchili left a comment

Choose a reason for hiding this comment

ajchili commented Aug 29, 2019

Ark-kun commented Aug 29, 2019

k8s-ci-robot commented Aug 29, 2019

Ark-kun commented Feb 7, 2019 •

edited

Loading

Ark-kun May 17, 2019 •

edited

Loading

Ark-kun May 22, 2019 •

edited

Loading

Ark-kun May 23, 2019 •

edited

Loading

Ark-kun commented May 17, 2019 •

edited

Loading