-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK - Added support for raw input artifact argument values to ContainerOp #791
SDK - Added support for raw input artifact argument values to ContainerOp #791
Conversation
f2db334
to
7a5c0d2
Compare
How does this relate to @hongye-sun's proposal btw? |
This PR was added before @hongye-sun's doc was created. I created this PR to enable the "raw" artifact arguments (that Hongye wanted previously) in a way that's aligned with the future artifact passing support. I think that this design follows the Argo way. |
Can we put this PR onhold as we haven't closed on the artifact interface yet. |
/hold |
0dd0c3e
to
7bd74cf
Compare
text_input_path = '/inputs/text/data' | ||
return dsl.ContainerOp( | ||
name='component_with_input_artifact', | ||
input_artifact_paths={'text': text_input_path}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are input_artifact_paths and input_artifact_arguments needed? Can we field for all input artifacts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not expose full argo input_artifacts spec here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not expose full argo input_artifacts spec here?
We are moving away from hard-coding any artifact storage information in the pipeline Argo YAML. This makes the pipelines more portable which is a common request from customers and will become more and more important.
Argo is moving in the same direction where all artifact repository information is configured in cluster-side configMaps.
Generally I do not want to expose more than we currently need, especially features that can be hard to support with different artifact passing systems (volumes, Airflow, Docker, etc)
Let's look at at Argo's spec for input artifacts: https://github.com/argoproj/argo/blob/master/pkg/apis/workflow/v1alpha1/types.go#L308
- name - Supported
- path - Supported
- mode - Not desirable: Not requested at this moment. Hard to support on GCS, passive volumes etc.
- from - N/A: Not supported for containers
- ArtifactLocation - Not desirable to expose: We want to prevent hard-coding any artifact repository information at the component level. The same effect can be archived at the argument level instead.
- globalName - Not desirable: Not requested at this moment. We do not want components to directly affect any global state.
- ArchiveStrategy - Not desirable to expose: Too Argo-specific. Bad compatibility with other data storage systems.
- optional - Might be desirable.
So, apart from name
and path
which I'm already including in this PR, there is only one additional field that can be desirable - optional
.
Would you like me to replace
input_artifact_paths={'text': text_input_path},
with
input_artifacts=[InputArtifactSpec(name='text', path=text_input_path)],
?
I'll be happy to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why input_artifact_arguments is needed? Why not combine it with input_artifact_paths into one?
I still think that we should support full argo's artifact feature and ContainerOp is a low level API which is argo specific. I am fine to expose partial of the argo's feature for now, but I don't want to limit the interface to be too opinionated. Maybe we should call out a design meeting to discuss this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why input_artifact_arguments is needed? Why not combine it with input_artifact_paths into one?
As a side note: please try giving more actionable feedback. If you want something to be different, provide a mockup of how you want it to should look like and what is the behavior that you want.
They are two different concepts: "input declaration" and "argument passing".
Both of these concepts involve inputs/parameters, but these concepts are very different.
These two lines are not the same:
# This line has input declarations
def my_func(param1: int = 3, param2 : str = 'bla'):
# This line has argument passing
my_func(param1=10, param2='bar')
Also you can only declare function parameters once, but you can pass arguments to it many times.
Argo has the same distinction:
Please check the relevant Argo examples: https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml
This is Argo input artifact definition:
inputs:
artifacts:
- name: message
path: /tmp/message
This is Argo artifact argument passing:
template: print-message
arguments:
artifacts:
- name: message
from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"
These are two separate concepts, two separate parts and Argo does not mix them.
How can you even mix them when the same template is used multiple times? In this case there is a single input definition section and multiple argument passing sections. How can you merge them together?
P.S. Combining input definitions and artifact passing is invalid in Argo for valid and obvious reasons. This is invalid:
container:
inputs:
artifacts:
- name: message
path: /tmp/message
from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I prefer the feedback to be actionable.
Also It's nice to have small PRs, so it's good when acting upon feedback does not increase the scope and size of the PR. We had cases where reviewers' requests have changed a small and focused PR to a 5000+ lines monster. It would be really great if we can keep the PR scopes as small as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by the interface, so I asked the question about the diff between input_artifact_arguments and input_artifact_paths and why do you need both? I also suggested to have a design discussion about the confusion here. I am not sure why you think it's non-actionable feedback.
Now I understand the diff from you last comment, but I still think it's unnecessary in DSL. Why do you need user to provide the name of the input artifact and duplicate it every time when they declare an input? The same reason applies why we don't ask user to declare each input parameters when they define the ContainerOp. For example, you can combine them into one field like following:
return dsl.ContainerOp(
name='component_with_input_artifact',
input_artifacts={ '/path/to/artifact1': artifact_value},
The name of the input artifact can be auto-generated. When converting to argo template, the compiler can take care of generating the template and argument parts separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, you can combine them into one field like following:
Thank you for actionable suggestion.
The name of the input artifact can be auto-generated.
Hmm. What about auto-generating the path instead?
input_artifacts={'artifact1': artifact_value},
How does that look to you?
Should the input_artifacts
be a list of a dictionary?
Should we make some data type to describe the input artifact entry?
E.g.
input_artifacts=[InputArtifact(path='/path/to/artifact1', argument=artifact_value)],
The same reason applies why we don't ask user to declare each input parameters when they define the ContainerOp.
ContainerOp
is a somewhat strange mixture of Argo's ContainerTemplate
and DagTask
. It has parts of both (although some parts are missing). This has already brought us some problems especially when Ning was implementing type checking. (To check argument types, you need to know the input types and for that you need inputs. Ning had to introduce new decorator and do some tricks just to recover the information about inputs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the input artifact can be auto-generated. When converting to argo template, the compiler can take care of generating the template and argument parts separately.
This makes sense and I've implemented it. I even simplified it further (path and name can also be specified, but they're optional).
return dsl.ContainerOp(
name='consumer',
image='alpine',
command=['cat', dsl.InputArtifactArgument("Hello world")],
)
Do you mind sharing your design about how the full artifact DSL API will look like? |
Sure. Here is the link demonstrating the whole DSL (producing, consuming, passing): #336 (comment) (Also included in this PR's description). |
…of the constructor.
In some scenarios it's hard to provide the artifact arguments through the `command` list when it already has resolved artifact paths.
as opposed to default input values.
1af6b7c
to
6bd95bd
Compare
Also renamed input_artifact_arguments to artifact_argument_paths in the ContainerOp's constructor
582b6cd
to
4ab43ec
Compare
getattr is too fragile and can be broken by renames.
Talked with Hongye offline a while ago. I still consider this to be a step towards the important goal of supporting artifact passing, so maybe someone can help with the review. |
/cc @numerology @ajchili |
/assign @numerology @ajchili |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit pick questions before an lgtm.
Added the test_input_path_placeholder_with_constant_argument test
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…ubeflow#837) * handle DSL for loop item separator * handle compilation for separator * add tests for loop with separator * style: self -> cls in classmethods * fix: dsl+compile * update test results * style: remove unused import * style: blank lines * add license to tests * fix tests: no value passing of str loop src
* m: typo in a comment * remove separator param in loop spec * update test
Only constant string artifact arguments are supported for now. (no artifact passing yet.)
Usage:
or
Full artifact passing DSL:
This change is