-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Struct Construct to allow for Complex types #160
Conversation
This is awesome, and exactly what I was hoping for with structs! I'd like to raise for discussion (but happy if it gets pushed back for now) the idea of default values in structs, possibly based on other values. As a motivating case, here's a possible solution to the perennial "bam and index" problem: struct BamAndIndex {
File bam
File index = basename(bam, "bam") + ".bai"
}
task foo {
# inputs...
# command...
output {
BamAndIndex out = { bam = "out.bam" }
}
} |
@cjllanwarne this was something I was actually debating. I am not 100% certain whether expressions should be evaluated within a struct. I felt as though they should simply be a type definition with no business logic in them. Additionally adding "defaults" raises the question of how a struct would be defined within json. At the moment there is an open discussion of whether or not hardcoded values should be overridable by the json value. the current answer is they are not in other constructs. In your example above then, if I wanted to override the location of the index file, i would not be able to. Another concern that I have is that a Struct is a new Concrete type. If default values worked properly it would mean you could leave out memembers of the struct in favor of their default. This seems odd to me, for example you would never define a json for a {
"myWf.pair_input": {"left":"value"}
} This would constitute an incomplete definition of the type. similarily in a struct definition, leaving out any of its members seems like an incomplete definition to me. But I am open for opinions on what other people think |
@patmagee Is the intention here to replace |
@geoffjentry That would be the ultimate goal, but at the same time I think there should be a period of deprecation for the |
+1 for this proposal. Patrick's initial post exactly defines what I'd like, and is equivalent in functionality to CWL's records. As much as I also want an approach for BAM bai indices, I favor the simpler initial proposal and let the index support be a separate discussion. Thanks much for putting this together. |
@patmagee re deprecation, my $0.02 is that that's what |
@geoffjentry I tend to agree with you on that. I still think it might be worth Having others weigh in on the |
I agree re keeping them as static constraints on the type formerly known as Objects (i.e. no defaults!). I'm indifferent whether we remove or deprecate Some questions:
struct Tree {
Tree? lhs
Tree? rhs
}
MyStruct s = { "field1": 10, "field2": 20 }
MyBigStruct b = { "field1": 10, "field2": 20 }
MySmallStruct s = b # Only needs "field1" from MyBigStruct
|
I will try to answer your questions in order
#wdl 1
struct MyStruct {
....
}
#wdl 2
import wdl_1 as helpful_types
task a {
helpful_types.MyStruct b
}
struct MyStruct {
File a
File b
String name
Int age
}
###
MyStruct s = {"a": "gs://..."," b": "gs://...", "name": "name", "age": 25}
|
What if something like this happens:
Now from D's point of view, it might have some results of type Thinking about how other languages deal with this, I think it's perhaps neater to have something like: import MyStruct from useful_types.wdl And remove any namespace qualification on types as they appear further down the WDL. What do you think? |
@cjllanwarne I like how that deobfuscates the situation, but it would change the way how imports work. Maybe @geoffjentry can wiegh in on this |
@cjllanwarne this is probably not very solvable through a change in syntax, my feeling is that it will be up to the implementor to search the depedency / import tree to define equivalent Struct objects and view them as equivalent regardless of how they are referred. ie |
I think my personal preference would be to remove qualifications and namespacing from struct definitions altogether. If you import a file we add the set of structs in it to a current set of available structs, and if we import two clashing structs (ie two imported structs have the same name but different fields, then that's an invalid workflow) |
👍 +1 this proposal, I have been waiting for this for a while! Some specifics:
I am for removing the Object type from draft-3. It is hard to use, and is misleading users. |
@cjllanwarne I am not a fan of invalidating a workflow because of a clash in the naming of two Structs. I really hope that people publish libraries of common workflows/tasks/structs etc, and in this case, an end user would not necessarily have any control over the naming schema of two of their imported libraries. I also feel like this is not necessarily strictly an issue for the I wonder if for First pass its good enough to have structs a member of the imported wdl's namespace like it is currently is, but maybe propose in a separate PR a new way to define Import statements? |
versions/draft-3/SPEC.md
Outdated
@@ -1515,6 +1522,130 @@ workflow wf { | |||
|
|||
In this example, the fully-qualified names that would be exposed as workflow outputs would be `wf.task1.results`, `wf.altname.value`. | |||
|
|||
## Struct Definition | |||
A struct is a C-like construct which enables the user to create new compound types that consist of previously existing types. Structs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up use of which
and that
in the same sentance
versions/draft-3/SPEC.md
Outdated
} | ||
``` | ||
|
||
You can also use compound types within a struct to easily encapsulate them within a single object. For example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reword to avoid the word "you"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @katevoss might disagree with you! FWIW I personally much prefer examples in second-person rather than convoluted passive-voice sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjllanwarne is right! We encourage documentarians to address their readers directly with the second-person in a friendly yet clear manner.
As our coffee mugs say,
It's more important to be nearly right and understandable than to be academically accurate and unintelligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So votes on language? Should we keep the voice referring to the user or move to a passive one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the current form
versions/draft-3/SPEC.md
Outdated
|
||
```wdl | ||
struct name { | ||
Arrray[File]+ myFiles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrrrrr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only valid on the 19th September
### Using a Struct | ||
When using a struct in the declaration section of either a `workflow` or a `task` or `output` section you define them in the same way you would define any other type. | ||
|
||
For example, if I have a struct like the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reword to avoid "I"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I prefer this to passive-voicing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the "I" construct sounds super awkward in a spec document. Think more lawyer-ly, less folks-y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can’t speak to what one should normally write in a spec document but just keep in mind that passive voice and other lawyerly constructs are a leading source of confusion and difficulty, especially for non-native English speakers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can’t speak to what one should normally write in a spec document but just keep in mind that passive voice and other lawyerly constructs are a leading source of confusion and difficulty, especially for non-native English speakers.
versions/draft-3/SPEC.md
Outdated
input: | ||
a = a | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra nl
versions/draft-3/SPEC.md
Outdated
In order to access members within a struct, use object notation; ie `myStruct.myName`. If the uderlying member is a complex type which supports member access, | ||
you can access its elements in the way defined by that specific type. | ||
|
||
for example if we have a struct like the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reword to avoid "we"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to the you/passive thing, I do find bouncing between "you" and "we" to be odd.
versions/draft-3/SPEC.md
Outdated
Map[String,String] myMap | ||
} | ||
``` | ||
ssing the nth element of myFiles would look like: `myStruct.myFiles[n]`, while indexing an element in myMap would look like `myStruct.myMap["entry"]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bring the "Acce" from line 1647 back here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add an example of access when one has an Array[myStruct] and the n-th element in myFiles in the m-th myStruct is being accessed (to point out the brackets before the .
)
versions/draft-3/SPEC.md
Outdated
Map[String,String] myMap | ||
} | ||
``` | ||
ssing the nth element of myFiles would look like: `myStruct.myFiles[n]`, while indexing an element in myMap would look like `myStruct.myMap["entry"]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, you should probably have a myStruct variable that you are accessing, not the struct itself, right?
versions/draft-3/SPEC.md
Outdated
|
||
for example if we have a struct like the following: | ||
```wdl | ||
struct myStruct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use more different names to the example struct's, and perhaps keep to FirstCap in type names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "Types names must be TitleCase
" is a good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yfarjoun will do for examples.
@cjllanwarne I talked to @geoffjentry I think, UpperCase first letter should be convention > enforced. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to be in the "if it's a convention that everyone is expected to follow, it might as well be enforced" camp (eg Java class names might as well be enforced UpperCase
at this point IMO).
I don't see a big cost to enforcing UpperCase. Every other type name is UpperCase
so people are already keyed to the idea that that's what types look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My $0.02 is that convention is fine unless it's problematic for defining in parsing/etc.
I get the allusions to other languages but at the end of the day WDL is being marketed as a language for people who don't want to be using programming languages. If it just starts taking every trope from popular prog langs, why would users bother using it when they could just go over to e.g. python?
``` | ||
|
||
#### Struct Assignment from Object Literal | ||
Structs can be assigned using an object literal. When Writing the object, all entries must conform or be coercible into the underlying type they are being assigned to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that the only way to assign to a struct? Could one assign to it "slowly"?
For example:
MyStruct a
a.myName="John"
a.myAge=30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say you cannot assign slowly, at least not at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this. WDL values are immutable and I think they should stay that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. in that case, please also include an example of building a complex struct using other (previously defined) values:
Array[File] someFiles=["file1.txt","file2.txt"]
Int someAge=30
struct MyStruct{
Array[File] files
Int age
}
MyStruct a={"files"=someFiles "age"=someAge}
Also given that the key/value pair has quotes on the key, it is possible to parametrize the assignments:
Array[File] someFiles=["file1.txt","file2.txt"]
Int someAge=30
String key1="files"
String key2="age"
struct MyStruct{
Array[File] files
Int age
}
MyStruct a={key1=someFiles key2=someAge}
? if not, I'd advocate for removing the quotes from the keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea of removing quotes from the keys, not least because it makes sure keys are valid identifiers rather than arbitrary strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geoffjentry I would be interested to know what is currently allowable syntax with Object Literal Notation. Does it need to be JSON like? Is it possible to remove the quotes from keys under the current spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I also prefer immutability.
@patmagee That's a good question. I realized yesterday that object isn't actually defined anywhere, period. It also doesn't help that AFAIK Broad folks aren't using object so it's not something we've seen a lot. And of course there's the "goodness only knows what Cromwell chooses to do w/ objects" aspect :)
It looks like the isObjectLiteral
function in wdl4s
is used absolutely nowhere which makes me wonder if there's any support for it at all.
@patmagee I agree with you about failure, that's probably a bit too harsh! Ok, so here's what I'm trying to avoid: the difference from previous My objection is not so much about resolution (although I do think that would be very tricky to get right) as it is about readability. Eg. I don't think this is a good thing to aim for (especially if we have to use the long type name multiple times throughout): import foo.wdl
workflow a {
call foo.fooTask
output {
foo.subfoo.gatk.gatkHelpers.gatkTypes.MyStruct s = fooTask.out
}
} Now imagine someone importing I think something like this will be much more flexible, readable, and be a much better support for composability: import foo.wdl
import MyStruct from gatkTypes.wdl as GatkStruct
workflow a {
call foo.fooTask
output {
GatkStruct s = fooTask.out
}
} Yes, we still need to know that |
@cjllanwarne I don't think this necessarily solves the issue, it just adds a way to easily overcome it. We should still probably allow for the "Chained" method for completeness. if users really want to do that and not properly import a struct, then it's probably their prerogative to do so |
versions/draft-3/SPEC.md
Outdated
`Object` type in many circumstances and enables proper typing of its members. | ||
|
||
Structs are defined using the `struct` keyword and have a single section consiting of typed declarationns. They are evaluated first prior | ||
to workflows and tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's implied here, but I think we should be explicit (especially since @yfarjoun already wrote one inline in one of the examples...):
"Structs are declared separately, outside of any
workflow
ortask
definition". They are part of the namespace of the WDL file they are written in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I think making this explicit is quite important
versions/draft-3/SPEC.md
Outdated
can then be used within a `Task` or `Workflow` definition as a declaration in place of any other normal types. The struct takes the place of the | ||
`Object` type in many circumstances and enables proper typing of its members. | ||
|
||
Structs are defined using the `struct` keyword and have a single section consiting of typed declarationns. They are evaluated first prior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consiting->consisting
@patmagee A shower thought I had this morning - I'm reversing course on my opinion on removal of object in this PR as that would be the first backwards incompatible change in |
@patmagee if enough naive WDL authors take up their prerogative to write long chains of imports, that creates a huge disincentive to the authors of the base WDL to refactor. Eg
FWIW this is the exact reason we made subworkflows pass through their inputs and declare their outputs explicitly - to avoid these long brittle chains of |
@geoffjentry I am okay with leaving the Object in for now. As I mentioned before, I feel like it is out of the scope of this PR and probably should be separately looked at and debated. @cjllanwarne I see what you are saying, and I definitely agree with you, I just don't know how to properly implement it? When I import another subworkflow or any WDL for that matter, would we need to explicitly declare All The other way to think about this is to go back to your original suggestion and import all structs to the global namespace, but try to figure out a way to better resolve conflicts (Which the farther down this rabbit hole we get might be the best way to go). Maybe the best way is actually to fail the workflow as you suggested and force the user to alias the failing Struct. If this was the case then we should have to only Alias the conflicting types, and not all types contained in the WDL. import "https://exmaple.com/exampl.wdl" as ns alias Bam as Bam2 alias Fastq as Fastq2 |
@patmagee that long list of imports was what I was proposing (just like scala) but I see your point... 😄 I like your idea of importing everything flat and aliasing only if required because it removes the burden of thinking about this completely until a collision occurs (and if base-WDL authors are conscientious and well-reviewed that should hopefully be rare). |
@cjllanwarne @geoffjentry if a few more people weigh in on the importing suggestion (maybe aliasing) I will update the PR |
I like the idea of flat import semantics, due to its conceptual simplicity. Aliasing is a good solution when collisions occur. |
Double +1 good. I can't wait to have these for CWL record interoperability. |
alias GrandChild as GrandChild2 | ||
``` | ||
|
||
Its important to note, that when importing from file 2, all structs from file 2's global namespace will be imported. This Includes structs from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erroneous capital I
on Includes
I'm really excited to implement this, I think it's gonna be a huge QoL improvement for WDL authors. 👍 |
👍 |
1 similar comment
👍 |
👍 |
1 similar comment
👍 |
I submit you'd get a richer, coded and tested by other people, implementation for this feature if you adopt the protobuf definition for this type. The complex types are naturally represented in a .proto IDL definition, and so can be known types to the WDL script. So the complex structures can be be straightforwardly prepared by the workflow caller and accepted and used in the task, no matter their implementation languages, and neatly interpreted and used as output. Even the code of taking the binary format and creating a scala object is already written. --David |
I actually like the Struct proposal a lot, as I think it fits the existing WDL typing system very organically while enabling highly flexible hierarchical data types (and we need those desperately to support transition of our on-prem workflows into Workbench).
|
@mohawkTrail @dinvlad Note that now that voting is active, people are voting on the proposal as-is. If you have issues with the proposal which you think should block adoption, please vote against it. |
@patmagee A half hearted 👍 - I'd have still liked to see some of the edits requested in the PR, but since the options are to vote yay or nay as is I dont view them as blockers |
@geoffjentry I think this is likely to be a first pass at structs of you will. There are a lot of concern s which have been raised by @dinvlad and @mohawkTrail however i think some of those concerns (ie how o puts are defined) are questions for the engine implementation and not the language specification. |
I agree with @patmagee, and with @geoffjentry. The WDL definition of struct is and should be consonant with WDL syntax and semantics (immutability for example). My suggestion was on implementation, and should be invisible in WDL--the difference being in the serialization and deserialization, and ease-of-use in the caller's, tasks', and output consumers' languages. If we were to pursue this at another time, we might permit alternate input and output formats for the WDL struct, and include a tool for taking a .proto definition, verifying it corresponds to a valid WDL struct, and producing the struct's WDL declaration for import. |
👍 |
In a previous discussion in the Cromwell github page under this issue broadinstitute/cromwell#2283 the concept of revamping the
Object
type was brought up.An
Object
Is supposed to be a JSON like type, however its functionally very hard to interact with and use. Type coercions for nested members in an object are never reliable, Caching is next to impossible, and doing actions such as localizing or delocalizing files from within the object are not supported. There was a clear desire for users to have a more complex JSON like with some sort of typing which theObject
type did not support.Enter the
Struct
.A struct, similar to its C counterpart, is a construct which allows users to define their own Types dynamically, directly wihtin a WDL file by combining preexisting types into a single new named type.
For example, if I had a set of patients, and for each patient I had several FASTQ files, the patient id, patient age, and a list of siblings for that patient, there is no way to easily pass that information around from task to task. However with Structs this would be easy.
And then if I wanted to use the new type in a workflow or task I could do the following:
By using structs, file localization and delocalization should be supported, as well as being able to properly type each field present within the struct. This removes any sort of ambiguity that exists with the
Object
type