Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Enumerations #139

Open
katevoss opened this issue Sep 28, 2017 · 21 comments
Open

Support Enumerations #139

katevoss opened this issue Sep 28, 2017 · 21 comments

Comments

@katevoss
Copy link
Contributor

@danbills commented on Thu Jul 06 2017

It would be nice to restrict types to specific values so that users can only specify one of a limited number of a values.

Per this forum post


@cjllanwarne commented on Mon Jul 10 2017

Could be a nice bring-along with broadinstitute/cromwell#2283


@patmagee commented on Tue Aug 29 2017

Just wondering if there has been any discussion regarding this.


@katevoss commented on Tue Aug 29 2017

@patmagee not to my knowledge but I can start the ball rolling.
@geoffjentry & @mcovarr, any opinions on supporting enums?


@geoffjentry commented on Tue Aug 29 2017

I'm leery but not necessarily against it. I'm also generally the one with the most conservative opinion in terms of adding WDL syntax, so view that take as a lower bound of acceptance :)

@patmagee what were you thinking in terms of the syntax?

Pinging @vdauwera so she's abreast of this convo.


@vdauwera commented on Wed Aug 30 2017

FWIW Enum support would definitely be very valuable to us in GATK-world, and are likely to be useful in general.

My one caveat would be that they should be easier to work with than they were in Queue (friendly elbow poke at @kshakir).


@patmagee commented on Fri Sep 01 2017

@geoffjentry I share your concern with adding any sort of syntax to the spec, and my first inclination is to fake support enumeration. Im not exactly sure how to do that though, nor am I sure of whether its possible.

The cleanest way I can think of implementing would be maybe something like a Java style enum:

enum MyEnum {
     "A","B","C"
}

workflow wf {
    MyEnum thisIsMyEnum
}

Another way that we would be able to do it would be define an Enum type in a workflow like so:

workflow wf {
     #This would get overridden at run time, but the value would need to be validated
     Enum greeting = ["HELLO","GOODBYE"]
     or
     #Done override anything but validate it
     Enum["HELLO","GOODBYE"] greeting
     
}

@katevoss commented on Thu Sep 28 2017

@geoffjentry sounds like this is more of a WDL feature request, shall I move it to the WDL repo?


@geoffjentry commented on Thu Sep 28 2017

628b747f8ccdfb757062f36a27eedecfc2295f515c0586e05fbfb0620c0571a2

@patmagee
Copy link
Member

@geoffjentry any thoughts on this?

@SHuang-Broad
Copy link

Bump

@jdidion
Copy link
Collaborator

jdidion commented Aug 10, 2021

I'd like to propose an alternative syntax. There would be a single top-level enum section where all enums are defined:

enum {
  Int Foo = [1, 2, 3]
  String Status = ["Error", "Pending", "Success"]
}

An open question is how to handle importing of enums from one file to another. I'm guessing it would work like structs.

@patmagee
Copy link
Member

@jdidion this is an interesting suggestion. It would certainly cut down on the boilerplate, and limit how many new structures we add to the language.

I am starting to wonder if we are starting to conflate two similar but distinct concepts:

  • enumerated types
  • input validation.

An enum type should not necessarily have an actual wdl type assigned to its inner values. The values are simple enumerations, and an identifier can be used for them. In this case, the total set of values are clearly written out as easy to use identifier (like in most programming languages) and values like

enum Animal { CAT, DOG, MANITEE }

workflow {
  Input {
    Animal animal = Animal.MANITEE
  }
}

I think by saying that an enums value must be an identifier we greatly simplify the implementation of this and have an actual enum.

In the example you provided, I think you are proposing more so input validation (which could equally provide as much value). In this case, there is no first class Enum type, just a check at runtime that the value passed to a task or workflow is valid according to a simple set of rules (here it must be one of the values supplied). Because of this reassignment of the declaration completely looses its enum context

Workflow foo {
  enum {
     # here I know the allowable set of values
     Int my_val = [1, 2, 3,4]
  }
  # reassignment and I no longer know that this declaration is restricted. 
  Int my_val_2 = my_val
}

Since the "enum" quality of this declaration is only enforced within the enum block and not upon reassignment (it becomes just a normal int), this doesn't really feel like an enum to me.

We could alternatively think of adding a syntax for restricting the values or adding valdiation directly to a declaration

workflow foo {
   input {
     Int val {
       min: 100
       max: 20000
     }
     String s {
       values: ["foo", "fun"]
     }
}

@jdidion
Copy link
Collaborator

jdidion commented Aug 11, 2021

You're right - you should be able to refer to an enum value by its identifier. But typed enums are useful for many purposes. So let me propose the following:

enum Animal[String] {
  CAT = "meowser"
  DOG = "pupper"
  MANATEE = "seacow"
}

task pet {
  input {
    Animal animal = Animal.CAT
  }
  command {
    echo "I'm petting my ~{animal}"
  }
}

input.json

{
  "pet.animal": "CAT"
}

@patmagee
Copy link
Member

@jdidion in the task, would the value meowser be printed out or would CAT be printed out? Also what if you want to define an enum WITHOUT an inner type and just have the identifier?

@jdidion
Copy link
Collaborator

jdidion commented Aug 11, 2021

Yes, the former. To do the latter, you would do:

enum Animal[String] {
  CAT = "CAT"
  DOG = "DOG"
  MANATEE = "MANATEE"
}

but I guess we could provide a simplified form that would be equivalent to the above:

enum Animal { CAT, DOG, MANATEE }

i.e. the default type would be String and the default value would be the enum identifier.

@patmagee
Copy link
Member

@jdidion I wonder if we should start by introducing enum in the simplified syntax, and then see if there is the desire for the more complex (subtyped) value. It will be easy to make it more complex, but reigning in the complexity is a challenge

@jdidion
Copy link
Collaborator

jdidion commented Aug 12, 2021

Normally I'd agree, but there are already obvious use-cases for at least primitive-typed enums for enumerating/limiting the options available for some command line flags.

@patmagee
Copy link
Member

fair point. It seems at this point, there is adequate support to move forward with a potential implemetnation? Considering I was one of the original people requesting this, I would be happy to see it progress. If I have time next week I can come up with an RFC / PR @jdidion

@jdidion
Copy link
Collaborator

jdidion commented Aug 13, 2021

Sounds good! Feel free to tag me for review.

@geoffjentry
Copy link
Member

This might be where @jdidion was heading all along but I like his proposal as it makes a path towards more general ADTs easier if that ever was a desire

@jdidion
Copy link
Collaborator

jdidion commented Aug 13, 2021

It probably goes without saying (but I'll say it anyway) that enums can't be named the same as any other data type (or any reserved word for that matter).

@vortexing
Copy link

Can someone give me an example of what issue exists right now that would be solved by the addition of enum's in some form?

@jdidion
Copy link
Collaborator

jdidion commented Jan 3, 2022

@vortexing a common use case is a task parameter that only allows a limited set of values. For example, let's say my tool has a parameter --output-format that can have values sam and bam. Rather than my task having a String output_format parameter, it could have a FileType output_format parameter where FileType is an enum. This makes it more clear to the user what values are allowed.

@vortexing
Copy link

OK. So right now, someone could specify String output_format = 'dam' and the task would fail for some dam reason, rather than having the workflow fail earlier before the task was actually sent/run b/c 'dam' is not one of the FileType's allowed? That seems like a computational win to me when running someone else's workflow. Just as long as how it gets done is aimed at newbies b/c the intent is to be more clear to users who don't already know the enum values for such a param, I'm game.

@jdidion
Copy link
Collaborator

jdidion commented Jan 3, 2022

That's right - you'd have to explicitly validate the parameter value yourself rather than having the runtime do the validation for you.

@jvlasblom
Copy link

I'm also interested in having support for enums in WDL. It would help with consistent error reporting, and generating UIs based on declared WDL inputs.

@hkeward
Copy link

hkeward commented Dec 12, 2022

Agreed that enums would be very useful for input validation. It seems much cleaner and simpler to maintain than writing a validation task that parses the user's inputs.

@DavyCats
Copy link
Contributor

DavyCats commented Dec 13, 2022

Just as a suggestion for those who need an enum like construction, you could use a Map instead. The keys would be all the allowed inputs and the values could be the same or some translation (allows for some abstraction). If a user provides an incorrect input value it will cause an error, though maybe not as specific as a dedicated enum syntax would allow for.

eg.

workflow Test {
  input {
    #...
    String strand
  }
  # with translation:
  Map[String, String] htseqStrandOptions = {"FR": "yes", "RF": "reverse", "no": "no"}
  # or just to check for allowed values:
  Map[String, String] htseqStrandOptions = {"yes": "yes", "reverse": "reverse", "no": "no"}
  #...

  call htseqCount {
    input:
      #...
      strand = htseqStrandOptions[strand]
  }
  #...
}

Now if strand is defined as anything other than "FR", "RF" or "no" in the first case or "yes", "reverse" or "no" in the second case, the workflow will fail.

@jdidion
Copy link
Collaborator

jdidion commented Apr 5, 2023

Just re-reading this thread and @DavyCats suggestion gave me an idea. AFAIK, this is valid WDL:

struct MyEnum {
  String name,
  String value
}

workflow test {
  Object MyEnum = {
    FOO: MyEnum {
      name: "FOO",
      value: "FOO"
    },
    BAR: MyEnum {
      name: "BAR",
      value: "BAR"
    }
  }

  input {
    MyEnum e
  }

  if (e == MyEnum.FOO) {
    ...
  }
}

Using MyEnum as a variable name should be fine since MyEnum is a user-defined type and thus not a reserved word.

The above is not ideal since it requires the user to pass in a MyEnum instance rather than just an identifier (since the MyEnum declaration is private). And there is nothing stopping the user from creating a MyEnum instance that is not equal to any of the values in the MyEnum object. But I think it's pretty close to what we (or at least I) want.

So I think enum then just becomes a shorthand way of defining a struct and declaring a value of the same name in the global scope. We'd want the struct type to be non-instantiable. And we'd want "FOO" to deserialize to MyEnum.Foo when used in an input JSON (and vice-versa). The import/aliasing semantics would be the same for an enum as for a struct.

file1.wdl:

enum MyEnum {
  FOO,
  BAR
}

workflow {
  input {
    MyEnum e = MyEnum.FOO
  }

  String ename = e.name

  if (e == MyEnum.FOO) { ... }
}

file2.wdl:

# MyEnum is copied into file2's global namespace
import "file1.wdl"

# MyEnum is copied into file2's global namespace with name MyEnum2
import "file1.wdl" as file1a alias MyEnum as MyEnum2

The fact that an enum's value is struct-like means that we have the option to later support generic enums, e.g.

enum MyIntEnum[Int] {
  FOO: 1,
  BAR: 2
}

then the struct definition looks like:

struct MyIntEnum {
  String name,
  Int value
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants