Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support closure operand for pipe operator #3423

Closed
wants to merge 6 commits into from

Conversation

jemunro
Copy link

@jemunro jemunro commented Nov 22, 2022

Based on #3356

This PR enables a more flexible approach to process inputs when piping by using _ as a placeholder.
For example:

workflow {
    append_ch = Channel.value('!')
    upper_ch = Channel.of('foo', 'bar', 'baz') \
        | TO_UPPER
    result_ch = CONCATENATE(upper_ch, append_ch)
}

can be simplified to:

workflow {
    append_ch = Channel.value('!')
    result_ch = Channel.of('foo', 'bar', 'baz') \
        | TO_UPPER \
        | CONCATENATE(_, append_ch)
}

In addition, when the input is a multichannel, component channels can be extracted by name or index:
For example:

workflow {
  Channel.of('foo', 'bar', 'baz') \
          | branch { foo: it == 'foo'; other: true } \
          | TO_UPPER(_.other)
}

or

workflow {
  Channel.of('foo', 'bar', 'baz') \
          | branch { foo: it == 'foo'; other: true } \
          | TO_UPPER(_[0])
}

One concern is there is that users are not prevented from overriding the placeholder variable.
This is the intend of this section:
https://github.com/jemunro/nextflow/blob/flexible-process-piping/modules/nextflow/src/main/groovy/nextflow/script/ScriptBinding.groovy#L191-L204
But on testing it doesn't seem to work.

@pditommaso
Copy link
Member

Wow, this is great. Should TO_UPPER(_.other) be read TO_UPPER(_,other) ?

@jemunro
Copy link
Author

jemunro commented Nov 22, 2022

Thanks!

Should TO_UPPER(_.other) be read TO_UPPER(_,other) ?

No, it's meant to be equivalent the following:

workflow {
  multi_ch = Channel.of('foo', 'bar', 'baz') \
          | branch { foo: it == 'foo'; other: true }
  TO_UPPER(multi_ch.other)
}

@pditommaso
Copy link
Member

Understand now. Not sure we should allow _.something. I fear it will make the syntax too convoluted.

@jemunro
Copy link
Author

jemunro commented Nov 22, 2022

The given example of the _.name syntax above is not very useful.

However, consider the process modules offered by nf-core. In general these emit two named output channels, one for the actual process outputs and one for the versions. In this case this syntax becomes very useful for piping the process outputs. For example:
https://github.com/nf-core/modules/blob/master/modules/nf-core/bcftools/sort/main.nf#L13-L15

    output:
    tuple val(meta), path("*.gz"), emit: vcf
    path "versions.yml"           , emit: versions

Using the nf-core processes BCFTOOLS_SORT, BCFTOOLS_FILTER and BCFTOOLS_INDEX:

  1. without _.name syntax:
BCFTOOLS_SORT(input_vcf_ch)
BCFTOOLS_FILTER(BCFTOOLS_SORT.out.vcf)
BCFTOOLS_INDEX(BCFTOOLS_FILTER.out.vcf)
  1. with _.name:
input_vcf_ch \
  | BCFTOOLS_SORT \
  | BCFTOOLS_FILTER(_.vcf) \
  | BCFTOOLS_INDEX(_.vcf)

@pditommaso
Copy link
Member

I think it's better to be more explicit in the pipeline composition. The last snippet is elegant but it hides too much information.

@jemunro
Copy link
Author

jemunro commented Nov 24, 2022

It is a bit esoteric, but I don't think it hides any more information than piping already does. e.g.:

process TO_UPPER {
    executor 'local'
    input: val x
    output: stdout emit: result
    script: "echo -n $x | tr a-z A-Z"
}

process OTHER {...}

workflow WITH_PIPE {
  Channel.of('foo', 'bar', 'baz') \
    | TO_UPPER \
    | OTHER
}

workflow NO_PIPE {
  input_ch = Channel.of('foo', 'bar', 'baz')
  TO_UPPER(input_ch)
  OTHER(TO_UPPER.out.result)
}

Piping in example WITH_PIPE already hides TO_UPPER.out.result

I think it would be really useful to have a way to use processes that output multiple channels with the pipe operator. It wouldn't necessarily have to be the proposed _.name construct, but I can't think of anything better.

I'd be interested to hear the opinion of some of the nf-core folks. @ewels, @drpatelh, @mribeirodantas - I've noticed that nf-core pipelines in general don't use the pipe operator. Is that because the pipe operator doesn't really work with processes that have multiple output channels?

@ewels
Copy link
Member

ewels commented Nov 24, 2022

Ooh I like this!

Is that because the pipe operator doesn't really work with processes that have multiple output channels?

I don't write a lot of Nextflow code these days but yes, from what I have done I've basically never had a situation where I could get the pipe to work except for dummy minimal examples. Most real life processes I've written always have more than one output channel.

@jemunro
Copy link
Author

jemunro commented Dec 6, 2022

Another option, instead of the proposed _.name syntax, would be to add an operator that extracts channels from multi-channel outputs, say extract. E.g.:

input_vcf_ch \
  | BCFTOOLS_SORT \
  | extract { vcf } \
  | BCFTOOLS_FILTER \
  | extract { vcf } \
  | BCFTOOLS_INDEX

This ends up being more verbose though.

@jemunro
Copy link
Author

jemunro commented Feb 3, 2023

What is the current thought on this? I'm happy to make some tweaks or discuss further.

@pditommaso pditommaso force-pushed the master branch 2 times, most recently from 0d59b4c to b93634e Compare March 11, 2023 11:20
@pditommaso pditommaso force-pushed the master branch 2 times, most recently from 81f7cb7 to 8a43489 Compare August 20, 2023 20:13
@bentsherman bentsherman changed the title enable flexible process piping Extend pipe operator to support multiple input channels Nov 16, 2023
@bentsherman bentsherman changed the title Extend pipe operator to support multiple input channels Support arbitrary port mappings for pipe operator Nov 16, 2023
@bentsherman
Copy link
Member

bentsherman commented Nov 16, 2023

I stumbled across a Rust RFC for a pipeline operator and particular this comment about de Bruijn indices, which is a notation for specifying multiple anonymous parameters in the lambda calculus. Your notation seems to be doing a similar thing and makes a lot of sense.

I think the key enhancement here is the ability to define "port mappings" between two components other than the implicit one-to-one mapping. I especially like the ability to access channel elements by index or emit name.

I think one way we could make the syntax clearer is to wrap non-default port mappings in a closure. Here are some sketches:

// (1) overload <component> | <closure>
input_vcf_ch
  | BCFTOOLS_SORT
  | { out -> BCFTOOLS_FILTER(out.vcf) } // or use implicit it variable
  | { out -> BCFTOOLS_INDEX(out.vcf) }

// (2) overload process call with single closure arg
input_vcf_ch
  | BCFTOOLS_SORT
  | BCFTOOLS_FILTER { out -> out.vcf }
  | BCFTOOLS_INDEX { out -> out.vcf }

I don't really like (2) because many operators expect a single closure argument which would create some ambiguity. But I like (1) because it is a distinct syntax and it allows you to map outputs to inputs however you like.

The closure arg is the ChannelOut of the previous component, and the closure should return the output of the component it invokes. The pipe operator should evaluate the closure immediately with the output of the previous component and the workflow context, so that other variables in the workflow can be used.

@bentsherman bentsherman linked an issue Nov 16, 2023 that may be closed by this pull request
@bentsherman bentsherman self-requested a review November 16, 2023 16:25
Copy link

netlify bot commented Nov 16, 2023

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 61f539f
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/65567ef1fd52a40007e8fa8d
😎 Deploy Preview https://deploy-preview-3423--nextflow-docs-staging.netlify.app/workflow
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@bentsherman
Copy link
Member

The closure syntax was stupid simple to implement! Piping between closures actually feels very intuitive, it's basically the same paradigm as pipes in Bash. Instead of a command, you have a closure that defines the invocation of a component, and the closure argument and return value are like stdin and stdout.

Added some docs and a testable doc snippet based on @jemunro 's example.

I really want this feature now 😄

@bentsherman bentsherman changed the title Support arbitrary port mappings for pipe operator Support closure operand for pipe operator Nov 16, 2023
@bentsherman
Copy link
Member

I extended the and operator to also work on channels, so you can do something like:

// before
foo(ch1, ch2, ch3)

// after
(ch1 & ch2 & ch3) | foo

I figure you can do this with one channel so why not multiple 😄

@bentsherman
Copy link
Member

Thinking more about this, especially in the context of our efforts to develop a formal grammar #4613 , I think we can do much better than extending the existing pipe operator which is just overloading the bitwise-or operator for a few specific types.

The script parser will allow us to define new syntax outside of the Groovy grammar. I would like to use this opportunity to add a new pipe operator |> that works with any function call, not just processes and channels:

// x can be any value
// foo can be function, process, workflow, callable
x |> foo = foo(x)

// even a closure
x |> { x -> foo(x) } = foo(x)

Languages that support pipes tend to use |> instead of | (see OCaml, F#, Elm, Elixir, Gleam). It should replace the | operator which can go back to just being bitwise-or (not that you need it very often in Nextflow).

This fits into some broader ideas I have about making the dataflow syntax easier to read and write (but just as powerful if not more-so), likely as a proposal for DSL3. I hope to share some sketches soon but might be a while before it becomes real. Until then, we'll just have to live with the current syntax

Closing this PR for now. Thanks @jemunro for your initial draft. It will not be forgotten

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DSL2 pipe syntax enhancements
4 participants