Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL2 migration #14

Open
cjfields opened this issue May 8, 2021 · 2 comments
Open

DSL2 migration #14

cjfields opened this issue May 8, 2021 · 2 comments
Labels
DSL2 Prioritize for DSL2 implementation enhancement New feature or request

Comments

@cjfields
Copy link
Contributor

cjfields commented May 8, 2021

We're seeing some fragmentation on workflows due to switches in technologies (PacBio, Shoreline, Loop, etc) that a migration to DSL2 would help tremendously. This is a simple tracker to plot a course forward and note tickets that would benefit from this.

@wbazant
Copy link
Contributor

wbazant commented Jul 8, 2021

A possible migration plan

Introduction

I've time-boxed a day for figuring out how I'd do it. Here is a plan with a few commands that might be helpful to someone if a migration is attempted.

Inspect overall structure

The logic is duplicated across a few files and it's not obvious how to put them together.
Handily, there are comments in the code for each step, and the execution flows from top to bottom.

This searches for the comments and notes down the lines they are at, so the files can be split into logical chunks:

cat <( grep -n 'env nextflow\| * Step' main.nf pacbio.nf loop.nf | perl -pe 's{#!/usr/bin/env nextflow}{ * Step 0: Start}' ) <(  wc -l main.nf pacbio.nf loop.nf  | grep -v total | perl -pe 's{  (\d+) (.*)}{$2:$1:  * Step 11: End}' )


main.nf:1: * Step 0: Start
main.nf:183: * Step 1: Filter and trim (run per sample?)
main.nf:467: * Step 2: Learn error rates (run on all samples)
main.nf:564: * Step 3: Dereplication, Sample Inference, Merge Pairs
main.nf:570: * Step 4: Construct sequence table
main.nf:755: * Step 8: Remove chimeras
main.nf:799: * Step 9: Taxonomic assignment
main.nf:992: * Step 8.5: Rename ASVs
main.nf:1181: * Step 10: Align and construct phylogenetic tree
main.nf:1188: * Step 10a: Alignment
main.nf:1259:     * Step 10b: Construct phylogenetic tree
main.nf:1386: * Step 10: Track reads
pacbio.nf:1: * Step 0: Start
pacbio.nf:173: * Step 1: Filter and trim (run per sample?)
pacbio.nf:331: * Step 2: Learn error rates (run on all samples)
pacbio.nf:377: * Step 3: Dereplication, Sample Inference, Merge Pairs
pacbio.nf:383: * Step 4: Construct sequence table
pacbio.nf:449: * Step 8: Remove chimeras
pacbio.nf:484: * Step 9: Taxonomic assignment
pacbio.nf:673: * Step 8.5: Rename ASVs
pacbio.nf:845: * Step 10: Align and construct phylogenetic tree
pacbio.nf:852: * Step 10a: Alignment
pacbio.nf:923:     * Step 10b: Construct phylogenetic tree
pacbio.nf:1050: * Step 10: Track reads
loop.nf:1: * Step 0: Start
loop.nf:291: * Step 2: Learn error rates (run on all samples)
loop.nf:336: * Step 3: Dereplication, Sample Inference, Merge Pairs
loop.nf:342: * Step 4: Construct sequence table
loop.nf:405: * Step 8: Remove chimeras
loop.nf:440: * Step 9: Taxonomic assignment
loop.nf:629: * Step 8.5: Rename ASVs
loop.nf:803: * Step 10: Align and construct phylogenetic tree
loop.nf:810: * Step 10a: Alignment
loop.nf:881:     * Step 10b: Construct phylogenetic tree
loop.nf:1008: * Step 10: Track reads
main.nf:1637:  * Step 11: End
pacbio.nf:1289:  * Step 11: End
loop.nf:1249:  * Step 11: End

Split into modules

Here's how to pick a section from each file corresponding to filtering and trimming, based on line numbers above:

cat \
  <( perl -ne 'print if $. >=183 && $. < 467' main.nf   ) \
  <( perl -ne 'print if $. >=173 && $. < 331' pacbio.nf ) \
  <( perl -ne 'print if $. >=1 && $. < 291' loop.nf ) \
 > filterAndTrim.nf

Edit each module

Remove DSL1 bits

Remove all the from from processes. Replace all the into with emit, naming the output after the first into channel.
In vim:

%s/ from.*//

%! perl -pe 'if(/ into/){s{ into}{, emit:}; s{(.*emit:.*?),.*}{$1};}'

Clean up

Go through the new file manually. Remove commented out code, and any if()s - leave just the process definitions.

Try to recreate each flow through the code

Since the from and into are gone from DSL2, how processes connect up is defined somewhere else. Do that, just for the module.

I found it helpful to have two files open next to each other - the common one, and the original ones that had the flow - and search for process names because they didn't change.
For the 16s workflow, I've ended up with:

workflow filterAndTrim16sPaired {
  take: reads
  main:

  readsPaired = Channel.fromFilePairs( reads )
  runFastQC(readsPaired) | runMultiQC()
  filterAndTrim(readsPaired)
  runFastQC_postfilterandtrim(filterAndTrim.out.filteredReadsforQC) | runMultiQC_postfilterandtrim()
  mergeTrimmedTable(filterAndTrim.out.trimTracking)

  emit:
  filterAndTrim.out
  mergeTrimmedTable.out.trimmedReadTracking
}

Test each module

Since nothing really changes in the individual processes, any breakages should be quite obvious when testing modules one at a time.

Put everything together

The new main.nf should have just the imports, and put them into workflows corresponding to usage, e.g. 16sPaired, ITSPaired, Loop, PacBio. Different workflows would then be executed with an -entry command line argument.

@cjfields
Copy link
Contributor Author

I've set up a preliminary project to start some organization on this. I'd like to start on something in the next few months, on a branch of course, as we have several workflows with similar steps. We should also think about some of the newer considerations with the nf-core DSL2 base and configs, in particular process-specific configs that allow some custom arguments.

@cjfields cjfields added the enhancement New feature or request label Nov 30, 2021
@cjfields cjfields added the DSL2 Prioritize for DSL2 implementation label Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DSL2 Prioritize for DSL2 implementation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants