Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Individual Process Testing Where You can Use another process' output as input for the next one #70

Closed
vasquini opened this issue Dec 7, 2022 · 9 comments · Fixed by #127
Labels
Milestone

Comments

@vasquini
Copy link

vasquini commented Dec 7, 2022

So I am grateful for the recent updates to nf-test. I can now get multiple process tests even if they are in the same file. I know there's pipeline tests, which I've ran successfully, but my team prefers multiple process tests specifically. I was wondering if there's a way to access the outputs of other processes like how it would be done on regular nextflow. Example: flye(guppy_ch.fastq). Can the tests emit outputs too? And can those inputs be passed as input?

@nvnieuwk
Copy link
Contributor

Couldn't you just write a test workflow that channels all the outputs between the processes and perform a test on this workflow? :)

@sateeshperi
Copy link
Contributor

Thank you for your response @nvnieuwk but, I would like to re-open this issue if possible to get more insights. ping @lukfor @seppinho

  • I'm currently interested in transitioning nf-core modules (~ 900 and counting) test framework, which currently runs on pytest, to nf-test.

  • While nf-core modules are intended to be independent units, the nature of various bioinformatics tools often requires chaining, resulting in many modules containing two or more processes to create the correct test data. I'm aware that this encroaches into the scope of workflows; however, the motivation remains fundamentally distinct.

Example of Dependent module test:

include { ABRICATE_RUN } from '../../../../../modules/nf-core/abricate/run/main.nf'
include { ABRICATE_SUMMARY } from '../../../../../modules/nf-core/abricate/summary/main.nf'

workflow test_abricate_summary {

    inputs = [
        tuple([ id:'test1', single_end:false ], // meta map
              file(params.test_data['bacteroides_fragilis']['genome']['genome_fna_gz'], checkIfExists: true)),
        tuple([ id:'test2', single_end:false ],
              file(params.test_data['haemophilus_influenzae']['genome']['genome_fna_gz'], checkIfExists: true))
    ]

    ABRICATE_RUN ( Channel.fromList(inputs) )
    ABRICATE_SUMMARY ( 
        ABRICATE_RUN.out.report.collect{ meta, report -> report }.map{ report -> [[ id: 'test_summary'], report]}
    )
}

Ideal solution:

nf-test process can support a depends field which runs the dependent module test first and makes the output channels available as input to the new module.

nextflow_process {

    name "Test Process ABRICATE_SUMMARY"
    script "modules/nf-core/abricate/summary/main.nf"
    process "ABRICATE_SUMMARY"
    depends "ABRICATE_RUN" // requires the test file to be present & runs it before executing the current one

    test("ABRICATE_SUMMARY") {

        when {
            params {
                // define parameters here. Example:
                // outdir = "tests/results"
            }
            process {
                """
                // define inputs of the process here. Example:
                input[0] = ABRICATE_RUN.out.report.collect{ meta, report -> report }.map{ report -> [[ id: 'test_summary'], report]}
                """
            }
        }

        then {
            assert process.success
            assert snapshot(process.out).match()
        }

    }

}

I would be interested to hear your thoughts on the feasabilty of the idea proposed or if you have any other ways we can tackle dependent modules.

Thanks a ton for all your work ❤️

@seppinho
Copy link
Collaborator

seppinho commented Apr 5, 2023

Good point, we're looking into that to find a solution that fits within the nf-test concept

@seppinho seppinho reopened this Apr 5, 2023
@sateeshperi
Copy link
Contributor

hello @seppinho checking in to see the status of this issue and if there is any way I could help?

@sateeshperi
Copy link
Contributor

hello @lukfor @seppinho

I wanted to follow up on this issue. This particular issue is critical for its adoption in nf-core. We have a hackathon coming up in October, and it would greatly benefit from having this feature implemented.

Could we get an update on any progress or a tentative timeline? Your assistance will significantly impact our productivity during the hackathon.

Thanks for your hard work and understanding.

@drpatelh
Copy link

Thanks @sateeshperi ! Quite a few nf-core/modules use chained processes in this way to overcome the need to generate intermediate test data files. Be awesome to have this implemented natively in nf-test.

@sateeshperi
Copy link
Contributor

@lukfor @seppinho I see this issue has been added to the 0.8.0 milestone but, haven't seen any PRs for it....is there a solution for this ?

@lukfor
Copy link
Collaborator

lukfor commented Oct 5, 2023

With PR #127, it is now possible to specify processes in the setup method that need to be executed before the when block. The outputs of these processes can then be utilized to define parameters or map processes/workflows. The run method takes the process or workflow name, the (relativ) path to the script file and the input mapping as arguments.

For example, the following testcase for the ABRICATE_SUMMARY module executes the ABRICATE_RUN module to generate the input data:

nextflow_process {

    name "Test process data"

    script "../main.nf"
    process "ABRICATE_SUMMARY"
    config "./nextflow.config"

    test("Should use process ABRICATE_RUN to generate input data") {

        setup {
            
            run("ABRICATE_RUN") {
                script "../../run/main.nf"
                process {
                    """
                    input[0] =  Channel.fromList([
                        tuple([ id:'test1', single_end:false ], // meta map
                            file(params.test_data['bacteroides_fragilis']['genome']['genome_fna_gz'], checkIfExists: true)),
                        tuple([ id:'test2', single_end:false ],
                            file(params.test_data['haemophilus_influenzae']['genome']['genome_fna_gz'], checkIfExists: true))
                    ])
                    """
                }
            }

        }

        when {
            process {
                """
                input[0] = ABRICATE_RUN.out.report.collect{ meta, report -> report }.map{ report -> [[ id: 'test_summary'], report]}
                """
            }
        }

        then {
            assert process.success
            assert snapshot(process.out).match()
        }
    }

}

The full example can be found here: https://github.com/askimed/nf-test/tree/features/dependencies/test-data/process/abricate/summary/tests and more details in PR #127

Happy to hear your feedback! @sateeshperi @drpatelh @emiller88 @nvnieuwk @maxulysse 🚀

@sateeshperi
Copy link
Contributor

sateeshperi commented Oct 5, 2023

works like a charm @lukfor Thanks for the setup 😉

nf-test test --tag abricate --profile docker

🚀 nf-test 0.8.0
https://code.askimed.com/nf-test
(c) 2021 - 2023 Lukas Forer and Sebastian Schoenherr

Found 1 files in test directory.

Test Process ABRICATE_SUMMARY

  Test [21188a7f] 'Should run without failures' PASSED (9.064s)

Test Process ABRICATE_RUN

  Test [50895192] 'Bacteroides Fragilis Genome' PASSED (8.443s)


SUCCESS: Executed 2 tests in 17.509s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants