Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need the ability to skip md5 check for some files in snapshot #211

Open
stevekm opened this issue Apr 30, 2024 · 2 comments
Open

Need the ability to skip md5 check for some files in snapshot #211

stevekm opened this issue Apr 30, 2024 · 2 comments

Comments

@stevekm
Copy link

stevekm commented Apr 30, 2024

I am trying to avoid taking the md5sum for some files in my snapshot.

I found this example of one potential method that was shared online here, which I have modified as such;

nextflow_pipeline {
    name "Test main Pipeline"
    script "main.nf"

    test("Should run without failures") {

        when {
            params {
                // NOTE: make sure 'outdir' is defined inside the JSON!
                load("$baseDir/examples/params.small.json")
            }
        }

        then {
            assert workflow.success

            def exclude_suffix = [".html", "_complete", "_invocation",
            "_outs", "_vdrkill", "_args","_complete",
            "_jobinfo","_log","_outs","_stderr","_stdout",
            "_chunk_defs", "_stage_defs", "_disabled",
            "_cmdline", "_filelist", "_finalstate", "_jobmode", "_mrosource", "_perf", "_sitecheck",
            "_tags", "_timestamp", "_uuid", "_versions"]

            assert snapshot(
                workflow,
                path("${params.outdir}")
                        .list()
                        .collect { getRecursiveFileNames(it, "${params.outdir}") }
                        .flatten()
                        .findAll {
                            def keep = true
                            exclude_suffix.each { suffix ->
                                if (it.toString().endsWith(suffix)) {
                                    keep = false
                                    // println "${it} : ${keep}"
                                    return keep // Exit the loop early if a match is found
                                }
                            }
                            // println "${it} : ${keep}"
                            return keep
                        }
            ).match()


        }

    }
}

def getRecursiveFileNames(fileOrDir, outputDir) {
    if(file(fileOrDir.toString()).isDirectory()) {
        return fileOrDir.list().collect { getRecursiveFileNames(it, outputDir) }
    }
    return fileOrDir.toString().replace("${outputDir}/", "")
}

It works to exclude the files with the listed suffixes, but the snapshot now only contains a list of files, no md5's for the remaining files in the list. Also, I realized that what I really wanted was to just exclude only the md5 from the files with inconsistent hashes, instead of removing them entirely. Not sure how to implement that. Can we have a feature that just builds this in to the nf-test directly?

@stevekm
Copy link
Author

stevekm commented Apr 30, 2024

I think this is related to this issue #116 however the main difference that I still want to check for the existence of the files, just not their md5

@GallVp
Copy link

GallVp commented Jul 16, 2024

I was in a somewhat similar situation and resorted to the following logic for the orthofinder module:

import groovy.io.FileType

.
.
.

assert process.success

def all_files = []

file(process.out.orthofinder[0][1]).eachFileRecurse (FileType.FILES) { file ->
    all_files << file
}

def all_file_names = all_files.collect { it.name }.sort(false)

def stable_file_names = [
    'Statistics_PerSpecies.tsv',
    'SpeciesTree_Gene_Duplications_0.5_Support.txt',
    'SpeciesTree_rooted.txt'
]

def stable_files = all_files.findAll { it.name in stable_file_names }

assert snapshot(
    all_file_names,
    stable_files,
    process.out.versions[0]
).match()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants