Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAT_SUMMARY fails due to input filename collision #474

Closed
tillenglert opened this issue Jul 11, 2023 · 4 comments
Closed

CAT_SUMMARY fails due to input filename collision #474

tillenglert opened this issue Jul 11, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@tillenglert
Copy link
Contributor

Description of the bug

Hi there,

I'm running nf-core/mag on a metagenomics dataset and wanted to include a taxonomic classification via CAT. The main module CAT runs, but then runs into the following issue:

nf-core/mag execution completed unsuccessfully!

The exit status of the task that caused the workflow execution to fail was: null.

The full error message was:

Error executing process > 'NFCORE_MAG:MAG:CAT_SUMMARY'

Caused by:
  Process `NFCORE_MAG:MAG:CAT_SUMMARY` input file name collision -- There are multiple input files for each of the following file names: SPAdes-DASTool-group-0.ORF2LCA.names.txt.gz, SPAdes-DASTool-group-0.bin2classification.names.txt.gz, MEGAHIT-DASTool-group-0.bin2classification.names.txt.gz, MEGAHIT-DASTool-group-0.ORF2LCA.names.txt.gz


Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

As the names suggest I'm using SPAdes and MEGAHIT, MaxBin2, Metabat2 and DAS Tool (Which seems to be the Problem here) and --postbinning_input "both".

I tracked down the problem to the naming convention within the CAT module:

gzip "raw/${meta.assembler}-${meta.binner}-${meta.id}.ORF2LCA.txt" \

Which does not account for the naming of the unbinned DasTool file.

If you need any more information/files I can of course provide them.

Command used and terminal output

nextflow run nf-core/mag -r 2.3.0 -profile cfc --input "../samplesheet.csv" --host_genome "mm10" --save_hostremoved_reads --cat_db ../cat_db/CAT_prepare_20210107.tar.gz --coassemble_group --refine_bins_dastool --postbinning_input "both" --busco_auto_lineage_prok --save_busco_reference --busco_download_path "../busco-data.ezlab.org/v5/data" --skip_concoct --skip_prokka --outdir "results_with_cat" -c "../QMCOK_mag.config" --email [email protected] -resume

Relevant files

No response

System information

Nextflow version 23.04.1 build 5866
Hardware: HPC Cluster
Executor: slurs
Container engine: Singularity
nf-core/mag: 2.3.0

@tillenglert tillenglert added the bug Something isn't working label Jul 11, 2023
@jfy133
Copy link
Member

jfy133 commented Jul 11, 2023

A fix for this was incoming: #433 however the PR was closed (@maxibor ?)

@tillenglert
Copy link
Contributor Author

Thanks @jfy133, I posted what I tracked down/found out to the PR. 👍

@jfy133
Copy link
Member

jfy133 commented Jul 11, 2023

To summarise:

  • CAT module hard codes file names in the commands themselves, rather than using the now standard Prefix, and thus the 'unbinned' information gets removed when writing files, resulting in the overwriting of bins and unbins of a given sampe...

@jfy133
Copy link
Member

jfy133 commented Nov 3, 2023

This in principle should be fixed in #489 and now in dev branch with a work around until we start replacing modules with offiical nf-core ones!

Will wait a week or so to see if we can get in a few more bug fixes then will release this.

@jfy133 jfy133 closed this as completed Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants