Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing <prefix>.ar53.summary.tsv #509

Closed
1 task
rialc13 opened this issue Apr 20, 2023 · 3 comments
Closed
1 task

Missing <prefix>.ar53.summary.tsv #509

rialc13 opened this issue Apr 20, 2023 · 3 comments
Labels
error Help required for a GTDB-Tk error.

Comments

@rialc13
Copy link

rialc13 commented Apr 20, 2023

Hi. I am using gtdbtk v2.1.1 to perform taxonomy assignment on metagenomic bin files. For the bacterial genomes, the .bac120.filtered.tsv and .bac120.summary.tsv files are generated. However, for the archaeal genomes, only the .ar53.filtered.tsv file is generated. The .ar53.summary.tsv file is missing. From the bac120 files, I see that the genomes containing insufficient AAs in MSA are reported in the bac120.summary.tsv along with other classified genomes. They are also reported in bac120.filtered.tsv. So I expected that in the ar53.summary.tsv file, similar information will be present & I planned to merge both the bac120.summary.tsv and ar53.summary.tsv file to generate a gtdbtk_summary.tsv file. However, with the ar53.summary.tsv file missing, I am unable to do so. Can you please suggest how to resolve this?

Command

gtdbtk classify_wf --extension fa --genome_dir bins --prefix gtdbtk.SPAdes-DASTool-SAMN04359828 --out_dir /hpc/projects/upt/Metagenomics2/PRJNA46333/work/SAMN04359828_work/04/00b8446ad22d9525acab88c64a2159 --cpus 10 --pplacer_cpus 1 --scratch_dir pplacer_tmp --min_perc_aa 10 --min_af 0.65

Environment

Server information

  • HPC
@rialc13 rialc13 added the error Help required for a GTDB-Tk error. label Apr 20, 2023
@pchaumeil
Copy link
Collaborator

Hello,
Can you please provide the gtdbtk.log?
How many genomes do you submit to your pipeline and how many are in the final summary file?

Thanks

@rialc13
Copy link
Author

rialc13 commented Apr 21, 2023

So I am running a nextflow metagenomics pipeline to process 1 sample SAMN04359828. The sample is assembled and binned to create bin files. The bin files (2 in this case - SPAdes-MetaBAT2Refined-SAMN04359828.4.fa, SPAdes-MaxBin2Refined-SAMN04359828.001.fa) are being supplied to gtdbtk for classification. Here is the log -

[2023-04-20 03:10:17] INFO: GTDB-Tk v2.1.1

[2023-04-20 03:10:17] INFO: gtdbtk classify_wf --extension fa --genome_dir bins --prefix gtdbtk.SPAdes-DASTool-SAMN04359828 --out_dir /hpc/projects/upt/Metagenomics2/PRJNA46333/work/SAMN04359828_work/04/00b8446ad22d9525acab88c64a2159 --cpus 10 --pplacer_cpus 1 --scratch_dir pplacer_tmp --min_perc_aa 10 --min_af 0.65

[2023-04-20 03:10:17] INFO: Using GTDB-Tk reference data version r207: /hpc/projects/upt/Metagenomics2/PRJNA46333/work/SAMN04359828_work/04/00b8446ad22d9525acab88c64a2159/database

[2023-04-20 03:10:17] INFO: Identifying markers in 2 genomes with 10 threads.

[2023-04-20 03:10:17] TASK: Running Prodigal V2.6.3 to identify genes.

[2023-04-20 03:10:27] INFO: Completed 2 genomes in 9.94 seconds (4.97 seconds/genome).

[2023-04-20 03:10:27] TASK: Identifying TIGRFAM protein families.

[2023-04-20 03:10:31] INFO: Completed 2 genomes in 3.30 seconds (1.65 seconds/genome).

[2023-04-20 03:10:31] TASK: Identifying Pfam protein families.

[2023-04-20 03:10:31] INFO: Completed 2 genomes in 0.25 seconds (7.97 genomes/second).

[2023-04-20 03:10:31] INFO: Annotations done using HMMER 3.1b2 (February 2015).

[2023-04-20 03:10:31] TASK: Summarising identified marker genes.

[2023-04-20 03:10:31] INFO: Completed 2 genomes in 0.03 seconds (63.64 genomes/second).

[2023-04-20 03:10:31] INFO: Done.

[2023-04-20 03:10:31] INFO: Aligning markers in 2 genomes with 10 CPUs.

[2023-04-20 03:10:31] INFO: Processing 1 genomes identified as bacterial.

[2023-04-20 03:10:38] INFO: Read concatenated alignment for 62,291 GTDB genomes.

[2023-04-20 03:10:38] TASK: Generating concatenated alignment for each marker.

[2023-04-20 03:10:39] INFO: Completed 1 genome in 0.02 seconds (50.43 genomes/second).

[2023-04-20 03:10:39] TASK: Aligning 120 identified markers using hmmalign 3.1b2 (February 2015).

[2023-04-20 03:10:43] INFO: Completed 120 markers in 2.80 seconds (42.90 markers/second).

[2023-04-20 03:10:43] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask.

[2023-04-20 03:12:41] INFO: Completed 62,292 sequences in 1.96 minutes (31,798.25 sequences/minute).

[2023-04-20 03:12:41] INFO: Masked bacterial alignment from 41,084 to 5,036 AAs.

[2023-04-20 03:12:41] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA.

[2023-04-20 03:12:41] INFO: Creating concatenated alignment for 62,292 bacterial GTDB and user genomes.

[2023-04-20 03:12:59] INFO: Creating concatenated alignment for 1 bacterial user genomes.

[2023-04-20 03:12:59] INFO: Processing 1 genomes identified as archaeal.

[2023-04-20 03:12:59] INFO: Read concatenated alignment for 3,412 GTDB genomes.

[2023-04-20 03:13:00] TASK: Generating concatenated alignment for each marker.

[2023-04-20 03:13:00] INFO: Completed 1 genome in 0.01 seconds (76.27 genomes/second).

[2023-04-20 03:13:00] TASK: Aligning 2 identified markers using hmmalign 3.1b2 (February 2015).

[2023-04-20 03:13:01] INFO: Completed 2 markers in 0.15 seconds (13.56 markers/second).

[2023-04-20 03:13:01] TASK: Masking columns of archaeal multiple sequence alignment using canonical mask.

[2023-04-20 03:13:05] INFO: Completed 3,413 sequences in 3.60 seconds (948.78 sequences/second).

[2023-04-20 03:13:05] INFO: Masked archaeal alignment from 13,540 to 10,153 AAs.

[2023-04-20 03:13:05] INFO: 1 archaeal user genomes have amino acids in <10.0% of columns in filtered MSA.

[2023-04-20 03:13:05] INFO: Creating concatenated alignment for 3,412 archaeal GTDB and user genomes.

[2023-04-20 03:13:07] INFO: All archaeal user genomes have been filtered out.

[2023-04-20 03:13:07] INFO: Done.

[2023-04-20 03:13:07] INFO: Using a scratch file for pplacer allocations. This decreases memory usage and performance.

[2023-04-20 03:13:07] TASK: Placing 1 bacterial genomes into backbone reference tree with pplacer using 1 CPUs (be patient).

[2023-04-20 03:13:07] INFO: pplacer version: v1.1.alpha19-0-g807f6f3

[2023-04-20 03:15:39] INFO: Calculating RED values based on reference tree.

[2023-04-20 03:15:39] INFO: 1 out of 1 have an class assignments. Those genomes will be reclassified.

[2023-04-20 03:15:39] INFO: Using a scratch file for pplacer allocations. This decreases memory usage and performance.

[2023-04-20 03:15:39] TASK: Placing 1 bacterial genomes into class-level reference tree 5 (1/1) with pplacer using 1 CPUs (be patient).

[2023-04-20 03:20:49] INFO: Calculating RED values based on reference tree.

[2023-04-20 03:20:50] TASK: Traversing tree to determine classification method.

[2023-04-20 03:20:50] INFO: Completed 1 genome in 0.00 seconds (7,463.17 genomes/second).

[2023-04-20 03:20:50] TASK: Calculating average nucleotide identity using FastANI (v1.3).

[2023-04-20 03:20:51] INFO: Completed 14 comparisons in 1.05 seconds (13.37 comparisons/second).

[2023-04-20 03:20:52] INFO: 1 genome(s) have been classified using FastANI and pplacer.

[2023-04-20 03:20:52] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode.

[2023-04-20 03:20:52] INFO: Done.

[2023-04-20 03:20:52] INFO: Removing intermediate files.

[2023-04-20 03:20:52] INFO: Intermediate files removed.

[2023-04-20 03:20:52] INFO: Done.

@pchaumeil
Copy link
Collaborator

Hello,
we have released GTDB-Tk v2.3.
This version should fix the problem of missing genomes

@pchaumeil pchaumeil reopened this May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error Help required for a GTDB-Tk error.
Projects
None yet
Development

No branches or pull requests

2 participants