-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing <prefix>.ar53.summary.tsv #509
Comments
Hello, Thanks |
So I am running a nextflow metagenomics pipeline to process 1 sample SAMN04359828. The sample is assembled and binned to create bin files. The bin files (2 in this case - SPAdes-MetaBAT2Refined-SAMN04359828.4.fa, SPAdes-MaxBin2Refined-SAMN04359828.001.fa) are being supplied to gtdbtk for classification. Here is the log - [2023-04-20 03:10:17] INFO: GTDB-Tk v2.1.1 [2023-04-20 03:10:17] INFO: gtdbtk classify_wf --extension fa --genome_dir bins --prefix gtdbtk.SPAdes-DASTool-SAMN04359828 --out_dir /hpc/projects/upt/Metagenomics2/PRJNA46333/work/SAMN04359828_work/04/00b8446ad22d9525acab88c64a2159 --cpus 10 --pplacer_cpus 1 --scratch_dir pplacer_tmp --min_perc_aa 10 --min_af 0.65 [2023-04-20 03:10:17] INFO: Using GTDB-Tk reference data version r207: /hpc/projects/upt/Metagenomics2/PRJNA46333/work/SAMN04359828_work/04/00b8446ad22d9525acab88c64a2159/database [2023-04-20 03:10:17] INFO: Identifying markers in 2 genomes with 10 threads. [2023-04-20 03:10:17] TASK: Running Prodigal V2.6.3 to identify genes. [2023-04-20 03:10:27] INFO: Completed 2 genomes in 9.94 seconds (4.97 seconds/genome). [2023-04-20 03:10:27] TASK: Identifying TIGRFAM protein families. [2023-04-20 03:10:31] INFO: Completed 2 genomes in 3.30 seconds (1.65 seconds/genome). [2023-04-20 03:10:31] TASK: Identifying Pfam protein families. [2023-04-20 03:10:31] INFO: Completed 2 genomes in 0.25 seconds (7.97 genomes/second). [2023-04-20 03:10:31] INFO: Annotations done using HMMER 3.1b2 (February 2015). [2023-04-20 03:10:31] TASK: Summarising identified marker genes. [2023-04-20 03:10:31] INFO: Completed 2 genomes in 0.03 seconds (63.64 genomes/second). [2023-04-20 03:10:31] INFO: Done. [2023-04-20 03:10:31] INFO: Aligning markers in 2 genomes with 10 CPUs. [2023-04-20 03:10:31] INFO: Processing 1 genomes identified as bacterial. [2023-04-20 03:10:38] INFO: Read concatenated alignment for 62,291 GTDB genomes. [2023-04-20 03:10:38] TASK: Generating concatenated alignment for each marker. [2023-04-20 03:10:39] INFO: Completed 1 genome in 0.02 seconds (50.43 genomes/second). [2023-04-20 03:10:39] TASK: Aligning 120 identified markers using hmmalign 3.1b2 (February 2015). [2023-04-20 03:10:43] INFO: Completed 120 markers in 2.80 seconds (42.90 markers/second). [2023-04-20 03:10:43] TASK: Masking columns of bacterial multiple sequence alignment using canonical mask. [2023-04-20 03:12:41] INFO: Completed 62,292 sequences in 1.96 minutes (31,798.25 sequences/minute). [2023-04-20 03:12:41] INFO: Masked bacterial alignment from 41,084 to 5,036 AAs. [2023-04-20 03:12:41] INFO: 0 bacterial user genomes have amino acids in <10.0% of columns in filtered MSA. [2023-04-20 03:12:41] INFO: Creating concatenated alignment for 62,292 bacterial GTDB and user genomes. [2023-04-20 03:12:59] INFO: Creating concatenated alignment for 1 bacterial user genomes. [2023-04-20 03:12:59] INFO: Processing 1 genomes identified as archaeal. [2023-04-20 03:12:59] INFO: Read concatenated alignment for 3,412 GTDB genomes. [2023-04-20 03:13:00] TASK: Generating concatenated alignment for each marker. [2023-04-20 03:13:00] INFO: Completed 1 genome in 0.01 seconds (76.27 genomes/second). [2023-04-20 03:13:00] TASK: Aligning 2 identified markers using hmmalign 3.1b2 (February 2015). [2023-04-20 03:13:01] INFO: Completed 2 markers in 0.15 seconds (13.56 markers/second). [2023-04-20 03:13:01] TASK: Masking columns of archaeal multiple sequence alignment using canonical mask. [2023-04-20 03:13:05] INFO: Completed 3,413 sequences in 3.60 seconds (948.78 sequences/second). [2023-04-20 03:13:05] INFO: Masked archaeal alignment from 13,540 to 10,153 AAs. [2023-04-20 03:13:05] INFO: 1 archaeal user genomes have amino acids in <10.0% of columns in filtered MSA. [2023-04-20 03:13:05] INFO: Creating concatenated alignment for 3,412 archaeal GTDB and user genomes. [2023-04-20 03:13:07] INFO: All archaeal user genomes have been filtered out. [2023-04-20 03:13:07] INFO: Done. [2023-04-20 03:13:07] INFO: Using a scratch file for pplacer allocations. This decreases memory usage and performance. [2023-04-20 03:13:07] TASK: Placing 1 bacterial genomes into backbone reference tree with pplacer using 1 CPUs (be patient). [2023-04-20 03:13:07] INFO: pplacer version: v1.1.alpha19-0-g807f6f3 [2023-04-20 03:15:39] INFO: Calculating RED values based on reference tree. [2023-04-20 03:15:39] INFO: 1 out of 1 have an class assignments. Those genomes will be reclassified. [2023-04-20 03:15:39] INFO: Using a scratch file for pplacer allocations. This decreases memory usage and performance. [2023-04-20 03:15:39] TASK: Placing 1 bacterial genomes into class-level reference tree 5 (1/1) with pplacer using 1 CPUs (be patient). [2023-04-20 03:20:49] INFO: Calculating RED values based on reference tree. [2023-04-20 03:20:50] TASK: Traversing tree to determine classification method. [2023-04-20 03:20:50] INFO: Completed 1 genome in 0.00 seconds (7,463.17 genomes/second). [2023-04-20 03:20:50] TASK: Calculating average nucleotide identity using FastANI (v1.3). [2023-04-20 03:20:51] INFO: Completed 14 comparisons in 1.05 seconds (13.37 comparisons/second). [2023-04-20 03:20:52] INFO: 1 genome(s) have been classified using FastANI and pplacer. [2023-04-20 03:20:52] INFO: Note that Tk classification mode is insufficient for publication of new taxonomic designations. New designations should be based on one or more de novo trees, an example of which can be produced by Tk in de novo mode. [2023-04-20 03:20:52] INFO: Done. [2023-04-20 03:20:52] INFO: Removing intermediate files. [2023-04-20 03:20:52] INFO: Intermediate files removed. [2023-04-20 03:20:52] INFO: Done. |
Hello, |
Hi. I am using gtdbtk v2.1.1 to perform taxonomy assignment on metagenomic bin files. For the bacterial genomes, the .bac120.filtered.tsv and .bac120.summary.tsv files are generated. However, for the archaeal genomes, only the .ar53.filtered.tsv file is generated. The .ar53.summary.tsv file is missing. From the bac120 files, I see that the genomes containing insufficient AAs in MSA are reported in the bac120.summary.tsv along with other classified genomes. They are also reported in bac120.filtered.tsv. So I expected that in the ar53.summary.tsv file, similar information will be present & I planned to merge both the bac120.summary.tsv and ar53.summary.tsv file to generate a gtdbtk_summary.tsv file. However, with the ar53.summary.tsv file missing, I am unable to do so. Can you please suggest how to resolve this?
Command
gtdbtk classify_wf --extension fa --genome_dir bins --prefix gtdbtk.SPAdes-DASTool-SAMN04359828 --out_dir /hpc/projects/upt/Metagenomics2/PRJNA46333/work/SAMN04359828_work/04/00b8446ad22d9525acab88c64a2159 --cpus 10 --pplacer_cpus 1 --scratch_dir pplacer_tmp --min_perc_aa 10 --min_af 0.65
Environment
Server information
The text was updated successfully, but these errors were encountered: