Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'Target_Species' #12

Open
larsmoret opened this issue Nov 3, 2023 · 10 comments
Open

KeyError: 'Target_Species' #12

larsmoret opened this issue Nov 3, 2023 · 10 comments

Comments

@larsmoret
Copy link

Dear all,
I must say, I am quite intrigued comparing it to BUSCO

However, I came across an error while trying to run it and i have no idea where to look.
While trying to run Compleasm, it suddenly stops and displays KeyError: 'Target_Species'

Has anyone had the same issue or any idea where the problem might be?

Thanks in advance,
Lars Moret

P.S.
This is my entire log, please note that i have installed Compleasm using conda.

(checker) lmoret@ubuntudesktopc:/data/volume_2$ compleasm run -a finalassemblies/CBS1922.fasta -o compleasmoutput/CBS1922 -l fungi -t 14
Searching for miniprot in the path where compleasm.py is located
Searching for miniprot in the current execution path
Searching for hmmsearch in the path where compleasm.py is located
Searching for hmmsearch in the current execution path
miniprot execute command:
/data/volume_2/compleasm_kit/miniprot
lineage: fungi_odb10
hmmsearch execute command:
/data/volume_2/compleasm_kit/hmmsearch
Traceback (most recent call last):
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Target_species'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/lmoret/miniconda3/envs/checker/bin/compleasm", line 10, in
sys.exit(main())
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 2534, in main
args.func(args)
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 2426, in run
mr.Run()
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 2142, in Run
miniprot_alignment_parser.Run()
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 1158, in Run
self.Run_busco_mode()
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 1234, in Run_busco_mode
filtered_species = records_df["Target_species"].unique()
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in getitem
indexer = self.columns.get_loc(key)
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'Target_species'
(checker) 1 lmoret@ubuntudesktopc:/data/volume_2$

@huangnengCSU
Copy link
Owner

Hi @larsmoret
Could you list the files (also the filesize) under the directory "fungi_odb10" of output folder?

@larsmoret
Copy link
Author

(checker) 130 lmoret@ubuntudesktopc:~/data/volume_2$ ls compleasmoutput/CBS1922/fungi_odb10/
hmmer_output hmmsearch.done miniprot.done miniprot_output.gff translated_protein.fasta

Total file size is:
25M compleasmoutput/CBS1922/fungi_odb10

with per file:
1.5M compleasmoutput/CBS1922/fungi_odb10/hmmer_output/
0 compleasmoutput/CBS1922/fungi_odb10/hmmsearch.done
0 compleasmoutput/CBS1922/fungi_odb10/miniprot.done
24M compleasmoutput/CBS1922/fungi_odb10/miniprot_output.gff
176K compleasmoutput/CBS1922/fungi_odb10/translated_protein.fasta

@katiecdillon
Copy link

**Hello,

I am running into the same issue as @larsmoret. Attached is my submission script.**
SCRIPT_miniBUSCO_20231106_v1.txt

Here are the contents of the "arthropoda_odb10" directory:

-rw-r--r-- 1 kcd88651 tcglab 9676547 Nov 4 17:49 miniprot_output.gff
-rw-r--r-- 1 kcd88651 tcglab 0 Nov 4 17:49 miniprot.done
-rw-r--r-- 1 kcd88651 tcglab 0 Nov 4 17:49 hmmsearch.done
drwxr-xr-x 2 kcd88651 tcglab 4096 Nov 4 17:49 hmmer_output
-rw-r--r-- 1 kcd88651 tcglab 0 Nov 6 11:48 translated_protein.fasta

This is my error output:

Traceback (most recent call last):
File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Target_species'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/kcd88651/.conda/envs/compleasm/bin/compleasm", line 10, in
sys.exit(main())
File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/compleasm.py", line 2534, in main
args.func(args)
File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/compleasm.py", line 2426, in run
mr.Run()
File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/compleasm.py", line 2142, in Run
miniprot_alignment_parser.Run()
File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/compleasm.py", line 1158, in Run
self.Run_busco_mode()
File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/compleasm.py", line 1234, in Run_busco_mode
filtered_species = records_df["Target_species"].unique()
File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/pandas/core/frame.py", line 3458, in getitem
indexer = self.columns.get_loc(key)
File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'Target_species'
Traceback (most recent call last):
File "/home/kcd88651/.conda/envs/compleasm/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Target_species'

@huangnengCSU
Copy link
Owner

huangnengCSU commented Nov 6, 2023

Hi @katiecdillon

Thanks for providing the script. Could you specify a different output folder name for each input assembly, instead of using "$D2" for all the assemblies?

@huangnengCSU
Copy link
Owner

huangnengCSU commented Nov 7, 2023

Hi @larsmoret @katiecdillon ,

I have added some checks in the code to understand why something went wrong. The reason for KeyError "Target_species" is that there is no candidate alignment hits satisfying the BUSCO threshold. Could you clone the source code and re-run the failed case in the existing compleasm env?

e.g.

https://github.com/huangnengCSU/compleasm.git
python compleasm.py run -a $input_asm -l $lineage -o $output_folder -t $threads

Thanks!

@larsmoret
Copy link
Author

Hi @huangnengCSU

Ive tried it, and now it loads the fungi_obd10 but it can not build the index.

Thanks in advance,

(checker) 2 lmoret@ubuntudesktopc:~/data/volume_2/compleasm$ compleasm run -a ~/finalassemblies/CBS1922.fasta -l fungi -o ~/compleasmoutput/ -t 14
Searching for miniprot in the path where compleasm.py is located
Searching for miniprot in the current execution path
Searching for miniprot in $PATH
Searching for hmmsearch in the path where compleasm.py is located
Searching for hmmsearch in the current execution path
Searching for hmmsearch in $PATH
miniprot execute command:
/home/lmoret/miniconda3/envs/checker/bin/miniprot
Success download from https://busco-data.ezlab.org/v5/data/file_versions.tsv
Success download from https://busco-data.ezlab.org/v5/data/placement_files/list_of_reference_markers.eukaryota_odb10.2019-12-16.txt.tar.gz
Placement file extraction path: mb_downloads/placement_files/list_of_reference_markers.eukaryota_odb10.2019-12-16.txt
Success download from https://busco-data.ezlab.org/v5/data/placement_files/mapping_taxid-lineage.eukaryota_odb10.2019-12-16.txt.tar.gz
Placement file extraction path: mb_downloads/placement_files/mapping_taxid-lineage.eukaryota_odb10.2019-12-16.txt
Success download from https://busco-data.ezlab.org/v5/data/placement_files/mapping_taxids-busco_dataset_name.eukaryota_odb10.2019-12-16.txt.tar.gz
Placement file extraction path: mb_downloads/placement_files/mapping_taxids-busco_dataset_name.eukaryota_odb10.2019-12-16.txt
Success download from https://busco-data.ezlab.org/v5/data/placement_files/supermatrix.aln.eukaryota_odb10.2019-12-16.faa.tar.gz
Placement file extraction path: mb_downloads/placement_files/supermatrix.aln.eukaryota_odb10.2019-12-16.faa
Success download from https://busco-data.ezlab.org/v5/data/placement_files/tree.eukaryota_odb10.2019-12-16.nwk.tar.gz
Placement file extraction path: mb_downloads/placement_files/tree.eukaryota_odb10.2019-12-16.nwk
Success download from https://busco-data.ezlab.org/v5/data/placement_files/tree_metadata.eukaryota_odb10.2019-12-16.txt.tar.gz
Placement file extraction path: mb_downloads/placement_files/tree_metadata.eukaryota_odb10.2019-12-16.txt
Success download from https://busco-data.ezlab.org/v5/data/lineages/eukaryota_odb10.2020-09-10.tar.gz
Lineage file extraction path: mb_downloads/eukaryota_odb10
Success download from https://busco-data.ezlab.org/v5/data/lineages/fungi_odb10.2021-06-28.tar.gz
Lineage file extraction path: mb_downloads/fungi_odb10
lineage: fungi_odb10
[ERROR] failed to open/build the index
Traceback (most recent call last):
File "/home/lmoret/miniconda3/envs/checker/bin/compleasm", line 10, in
sys.exit(main())
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 2534, in main
args.func(args)
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 2426, in run
mr.Run()
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 2120, in Run
alignment_output_dir)
File "/home/lmoret/miniconda3/envs/checker/lib/python3.7/site-packages/compleasm.py", line 304, in run_miniprot
raise Exception("miniprot exited with non-zero exit code: {}".format(exitcode))
Exception: miniprot exited with non-zero exit code: 1

@huangnengCSU
Copy link
Owner

huangnengCSU commented Nov 7, 2023

To @larsmoret

The error "failed to open/build the index" is reported in miniprot. You can test the alignment manually by "miniprot --trans -u -I --outs=0.95 -t 20 --gff ~/finalassemblies/CBS1922.fasta mb_downloads/fungi_odb10/refseq_db.faa.gz > out.gff". I guess the problem occurs in creating the index of genome.

@katiecdillon
Copy link

Hello @huangnengCSU it looks like the output directory was in fact the issue. Thank you!

@larsmoret
Copy link
Author

Hi @huangnengCSU,
I've tried it again and manually downloaded the dependencies again, however I'm still facing difficulties.
The most interesting part fo the log is stated below, does it maybe have to do with the quality of the assembly?

Kind regards,
Lars Moret

[M::main] CMD: /data/volume_2/compleasm_kit/miniprot --trans -u -I --outs=0.95 -t 14 --gff finalassemblies/CBS.fasta mb_downloads/eukaryota_odb10/refseq_db.faa.gz
[M::main] Real time: 72.284 sec; CPU: 957.367 sec; Peak RSS: 0.219 GB
hmmsearch execute command:
/data/volume_2/compleasm_kit/hmmsearch
Warning: no reliable mappings found. All candidates do not pass the cutoff of BUSCO gene.
Warning: No reliable hits found! Check the lineage file: eukaryota_odb10, alignment file: compleasmoutput/CBS/eukaryota_odb10/miniprot_output.gff, hmmsearch output folder: compleasmoutput/CBS/eukaryota_odb10/hmmer_output.

S:0.00%, 0
D:0.00%, 0
F:0.00%, 0
I:0.00%, 0
M:100.00%, 255
N:255

Download lineage: 0.00(s)

Run miniprot: 72.29(s)

Analyze miniprot: 46.34(s)

Total runtime: 118.63(s)

@huangnengCSU
Copy link
Owner

huangnengCSU commented Nov 21, 2023

Hi @larsmoret,

All BUSCO genes are missing is because that there is no gene can be aligned to the assembly and pass the BUSCO's threshold, which means the genes are quite different from the assembly result. It may be the quality of assembly result or choosing the wrong lineage file. Meanwhile, if the assembly with high divergence, miniprot may not align well. Did you try BUSCO and how about the assessment result of BUSCO?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants