Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instanovo.transformer.predict and instanovo.utils.convert_to_ipc #17

Closed
cguetot opened this issue Oct 11, 2023 · 4 comments · Fixed by #19
Closed

instanovo.transformer.predict and instanovo.utils.convert_to_ipc #17

cguetot opened this issue Oct 11, 2023 · 4 comments · Fixed by #19
Assignees

Comments

@cguetot
Copy link

cguetot commented Oct 11, 2023

Hi again,

I tried InstaNovo command lines, and I always get error messages.

for example,

for a 1 spectrum mgf file like this

BEGIN IONS
TITLE=TestFile.2.2.3
PEPMASS=431.564880371094
CHARGE=3+
SCANS=2
RTINSECONDS=0.411807366
98.984690000 39484.098000000
114.388470000 5826.442000000
115.086220000 10740.066000000
END IONS

I got
"""
Traceback (most recent call last):
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/site-packages/instanovo/utils/convert_to_ipc.py", line 287, in
main()
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/site-packages/instanovo/utils/convert_to_ipc.py", line 262, in main
convert_mgf_ipc(
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/site-packages/instanovo/utils/convert_to_ipc.py", line 97, in convert_mgf_ipc
int(re.findall(r":(\d+)", meta["scans"])[-1]) if "scans" in meta else evidence_index
IndexError: list index out of range

"""

If I try predicting something
"""
INFO:root:Initializing inference.
INFO:root:Loading data from new0.t
Traceback (most recent call last):
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/site-packages/instanovo/transformer/predict.py", line 192, in
main()
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/site-packages/instanovo/transformer/predict.py", line 175, in main
get_preds(data_path, model, config, denovo, output_path, knapsack_path)
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/site-packages/instanovo/transformer/predict.py", line 42, in get_preds
df = pl.read_ipc(data_path)
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/site-packages/polars/io/ipc/functions.py", line 103, in read_ipc
return pl.DataFrame._read_ipc(
File "/srv/data1/home/cguetot/tools/miniconda3/envs/instanovo2/lib/python3.8/site-packages/polars/dataframe/frame.py", line 971, in _read_ipc
self._df = PyDataFrame.read_ipc(
exceptions.ArrowErrorException: OutOfSpec("InvalidHeader")
"""

@KevinEloff KevinEloff self-assigned this Oct 11, 2023
@KevinEloff KevinEloff linked a pull request Oct 11, 2023 that will close this issue
@KevinEloff
Copy link
Collaborator

We have made the mgf conversion script more robust and it now works with the above example. When converting from mgf any additional metadata will saved in the .ipc output file.

Please note that instanovo.transformer.predict currently only works with Polars .ipc files.

@BioGeek BioGeek reopened this Oct 11, 2023
@BioGeek
Copy link
Collaborator

BioGeek commented Oct 11, 2023

@cguetot Can you confirm that the updated code now works for your real-world dataset? Thanks.

@cguetot
Copy link
Author

cguetot commented Oct 12, 2023

Hi @KevinEloff and @BioGeek ,

Thanks for your quick reply. I can confirm that the mgf conversion and prediction works.

I am wondering if there is a way to add the feature to the prediction script for getting top N best predictions, which is related to issue #12

best,

Carlos

@KevinEloff
Copy link
Collaborator

Hi @cguetot, we currently do not plan on adding this feature in the immediate future, as we found limited value in the additional predictions. If you would like multiple predictions per spectra, I would look at the output of the diffusion model: InstaNovo+. By default it returns 5 predictions per spectra in the output csv file, sorted by model confidence. We will update the README with the usage for InstaNovo+ asap.

If you would still like this functionality from autoregressive InstaNovo, you can add it by adjusting the get_preds function of the prediction script to set return_all_beams=True, and then making sure the top N beams are saved to the output DataFrame pred_df.

Note: to alter the code it may be best to install InstaNovo from git in editable mode:

git clone https://github.com/instadeepai/InstaNovo.git
cd InstaNovo
pip install -e .

@cguetot cguetot closed this as completed Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants