Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequences don't fully match themselves #132

Open
fluhus opened this issue May 27, 2024 · 1 comment
Open

Sequences don't fully match themselves #132

fluhus opened this issue May 27, 2024 · 1 comment

Comments

@fluhus
Copy link

fluhus commented May 27, 2024

Hello, thanks for making this tool!

As a sanity test before I incorporate it into my pipeline I aligned a collection of viral genomes (~10K+ bases each) against themselves. To my surprise, 35% of the sequences did not have a perfect match.

For example with the attached file below, running fastANI -q vir.fa -r vir.fa -o /dev/stdout gave:

vir.fa     vir.fa     100     3       4

I am seeing 100% base identity but 3 out of 4 chunks matched. Is that correct? Does that mean 100% * 3 / 4 = 75% match? How can I distinguish this case from a genome that's actually 25% shorter but matches 100%? Maybe I am misinterpreting the results?

I hope my question is clear :)

vir.fa.gz

@valery-shap
Copy link

valery-shap commented Aug 19, 2024

Hello, @fluhus,

This topic is interesting for me too.
I have nearly the same situation with bacterial genomes ,especially if a value of fraglen was changed from default (3000) to 1020.
ANC_3681.fasta ANC_3681.fasta 99.9992 3432 3467 for fraglen=1020
ANC_3681.fasta ANC_3681.fasta 100 1169 1177 for fraglen=3000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants