Sequences don't fully match themselves #132

fluhus · 2024-05-27T09:40:29Z

Hello, thanks for making this tool!

As a sanity test before I incorporate it into my pipeline I aligned a collection of viral genomes (~10K+ bases each) against themselves. To my surprise, 35% of the sequences did not have a perfect match.

For example with the attached file below, running fastANI -q vir.fa -r vir.fa -o /dev/stdout gave:

vir.fa     vir.fa     100     3       4

I am seeing 100% base identity but 3 out of 4 chunks matched. Is that correct? Does that mean 100% * 3 / 4 = 75% match? How can I distinguish this case from a genome that's actually 25% shorter but matches 100%? Maybe I am misinterpreting the results?

I hope my question is clear :)

vir.fa.gz

The text was updated successfully, but these errors were encountered:

valery-shap · 2024-08-19T17:32:42Z

Hello, @fluhus,

This topic is interesting for me too.
I have nearly the same situation with bacterial genomes ,especially if a value of fraglen was changed from default (3000) to 1020.
ANC_3681.fasta ANC_3681.fasta 99.9992 3432 3467 for fraglen=1020
ANC_3681.fasta ANC_3681.fasta 100 1169 1177 for fraglen=3000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequences don't fully match themselves #132

Sequences don't fully match themselves #132

fluhus commented May 27, 2024

valery-shap commented Aug 19, 2024 •

edited

Loading

Sequences don't fully match themselves #132

Sequences don't fully match themselves #132

Comments

fluhus commented May 27, 2024

valery-shap commented Aug 19, 2024 • edited Loading

valery-shap commented Aug 19, 2024 •

edited

Loading