Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to designations of lineage B.1.575 and B.1.575.1 #64

Closed
aineniamh opened this issue Apr 28, 2021 · 4 comments
Closed

Update to designations of lineage B.1.575 and B.1.575.1 #64

aineniamh opened this issue Apr 28, 2021 · 4 comments
Assignees
Labels
correction Highlight an error in the description or definition
Milestone

Comments

@aineniamh
Copy link
Member

Lineage description

USA and Aruba lineage with spike mutations "S:P681H","S:S494P" and "S:T716I".

From all SARS-CoV-2 genomes on GISAID as of 2021-04-25, any sequences with at least 8 of the SNPs below were aligned with a selection of B.1 genomes to place them within B.1 diversity. These sequences include previously designated B.1.575 sequences and a couple of sequences that had previously been designated as B.1.1.335.

snps = [
"S:D614G",
"S:P681H",
"S:S494P",
"S:T716I",
"N:T205I",
"orf1ab:T3750I",
"orf1ab:T3255I",
"orf1ab:T265I",
"M:182T",
"orf3a:Q57H"
]

Screenshot 2021-04-28 at 08 42 53

The above phylogeny shows B.1.575 in red and B.1.575.1 in blue, both as monophyletic clades. The annotated tree is available here: [sequences.aln.fasta.tree.zip](https://github.com/cov-lineages/pango-designation/files/6389986/sequences.aln.fasta.tree.zip)

New designations:

B.1.575 1,560
B.1.575.1 167

Now total designations:

B.1.575 1,571
B.1.575.1 168

Changes enacted in commit:
8a5d954
B.1.575.new_lineages.csv

@aineniamh aineniamh self-assigned this Apr 28, 2021
@aineniamh aineniamh added the correction Highlight an error in the description or definition label Apr 28, 2021
@aineniamh aineniamh modified the milestones: B.1.575, B.1.575.1 Apr 28, 2021
@aineniamh
Copy link
Member Author

@oroak
Copy link

oroak commented Apr 28, 2021

@aineniamh Thanks for your work I this. I spot checked a few of the sequences I flagged manually and they are still B.1 in the new release lineages.csv. Investigating the first three (below), I think the issue is they have gaps (including our sequence) over "orf1ab:T265I". Because of this they were not included in your reassignment pull above (they have all of the other mutations). My suggesting would be to remove the samples like this (those originally reassigned from B.1.617 to B.1 but not included in above) from the training data as I think they will just be creating trouble for both lineages.
hCoV-19/USA/OR-OHSU-10702/2021|EPI_ISL_1541883|2021-03-17
hCoV-19/USA/NJ-CDC-LC0035972/2021|EPI_ISL_1609515|2021-03-27
hCoV-19/USA/NJ-CDC-LC0036132/2021|EPI_ISL_1609321|2021-03-25

@oroak
Copy link

oroak commented May 3, 2021

@aineniamh I know you closed this but I believe the reassignment of samples from #49 to B.1 is still causing issues. I ran our custom Oregon build and included many of these still assigned to B.1 (that is not reassigned above). You can see from these screenshots that we have genotypes at identical places in the tree getting misassigned.

Screen Shot 2021-05-03 at 2 02 13 PM

Screen Shot 2021-05-03 at 2 01 48 PM

nextstrain__metadata.txt

@aineniamh
Copy link
Member Author

Hi @oroak, thanks for providing the trees and IDs.

I've now removed the following from designations:

-USA/OR-OHSU-10702/2021,B.1
-USA/NJ-CDC-LC0035972/2021,B.1
-USA/NJ-CDC-LC0036132/2021,B.1
-USA/NH-CDC-LC0019242/2021,B.1
-USA/MA-CDC-LC0013337/2021,B.1
-USA/PA-CDC-LC0022743/2021,B.1

I'm not certain what these sequences will be assigned once trained, but it sounds like they should be B.1.575 and it would be straightforward to flag a misassignment as due to missing data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correction Highlight an error in the description or definition
Projects
None yet
Development

No branches or pull requests

2 participants