-
Notifications
You must be signed in to change notification settings - Fork 110
Variant annotation update (transcript effects endpoint) #706
Conversation
dc0c9ac
to
608b0c4
Compare
@david4096 - thanks for putting back the search by variant and feature options! Directly querying TranscriptEffects makes sense for feature id and location, but I’m not sure about searching by variant_annotation_id. The only way to extract a variant_annotation_id is by looking up the VariantAnnotation record which contains the TranscriptEffects anyway. Searching by variant id would be more useful, though we don’t have variant id in TranscriptEffect. Similarly, if you find all TranscriptEffects for a gene, you then need to look up all VariantAnnotation records to find the variant id, picking up all the TranscriptEffects again. The desire to not modify the variant annotation record may be better met by not having the record to group results by variant and leaving the client to handle this. What do you think? |
f37847e
to
bb2e69b
Compare
Thanks @sarahhunt excellent points! I have removed the |
Helps with ga4gh/ga4gh-server#308 |
|
||
// Only return variants with an allele frequency annotation over this value. | ||
// Return all variants by default. | ||
double frequency_threshold = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for a one-sided test? What if someone wants variants below a threshold? minimum_frequency
and maximum_frequency
would be more generally useful.
Are we planning notion of population specific frequencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell that is a difficult domain to model well, as different data preparers will annotate their subpopulations differently. If this is a common enough use case we should consider it, as there is work on modeling groups of individuals here. We can make a nice access pattern through that endpoint.
This frequency, as I understand it, would have been calculated for the specific variant set. Adding a minimum threshold makes sense.
// less than reference length. Requests spanning the join of circular | ||
// genomes are represented as two requests one on each side of the join | ||
// (position 0). | ||
int64 start = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint64? (Have we had this discussion already? Someone must have picked up on this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a sensible change to me! It would imply a change to how position is specified throughout the API, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaning towards keeping the sign based on the response to #737
Add ability to filter by allele frequency Provide ability to get annotations by variant id Remove transcript effects from variant annotation message
f49738c
to
26dbb64
Compare
@david4096 - I've been meaning to get back to this, sorry for the delay. Looking at the proposed changes, I am doubting the value of the variantAnnotation record and thinking it would be cleaner to just query for TranscriptEffects directly based on variant id, transcript id or location. What do you think? |
Thanks @sarahhunt ! The main reason for keeping the Can you tell me if I'm mistaken, but in practice if you have a Features API that allows you to get transcript effect IDs, can't you use that to construct a range request? I have been thinking that feature ID filtering of an endpoint might be fulfilled already by that access pattern. Like, if I want to find transcript effects for some ID that I have, I request that from the features API with a GetFeatureRequest, then use the returned location data to construct a range request against Thanks for taking a look! I'm open to deprecating the Variant Annotation message. I think that with something like #700 it might be possible to think about how to construct requests against typed annotations. For now, I think the most important point thing is to remove the modification at query time of transcript effect messages. |
Hi @david4096! We are currently using the VariantAnnotation record for allele frequency and ClinVar clinical significance info, but the first is a property of the variant and the second fits better as a seperate search for phenotype associations. It would be interesting to know how anyone else is using it. Searching by featureIds rather than just ranges is useful because many features overlap. If you are interested in a specific transcript, there could be a dozen in the same range, so only supporting range queries could mean more records than are required are returned and force the filtering onto the API user. There is also a nice consistency if you can throw a set of featureIds at the various endpoints and retrieve different types of data. Tightening up the info structure as in #700 makes a lot of sense. |
Thanks for the input regarding feature search. I'll leave it as is then. If there is a way I can push this PR forward without having to make modifications to the variants model, I'll take it. I believe we are going to need to switch to a variant model with one alternate base entry per variant and my hope is that by separating out the messages now we will be better prepared. |
The current variant annotation message structure faces an issue seen in other parts of the API (readgroupsets, variants). Messages returned from a search request may be modified during a query, causing the same identifier to point to two unique documents. This can cause problems in interchange. It also creates a complex nested document that can be difficult to work with in practice.
To solve this, another endpoint has been added for filtering transcript effects. This allows transcript effects to be filtered and operated on directly, as opposed to through the modification of Variant Annotation messages at query time. It also provides a more natural query interface by separating concerns from the
SearchVariantAnnotationsRequest
.Another problem facing the Variant Annotation protocol was the difficulty in finding an annotation for a given variant. It isn't possible to directly get an annotation for a given variant, which has caused clients to have to search for similar ranges for a given variant in an annotation set. The addition of the
variant_id
to theSearchVariantAnnotationsRequest
makes it possible to find annotations directly, when avariant_id
is provided the range is ignored.By request from the VATT, filtering annotations by allele frequency has been added. This was done by adding an
allele_frequency
field to theVariantAnnotation
message and afrequency_threshold
to theSearchVariantAnnotationsRequest
. @reece @sarahhunt @Kusdhill/transcripteffects/search
)allele_frequency