Variant annotation update (transcript effects endpoint) #706

david4096 · 2016-09-01T22:23:11Z

The current variant annotation message structure faces an issue seen in other parts of the API (readgroupsets, variants). Messages returned from a search request may be modified during a query, causing the same identifier to point to two unique documents. This can cause problems in interchange. It also creates a complex nested document that can be difficult to work with in practice.

To solve this, another endpoint has been added for filtering transcript effects. This allows transcript effects to be filtered and operated on directly, as opposed to through the modification of Variant Annotation messages at query time. It also provides a more natural query interface by separating concerns from the SearchVariantAnnotationsRequest.

Another problem facing the Variant Annotation protocol was the difficulty in finding an annotation for a given variant. It isn't possible to directly get an annotation for a given variant, which has caused clients to have to search for similar ranges for a given variant in an annotation set. The addition of the variant_id to the SearchVariantAnnotationsRequest makes it possible to find annotations directly, when a variant_id is provided the range is ignored.

By request from the VATT, filtering annotations by allele frequency has been added. This was done by adding an allele_frequency field to the VariantAnnotation message and a frequency_threshold to the SearchVariantAnnotationsRequest. @reece @sarahhunt @Kusdhill

Search transcript effects directly (/transcripteffects/search)
Filter variant annotations by allele_frequency
Search variant annotations by variant ID

sarahhunt · 2016-09-06T09:59:33Z

@david4096 - thanks for putting back the search by variant and feature options!

Directly querying TranscriptEffects makes sense for feature id and location, but I’m not sure about searching by variant_annotation_id. The only way to extract a variant_annotation_id is by looking up the VariantAnnotation record which contains the TranscriptEffects anyway. Searching by variant id would be more useful, though we don’t have variant id in TranscriptEffect.

Similarly, if you find all TranscriptEffects for a gene, you then need to look up all VariantAnnotation records to find the variant id, picking up all the TranscriptEffects again.

The desire to not modify the variant annotation record may be better met by not having the record to group results by variant and leaving the client to handle this. What do you think?

david4096 · 2016-09-07T20:50:27Z

Thanks @sarahhunt excellent points! I have removed the transcript_effects field from the VariantAnnotation message and added a variant_id to the TranscriptEffect message. This will allow one to easily move back and forth between the protocols.

david4096 · 2016-10-27T20:58:51Z

Helps with ga4gh/ga4gh-server#308

reece · 2016-10-28T20:55:26Z

src/main/proto/ga4gh/allele_annotation_service.proto

+
+  // Only return variants with an allele frequency annotation over this value.
+  // Return all variants by default.
+  double frequency_threshold = 10;


What's the rationale for a one-sided test? What if someone wants variants below a threshold? minimum_frequency and maximum_frequency would be more generally useful.

Are we planning notion of population specific frequencies?

As far as I can tell that is a difficult domain to model well, as different data preparers will annotate their subpopulations differently. If this is a common enough use case we should consider it, as there is work on modeling groups of individuals here. We can make a nice access pattern through that endpoint.

This frequency, as I understand it, would have been calculated for the specific variant set. Adding a minimum threshold makes sense.

reece · 2016-10-28T20:57:21Z

src/main/proto/ga4gh/allele_annotation_service.proto

+  // less than reference length. Requests spanning the join of circular
+  // genomes are represented as two requests one on each side of the join
+  // (position 0).
+  int64 start = 5;


uint64? (Have we had this discussion already? Someone must have picked up on this)

That sounds like a sensible change to me! It would imply a change to how position is specified throughout the API, thanks!

Leaning towards keeping the sign based on the response to #737

Add ability to filter by allele frequency Provide ability to get annotations by variant id Remove transcript effects from variant annotation message

@reece

Thanks @reece

sarahhunt · 2016-11-29T17:05:29Z

@david4096 - I've been meaning to get back to this, sorry for the delay. Looking at the proposed changes, I am doubting the value of the variantAnnotation record and thinking it would be cleaner to just query for TranscriptEffects directly based on variant id, transcript id or location. What do you think?

david4096 · 2016-12-02T04:28:11Z

Thanks @sarahhunt !

The main reason for keeping the VariantAnnotation record are for "variant level" annotations, which for the time being is Allele Frequency. Perhaps we could move those features to the Variants API instead?

Can you tell me if I'm mistaken, but in practice if you have a Features API that allows you to get transcript effect IDs, can't you use that to construct a range request? I have been thinking that feature ID filtering of an endpoint might be fulfilled already by that access pattern. Like, if I want to find transcript effects for some ID that I have, I request that from the features API with a GetFeatureRequest, then use the returned location data to construct a range request against /transcripteffects.

Thanks for taking a look! I'm open to deprecating the Variant Annotation message. I think that with something like #700 it might be possible to think about how to construct requests against typed annotations. For now, I think the most important point thing is to remove the modification at query time of transcript effect messages.

sarahhunt · 2016-12-02T08:44:51Z

Hi @david4096!

We are currently using the VariantAnnotation record for allele frequency and ClinVar clinical significance info, but the first is a property of the variant and the second fits better as a seperate search for phenotype associations. It would be interesting to know how anyone else is using it.

Searching by featureIds rather than just ranges is useful because many features overlap. If you are interested in a specific transcript, there could be a dozen in the same range, so only supporting range queries could mean more records than are required are returned and force the filtering onto the API user. There is also a nice consistency if you can throw a set of featureIds at the various endpoints and retrieve different types of data.

Tightening up the info structure as in #700 makes a lot of sense.

david4096 · 2016-12-05T18:05:21Z

Thanks for the input regarding feature search. I'll leave it as is then. If there is a way I can push this PR forward without having to make modifications to the variants model, I'll take it. I believe we are going to need to switch to a variant model with one alternate base entry per variant and my hope is that by separating out the messages now we will be better prepared.

david4096 added the in progress label Sep 1, 2016

david4096 force-pushed the transcript_effects branch from dc0c9ac to 608b0c4 Compare September 1, 2016 22:40

david4096 force-pushed the transcript_effects branch 2 times, most recently from f37847e to bb2e69b Compare September 7, 2016 20:34

david4096 added ready in progress and removed in progress labels Oct 26, 2016

reece reviewed Oct 28, 2016

View reviewed changes

david4096 mentioned this pull request Oct 28, 2016

Use unsigned integer for positions #737

Open

david4096 added 3 commits October 28, 2016 14:30

Add transcript effect endpoint

9c6d593

Add ability to filter by allele frequency Provide ability to get annotations by variant id Remove transcript effects from variant annotation message

Add maximum and minimum frequency

5ee6a74

Thanks @reece

Add HTTP annotations

26dbb64

david4096 force-pushed the transcript_effects branch from f49738c to 26dbb64 Compare October 28, 2016 21:33

This was referenced Mar 7, 2017

Add transcripteffects endpoint ga4gh/ga4gh-server#1603

Open

Implement updated transcript effects protocol #855

Open

reece closed this Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variant annotation update (transcript effects endpoint) #706

Variant annotation update (transcript effects endpoint) #706

david4096 commented Sep 1, 2016 •

edited

Loading

sarahhunt commented Sep 6, 2016

david4096 commented Sep 7, 2016

david4096 commented Oct 27, 2016

reece Oct 28, 2016

david4096 Oct 28, 2016

reece Oct 28, 2016

david4096 Oct 28, 2016

david4096 Nov 28, 2016

sarahhunt commented Nov 29, 2016

david4096 commented Dec 2, 2016

sarahhunt commented Dec 2, 2016

david4096 commented Dec 5, 2016

Variant annotation update (transcript effects endpoint) #706

Variant annotation update (transcript effects endpoint) #706

Conversation

david4096 commented Sep 1, 2016 • edited Loading

sarahhunt commented Sep 6, 2016

david4096 commented Sep 7, 2016

david4096 commented Oct 27, 2016

reece Oct 28, 2016

Choose a reason for hiding this comment

david4096 Oct 28, 2016

Choose a reason for hiding this comment

reece Oct 28, 2016

Choose a reason for hiding this comment

david4096 Oct 28, 2016

Choose a reason for hiding this comment

david4096 Nov 28, 2016

Choose a reason for hiding this comment

sarahhunt commented Nov 29, 2016

david4096 commented Dec 2, 2016

sarahhunt commented Dec 2, 2016

david4096 commented Dec 5, 2016

david4096 commented Sep 1, 2016 •

edited

Loading