Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TASK-2478 #2239

Open
wants to merge 27 commits into
base: release-2.2.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
cb9c9ef
analysis: properly count deletion overlap pairs, #TASK-2478
pfurio Jan 10, 2023
005ad36
analysis: exclude missense variants for DEL_OVERLAP, #TASK-2478
pfurio Jan 11, 2023
47b509c
analysis: add cts to exclude, #TASK-2478
pfurio Jan 18, 2023
99a25da
analysis: calculate missing chPairVariantStats, #TASK-2478
pfurio Jan 24, 2023
7aeddf7
analysis: fix HOM_ALT filters with 2 pop freqs, #TASK-2478
pfurio Jan 25, 2023
d0f5eee
analysis: ensure RGA always have the same behaviour, #TASK-2478
pfurio Jan 26, 2023
528b490
analysis: fix aux collection query parser, #TASK-2478
pfurio Jan 26, 2023
b15d641
analysis: fix facet queries, #TASK-2478
pfurio Jan 27, 2023
35ae29a
analysis: fix query parser options, #TASK-2478
pfurio Jan 30, 2023
269f1b0
analysis: fix ko filter, #TASK-2478
pfurio Jan 30, 2023
24381c4
analysis: improve summary performance, #TASK-2478
pfurio Jan 31, 2023
2ffd484
analysis: add cache for RGA, #TASK-1750
pfurio Aug 18, 2022
0567f68
analysis: make map thread-safe, #TASK-1750
pfurio Aug 18, 2022
21d1033
analysis: remove static modifier, #TASK-1750
pfurio Aug 19, 2022
cac9f02
analysis: cache only queries taking longer than 4 seconds, #TASK-1750
pfurio Aug 19, 2022
40d3c1d
analysis: put if absent, #TASK-2478
pfurio Jan 31, 2023
38fcd58
analysis: count CH variant stats properly, #TASK-2478
pfurio Feb 2, 2023
822d012
analysis: improve query parser with 2 pop freqs, #TASK-2478
pfurio Feb 3, 2023
054d763
analysis: store paired PFs to be able to filter, #TASK-2478
pfurio Feb 6, 2023
dbab096
analysis: remove useless CH pair combinations, #TASK-2478
pfurio Feb 6, 2023
09aaaff
analysis: fix queries with pop freq < 0.001
pfurio Mar 10, 2023
4527509
analysis: add collection suffix for RGA, #TASK-2478
pfurio Mar 20, 2023
cd68009
analysis: check suffix is null, #TASK-2478
pfurio Mar 20, 2023
7293adc
storage: Fix trio serialization in CompoundHeterozygous variant query.
j-coll Mar 21, 2023
37d809a
analysis: remove limit to variant query, #TASK-2478
pfurio Mar 22, 2023
2d95ee2
analysis: reduce number of parallel tasks, #TASK-2478
pfurio Apr 5, 2023
1a3c9a8
analysis: fix pop freq queries over CH variants, #TASK-2478
pfurio Apr 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ public class GeneRgaConverter extends AbstractRgaConverter {
static {
CONVERTER_MAP = new HashMap<>();
// We always include individual id in the response because we always want to return the numIndividuals populated
CONVERTER_MAP.put("id", Arrays.asList(RgaDataModel.GENE_ID, RgaDataModel.INDIVIDUAL_ID));
CONVERTER_MAP.put("id", Arrays.asList(RgaDataModel.GENE_ID, RgaDataModel.INDIVIDUAL_ID, RgaDataModel.CH_PAIRS));
CONVERTER_MAP.put("name", Arrays.asList(RgaDataModel.GENE_ID, RgaDataModel.GENE_NAME, RgaDataModel.INDIVIDUAL_ID));
CONVERTER_MAP.put("chromosome", Arrays.asList(RgaDataModel.GENE_ID, RgaDataModel.CHROMOSOME, RgaDataModel.INDIVIDUAL_ID));
CONVERTER_MAP.put("start", Arrays.asList(RgaDataModel.GENE_ID, RgaDataModel.START, RgaDataModel.INDIVIDUAL_ID));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ private void fixIndividualOptions(QueryOptions queryOptions, Query query, SolrQu
public RgaIterator geneQuery(String collection, Query query, QueryOptions queryOptions) throws RgaException {
SolrQuery solrQuery = parser.parseQuery(query);
fixGeneOptions(queryOptions, query, solrQuery);
solrQuery.setRows(Integer.MAX_VALUE);
solrQuery.setRows(queryOptions.getInt(QueryOptions.LIMIT, Integer.MAX_VALUE));
try {
return new RgaIterator(solrManager.getSolrClient(), collection, solrQuery);
} catch (SolrServerException e) {
Expand Down Expand Up @@ -283,13 +283,15 @@ public long count(String collection, Query query) throws RgaException, IOExcepti
public DataResult<FacetField> joinFacetQuery(String collection, String externalCollection, Query query, Query externalQuery,
QueryOptions queryOptions) throws RgaException, IOException {
SolrQuery mainSolrQuery = parser.parseAuxQuery(query);
SolrQuery externalSolrQuery = parser.parseQuery(externalQuery);

if (externalSolrQuery.getFilterQueries() != null && externalSolrQuery.getFilterQueries().length > 0) {
String externalQueryStr = StringUtils.join(externalSolrQuery.getFilterQueries(), " AND ");
mainSolrQuery.set("v1", externalQueryStr);
mainSolrQuery.addFilterQuery("{!join from=" + RgaDataModel.VARIANTS + " to=" + AuxiliarRgaDataModel.ID
+ " fromIndex=" + externalCollection + " v=$v1}");
if (!externalQuery.isEmpty()) {
SolrQuery externalSolrQuery = parser.parseQuery(externalQuery);

if (externalSolrQuery.getFilterQueries() != null && externalSolrQuery.getFilterQueries().length > 0) {
String externalQueryStr = StringUtils.join(externalSolrQuery.getFilterQueries(), " AND ");
mainSolrQuery.set("v1", externalQueryStr);
mainSolrQuery.addFilterQuery("{!join from=" + RgaDataModel.VARIANTS + " to=" + AuxiliarRgaDataModel.ID
+ " fromIndex=" + externalCollection + " v=$v1}");
}
}

return facetedQuery(collection, mainSolrQuery, queryOptions);
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ public SolrNativeIterator(SolrClient solrClient, String collection, SolrQuery so

@Override
public boolean hasNext() {
if (listBuffer.isEmpty()) {
fetchNextBatch();
}
return !listBuffer.isEmpty();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
import org.opencb.opencga.core.response.OpenCGAResult;
import org.opencb.opencga.storage.core.exceptions.StorageEngineException;
import org.opencb.opencga.storage.core.metadata.VariantStorageMetadataManager;
import org.opencb.opencga.storage.core.metadata.models.Trio;
import org.opencb.opencga.storage.core.utils.CellBaseUtils;
import org.opencb.opencga.storage.core.variant.adaptors.VariantField;
import org.opencb.opencga.storage.core.variant.adaptors.VariantQueryException;
Expand Down Expand Up @@ -549,7 +550,7 @@ public Query parseQuery(Query query, QueryOptions queryOptions, CellBaseUtils ce
"Require at least one parent to get compound heterozygous");
}

query.append(SAMPLE_COMPOUND_HETEROZYGOUS.key(), Arrays.asList(childId, fatherId, motherId));
query.append(SAMPLE_COMPOUND_HETEROZYGOUS.key(), new Trio(fatherId, motherId, childId));
} else {
if (family.getDisorders().isEmpty()) {
throw VariantQueryException.malformedParam(FAMILY, familyId, "Family doesn't have disorders");
Expand Down Expand Up @@ -1024,7 +1025,7 @@ private void processSampleFilter(Query query, String defaultStudyStr, String tok
String fatherId = member.getFather() != null ? member.getFather().getId() : MISSING_SAMPLE;
String motherId = member.getMother() != null ? member.getMother().getId() : MISSING_SAMPLE;

query.put(SAMPLE_COMPOUND_HETEROZYGOUS.key(), Arrays.asList(member.getId(), fatherId, motherId));
query.put(SAMPLE_COMPOUND_HETEROZYGOUS.key(), new Trio(fatherId, motherId, member.getId()));
query.remove(SAMPLE.key());
} else if (moi == ClinicalProperty.ModeOfInheritance.DE_NOVO) {
query.put(SAMPLE_DE_NOVO.key(), member.getId());
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
package org.opencb.opencga.core.config;

import java.util.List;

public class RgaSearchConfiguration extends SearchConfiguration {

private boolean cache;
private int cacheSize;
private String suffix;

public RgaSearchConfiguration() {
}

public RgaSearchConfiguration(List<String> hosts, String configSet, String mode, String user, String password, String manager,
boolean active, int timeout, int insertBatchSize, boolean cache, int cacheSize, String suffix) {
super(hosts, configSet, mode, user, password, manager, active, timeout, insertBatchSize);
this.cache = cache;
this.cacheSize = cacheSize;
this.suffix = suffix;
}

@Override
public String toString() {
final StringBuilder sb = new StringBuilder("RgaSearchConfiguration{");
sb.append("cache=").append(cache);
sb.append(", cacheSize=").append(cacheSize);
sb.append(", suffix='").append(suffix).append('\'');
sb.append('}');
return sb.toString();
}

public boolean isCache() {
return cache;
}

public RgaSearchConfiguration setCache(boolean cache) {
this.cache = cache;
return this;
}

public int getCacheSize() {
return cacheSize;
}

public RgaSearchConfiguration setCacheSize(int cacheSize) {
this.cacheSize = cacheSize;
return this;
}

public String getSuffix() {
return suffix;
}

public RgaSearchConfiguration setSuffix(String suffix) {
this.suffix = suffix;
return this;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import com.fasterxml.jackson.dataformat.yaml.YAMLFactory;
import org.apache.commons.lang3.StringUtils;
import org.opencb.commons.datastore.core.ObjectMap;
import org.opencb.opencga.core.config.RgaSearchConfiguration;
import org.opencb.opencga.core.config.SearchConfiguration;
import org.opencb.opencga.core.config.ServerConfiguration;
import org.slf4j.Logger;
Expand All @@ -43,7 +44,7 @@ public class StorageConfiguration {
private CacheConfiguration cache;
private SearchConfiguration search;
private SearchConfiguration clinical;
private SearchConfiguration rga;
private RgaSearchConfiguration rga;
private ObjectMap alignment;
private StorageEnginesConfiguration variant;
private IOConfiguration io;
Expand All @@ -61,7 +62,7 @@ public StorageConfiguration() {
this.cache = new CacheConfiguration();
this.search = new SearchConfiguration();
this.clinical = new SearchConfiguration();
this.rga = new SearchConfiguration();
this.rga = new RgaSearchConfiguration();
}


Expand Down Expand Up @@ -192,11 +193,11 @@ public StorageConfiguration setClinical(SearchConfiguration clinical) {
return this;
}

public SearchConfiguration getRga() {
public RgaSearchConfiguration getRga() {
return rga;
}

public StorageConfiguration setRga(SearchConfiguration rga) {
public StorageConfiguration setRga(RgaSearchConfiguration rga) {
this.rga = rga;
return this;
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,32 @@
package org.opencb.opencga.storage.core.metadata.models;

import org.apache.logging.log4j.util.Strings;

import java.util.ArrayList;
import java.util.List;
import java.util.Objects;

public class Trio {
private final String id;
private final String father;
private final String mother;
private final String child;

public Trio(List<String> trio) {
this(null, trio);
}

public Trio(String id, List<String> trio) {
this.id = id;
this.father = trio.get(1);
this.mother = trio.get(2);
this.child = trio.get(0);
}

public Trio(String father, String mother, String child) {
this(null, father, mother, child);
}

public Trio(String id, String father, String mother, String child) {
this.id = id;
this.father = father;
Expand Down Expand Up @@ -43,4 +61,29 @@ public List<String> toList() {
}
return list;
}

@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
Trio trio = (Trio) o;
return Objects.equals(id, trio.id)
&& Objects.equals(father, trio.father)
&& Objects.equals(mother, trio.mother)
&& Objects.equals(child, trio.child);
}

@Override
public int hashCode() {
return Objects.hash(id, father, mother, child);
}

@Override
public String toString() {
return Strings.join(toList(), ',');
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -1073,7 +1073,7 @@ public VariantQueryResult<Variant> getCompoundHeterozygous(String study, String
father = StringUtils.isEmpty(father) ? CompoundHeterozygousQueryExecutor.MISSING_SAMPLE : father;
mother = StringUtils.isEmpty(mother) ? CompoundHeterozygousQueryExecutor.MISSING_SAMPLE : mother;
query = new Query(query)
.append(VariantQueryUtils.SAMPLE_COMPOUND_HETEROZYGOUS.key(), Arrays.asList(child, father, mother))
.append(VariantQueryUtils.SAMPLE_COMPOUND_HETEROZYGOUS.key(), new Trio(father, mother, child))
.append(VariantQueryParam.STUDY.key(), study);

return get(query, options);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -210,15 +210,15 @@ protected List<String> getAndCheckIncludeSample(Query query, String proband, Str
// Check it has all required members
if (!includeSamples.contains(proband)) {
throw VariantQueryException.malformedParam(VariantQueryParam.INCLUDE_SAMPLE, includeSamples.toString(),
"Can not compute CompoundHeterozygous not including the proband in the query");
"Can not compute CompoundHeterozygous not including the proband '" + proband + "' in the query");
}
if (!mother.equals(MISSING_SAMPLE) && !includeSamples.contains(mother)) {
throw VariantQueryException.malformedParam(VariantQueryParam.INCLUDE_SAMPLE, includeSamples.toString(),
"Can not compute CompoundHeterozygous not including the mother in the query");
"Can not compute CompoundHeterozygous not including the mother '" + mother + "' in the query");
}
if (!father.equals(MISSING_SAMPLE) && !includeSamples.contains(father)) {
throw VariantQueryException.malformedParam(VariantQueryParam.INCLUDE_SAMPLE, includeSamples.toString(),
"Can not compute CompoundHeterozygous not including the father in the query");
"Can not compute CompoundHeterozygous not including the father '" + father + "' in the query");
}
} else {
if (father.equals(MISSING_SAMPLE)) {
Expand Down Expand Up @@ -265,9 +265,13 @@ protected VariantDBIterator getRawIterator(String proband, String father, String
}

protected Trio getCompHetTrio(Query query) {
Object o = query.get(SAMPLE_COMPOUND_HETEROZYGOUS.key());
if (o instanceof Trio) {
return ((Trio) o);
}
List<String> samples = query.getAsStringList(VariantQueryUtils.SAMPLE_COMPOUND_HETEROZYGOUS.key());
if (samples.size() == 3) {
return new Trio(null, samples.get(2), samples.get(0), samples.get(1));
return new Trio(samples);
} else if (samples.size() == 1) {
int studyId = metadataManager.getStudyId(query.getString(VariantQueryParam.STUDY.key()));
String sample = samples.get(0);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import org.mockito.Mockito;
import org.opencb.commons.datastore.core.Query;
import org.opencb.commons.datastore.core.QueryOptions;
import org.opencb.opencga.storage.core.metadata.models.Trio;
import org.opencb.opencga.storage.core.variant.adaptors.VariantField;
import org.opencb.opencga.storage.core.variant.adaptors.VariantIterable;
import org.opencb.opencga.storage.core.variant.adaptors.VariantQueryException;
Expand All @@ -19,6 +20,7 @@
import static org.junit.Assert.assertFalse;
import static org.opencb.opencga.storage.core.variant.adaptors.VariantField.*;
import static org.opencb.opencga.storage.core.variant.query.VariantQueryUtils.ALL;
import static org.opencb.opencga.storage.core.variant.query.VariantQueryUtils.SAMPLE_COMPOUND_HETEROZYGOUS;

/**
* Created on 09/04/19.
Expand Down Expand Up @@ -65,6 +67,19 @@ public void testBuildQueryOptions() {
STUDIES, STUDIES_SAMPLES)), includeFields);
}

@Test
public void getCompHetTrio() {
Trio expected = new Trio("F", "M", "C");
Trio actual = ch.getCompHetTrio(new Query(SAMPLE_COMPOUND_HETEROZYGOUS.key(), expected.toList()));
assertEquals(expected, actual);

actual = ch.getCompHetTrio(new Query(SAMPLE_COMPOUND_HETEROZYGOUS.key(), expected.toString()));
assertEquals(expected, actual);

actual = ch.getCompHetTrio(new Query(SAMPLE_COMPOUND_HETEROZYGOUS.key(), expected));
assertEquals(expected, actual);
}

@Test
public void testGetAndCheckIncludeSample() {

Expand Down