Refactor CoExpressionService to work with iterator over MolecularAlteration objects. #6302

n1zea144 · 2019-06-21T01:07:13Z

In an effort to reduce memory consumption when the Co-expression tab is selected, this PR introduces the use of a MyBatis cursor to iterate over Molecular Alterations rather than fetch them all in a single database call.

The PR needs to be profiled in beta to determine the gain in memory optimization against the loss in runtime performance (it may be determined that a cursor should only be used when the cohort exceeds a certain sample size). Also, its unclear what impact a large volume of concurrent Co-expression requests can have on the database connection pool when cursors are in use.

The PR limits the exposure of the MyBatis cursor to the persistence layer and returns an Iterable to the service layer. The PR refactors code as needed, makes minor attempts to cleanup the existing code to avoid breaking functionality, but does add comments where time was spent understanding the codebase.

Some JProfiler screenshots comparing runtime performance /memory usage of this PR against the current master branch will be provided, but as previously mentioned, profiling in the beta environment should still be performed before this PR is merged.

service/src/main/java/org/cbioportal/service/impl/CoExpressionServiceImpl.java

ao508

Looks good. All I found was a typo basically in a comment and I just had one question regarding a step in the coexpression impl class (see comment).

ao508 · 2019-06-21T16:55:05Z

service/src/main/java/org/cbioportal/service/impl/CoExpressionServiceImpl.java

+
+        // For each MolecularAlteration in the profile, compute a CoExpression to return.
+        // If the MolecularAlteration is for the query gene/geneset, skip it.  Otherwise,
+        // filter out genetic_alteration values from genetic_alteration values from


accidentally repeated some words in this comment

Thanks for catching, just updated.

n1zea144 · 2019-06-21T20:39:26Z

Here are some screen capture comparing runtime performance between precursor/cursor implementations. The good news is the cursor implementation is in the same ballpark as precursor implementation. Profiling is being performed against the brca_metabric cohort hosted on public_test (devdb.cbioportal.org/public_test). I'll follow this up with some memory usage profiling....

Current "precursor" codebase, ~37 seconds to compute coexpression of brca1 gene:

cursor codebase, ~39 seconds to compute coexpression of brca1 gene:

jjgao · 2019-06-21T20:54:31Z

@n1zea144 just logged for using cursor for Enrichments endpoints if this works out well.

…tion

n1zea144 · 2019-07-02T18:57:22Z

Here is one of many screen captures comparing memory usage by the coexpression endpoint between the master branch and this PR (mybatis cursor implementation). The cursor implementation uses ~half as much memory. There is some variability in this savings and much of the savings is in the downstream computeCoExpressions method (which has me scratching my head), but I think the gain due to the use of cursors will only improve and become clearer as the datasets become larger. The highlighted row illustrates that the current implementation is using +1,284MB more memory to satisfy the fetchCoExpression request as compared to the cursor implementation (two rows down):

Refactor CoExpressionService to work with iterator over MolecularAlteration objects.

n1zea144 added do not merge performance labels Jun 21, 2019

n1zea144 requested review from alisman, jjgao, inodb, sheridancbio, ao508 and averyniceday June 21, 2019 01:07

n1zea144 force-pushed the mybatis-cursor branch from 3dfa576 to b29c32e Compare June 21, 2019 12:48

ao508 reviewed Jun 21, 2019

View reviewed changes

service/src/main/java/org/cbioportal/service/impl/CoExpressionServiceImpl.java Show resolved Hide resolved

ao508 approved these changes Jun 21, 2019

View reviewed changes

n1zea144 force-pushed the mybatis-cursor branch 2 times, most recently from 454d4df to a8da956 Compare June 21, 2019 17:21

jjgao mentioned this pull request Jun 21, 2019

using cursor for the Enrichments endpoints #6311

Closed

n1zea144 force-pushed the mybatis-cursor branch from a8da956 to 6b2da6f Compare July 2, 2019 15:39

Refactor CoExpressionService to work with Iterator of MolecularAltera…

a645fbe

…tion

n1zea144 force-pushed the mybatis-cursor branch from 6b2da6f to a645fbe Compare July 2, 2019 15:43

n1zea144 added ready to merge and removed do not merge ready to merge labels Jul 2, 2019

n1zea144 merged commit 52f7515 into cBioPortal:master Jul 2, 2019

ao508 pushed a commit to ao508/cbioportal that referenced this pull request Jul 3, 2019

Merge pull request cBioPortal#6302 from n1zea144/mybatis-cursor

f15d298

Refactor CoExpressionService to work with iterator over MolecularAlteration objects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor CoExpressionService to work with iterator over MolecularAlteration objects. #6302

Refactor CoExpressionService to work with iterator over MolecularAlteration objects. #6302

n1zea144 commented Jun 21, 2019

ao508 left a comment

ao508 Jun 21, 2019

n1zea144 Jun 21, 2019

n1zea144 commented Jun 21, 2019

jjgao commented Jun 21, 2019

n1zea144 commented Jul 2, 2019

Refactor CoExpressionService to work with iterator over MolecularAlteration objects. #6302

Refactor CoExpressionService to work with iterator over MolecularAlteration objects. #6302

Conversation

n1zea144 commented Jun 21, 2019

ao508 left a comment

Choose a reason for hiding this comment

ao508 Jun 21, 2019

Choose a reason for hiding this comment

n1zea144 Jun 21, 2019

Choose a reason for hiding this comment

n1zea144 commented Jun 21, 2019

jjgao commented Jun 21, 2019

n1zea144 commented Jul 2, 2019