Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CoExpressionService to work with iterator over MolecularAlteration objects. #6302

Merged
merged 1 commit into from
Jul 2, 2019

Conversation

n1zea144
Copy link
Contributor

In an effort to reduce memory consumption when the Co-expression tab is selected, this PR introduces the use of a MyBatis cursor to iterate over Molecular Alterations rather than fetch them all in a single database call.

The PR needs to be profiled in beta to determine the gain in memory optimization against the loss in runtime performance (it may be determined that a cursor should only be used when the cohort exceeds a certain sample size). Also, its unclear what impact a large volume of concurrent Co-expression requests can have on the database connection pool when cursors are in use.

The PR limits the exposure of the MyBatis cursor to the persistence layer and returns an Iterable to the service layer. The PR refactors code as needed, makes minor attempts to cleanup the existing code to avoid breaking functionality, but does add comments where time was spent understanding the codebase.

Some JProfiler screenshots comparing runtime performance /memory usage of this PR against the current master branch will be provided, but as previously mentioned, profiling in the beta environment should still be performed before this PR is merged.

Copy link
Contributor

@ao508 ao508 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. All I found was a typo basically in a comment and I just had one question regarding a step in the coexpression impl class (see comment).


// For each MolecularAlteration in the profile, compute a CoExpression to return.
// If the MolecularAlteration is for the query gene/geneset, skip it. Otherwise,
// filter out genetic_alteration values from genetic_alteration values from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accidentally repeated some words in this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching, just updated.

@n1zea144 n1zea144 force-pushed the mybatis-cursor branch 2 times, most recently from 454d4df to a8da956 Compare June 21, 2019 17:21
@n1zea144
Copy link
Contributor Author

Here are some screen capture comparing runtime performance between precursor/cursor implementations. The good news is the cursor implementation is in the same ballpark as precursor implementation. Profiling is being performed against the brca_metabric cohort hosted on public_test (devdb.cbioportal.org/public_test). I'll follow this up with some memory usage profiling....

Current "precursor" codebase, ~37 seconds to compute coexpression of brca1 gene:

image

cursor codebase, ~39 seconds to compute coexpression of brca1 gene:

image

@jjgao
Copy link
Member

jjgao commented Jun 21, 2019

@n1zea144 just logged for using cursor for Enrichments endpoints if this works out well.

@n1zea144
Copy link
Contributor Author

n1zea144 commented Jul 2, 2019

Here is one of many screen captures comparing memory usage by the coexpression endpoint between the master branch and this PR (mybatis cursor implementation). The cursor implementation uses ~half as much memory. There is some variability in this savings and much of the savings is in the downstream computeCoExpressions method (which has me scratching my head), but I think the gain due to the use of cursors will only improve and become clearer as the datasets become larger. The highlighted row illustrates that the current implementation is using +1,284MB more memory to satisfy the fetchCoExpression request as compared to the cursor implementation (two rows down):

image

@n1zea144 n1zea144 merged commit 52f7515 into cBioPortal:master Jul 2, 2019
ao508 pushed a commit to ao508/cbioportal that referenced this pull request Jul 3, 2019
Refactor CoExpressionService to work with iterator over MolecularAlteration objects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants