-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CoExpressionService to work with iterator over MolecularAlteration objects. #6302
Conversation
service/src/main/java/org/cbioportal/service/impl/CoExpressionServiceImpl.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. All I found was a typo basically in a comment and I just had one question regarding a step in the coexpression impl class (see comment).
|
||
// For each MolecularAlteration in the profile, compute a CoExpression to return. | ||
// If the MolecularAlteration is for the query gene/geneset, skip it. Otherwise, | ||
// filter out genetic_alteration values from genetic_alteration values from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accidentally repeated some words in this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching, just updated.
454d4df
to
a8da956
Compare
Here are some screen capture comparing runtime performance between precursor/cursor implementations. The good news is the cursor implementation is in the same ballpark as precursor implementation. Profiling is being performed against the brca_metabric cohort hosted on public_test (devdb.cbioportal.org/public_test). I'll follow this up with some memory usage profiling.... Current "precursor" codebase, ~37 seconds to compute coexpression of brca1 gene: cursor codebase, ~39 seconds to compute coexpression of brca1 gene: |
@n1zea144 just logged for using cursor for Enrichments endpoints if this works out well. |
Here is one of many screen captures comparing memory usage by the coexpression endpoint between the master branch and this PR (mybatis cursor implementation). The cursor implementation uses ~half as much memory. There is some variability in this savings and much of the savings is in the downstream computeCoExpressions method (which has me scratching my head), but I think the gain due to the use of cursors will only improve and become clearer as the datasets become larger. The highlighted row illustrates that the current implementation is using +1,284MB more memory to satisfy the fetchCoExpression request as compared to the cursor implementation (two rows down): |
Refactor CoExpressionService to work with iterator over MolecularAlteration objects.
In an effort to reduce memory consumption when the Co-expression tab is selected, this PR introduces the use of a MyBatis cursor to iterate over Molecular Alterations rather than fetch them all in a single database call.
The PR needs to be profiled in beta to determine the gain in memory optimization against the loss in runtime performance (it may be determined that a cursor should only be used when the cohort exceeds a certain sample size). Also, its unclear what impact a large volume of concurrent Co-expression requests can have on the database connection pool when cursors are in use.
The PR limits the exposure of the MyBatis cursor to the persistence layer and returns an Iterable to the service layer. The PR refactors code as needed, makes minor attempts to cleanup the existing code to avoid breaking functionality, but does add comments where time was spent understanding the codebase.
Some JProfiler screenshots comparing runtime performance /memory usage of this PR against the current master branch will be provided, but as previously mentioned, profiling in the beta environment should still be performed before this PR is merged.