Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce concordance_index_censored's memory use #362

Merged
merged 6 commits into from
May 12, 2023
Merged

Conversation

cpoerschke
Copy link
Contributor

@cpoerschke cpoerschke commented May 10, 2023

Checklist

  • pytest passes
  • tests are included - please see note and code snippet below
  • code is well formatted

What does this implement/fix? Explain your changes

issue encountered: Passing relatively lots of technical data samples to concordance_index_censored results in "kernel died" in a notebook environment.

analysis

  • The concordance_index_censored computation calls _get_comparable which returns up to N masks of length N where N is the number of samples being scored.
  • The _get_comparable returns all the masks and the caller iterates over all the masks but it uses only one mask at a time.

proposed solution

  • Change _get_comparable to return (smaller) mask information instead of (potentially much larger) full mask content.
  • Change the _get_comparable caller to construct the full mask from the mask info, one mask at a time at the point of use.

testing

Existing tests continue to pass. In terms of additional test coverage, I've tried to build a test_metrics.py case but wasn't able to easily reproduce the issue running locally but in a more resource-constrained environment it is easily reproduced e.g. via this code snippet:

import numpy as np
from sksurv.metrics import concordance_index_censored

step=10000

for n_samples in range(10*step, 20*step, step):

    event_indicator = np.zeros(n_samples, dtype=bool)
    event_time = np.zeros(n_samples, dtype=int)
    for idx in range(n_samples):
        event_indicator[idx] = (idx % 2)
        event_time[idx] = 100 + idx//10
    estimate = np.zeros(n_samples, dtype=float)

    print(f"n_samples={n_samples} concordance_index_censored call before")
    concordance_index_censored(event_indicator, event_time, estimate)
    print(f"n_samples={n_samples} concordance_index_censored call after")

@codecov
Copy link

codecov bot commented May 11, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.01 ⚠️

Comparison is base (5938554) 97.92% compared to head (0e693db) 97.92%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #362      +/-   ##
==========================================
- Coverage   97.92%   97.92%   -0.01%     
==========================================
  Files          37       37              
  Lines        3326     3323       -3     
  Branches      502      502              
==========================================
- Hits         3257     3254       -3     
  Misses         33       33              
  Partials       36       36              
Impacted Files Coverage Δ
sksurv/metrics.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@sebp
Copy link
Owner

sebp commented May 11, 2023

Indeed, the memory usage is quite excessive, great catch!

I believe you can optimize it even further by

  1. converting _get_comparable to a generator via yield
  2. Initializing an empty mask in _estimate_concordance_index, which you reset at the beginning of each loop with mask.fill(False).

@cpoerschke
Copy link
Contributor Author

Hi @sebp - thank you very much for reviewing and suggesting further optimisations!

generators: i had to read up on that and the changes i made are a first attempt then; tests continue to pass.

empty mask outside the loop and fill resetting inside the loop: i tried that but didn't observe a reduced number of allocations, perhaps a mask inside the loop can be efficiently 'recycled' if it's the same shape and dtype? allocation observation was via memray's getting started step and naive mouse-hover-for-tooltip manual inspection of the flame graph. having np.create/mask.fill instead of just np.zeros also marginally adds code reading complexity and so i've reverted that change again for now.

sksurv/metrics.py Outdated Show resolved Hide resolved
@cpoerschke cpoerschke marked this pull request as draft May 12, 2023 13:51
@cpoerschke cpoerschke marked this pull request as ready for review May 12, 2023 14:14
@sebp
Copy link
Owner

sebp commented May 12, 2023

empty mask outside the loop and fill resetting inside the loop: i tried that but didn't observe a reduced number of allocations, perhaps a mask inside the loop can be efficiently 'recycled' if it's the same shape and dtype? allocation observation was via memray's getting started step and naive mouse-hover-for-tooltip manual inspection of the flame graph. having np.create/mask.fill instead of just np.zeros also marginally adds code reading complexity and so i've reverted that change again for now.

I don't know how numpy handles memory allocations either. Since you actually used a memory profiler and did not observe a difference, I would prefer the more readable code as well.

sksurv/metrics.py Outdated Show resolved Hide resolved
@sebp sebp merged commit d5fcbc9 into sebp:master May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants