Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility of limiting the prediction horizon #227

Open
christophme opened this issue Oct 11, 2021 · 2 comments
Open

Add possibility of limiting the prediction horizon #227

christophme opened this issue Oct 11, 2021 · 2 comments

Comments

@christophme
Copy link

christophme commented Oct 11, 2021

I wanted to predict short-term survival curves from a RSF on a large customer dataset and ran out of memory because the entire survival curves were predicted with their length equal to the training dataset.

It would be great if there would be the possibility to limit the prediction by passing an array with indices or something like that similar to what is implement in the lifelines package for predictions via passing the 'time' - parameter.

References and existing implementations
[(https://lifelines.readthedocs.io/en/latest/fitters/regression/CoxPHFitter.html#lifelines.fitters.coxph_fitter.SemiParametricPHFitter.predict_survival_function)]

@sebp
Copy link
Owner

sebp commented Oct 11, 2021

Could you please clarify what you mean by "their length equal to the training dataset"?

If you call predict_survival_function it will return an array of StepFunction instances, which share the x attribute, but have different y values.

Do you have a huge number of unique training times such that x and y cannot be stored?

If not, you can evaluate each StepFunction at any time point(s) by calling the function. See for instance, this example from the user guide:

rsf_chf_funcs = rsf.predict_cumulative_hazard_function(
    va_x_test, return_array=False)
rsf_risk_scores = np.row_stack([chf(va_times) for chf in rsf_chf_funcs])

@christophme
Copy link
Author

With "their length equal to the training dataset" I meant that the length of the returned arrays or StepFunctino are determined from the training times and this number was so large that x and y couldn't even be stored. Here, I would have appreciated the possibility in limiting the prediction to the next n number of periods or a list of predefined time periods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants