integrated_brier_score() - ValueError: expected estimate with ... columns, but got ... #317

ramm777 · 2022-10-28T12:51:37Z

Describe the bug

integrated_brier_score() function cannot input floats and it looks like inputs must be integers only. This is not described in the documentation, I was wondering if you could add that, please?

If you input times as float, your module converts that to integers and the length of 'times' floats may not be equal to the length of 'times' of integers, because of rounding methods.

Code Sample to Reproduce the Bug

# I added here file called 'data.csv'


from sklearn.model_selection import train_test_split
from sksurv.ensemble import RandomSurvivalForest
from sksurv.metrics import integrated_brier_score

data = pd.read_csv('data.csv')

y = data[['1', '0']].copy()
y = y.to_records(index=False)

x = data.loc[:, '2':].copy()


x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=20)

rsf = RandomSurvivalForest()
rsf.fit(x_train, y_train)

rsf_surv_funcs = rsf.predict_survival_function(x_test)
times = np.percentile(rsf_surv_funcs[0].x, np.linspace(10, 80, 2*len(rsf_surv_funcs[0].x)))
rsf_surv_prob = np.row_stack([fn(times) for fn in rsf_surv_funcs])

# This will give the bug
integrated_brier_score(y_train, y_test, rsf_surv_prob, times)


# However, if I set times as unique of times - there will be no error. I guess the issue is the conversion of floats into integers in metrics.py. To check you can print your array in line 4 of the _check_estimate_2d() function.  

times = np.unique(np.round(times))
rsf_surv_prob = np.row_stack([fn(times) for fn in rsf_surv_funcs])
integrated_brier_score(y_train, y_test, rsf_surv_prob, times)

Expected Results
ibs

Actual Results

    time_points.shape[0], estimate.shape[1]))
ValueError: expected estimate with 142 columns, but got 144

Versions
Please execute the following snippet and paste the output below.

System:
    python: 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)]
executable: D:\concr_health_oucomes\projects\outcomes\venv\Scripts\python.exe
   machine: Windows-10-10.0.19041-SP0
Python dependencies:
          pip: 21.1.2
   setuptools: 57.0.0
      sklearn: 1.0.2
        numpy: 1.19.5
        scipy: 1.7.1
       Cython: None
       pandas: 1.3.2
   matplotlib: 3.4.3
       joblib: 1.0.1
threadpoolctl: 2.2.0
Built with OpenMP: True
sksurv: 0.17.2

The text was updated successfully, but these errors were encountered:

sebp · 2022-11-01T07:47:40Z

The error indicates that the number of time points does not match the number predictions (where needs to be one for each time point). I don't think it as anything to do with the data type.

ramm777 · 2022-11-01T09:37:43Z

I checked the metrics.py, it looks like it is related to the type of data.

So, if your data points are not integers but float, printing time_points in the metrics.py will results in a shorter array so the dimension will change.

I fixed this by making the input integers, or it can be floats which if converted to integers will result in the same length.

sebp · 2022-11-02T09:51:28Z

Could you please provide a minimal working example to reproduce the problem (e.g. using randomly generated predictions)?

ramm777 · 2022-11-07T16:18:25Z

I have just updated the issue with the minimum working example and attached a data file. Thank you.

sebp · 2022-11-12T19:37:27Z

You are correct. times gets converted to the same dtype as time in y, which is int, therefore creating duplicates.

scikit-survival/sksurv/metrics.py

Line 66 in 7fd87e7

    
           times = check_array(np.atleast_1d(times), ensure_2d=False, dtype=test_time.dtype, input_name="times")

This is not intended.

If `times` is a float array and survival times are ints, a downcast of float to int can result in loss of information. Keep the original dtype instead. Closes #317

sebp added the bug label Nov 12, 2022

sebp self-assigned this Nov 12, 2022

sebp added a commit that referenced this issue Apr 2, 2023

FIX: Downcast time points passed to brier_score

93d0c24

If `times` is a float array and survival times are ints, a downcast of float to int can result in loss of information. Keep the original dtype instead. Closes #317

sebp mentioned this issue Apr 2, 2023

FIX: Downcast time points passed to brier_score #349

Merged

5 tasks

sebp closed this as completed in #349 Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrated_brier_score() - ValueError: expected estimate with ... columns, but got ... #317

integrated_brier_score() - ValueError: expected estimate with ... columns, but got ... #317

ramm777 commented Oct 28, 2022 •

edited

Loading

sebp commented Nov 1, 2022

ramm777 commented Nov 1, 2022

sebp commented Nov 2, 2022

ramm777 commented Nov 7, 2022

sebp commented Nov 12, 2022

integrated_brier_score() - ValueError: expected estimate with ... columns, but got ... #317

integrated_brier_score() - ValueError: expected estimate with ... columns, but got ... #317

Comments

ramm777 commented Oct 28, 2022 • edited Loading

sebp commented Nov 1, 2022

ramm777 commented Nov 1, 2022

sebp commented Nov 2, 2022

ramm777 commented Nov 7, 2022

sebp commented Nov 12, 2022

ramm777 commented Oct 28, 2022 •

edited

Loading