Survival Analysis

This project is to perform survial analysis by deep learning method. This could be an alternative to Kaplan-Meier estimator(https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator). Survival functions which measure the conversion time to alzheimer disease of patients were predicted.

The dataset is based on open dataset ADNI. A csv file of ADNI contains information of patients (Demography, Clinical Data, Convert Time). Detailed information can be found in http://adni.loni.usc.edu/methods/documents/.

Missing data problem - GAIN

The ADNI dataset have missing values(features not label). To solve the problems, the missing values were imputed by generative model. The GAIN (Generative Adversarial Imputation Networks) is known to solve missing value.

Why missing value problem happens

It is hard to measure all diagnosis procedures.
Some patients can't complete all clinical tests due to their health status.
Sometimes, patients could refuse tests such as biopsy, lumbar puncture or even blood test.
Standard medical test procedure can change over time.

Network architecture and details are described in the paper (https://arxiv.org/pdf/1806.02920.pdf). Jinsung Yoon, 2018, GAIN: Missing Data Imputation using Generative Adversarial Nets, ICML.

Implementation Details

Data preprocessing of ADNI

Min/Max Normalization each feature range of [0, 1]
Encode string-value features to int-value label

Generator

Input - (1) Real data matrix (2) Random vector at Nan-value (3) Mask matrix
output - Imputed data
Loss - (1) Mean Square Error ( Real data & Imputed data) (2) Adversarial loss

Discriminator

Input - Imputed data from generator
Output - Mask vector [ real (1), imputed (0), confusing data (0.5) ]
Hint mask - help training of discriminator
Loss - Cross entropy

Imputation result

RMSE is model evaluation metric

The imputaed values are saved in csv