Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for finding features with epistatic effects with scikit-mdr #26

Open
weixuanfu opened this issue Sep 25, 2018 · 1 comment
Open

Comments

@weixuanfu
Copy link

weixuanfu commented Sep 25, 2018

It seems that the utilities in mdr.utils is designed for this purpose but there is no documentation about how to use them. I have a quick look into those codes and made the demo for calculating scores for n-way combinations and I think it maybe a way to finding feature combinations with epistatic effect. Please let me know if it is the correct way.

from mdr import MDRClassifier
import pandas as pd
from mdr.utils import n_way_models
import operator

genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-mdr/raw/development/data/GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz', sep='\t', compression='gzip')

features = genetic_data.drop('class', axis=1).values
labels = genetic_data['class'].values
feature_names = list(genetic_data.columns)

my_mdr = MDRClassifier()
my_mdr.fit(features, labels)
print("Score for using all features", my_mdr.score(features, labels))

#n: list (default: [2])
#The maximum size(s) of the MDR model to generate.
#e.g., if n == [3], all 3-way models will be generated.
n = [2]
mdr_score_list = []
#  Note that this function performs an exhaustive search through all feature combinations and can be computationally expensive.
for _, mdr_model_score, model_features in n_way_models(my_mdr, features, labels, n=n, feature_names=feature_names):
    mdr_score_list.append((model_features, mdr_model_score))
mdr_score_list.sort(key=operator.itemgetter(1), reverse=True)
print("The combination with highest score:", mdr_score_list[0])

Exported output:

Score for using all features 0.998125
The combination with highest score: (['P1', 'P2'], 0.793125)
@amyxlu
Copy link

amyxlu commented Sep 27, 2018

The test code worked for me and did what I had wanted to do. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants