Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the train pipeline responsibleaidashboard-census-classification-model-debugging.ipynb #1195

Merged
merged 44 commits into from
Feb 27, 2022
Merged
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
6b7a9a5
Simplify the train pipeline responsibleaidashboard-census-classificat…
gaugup Feb 2, 2022
1c9041e
Address code review comments
gaugup Feb 4, 2022
00c4a39
Update notebooks/responsibleaidashboard/responsibleaidashboard-census…
gaugup Feb 4, 2022
188c833
Merge branch 'main' into gaugup/CorrectTrainDataRAIInsights
gaugup Feb 4, 2022
71ecdb4
Merge branch 'main' into gaugup/CorrectTrainDataRAIInsights
gaugup Feb 4, 2022
3bedb89
Merge branch 'main' into gaugup/CorrectTrainDataRAIInsights
gaugup Feb 4, 2022
e1cdaae
Add multiclass classification dataset & set up basic model-assessment…
romanlutz Feb 4, 2022
40bb222
add more docs for error analysis surrogate tree (#1198)
imatiach-msft Feb 6, 2022
6ae3c50
Add cohort and filter definitions in raiwidgets SDK (#1186)
gaugup Feb 7, 2022
c10837f
fix flaky flask test (#1203)
imatiach-msft Feb 7, 2022
4250c66
export (#1204)
zhb000 Feb 8, 2022
a4a424d
Improve description for responsibleai and raiwidgets (#1205)
gaugup Feb 8, 2022
b6cacf7
Add docstrings for locale (#1202)
gaugup Feb 8, 2022
db0736d
Make causal manager add() and compute() behavior similar to other man…
gaugup Feb 9, 2022
11934dc
DOC Fix various documentation formatting inconsistencies (#1209)
romanlutz Feb 9, 2022
641c661
add tests (#1199)
vinuthakaranth Feb 9, 2022
5609a36
optimize tree traversal logic in error analysis to reduce execution t…
imatiach-msft Feb 10, 2022
6301b7c
release raiwidgets and responsibleai 0.17.0 (#1211)
imatiach-msft Feb 10, 2022
39ddb87
release rai-core-flask 0.2.5 (#1215)
imatiach-msft Feb 10, 2022
6eace92
Remove widget tests from CI-notebook pipeline (#1213)
vinuthakaranth Feb 10, 2022
0e59c26
Replace dependence plot with highchart lib (#1208)
zhb000 Feb 10, 2022
27cf1f5
add missing release steps causing rai-core-flask release errors (#1216)
imatiach-msft Feb 11, 2022
f91bed0
Add heterogeneity_model checks (#1210)
gaugup Feb 11, 2022
b0f024c
DOC add type annotations to responsibleai package (#1214)
romanlutz Feb 11, 2022
19eedee
update raiwidgets to rai-core-flask to 0.2.5 release (#1221)
imatiach-msft Feb 11, 2022
6d1ca37
Add e2e tests for Housing decision making and multiclass dnn notebook…
vinuthakaranth Feb 11, 2022
8805f22
fix release pipeline by adding pytorch packages for tests (#1222)
imatiach-msft Feb 11, 2022
9329fb7
refactor (#1220)
zhb000 Feb 12, 2022
15f1ef0
fix release error on unknown shell command when uploading to pypi (#1…
imatiach-msft Feb 14, 2022
7d1cda2
Correct falsey to falsely (#1228)
gaugup Feb 15, 2022
f6204fb
fix categorical what-if in RAI dashboard (#1225)
gaugup Feb 27, 2022
68ac41d
fix tree api being called twice on initial load due to uninitialized …
imatiach-msft Feb 16, 2022
a4cf7d6
scatter e2e (#1226)
zhb000 Feb 16, 2022
44ec4b2
update several required dependencies (#1219)
imatiach-msft Feb 16, 2022
020d8f0
Add data validations to SDK defined cohorts (#1227)
gaugup Feb 16, 2022
11c4443
fix total metric changing with different num bins when using quantile…
imatiach-msft Feb 18, 2022
3f93bc3
add ut for DashboardSettingDeleteButton (#1231)
xuke444 Feb 22, 2022
54ec869
Pin markupsafe and itsdangerous to unblock gates (#1238)
gaugup Feb 22, 2022
694eb7c
Create pytest fixtures raiwidgets tests (#1232)
gaugup Feb 23, 2022
1537b66
Refactor dependence plot (#1230)
zhb000 Feb 23, 2022
f7085c4
Add user defined cohort injection logic into raiwidgets (#1237)
gaugup Feb 24, 2022
5846b35
erroranalysis version bump in raiwidgets to 0.1.31 (#1245)
imatiach-msft Feb 24, 2022
edee29a
Make cohrtData empty list in case no pre-bdefined cohorts are injecte…
gaugup Feb 26, 2022
6244dc3
Merge branch 'main' into gaugup/CorrectTrainDataRAIInsights
gaugup Feb 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@
"id": "clinical-henry",
"metadata": {},
"source": [
"First, load the census dataset and specify the different types of features. Then, clean the target feature values to include only 0 and 1."
"First, load the census dataset and specify the different types of features. Compose a pipeline which contains a preprocessor and estimator."
]
},
{
Expand All @@ -99,7 +99,7 @@
" y = dataset[[target_feature]]\n",
" return X, y\n",
"\n",
"def clean_data(X, y, target_feature):\n",
"def create_classification_pipeline(X, y, target_feature):\n",
" features = X.columns.values.tolist()\n",
" classes = y[target_feature].unique().tolist()\n",
" pipe_cfg = {\n",
Expand All @@ -118,9 +118,13 @@
" ('num_pipe', num_pipe, pipe_cfg['num_cols']),\n",
" ('cat_pipe', cat_pipe, pipe_cfg['cat_cols'])\n",
" ])\n",
" X = feat_pipe.fit_transform(X)\n",
" print(pipe_cfg['cat_cols'])\n",
" return X, feat_pipe, features, classes\n",
"\n",
" # Append classifier to preprocessing pipeline.\n",
" # Now we have a full prediction pipeline.\n",
" pipeline = Pipeline(steps=[('preprocessor', feat_pipe),\n",
" ('model', LGBMClassifier(n_estimators=5))])\n",
gaugup marked this conversation as resolved.
Show resolved Hide resolved
"\n",
" return pipeline\n",
"\n",
"outdirname = 'responsibleai.12.28.21'\n",
"try:\n",
Expand All @@ -140,30 +144,25 @@
"train_data = pd.read_csv('adult-train.csv')\n",
"test_data = pd.read_csv('adult-test.csv')\n",
"\n",
"\n",
"X_train_original, y_train = split_label(train_data, target_feature)\n",
"X_test_original, y_test = split_label(test_data, target_feature)\n",
"\n",
"pipeline = create_classification_pipeline(X_train_original, y_train, target_feature)\n",
"\n",
"X_train, feat_pipe, features, classes = clean_data(X_train_original, y_train, target_feature)\n",
"y_train = y_train[target_feature].to_numpy()\n",
"\n",
"X_test = feat_pipe.transform(X_test_original)\n",
"y_test = y_test[target_feature].to_numpy()\n",
"\n",
"train_data[target_feature] = y_train\n",
"test_data[target_feature] = y_test\n",
"\n",
"test_data_sample = test_data.sample(n=500, random_state=5)\n",
"train_data_sample = train_data.sample(n=8000, random_state=5)"
"# Take 500 samples from the test data\n",
"test_data_sample = test_data.sample(n=500, random_state=5)"
]
},
{
"cell_type": "markdown",
"id": "potential-proportion",
"metadata": {},
"source": [
"Train a LightGBM classifier on the training data."
"Train a classification pipeline composed in the previous cell on the training data."
gaugup marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
Expand All @@ -173,8 +172,7 @@
"metadata": {},
"outputs": [],
"source": [
"clf = LGBMClassifier(n_estimators=5)\n",
"model = clf.fit(X_train, y_train)"
"model = pipeline.fit(X_train_original, y_train)"
]
},
{
Expand Down Expand Up @@ -213,10 +211,8 @@
"metadata": {},
"outputs": [],
"source": [
"dashboard_pipeline = Pipeline(steps=[('preprocess', feat_pipe), ('model', model)])\n",
"\n",
"rai_insights = RAIInsights(dashboard_pipeline, train_data_sample, test_data_sample, target_feature, 'classification',\n",
" categorical_features=categorical_features)"
"rai_insights = RAIInsights(model, train_data, test_data_sample, target_feature, 'classification',\n",
" categorical_features=categorical_features)"
]
},
{
Expand Down Expand Up @@ -519,7 +515,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
"version": "3.7.11"
}
},
"nbformat": 4,
Expand Down