Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Experimental versions of GPU accelerated Kernel and Permutation SHAP #3126

Merged
merged 55 commits into from
Dec 2, 2020
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
fe99177
FEA Separate kernel shap from shared shap branch for PR
dantegd Nov 9, 2020
52bfcd8
FIX Typos
dantegd Nov 9, 2020
5021358
FIX Typos
dantegd Nov 9, 2020
26f21e6
FEA Add files to cmakelists
dantegd Nov 11, 2020
ee47596
ENH small corrections and started incorporating PR review feedback
dantegd Nov 11, 2020
2d51f8e
ENH progress on remaining todos
dantegd Nov 11, 2020
e3aa3af
[FIX] typo
dantegd Nov 11, 2020
14f0a8b
FIX Small function typo
dantegd Nov 11, 2020
18cec88
Merge branch '017-fea-kshap' of github.com:dantegd/cuml into 017-fea-…
dantegd Nov 11, 2020
dffb7b9
ENH Multiple small enhancements and fixes
dantegd Nov 11, 2020
267468f
ENH Use tags for device model detection
dantegd Nov 11, 2020
90f59c5
ENH data type changes
dantegd Nov 16, 2020
96baa48
ENH Add pytest files
dantegd Nov 16, 2020
5602f28
ENH multiple enhancements, completed todos and fixes
dantegd Nov 16, 2020
df5517f
ENH naming, comments and code enhancements to C++ code
dantegd Nov 16, 2020
ec8af23
ENH clang-format cleanup
dantegd Nov 16, 2020
204dbcf
ENH variable rename for clarity
dantegd Nov 16, 2020
9189efa
ENH Add explainer common pytests
dantegd Nov 17, 2020
13bf8e6
Merge branch 'branch-0.17' of https://github.com/rapidsai/cuml into 0…
dantegd Nov 18, 2020
0396424
ENH Use raft handle device properties
dantegd Nov 19, 2020
0e8e405
ENH Many more enhancements, better weighter linear regression
dantegd Nov 19, 2020
5349e47
ENH Add googletest and c++ improvements from PR feedback
dantegd Nov 22, 2020
c5e09da
ENH clang-format and comments about the tests
dantegd Nov 22, 2020
76fc081
FIX remove straggling prints
dantegd Nov 22, 2020
bf32ddc
FIX Uncomment all other c++ tests
dantegd Nov 22, 2020
d876f44
ENH Multiple small python enhancements and bugfixes
dantegd Nov 23, 2020
aceddec
ENH More python small improvements, rename class to match mainline
dantegd Nov 23, 2020
2dd1fa1
ENH Big python code cleanup and incorporating PR feedback. New SHAPBa…
dantegd Nov 24, 2020
347c51b
Merge branch 'branch-0.17' of https://github.com/rapidsai/cuml into 0…
dantegd Nov 24, 2020
64c60e9
ENH Incorporate rest of feedback of KernelSHAP and Base
dantegd Nov 24, 2020
d7516da
ENH Add full coverage to explainer common tests
dantegd Nov 24, 2020
b2ddd21
ENH Small numeric and other enhancements
dantegd Nov 24, 2020
67d8025
ENH Multiple enhancements including coalesced kernel, generating samp…
dantegd Nov 29, 2020
aae8dee
FIX clang format fixes
dantegd Nov 30, 2020
9af98d8
FEA Improvements to pytests
dantegd Nov 30, 2020
6ab7326
ENH More python enhancements and simplify perm SHAP to use SHAPBase c…
dantegd Nov 30, 2020
8d6d105
Merge branch 'branch-0.17' of https://github.com/rapidsai/cuml into 0…
dantegd Nov 30, 2020
a21f20a
DOC Added entry to changelog
dantegd Nov 30, 2020
5679e3d
ENH Various small style fixes, doc fixes, tidying up straggling comments
dantegd Nov 30, 2020
2170217
FIX PEP8 fixes
dantegd Nov 30, 2020
0566dfc
FIX test margins that I had forgotten to adjust, some might still be …
dantegd Nov 30, 2020
81a95ee
FIX add missing stream sync to test and print in case of failure
dantegd Nov 30, 2020
d132522
FIX always run clang-format I keep telling myself...
dantegd Nov 30, 2020
ecdee14
FIX Small type correction that seems to be the root of the googletest…
dantegd Nov 30, 2020
fe2dc74
FIX temporarily disable specific googletest for 0.17 burndown
dantegd Dec 1, 2020
9013548
FIX Had disabled the test in the incorrect place :(
dantegd Dec 1, 2020
5886407
FIX remove straggling prints
dantegd Dec 1, 2020
30242e6
Update cpp/src/explainer/kernel_shap.cu
dantegd Dec 1, 2020
6cfe98e
Update python/cuml/common/import_utils.py
dantegd Dec 1, 2020
2cd7d0e
Update python/cuml/experimental/explainer/base.py
dantegd Dec 1, 2020
7c50130
ENH incorporating PR review feedback
dantegd Dec 1, 2020
d6ebad4
FIX solve changelog conflict
dantegd Dec 1, 2020
2e0374f
FIX clang format fixes
dantegd Dec 1, 2020
3725127
FIX reduce test size and case matrix of test that was slow in CI
dantegd Dec 1, 2020
454180d
FIX c++ docstring fix
dantegd Dec 2, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
- PR #3164: Expose silhouette score in Python
- PR #2659: Add initial max inner product sparse knn
- PR #2836: Refactor UMAP to accept sparse inputs
- PR #3126: Experimental versions of GPU accelerated Kernel and Permutation SHAP

## Improvements
- PR #3077: Improve runtime for test_kmeans
Expand Down
2 changes: 2 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,8 @@ if(BUILD_CUML_CPP_LIBRARY)
src/datasets/make_regression.cu
src/dbscan/dbscan.cu
src/decisiontree/decisiontree.cu
src/explainer/kernel_shap.cu
src/explainer/permutation_shap.cu
src/fil/fil.cu
src/fil/infer.cu
src/glm/glm.cu
Expand Down
86 changes: 86 additions & 0 deletions cpp/include/cuml/explainer/kernel_shap.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cuml/cuml.hpp>

namespace ML {
namespace Explainer {

/**
* Generates samples of dataset for kernel shap algorithm.
*
*
* @param[in] handle cuML handle
* @param[inout] X generated data [on device] 1-0 (row major)
* @param[in] nrows_X number of rows in X
* @param[in] ncols number of columns in X, background and dataset
* @param[in] background background data [on device]
dantegd marked this conversation as resolved.
Show resolved Hide resolved
* @param[in] nrows_background number of rows in background dataset
* @param[out] dataset generated data [on device] observation=background (row major)
* @param[in] observation row to scatter
JohnZed marked this conversation as resolved.
Show resolved Hide resolved
* @param[in] nsamples vector with number of entries that are randomly sampled
* @param[in] len_nsamples number of entries to be sampled
* @param[in] maxsample size of the biggest sampled observation
* @param[in] seed Seed for the random number generator
*
* Kernel distrubutes exact part of the kernel shap dataset
* Each block scatters the data of a row of `observations` into the (number of rows of
* background) in `dataset`, based on the row of `X`.
* So, given:
* background = [[0, 1, 2],
[3, 4, 5]]
* observation = [100, 101, 102]
* X = [[1, 0, 1],
* [0, 1, 1]]
*
* dataset (output):
* [[100, 1, 102],
* [100, 4, 102]
* [0, 101, 102],
* [3, 101, 102]]
* The first thread of each block calculates the sampling of `k` entries of `observation`
* to scatter into `dataset`. Afterwards each block scatters the data of a row of `X` into
* the (number of rows of background) in `dataset`.
* So, given:
* background = [[0, 1, 2, 3],
* [5, 6, 7, 8]]
* observation = [100, 101, 102, 103]
* nsamples = [3, 2]
*
* X (output)
* [[1, 0, 1, 1],
* [0, 1, 1, 0]]
*
* dataset (output):
* [[100, 1, 102, 103],
* [100, 6, 102, 103]
* [0, 101, 102, 3],
* [5, 101, 102, 8]]
*/
void kernel_dataset(const raft::handle_t& handle, float* X, int nrows_X,
int ncols, float* background, int nrows_background,
float* dataset, float* observation, int* nsamples,
int len_nsamples, int maxsample, uint64_t seed = 0ULL);

void kernel_dataset(const raft::handle_t& handle, float* X, int nrows_X,
int ncols, double* background, int nrows_background,
double* dataset, double* observation, int* nsamples,
int len_nsamples, int maxsample, uint64_t seed = 0ULL);

} // namespace Explainer
} // namespace ML
140 changes: 140 additions & 0 deletions cpp/include/cuml/explainer/permutation_shap.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
/*
* Copyright (c) 2020, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#pragma once

#include <cuml/cuml.hpp>

namespace ML {
namespace Explainer {

/**
* Generates a dataset by tiling the `background` matrix into `out`, while
* adding a forward and backward permutation pass of the observation `row`
* on the positions defined by `idx`. Example:
*
* background = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
* idx = [2, 0, 1]
* row = [100, 101, 102]
* output:
* [[ 0, 1, 2]
* [ 3, 4, 5]
* [ 6, 7, 8]
* [ 0, 1, 102]
* [ 3, 4, 102]
* [ 6, 7, 102]
* [100, 1, 102]
* [100, 4, 102]
* [100, 7, 102]
* [100, 101, 102]
* [100, 101, 102]
* [100, 101, 102]
* [100, 101, 2]
* [100, 101, 5]
* [100, 101, 8]
* [ 0, 101, 2]
* [ 3, 101, 5]
* [ 6, 101, 8]
* [ 0, 1, 2]
* [ 3, 4, 5]
* [ 6, 7, 8]]
*
*
* @param[in] handle cuML handle
* @param[out] out generated data [on device] [dim = (2 * ncols * nrows_bg + nrows_bg) * ncols]
JohnZed marked this conversation as resolved.
Show resolved Hide resolved
* @param[in] background background data [on device] [dim = ncols * nrows_bg]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column major?

* @param[in] nrows_bg number of rows in background dataset
* @param[in] ncols number of columns
* @param[in] row row to scatter in a permutated fashion [dim = ncols]
* @param[in] idx permutation indexes [dim = ncols]
* @param[in] row_major boolean to generate either row or column major data
*
*/
void permutation_shap_dataset(const raft::handle_t& handle, float* out,
const float* background, int nrows_bg, int ncols,
const float* row, int* idx, bool row_major);

void permutation_shap_dataset(const raft::handle_t& handle, double* out,
const double* background, int nrows_bg, int ncols,
const double* row, int* idx, bool row_major);

/**
* Generates a dataset by tiling the `background` matrix into `out`, while
* adding a forward and backward permutation pass of the observation `row`
* on the positions defined by `idx`. Example:
*
* background = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
* idx = [2, 0, 1]
* row = [100, 101, 102]
* output:
* [[ 0, 1, 2]
* [ 3, 4, 5]
* [ 6, 7, 8]
* [ 0, 1, 102]
* [ 3, 4, 102]
* [ 6, 7, 102]
* [100, 1, 2]
* [100, 4, 5]
* [100, 7, 8]
* [ 0, 101, 2]
* [ 3, 101, 5]
* [ 6, 101, 8]]
*
*
* @param[in] handle cuML handle
* @param[out] out generated data [on device] [dim = (2 * ncols * nrows_bg + nrows_bg) * ncols]
* @param[in] background background data [on device] [dim = ncols * nrows_bg]
* @param[in] nrows_bg number of rows in background dataset
* @param[in] ncols number of columns
* @param[in] row row to scatter in a permutated fashion [dim = ncols]
* @param[in] idx permutation indexes [dim = ncols]
* @param[in] row_major boolean to generate either row or column major data
*
*/

void shap_main_effect_dataset(const raft::handle_t& handle, float* out,
const float* background, int nrows_bg, int ncols,
const float* row, int* idx, bool row_major);

void shap_main_effect_dataset(const raft::handle_t& handle, double* out,
const double* background, int nrows_bg, int ncols,
const double* row, int* idx, bool row_major);

/**
* Function that aggregates averages of the averatge of results of the model
* called with the permutation dataset, to estimate the SHAP values.
* It is equivalent to the Python code:
* for i,ind in enumerate(idx):
* shap_values[ind] += y_hat[i + 1] - y_hat[i]
* for i,ind in enumerate(idx):
* shap_values[ind] += y_hat[i + ncols] - y_hat[i + ncols + 1]
*
* @param[in] handle cuML handle
* @param[out] shap_values Array where the results are aggregated [dim = ncols]
* @param[in] y_hat Results to use for the aggregation [dim = ncols + 1]
* @param[in] ncols number of columns
* @param[in] idx permutation indexes [dim = ncols]
*/
void update_perm_shap_values(const raft::handle_t& handle, float* shap_values,
const float* y_hat, const int ncols,
const int* idx);

void update_perm_shap_values(const raft::handle_t& handle, double* shap_values,
const double* y_hat, const int ncols,
const int* idx);

} // namespace Explainer
} // namespace ML
Loading