forked from rapidsai/cuml
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync with upstream #20
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
`#include <cuml/manifold/umap.hpp>` works now. Co-authored-by: Corey J. Nolet <[email protected]>
* Moving conftest.py files around and adding quick_run plugin * Adding PR to CHANGELOG * Incorporating feedback from code review
* Initial cython test commit * Update changelog * Style fixes Co-authored-by: Nanthini Balasubramanian <[email protected]> Co-authored-by: Dante Gama Dessavre <[email protected]>
…precation warnings (#3155) * Get rid of warnings in random projections test * Update changelog * Fix style * Update other deprecated make_blob imports
* FIX Force local install by specifying exact build string * DOC Update changelog * Update ci/gpu/build.sh Co-authored-by: AJ Schmidt <[email protected]> Co-authored-by: AJ Schmidt <[email protected]>
* Update flake8 config to join python/cython configuration and improve setup to check __init__.py files * Fixing linting issues in previously ignored __init__.py files * Update flake8 config to join python/cython configuration and improve setup to check __init__.py files * Fixing linting issues in previously ignored __init__.py files * Adding PR to CHANGELOG * Incorporating feedback from code review * Fixing style issues after merge with branch-0.17 Co-authored-by: Corey J. Nolet <[email protected]> Co-authored-by: Dante Gama Dessavre <[email protected]>
…kip-ci] (#3144) * Adding ability to set arbitrary cmake flags in ./build.sh via the $CUML_ADDL_CMAKE_ARGS variable * Adding PR to CHANGELOG * Adding more help info requested from code review. Co-authored-by: John Zedlewski <[email protected]>
* Adding brute force knn shell to sparse * Stubbing out algorithm flow * Adding initial headers to wrapper * Performing idx batching * Starting to full in cusparse calls * Checking in * Beginning to add selection kernel * Finished header * Updates. Need to finish populating merge buffer * Using block select for selecting k and using 3-partition merge buffer * Logic is just about done. * Checking in changes. Need to swap out cuda 11 cusparse calls for cuda 10.2 version * Everything is building. Need end-to-end test * Running clang format * Updating changelog * Using raft's cusparse_wrappers.h instead of cuml * Removing cuda11-required GEMM calls (commenting them out for now, will swap them out shortly) * Fixing clang style * Separating distance computation from selection from general brute force algorithm to make pieces more reusable * Updating clang style * Adding batcher to help ease batch state management * Fixing clang style * MOre clang fixes * IP distance is computed using search * index.T. * Making type template for value_t all the way through knn_merge_parts * Adding simple googletest for sparse pairwise dists. The transpose conversion seems super expensive, but maybe it's necessary. * Completing test for basic inner product distances * Removing prints from test * Cleaning up batching for knn. Ready to gtest * KNN w/ max inner product is working * Adding guts of expanded l2 computation. * Cleaning up some debug prints * Fixing clang format * More cleanup and clang style fix * Fixing style for sparse knn prim test * Hoping i've captured all the clang updates * Updating per include_checker * I feel like I"m bouncing back and forth between clang and include checker * Refactoring sparse pairwise dists to return dense outputs * Beginning python layer * iAdding python layer for sparse inputs to nearest neighbors * End to end sparse knn works. Need to finish norms for expanded euclidean and expose it. * Removing unused file * Adding gtest for expanded l2. * Sparse l2 matches sklearn * Fixing clang format style * Fixing dstyle in gtests * Lots of changes and cleanup. Still need to flip the batching * Progress on tiling. Still a failure when tile sizes don't match up. * Tiling w/ uneven batch sizes works! Now just need to figure out what to do when the leftover values are <k * Some further optinmizations are necessary, but this works for now. * Ready for cleanup * Parametrizing sparse knn tests * More cleanup. * Fixing clang format * Fixing clang format style * Fixing flake8 for sparse nn tests * Fixing googletests * More cleanup of sparse knn * Adding sparse support to UMAP by abstracting the inputs * Everything's building. Have one template issue to fix in the sparse knn * Updates to API * Usig a struct to manage the knn graph output state * C++ side is largely done. Still need to figure out what to do w/ the separate int64_t type in the sparse knn * Removing examples/comms, which seems to have gotten re-checked in by mistake * Fixing c++ style * Fixing include checks * This darn style checker is going to kill me..... * Adding template type params for output * UMAP is officially accepting sparse inputs * More cleanup * Cleaning up gtests and making them easier to write * Fixing up and parametrizing tests * Fixing style * Fixing python style * More clang format style fixes * Pulled umap inputs classes to more shared location so tsne can use them. Added kselection gtest * Updating clang format * Fixing bad ide refactor * Updating changelog * Fixing more clang format * Fixing flake8 style. Not sure why these didn't show up locally * Decomposing sparse knn into a class. * Review feedback * Better umap sparse test * More testing updates * Adding docs to some of the remaining prims in csr.cuh * Adding gtests for transpose and row slice. Need to add one for todense * GTest for csr to dense * Fixing style * Removing debug logging from new gtests * Fixing flake8 style * Getting build to pass * Running clang-tidy * Fixing format for sparse gtests * Adding 'algo_params' to get_param_names() * Removing cumlarray output in kneighbors * Finishing review feedback * Fixing style * Fixing format * clang-format * Style changes * More review updates * Style updates * Running clang format on distance.cuh * Runing clang format on tests * Fixing cython style * Updating RAFT commit * Updating neighbors from bad merge
…mples_leaf (#3132) * Enforce min_rows_per_node in experimental RF backend * Add min_samples_split hyperparameter * Use correct definition of min_samples_split * Rename range_len -> n_samples * Add min_samples_split to Dask docstring * Rename min_rows_per_node -> min_samples_leaf * Update docstring for min_samples_leaf * Correctly apply min_samples_split in new RF backend * Address reviewer's comment * Fix broken tests in BatchedLevelAlgo/DtRegTestF.Test * Adjust accuracy requirement in test RFBatchedRegTests/RFBatchedRegTestF.Fit/5 * Add unit tests for min_samples_split, min_samples_leaf * Add descriptive comments for compound literals * Fix formatting * Add changelog * Organize unit tests under prefix BatchedLevelAlgoUnitTest * Change default value for min_samples_leaf to 1 * Deprecate min_rows_per_node; guide users to use min_samples_leaf * Fix style error
…ors (#3113) * FEA Add preferred_order class parameter to linear models * ENH adopt tags from scikit-learn API to support preferred order attribute * DOC remove attribute docstrings * FIX Change straggling classes * FIX Change straggling classes * FIX Add missing self * FIX straggling attribute * ENH Add device data tag for proposal * FEA Add all scikit-learn API tags to base and improve gpu input types tag * FEA Add preferred_order tag to cluster models * FEA Add preferred_order tag to most models * ENH Improvements and PR review feedback * DOC add tag documentation to estimator guide * DOC add scikit link * Update wiki/python/ESTIMATOR_GUIDE.md Co-authored-by: Corey J. Nolet <[email protected]> * Update wiki/python/ESTIMATOR_GUIDE.md Co-authored-by: Corey J. Nolet <[email protected]> * Update wiki/python/ESTIMATOR_GUIDE.md Co-authored-by: Corey J. Nolet <[email protected]> * Update wiki/python/ESTIMATOR_GUIDE.md Co-authored-by: Corey J. Nolet <[email protected]> * Update wiki/python/ESTIMATOR_GUIDE.md Co-authored-by: Corey J. Nolet <[email protected]> * ENH Rename test_fit to test_api and add tags tests * FIX fixes from PR review * DOC Added entry to changelog * FIX PEP8 fixes Co-authored-by: Corey J. Nolet <[email protected]>
* Removing extra unneeded file * Updating changelog
…#3152) * FIX Access to attributes of individual NB objects in dask NB * DOC Added entry to changelog * ENH Add pytest * FIX PEP8 fixes Co-authored-by: John Zedlewski <[email protected]>
…the tiniest models (#3032) * just control block count * blocks_per_sm can now be passed through treelite_params_t or forest_params_t * changelog * made blocks_per_sm mandatory; added tests; fixed a bug * changelog * added tests, moved __syncthreads() to common for all acc's, removed most blockIdx.x uses * removed blocks_per_sm from python API, to avoid a longer discussion on best set * simplified output loops * addressed other review comments * fixed bad merge conflict resolution * comment for blocks_per_sm in fil.pyx * style
* binary reduction: half way there * quaternary reduction * changelog * remove accidental files * generalize the multireduction * adding dedicated tests for multireduction; style * change trap; into setting an atomic. * split into n tests, one per size * ? * tried thrust + rmm, no rmm dependency in tests it seems * no rmm, sync allocations * style * fixed some testing bugs; expanded test to all block sizes; better documentation * fixed wrong test * simplify comparison * member -> non-member function pointer as test template argument * style * replaced reduction with simpler code; tuned radix towards fewer classes * fixed compile dependency and runtime discrepancy * long comment line * fix build issues * Apply suggestions from code review Co-authored-by: Andy Adinets <[email protected]> * addressed review comments Co-authored-by: Andy Adinets <[email protected]>
* add dask-glm demo link * add to changelog Co-authored-by: Corey J. Nolet <[email protected]> Co-authored-by: Dante Gama Dessavre <[email protected]>
Updated with 0.15 and 0.16 release dates. Co-authored-by: Corey J. Nolet <[email protected]>
* Remove outdated, extraneous file * Update changelog
* Expose silhouette score in Python * Style fix * Correct dtypes used in silhouette_score * Update changelog * Fix style * Update linebreaks * Add copyright headers * Collapse Python silhouette_score to single file * Restructure silhouette_score for consistency * Fix style * Loosen silhouette score test tolerance
…[skip-ci] (#3175) * FIX Fix gtest pinned cmake version for build from source option * DOC Added entry to changelog
…3176) * Add probabilistic SVM tests with various input array types * DOC update changelog
* Fix a bug in MSE metric calculation * Style fix * Add changelog * Try smaller grid dimensions
* blocks_per_sm FIL parameter in Python. * Updated CHANGELOG.md. * Fixed style errors. * Reduced the number of parameter combinations in the Python test.
* Adding simple dask estimator notebook to demonstrate saving/loading * Renaming and updating cells * Updating source.rst * Updating changelog * Updating pickling notebook * Review updates * More review feedback Co-authored-by: John Zedlewski <[email protected]>
* Fix + multiple improvements * Update changelog * Update model output and testing * Check style update * Update comments * Test one query partition * Check style
…dically-failing FIL test [skip ci] (#3196) * Disable ascending=false path for sortColumnsPerRow * DOC Update chanegelog * Disable flaky FIL test Co-authored-by: John Zedlewski <[email protected]> Co-authored-by: John Zedlewski <[email protected]>
* FIX Fix EXITCODE override in test_notebooks script * DOC Changelog update * FIX Move bash trap to after the GTests so they fail immediately * FIX Move codecov block to gpu build
* Fix cuDF to cuPy conversion (missing value) * Changelog update * Introducing fail_on_nan parameter * Adding test with fail_on_nan=True * Updating conversion * Rename fail_on_nan into fail_on_null
This PR is fixing the attribute error of #3183, and additional bugs on the input type of PCA (`sparse_scipy_to_cp()` function call missed an argument) and on the shape of `self.singular_values_`. I am also adding additional tests on the bug fixed here. Authors: - Mickael Ide <[email protected]> - John Zedlewski <[email protected]> Approvers: - Divye Gala - John Zedlewski URL: #3190
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Authors: - divyegala <[email protected]> - John Zedlewski <[email protected]> Approvers: - Dante Gama Dessavre - Dante Gama Dessavre URL: #3241
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
…'s RF(#3245) Rename rows_sample -> max_samples to be consistent with sklearn's RF. From https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html: > **max_samples**: int or float, default=None > If bootstrap is True, the number of samples to draw from X to train each base estimator. > If None (default), then draw X.shape[0] samples. > If int, then draw max_samples samples. > If float, then draw max_samples * X.shape[0] samples. Thus, max_samples should be in the interval (0, 1). > New in version 0.22. Authors: - Hyunsu Cho <[email protected]> Approvers: - John Zedlewski URL: #3245
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
…tical(#3243) Closes #3231 Closes #3128 Partially addresses #3188 The degenerate case (labels all identical in a node) is now robustly handled, by computing the MSE metric separately for each of the three nodes (the parent node, the left child node, and the right child node). Doing so ensures that the gain is 0 for the degenerate case. The degenerate case may occur in some real-world regression problems, e.g. house price data where the price label is rounded up to nearest 100k. As a result, the MSE gain is computed very similarly as the MAE gain. Disadvantage: now we always make two passes over data to compute the gain. cc @teju85 @vinaydes @JohnZed Authors: - Hyunsu Cho <[email protected]> - Philip Hyunsu Cho <[email protected]> Approvers: - Thejaswi Rao - John Zedlewski URL: #3243
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Authors: - Corey J. Nolet <[email protected]> - Corey J. Nolet <[email protected]> Approvers: - John Zedlewski URL: #3250
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
) * Hide silhouette_score Python binding Remove this feature due to memory issues in C++ implementation for anything but modest numbers of samples * Remove silhouette_score tests * Update changelog * Remove unused import * Remove silhouette_score from new features list * Add note on reason for hiding silhouette_score * Update docstrings with silhouette_score warning Also remove sillhouette_score from api.rst docs * Update CHANGELOG to restore reference to reverted PR
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Answers #3232. Explicitly specify `batch_size` as parameter to MNMG KNN models in order to make it visible in the documentation. Authors: - viclafargue <[email protected]> - Corey J. Nolet <[email protected]> Approvers: - Corey J. Nolet - John Zedlewski URL: #3246
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
…#3282) * FIX Add secondary test to kernel explainer pytests for stability in Volta * DOC Added entry to changelog * FIX PR review feedback
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
* Correct pure virtual declaration in manifold_inputs_t * Update changelog
Remove keyword "stops" from call to cudf.core.column.string.slice, which no longer accepts arbitrary keywords. cuDF change introduced in rapidsai/cudf#6750. Authors: - William Hicks <[email protected]> Approvers: - John Zedlewski - Micka URL: #3289
Linear SVR has the coef_ attribute in the python layer. In the C++ unit test the same vector is denoted by _w_, and it is defined as a linear combination of the support vectors ![image](https://user-images.githubusercontent.com/3671106/101908077-ce3d9e80-3bbb-11eb-98ff-e7be90828dde.png) The number of elements in _w_ is n_cols. One of the SVR tests only defined 1 expected value for _w_, instead of the expected n_cols=2 values, which lead to accessing an uninitialized value. This would fail the test unless the value is accidentally zero initialized. Surprisingly this happened extremely rarely. This PR fixes the expected value _w_exp_. Authors: - Tamas Bela Feher <[email protected]> Approvers: - Dante Gama Dessavre URL: #3294
Closes #1780 Adding kNN graph input functionality to t-SNE, a request broken off of the issue #1733. t-SNE gathers kNN indices and distances in the first stage of it's computation, by allowing the user to input their own kNN graph, they can skip this step. This should follow #1815 as closely as possible. **Benefits of this**: - allow user custom run of kNN algorithm - can use different distance function instead of t-SNE euclidean default - allows for speedup if performing grid search by storing and reusing kNN graph **Includes:** - [x] Abstracted `extract_knn_graph` so it can be used for both UMAP and t-SNE - [x] Implemented kNN graph input to Python/Cython layer and C++/CUDA layer - [x] C++/CUDA Barnes Hut and Exact t-SNE tests - [x] Python t-SNE tests - [x] General code cleanup wherever needed Authors: - Aleksander Ficek <[email protected]> - Corey J. Nolet <[email protected]> - Ray Douglass <[email protected]> - Corey J. Nolet <[email protected]> Approvers: - Corey J. Nolet URL: #2592
* FEA Consolidate linear model gemm based predicts on one function on C++ * FEA Consolidate linear model gemm based predicts on one function on Python * DOC Added entry to changelog * FIX PEP8 fixes * FIX Forgot clang-format * FIX Remove C++ sync calls and unnecessary delete on Python based on PR feedback * DOC Remove changelog entry
…3292) * Refactoring: move internal FIL interface to a separate file. - move the functions not related to treelite import, prediction or freeing the model to a separate file * Fixed style errors.
This PR will enable the usage of multiple KNN strategies as alternatives to the current default bruteforce method. See #574 Authors: - wxbn <[email protected]> - viclafargue <[email protected]> - Corey J. Nolet <[email protected]> Approvers: - Corey J. Nolet URL: #2780
This PR fixes CI fails that happen on `test_naive_bayes` when the machine can't download the 20 newsgroup dataset. It closes #3260 Authors: - Mickael Ide <[email protected]> Approvers: - John Zedlewski URL: #3291
* Adding NotFittedError to PCA * Fixed typo in PCA import * Fixed check_is_fitted call * Fixed missing parenthesis * Added test on svd_flip * fix style ipca * Fixed whitespace style * Removed useless test
Ensure that the 100th quantile value returned by cupy.percentile is the maximum of the input array rather than (possibly) NaN due to cupy/cupy#4451. This eliminates an intermittent failure observed in tests of KBinsDiscretizer, which makes use of cupy.percentile. Note that this includes an alteration of the included sklearn code and should be reverted once the upstream cupy issue is resolved. Resolve failure due to ValueError described in #2933. Authors: - William Hicks <[email protected]> Approvers: - Dante Gama Dessavre - Victor Lafargue URL: #3315
This PR aims at converting the confusion matrix to int when possible, to avoid the scientific notation when possible. See this example: ![image](https://user-images.githubusercontent.com/9810050/101400035-9808d200-38d0-11eb-9f81-4d217a5ff202.png) Authors: - Mickael Ide <[email protected]> - Mickael Ide <[email protected]> Approvers: - John Zedlewski URL: #3275
…#3281) Replace "constexpr static" member variables in DecisionTree unit test fixture with "const" member variables for compliance with C++14, which otherwise requires that const static data members be separately defined in a namespace scope if it is ODR-used (See sections 3.2 and 9.4.2 of the C++11 standard, which remain relevant until C++17). Authors: - William Hicks <[email protected]> Approvers: - Dante Gama Dessavre URL: #3281
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.