Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [CUDA] New CUDA version #4528

Closed
wants to merge 70 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
94aed50
new cuda framework
shiyu1994 Apr 20, 2021
18df6b2
add histogram construction kernel
Apr 22, 2021
9b21d2b
before removing multi-gpu
Apr 29, 2021
634a4f1
new cuda framework
Apr 29, 2021
23bcaa2
tree learner cuda kernels
May 6, 2021
6c14cd9
single tree framework ready
May 7, 2021
aa0b3de
single tree training framework
May 9, 2021
bc85ced
remove comments
May 9, 2021
18d957a
boosting with cuda
May 10, 2021
28186c0
optimize for best split find
May 13, 2021
60c7e4e
data split
May 14, 2021
57547fb
move boosting into cuda
May 17, 2021
608fd70
parallel synchronize best split point
May 27, 2021
277be8b
merge split data kernels
Jun 1, 2021
ffcf765
before code refactor
Jun 1, 2021
a58c1e1
use tasks instead of features as units for split finding
Jun 2, 2021
72d41c9
refactor cuda best split finder
Jun 2, 2021
f7a7658
fix configuration error with small leaves in data split
shiyu1994 Jun 2, 2021
b6efd10
skip histogram construction of too small leaf
shiyu1994 Jun 2, 2021
6f4e39d
skip split finding of invalid leaves
shiyu1994 Jun 3, 2021
4072bb8
support row wise with CUDA
shiyu1994 Jun 4, 2021
88ecde9
copy data for split by column
shiyu1994 Jun 4, 2021
dec7501
copy data from host to CPU by column for data partition
shiyu1994 Jun 8, 2021
2dccb7f
add synchronize best splits for one leaf from multiple blocks
shiyu1994 Jun 8, 2021
0168d2c
partition dense row data
shiyu1994 Jun 9, 2021
0570fe0
fix sync best split from task blocks
shiyu1994 Jun 9, 2021
374018c
add support for sparse row wise for CUDA
shiyu1994 Jun 9, 2021
40c49cc
remove useless code
shiyu1994 Jun 9, 2021
dc41a00
add l2 regression objective
shiyu1994 Jun 10, 2021
bd065b7
sparse multi value bin enabled for CUDA
shiyu1994 Jun 11, 2021
a5fadfb
fix cuda ranking objective
shiyu1994 Jun 16, 2021
3202b79
support for number of items <= 2048 per query
Jun 16, 2021
cd687c9
speedup histogram construction by interleaving global memory access
Jun 23, 2021
320c449
split optimization
Jun 28, 2021
eb1d7fa
add cuda tree predictor
Jul 2, 2021
dd177f5
remove comma
Jul 18, 2021
ee836d6
refactor objective and score updater
Jul 19, 2021
0467fce
before use struct
Jul 21, 2021
f05da3c
use structure for split information
Jul 21, 2021
400622a
use structure for leaf splits
Jul 21, 2021
d9d3aa9
return CUDASplitInfo directly after finding best split
Jul 21, 2021
45cf7a7
split with CUDATree directly
Jul 22, 2021
9dea18d
use cuda row data in cuda histogram constructor
Jul 26, 2021
572e2b0
clean src/treelearner/cuda
Jul 26, 2021
fe58d4c
gather shared cuda device functions
Jul 27, 2021
dc461dc
put shared CUDA functions into header file
Jul 27, 2021
ba565c1
change smaller leaf from <= back to < for consistent result with CPU
Jul 27, 2021
a781ef5
add tree predictor
Aug 3, 2021
c8a6fab
remove useless cuda_tree_predictor
Aug 3, 2021
a7504dc
predict on CUDA with pipeline
Aug 4, 2021
896d47b
add global sort algorithms
Aug 9, 2021
fe6ed74
add global argsort for queries with many items in ranking tasks
Aug 9, 2021
7808455
remove limitation of maximum number of items per query in ranking
Aug 11, 2021
7a0d218
add cuda metrics
Aug 16, 2021
ca42f3b
fix CUDA AUC
Aug 18, 2021
c681102
remove debug code
Aug 18, 2021
ea60566
add regression metrics
Aug 19, 2021
5c84788
remove useless file
Aug 19, 2021
c2c2407
don't use mask in shuffle reduce
Aug 19, 2021
b43d367
add more regression objectives
Sep 2, 2021
951aa37
fix cuda mape loss
Sep 3, 2021
b50ce5b
use template for different versions of BitonicArgSortDevice
Sep 3, 2021
f51fd70
add multiclass metrics
Sep 6, 2021
35c742d
add ndcg metric
Sep 7, 2021
510d878
fix cross entropy objectives and metrics
Sep 10, 2021
95f4612
fix cross entropy and ndcg metrics
Sep 10, 2021
bb997d0
add support for customized objective in CUDA
Sep 10, 2021
17b78d1
complete multiclass ova for CUDA
Sep 10, 2021
72aa863
merge master
Sep 13, 2021
c1b8f99
remove CUDA_ARCH 6.2 temporarily
shiyu1994 Oct 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ if(USE_CUDA)
include_directories(${CUDA_INCLUDE_DIRS})
LIST(APPEND CMAKE_CUDA_FLAGS -Xcompiler=${OpenMP_CXX_FLAGS} -Xcompiler=-fPIC -Xcompiler=-Wall)

set(CUDA_ARCHS "6.0" "6.1" "6.2" "7.0")
set(CUDA_ARCHS "6.0" "6.1" "7.0")
if(CUDA_VERSION VERSION_GREATER_EQUAL "10.0")
list(APPEND CUDA_ARCHS "7.5")
endif()
Expand Down Expand Up @@ -352,6 +352,20 @@ file(GLOB SOURCES
src/treelearner/*.cpp
if(USE_CUDA)
src/treelearner/*.cu
src/treelearner/cuda/*.cpp
src/treelearner/cuda/*.cu
src/io/cuda/*.cu
src/io/cuda/*.cpp
src/cuda/*.cpp
src/cuda/*.cu
src/objective/cuda/*.cpp
src/objective/cuda/*.cu
src/boosting/cuda/*.cpp
src/boosting/cuda/*.cu
src/application/cuda/*.cpp
src/application/cuda/*.cu
src/metric/cuda/*.cpp
src/metric/cuda/*.cu
endif(USE_CUDA)
)

Expand Down Expand Up @@ -443,12 +457,14 @@ endif()

if(USE_CUDA)
set_target_properties(lightgbm PROPERTIES CUDA_RESOLVE_DEVICE_SYMBOLS ON)
set_target_properties(lightgbm PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
set_target_properties(lightgbm PROPERTIES CUDA_ARCHITECTURES OFF)
TARGET_LINK_LIBRARIES(
lightgbm
${histograms}
)
set_target_properties(_lightgbm PROPERTIES CUDA_RESOLVE_DEVICE_SYMBOLS ON)
set_target_properties(_lightgbm PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
set_target_properties(_lightgbm PROPERTIES CUDA_ARCHITECTURES OFF)
TARGET_LINK_LIBRARIES(
_lightgbm
Expand Down
12 changes: 12 additions & 0 deletions include/LightGBM/bin.h
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,8 @@ class BinMapper {
}
}

inline const std::vector<double>& bin_upper_bound() const { return bin_upper_bound_; }

private:
/*! \brief Number of bins */
int num_bin_;
Expand Down Expand Up @@ -386,6 +388,10 @@ class Bin {
* \brief Deep copy the bin
*/
virtual Bin* Clone() = 0;

virtual const void* GetColWiseData(uint8_t* bit_type, bool* is_sparse, std::vector<BinIterator*>* bin_iterator, const int num_threads) const = 0;

virtual const void* GetColWiseData(uint8_t* bit_type, bool* is_sparse, BinIterator** bin_iterator) const = 0;
};


Expand Down Expand Up @@ -459,6 +465,12 @@ class MultiValBin {
static constexpr double multi_val_bin_sparse_threshold = 0.25f;

virtual MultiValBin* Clone() = 0;

virtual const void* GetRowWiseData(uint8_t* bit_type,
size_t* total_size,
bool* is_sparse,
const void** out_data_ptr,
uint8_t* data_ptr_bit_type) const = 0;
};

inline uint32_t BinMapper::ValueToBin(double value) const {
Expand Down
7 changes: 7 additions & 0 deletions include/LightGBM/boosting.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@

#include <LightGBM/config.h>
#include <LightGBM/meta.h>
#include <LightGBM/tree.h>

#include <string>
#include <map>
Expand Down Expand Up @@ -314,6 +315,12 @@ class LIGHTGBM_EXPORT Boosting {
static Boosting* CreateBoosting(const std::string& type, const char* filename);

virtual bool IsLinear() const { return false; }

virtual const std::vector<std::unique_ptr<Tree>>& models() const = 0;

virtual int num_tree_per_iteration() const = 0;

virtual std::function<void(data_size_t, const double*, double*)> GetCUDAConvertOutputFunc() const = 0;
};

class GBDTBase : public Boosting {
Expand Down
Loading