Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (xuewujiao#86) * change load node and edge from local to cpu (xuewujiao#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(xuewujiao#85) Co-authored-by: yangjunchao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] graph sample v2 (xuewujiao#87) * change load node and edge from local to cpu (xuewujiao#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(xuewujiao#85) Co-authored-by: yangjunchao <[email protected]> * support ssdsparsetable;test=develop (xuewujiao#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: danleifeng <[email protected]> * Release cpu graph * uniq nodeid (xuewujiao#89) * compatible whole HBM mode (xuewujiao#91) Co-authored-by: yangjunchao <[email protected]> * Gpugraph v2 (xuewujiao#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <[email protected]> * split generate batch into multi stage (xuewujiao#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <[email protected]> * [GpuGraph] Uniq feature (xuewujiao#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (xuewujiao#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (xuewujiao#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <[email protected]> * v2 infer (xuewujiao#102) * optimize begin pass and end pass (xuewujiao#106) Co-authored-by: yangjunchao <[email protected]> * fix ins no (xuewujiao#104) * [GPUGraph] fix FillOneStep args (xuewujiao#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (xuewujiao#110) Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] fix infer && add infer_table_cap (xuewujiao#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (xuewujiao#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (xuewujiao#114) * [GpuGraph] optimize shuffle batch (xuewujiao#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (xuewujiao#116) Co-authored-by: root <[email protected]> * fix class not found err (xuewujiao#118) Co-authored-by: root <[email protected]> * optimize sample (xuewujiao#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <[email protected]> * fix clear gpu mem (xuewujiao#119) Co-authored-by: root <[email protected]> * fix sample core (xuewujiao#121) Co-authored-by: yangjunchao <[email protected]> * add ssd cache (xuewujiao#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (xuewujiao#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <[email protected]> * Add debug log (xuewujiao#131) * Add debug log * Add debug log Co-authored-by: root <[email protected]> * optimize mem in uniq slot feature (xuewujiao#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (xuewujiao#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <[email protected]> * [GpuGraph] fix kernel overflow (xuewujiao#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <[email protected]> * fix ssd cache;test=develop (xuewujiao#139) * slot feature secondary storage (xuewujiao#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: xuewujiao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: Thunderbrook <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: huwei02 <[email protected]>
- Loading branch information