[GpuGrpah] fix sample core #114

Thunderbrook · 2022-09-15T13:38:12Z

PR types

Bug fixes

PR changes

Others

Describe

修复采样阶段显存分配stream不安全的问题

* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (#86) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] graph sample v2 (#87) * change load node and edge from local to cpu (#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(#85) Co-authored-by: yangjunchao <[email protected]> * support ssdsparsetable;test=develop (#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: danleifeng <[email protected]> * Release cpu graph * uniq nodeid (#89) * compatible whole HBM mode (#91) Co-authored-by: yangjunchao <[email protected]> * Gpugraph v2 (#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <[email protected]> * split generate batch into multi stage (#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <[email protected]> * [GpuGraph] Uniq feature (#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <[email protected]> * v2 infer (#102) * optimize begin pass and end pass (#106) Co-authored-by: yangjunchao <[email protected]> * fix ins no (#104) * [GPUGraph] fix FillOneStep args (#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (#110) Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] fix infer && add infer_table_cap (#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (#114) * [GpuGraph] optimize shuffle batch (#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (#116) Co-authored-by: root <[email protected]> * fix class not found err (#118) Co-authored-by: root <[email protected]> * optimize sample (#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <[email protected]> * fix clear gpu mem (#119) Co-authored-by: root <[email protected]> * fix sample core (#121) Co-authored-by: yangjunchao <[email protected]> * add ssd cache (#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <[email protected]> * Add debug log (#131) * Add debug log * Add debug log Co-authored-by: root <[email protected]> * optimize mem in uniq slot feature (#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <[email protected]> * [GpuGraph] fix kernel overflow (#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <[email protected]> * fix ssd cache;test=develop (#139) * slot feature secondary storage (#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: xuewujiao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: Thunderbrook <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: huwei02 <[email protected]>

* Optimizing the zero key problem in the push phase * Optimize CUDA thread parallelism in MergeGrad phase * Optimize CUDA thread parallelism in MergeGrad phase * Performance optimization, segment gradient merging * Performance optimization, segment gradient merging * Optimize pullsparse and increase keys aggregation * sync gpugraph to gpugraph_v2 (xuewujiao#86) * change load node and edge from local to cpu (xuewujiao#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(xuewujiao#85) Co-authored-by: yangjunchao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] graph sample v2 (xuewujiao#87) * change load node and edge from local to cpu (xuewujiao#83) * change load node and edge * remove useless code Co-authored-by: root <[email protected]> * extract pull sparse as single stage(xuewujiao#85) Co-authored-by: yangjunchao <[email protected]> * support ssdsparsetable;test=develop (xuewujiao#81) * graph sample v2 * remove log Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: danleifeng <[email protected]> * Release cpu graph * uniq nodeid (xuewujiao#89) * compatible whole HBM mode (xuewujiao#91) Co-authored-by: yangjunchao <[email protected]> * Gpugraph v2 (xuewujiao#93) * compatible whole HBM mode * unify flag for graph emd storage mode and graph struct storage mode * format Co-authored-by: yangjunchao <[email protected]> * split generate batch into multi stage (xuewujiao#92) * split generate batch into multi stage * fix conflict Co-authored-by: root <[email protected]> * [GpuGraph] Uniq feature (xuewujiao#95) * uniq feature * uniq feature * uniq feature * [GpuGraph] global startid (xuewujiao#98) * uniq feature * uniq feature * uniq feature * global startid * load node edge seperately and release graph (xuewujiao#99) * load node edge seperately and release graph * load node edge seperately and release graph Co-authored-by: root <[email protected]> * v2 infer (xuewujiao#102) * optimize begin pass and end pass (xuewujiao#106) Co-authored-by: yangjunchao <[email protected]> * fix ins no (xuewujiao#104) * [GPUGraph] fix FillOneStep args (xuewujiao#107) * fix ins no * fix FillOnestep args * fix bug for whole hbm mode (xuewujiao#110) Co-authored-by: yangjunchao <[email protected]> * [GPUGraph] fix infer && add infer_table_cap (xuewujiao#108) * fix ins no * fix FillOnestep args * fix infer && add infer table cap * fix infer * 【PSCORE】perform ssd sparse table (xuewujiao#111) * perform ssd sparsetable;test=develop Conflicts: paddle/fluid/framework/fleet/ps_gpu_wrapper.cc * perform ssd sparsetable;test=develop * remove debug code; * remove debug code; * add jemalloc cmake;test=develop * fix wrapper;test=develop * fix sample core (xuewujiao#114) * [GpuGraph] optimize shuffle batch (xuewujiao#115) * fix sample core * optimize shuffle batch * release gpu mem when sample end (xuewujiao#116) Co-authored-by: root <[email protected]> * fix class not found err (xuewujiao#118) Co-authored-by: root <[email protected]> * optimize sample (xuewujiao#117) * optimize sample * optimize sample Co-authored-by: yangjunchao <[email protected]> * fix clear gpu mem (xuewujiao#119) Co-authored-by: root <[email protected]> * fix sample core (xuewujiao#121) Co-authored-by: yangjunchao <[email protected]> * add ssd cache (xuewujiao#123) * add ssd cache;test=develop * add ssd cache;test=develop * add ssd cache;test=develop * add multi epoch train & fix train table change ins & save infer embeding (xuewujiao#129) * add multi epoch train & fix train table change ins & save infer embedding * change epoch finish judge * change epoch finish change Co-authored-by: root <[email protected]> * Add debug log (xuewujiao#131) * Add debug log * Add debug log Co-authored-by: root <[email protected]> * optimize mem in uniq slot feature (xuewujiao#130) * [GpuGraph] cherry pick var slot feature && fix load multi path node (xuewujiao#136) * optimize mem in uniq slot feature * cherry-pick var slot_feature Co-authored-by: huwei02 <[email protected]> * [GpuGraph] fix kernel overflow (xuewujiao#138) * optimize mem in uniq slot feature * cherry-pick var slot_feature * fix kernel overflow && add max feature num flag Co-authored-by: huwei02 <[email protected]> * fix ssd cache;test=develop (xuewujiao#139) * slot feature secondary storage (xuewujiao#140) * slot feature secondary storage * slot feature secondary storage Co-authored-by: yangjunchao <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: xuewujiao <[email protected]> Co-authored-by: miaoli06 <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: chao9527 <[email protected]> Co-authored-by: yangjunchao <[email protected]> Co-authored-by: Thunderbrook <[email protected]> Co-authored-by: danleifeng <[email protected]> Co-authored-by: huwei02 <[email protected]>

fix sample core

a22e617

xuewujiao approved these changes Sep 15, 2022

View reviewed changes

Thunderbrook merged commit 2d61dde into xuewujiao:gpugraph_v2 Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GpuGrpah] fix sample core #114

[GpuGrpah] fix sample core #114

Thunderbrook commented Sep 15, 2022

[GpuGrpah] fix sample core #114

[GpuGrpah] fix sample core #114

Conversation

Thunderbrook commented Sep 15, 2022

PR types

PR changes

Describe