Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support shard embeding #70

Open
wants to merge 468 commits into
base: paddlebox
Choose a base branch
from
Open

Conversation

qingshui
Copy link

PR types

PR changes

Describe

qingshui and others added 30 commits October 22, 2021 15:52
fix ins bug, add mean logloss gpu op
add gpu sample memory pool
use diff thres during pull sparse
humingqing and others added 30 commits December 18, 2023 19:15
add fill zero in fused_seqpool_cvm
add fused seq tensor && support transpose batch fc weight
* fused_seqpool_cvm_with_conv support filter by threshold

* add fill zero in fused_seqpool_cvm

* add fused seq tensor && support transpose batch fc weight

---------

Co-authored-by: mojingcj <[email protected]>
Co-authored-by: jiaoxuewu <[email protected]>
Co-authored-by: yuandong1998 <[email protected]>
Co-authored-by: shangzhongbin <[email protected]>
* fused_seqpool_cvm_with_conv support filter by threshold

* add fill zero in fused_seqpool_cvm

* add fused seq tensor && support transpose batch fc weight

---------

Co-authored-by: mojingcj <[email protected]>
Co-authored-by: jiaoxuewu <[email protected]>
Co-authored-by: yuandong1998 <[email protected]>
Co-authored-by: shangzhongbin <[email protected]>
fix fused query seq tensor compare case
* fused_seqpool_cvm_with_conv support filter by threshold

* add fill zero in fused_seqpool_cvm

* add fused seq tensor && support transpose batch fc weight

* fix fused query seq tensor compare case

---------

Co-authored-by: mojingcj <[email protected]>
Co-authored-by: jiaoxuewu <[email protected]>
Co-authored-by: yuandong1998 <[email protected]>
Co-authored-by: shangzhongbin <[email protected]>
…u thread num, fused_seqpool_cvm gpu memory alloc optimize
修复dump core问题,优化大数据写磁盘内存会超问题改成分段写入,优化fused_seqpool_cvm concat性能,优化fused_seqpool_cvm显存分配以及连续访问提升性能单op H800机型提升60倍整体提升25%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.