r1.15.5-deeprec2212
liutongxuan
released this
24 Jan 11:25
·
346 commits
to main
since this release
Major Features and Improvements
Embedding
- Refactor GPU Embedding Variable storage layer.
- Remove TENSORFLOW_USE_GPU_EV macro from embedding storage layer.
- Refactor KvResourceGather GPU Op.
- Add embedding memory pool for HBM storage of EmbeddingVariable.
- Refine the code HBM storage of EmbeddingVariable.
- Reuse the embedding files on SSD generated by EmbeddingVariable when save and restore checkpoint.
- Integrate single HBM EV into multi_tier EmbeddingVariable.
Graph & Grappler Optimization
- Filter out the 'stream_id' attribute in arithmetic optimizer.
- Add SimplifyEmbeddingLookupStage optimizer.
- Add ForwardBackwardJointOptimizationPass to eliminate duplicate hash in Gather and Apply ops for Embedding Variable.
Runtime Optimization
- Add allocators for each stream_executor in multi-context mode.
- Set multi-gpu devices in session_group mode.
- Add blacklist and whitelist to JitCugraph.
- Optimize CPU EVAllocator to speedup EmbeddingVariable performance.
- Support independent GPU host allocator for each session.
- Add GPU EVAllocator to speedup EmbeddingVariable on GPU.
Ops & Hardware Acceleration
- Add GPU implementation for Unique.
- Support indices type with DT_INT64 in sparse segment ops.
- Add list of gradient implementation for the following ops including SplitV, ConcatV2, BroadcastTo, Tile, GatherV2, Cumsum, Cast.
- Add C++ gradient op for Select.
- Add gradient implementation for SelectV2.
- Add C++ gradient op for Atan2.
- Add C++ gradients for UnsortedSegmentMin/Max/Sum.
- Refactor KvSparseApplyAdagrad GPU Op.
- Merge NV-TF r1.15.5+22.12.
Distributed
- Update seastar to control SDT by macro HAVE_SDT.
- Update WORKER_DEFAULT_CORE_NUM(8) and PS_EFAULT_CORE_NUM(2) default values.
Serving
- Support multi-model deployment in SessionGroup.
- Support user setup cpu-sets for each session_group.
- Support processor to load multi-models.
- Support GPU compilation in processor.
- Optimize independent GPU host allocator for each session.
Environment & Build
- Update systemtap to a valid source address.
- Support DeepRec's ABI compatible with TensorFlow 1.15 by configure TF_API_COMPATIBLE_1150.
- Upgrade base docker images based on ubuntu20.04 and python3.8.10.
- Update pcre-8.44 urls.
- Remove systemtap from third party and related dependency.
- Enable gcc optimization option -O3 by default.
BugFix
- Fix function definition issue in processor.
- Fix the hang when insert item into lockless hash map.
- Fix EmbeddingVariable hang/coredump in GPU mode.
- Fix memory leak in CUDA multi-stream when merge compute and copy stream.
- Fix wrong session devices order.
- Fix hwloc build error on alinux3.
- Fix double clear resource_mgr bug when use SessionGroup.
- Fix wrong Shrink causes unit tests to fail randomly.
- Fix the conflict when the EmbeddingVariable and embedding fusion is enabled simultaneously.
- Fix EmbeddingVarGPU coredump in destructor.
More details of features: https://deeprec.readthedocs.io/zh/latest/
Release Images
CPU Image
alideeprec/deeprec-release:deeprec2212-cpu-py38-ubuntu20.04
GPU Image
alideeprec/deeprec-release:deeprec2212-gpu-py38-cu116-ubuntu20.04