Skip to content

r1.15.5-deeprec2210

Compare
Choose a tag to compare
@liutongxuan liutongxuan released this 17 Nov 12:39
· 442 commits to main since this release
2e70144

Major Features and Improvements

Embedding

  • Support HBM-DRAM-SSD storage in EmbeddingVariable multi-tier storage.
  • Support multi-tier EmbeddingVariable initialized based on frequency when restore model.
  • Support to lookup location of ids of EmbeddingVariable.
  • Support kv_initialized_op for GPU Embedding Variable.
  • Support restore compatibility of EmbeddingVariable using init_from_proto.
  • Improve performance of apply/gather ops for EmbeddingVariable.
  • Add Eviction Manager in EmbeddingVariable Multi-tier storage.
  • Add unified thread pool for cache of Multi-tier storage in EmbeddingVariable.
  • Save frequencies and versions of features in SSDHash and LevelDB storage of EmbeddingVariable.
  • Avoid invalid eviction use HBM-DRAM storage of EmbeddingVariable.
  • Preventing from accessing uninitialized data use EmbeddingVariable.

Graph & Grappler Optimization

  • Optimize Async EmbeddingLookup by placement optimization.
  • Place VarHandlerOp to Compute main graph for SmartStage.
  • Support independent thread pool for stage subgraph to avoid thread contention.
  • Implement device placement optimization.

Runtime Optimization

  • Support CUDA Graph execution by adding CUDA Graph mode session.
  • Support CUDA Graph execution in JIT mode.
  • Support intra task cost estimate in CostModel in Executor.
  • Support tf.stream and tf.colocate python API for CUDA multi-stream.
  • Support embedding subgraphs partition policy when use CUDA multi-stream.
  • Optimize CUDA multi-stream by merging copy stream into compute stream.

Ops & Hardware Acceleration

  • Add a list of Quantized* and _MklQuantized* ops.
  • Implement GPU version of SparseFillEmptyRows.
  • Implement c version of spin_lock to support multi-architectures.
  • Upgrade the OneDNN version to v2.7.

Distributed

  • Support distributed training use SOK based on EmbeddingVariable.
  • Add NETWORK_MAX_CONNECTION_TIMEOUT to support connection timeout configurable in StarServer.
  • Upgrade the SOK version to v4.2.

IO

  • Add TF_NEED_PARQUET_DATASET to enable ParquetDataset.

Serving

  • Optimize embedding lookup performance by disable feature filter when serving.
  • Optimize error code for user when parse request or response failed.
  • Support independent update model threadpool to avoid performance jitter.

ModelZoo

  • Add MaskNet Model.
  • Add PLE Model.
  • Support variable type BF16 in DCN model.

BugFix

  • Fix tf.nn.embedding_lookup interface bug and session hang bug when enabling async embedding.
  • Fix warmup failed bug when user set warmup file path.
  • Fix build failure in ev_allocator.cc and hash.cc on ARM.
  • Fix build failure in arrow when build on ARM
  • Fix redefined error in NEON header file for ARM.
  • Fix _mm_malloc build failure in sparsehash on ARM.
  • Fix warmup failed bug when use session_group.
  • Fix build save graph bug when creating partitioned EmbeddingVariable in feature_column API.
  • Fix the colocation error when using EmbeddingVariable in distribution.
  • Fix HostNameToIp fails by replacing gethostbyname by getaddrinfo in StarServer.

More details of features: https://deeprec.readthedocs.io/zh/latest/

Release Images

CPU Image

alideeprec/deeprec-release:deeprec2210-cpu-py36-ubuntu18.04

GPU Image

alideeprec/deeprec-release:deeprec2210-gpu-py36-cu116-ubuntu18.04

Thanks to our Contributors

Duyi-Wang, Locke, shijieliu, Honglin Zhu, chenxujun, GosTraight2020, LALBJ, Nanno