Skip to content

Ray-2.36.0

Compare
Choose a tag to compare
@GeneDer GeneDer released this 17 Sep 18:30
· 1 commit to releases/2.36.0 since this release
85d98e1

Ray Libraries

Ray Data

💫 Enhancements:

  • Remove limit on number of tasks launched per scheduling step (#47393)
  • Allow user-defined Exception to be caught. (#47339)

🔨 Fixes:

  • Display pending actors separately in the progress bar and not count them towards running resources (#46384)
  • Fix bug where arrow_parquet_args aren't used (#47161)
  • Skip empty JSON files in read_json() (#47378)
  • Remove remote call for initializing Datasource in read_datasource() (#47467)
  • Remove dead from_*_operator modules (#47457)
  • Release test fixes
  • Add AWS ACCESS_DENIED as retryable exception for multi-node Data+Train benchmarks (#47232)
  • Get AWS credentials with boto (#47352)
  • Use worker node instead of head node for read_images_comparison_microbenchmark_single_node release test (#47228)

📖 Documentation:

  • Add docstring to explain Dataset.deserialize_lineage (#47203)
  • Add a comment explaining the bundling behavior for map_batches with default batch_size (#47433)

Ray Train

💫 Enhancements:

  • Decouple device-related modules and add Huawei NPU support to Ray Train (#44086)

🔨 Fixes:

  • Update TORCH_NCCL_ASYNC_ERROR_HANDLING env var (#47292)

📖 Documentation:

  • Add missing Train public API reference (#47134)

Ray Tune

📖 Documentation:

  • Add missing Tune public API references (#47138)

Ray Serve

💫 Enhancements:

  • Mark proxy as unready when its routers are aware of zero replicas (#47002)
  • Setup default serve logger (#47229)

🔨 Fixes:

  • Allow get_serve_logs_dir to run outside of Ray's context (#47224)
  • Use serve logger name for logs in serve (#47205)

📖 Documentation:

  • [HPU] [Serve] [experimental] Add vllm HPU support in vllm example (#45893)

🏗 Architecture refactoring:

  • Remove support for nested DeploymentResponses (#47209)

RLlib

🎉 New Features:

  • New API stack: Add CQL algorithm. (#47000, #47402)
  • New API stack: Enable GPU and multi-GPU support for DQN/SAC/CQL. (#47179)

💫 Enhancements:

  • New API stack: Offline RL enhancements: #47195, #47359
  • Enhance new API stack stability: #46324, #47196, #47245, #47279
  • Fix large batch size for synchronous algos (e.g. PPO) after EnvRunner failures. (#47356)
  • Add torch.compile config options to old API stack. (#47340)
  • Add kwargs to torch.nn.parallel.DistributedDataParallel (#47276)
  • Enhanced CI stability: #47197, #47249

📖 Documentation:

  • New API stack example scripts:
    • Float16 training example script. (#47362)
    • Mixed precision training example script (#47116)
    • ModelV2 -> RLModule wrapper for migrating to new API stack. (#47425)
  • Remove "new API stack experimental" hint from docs. (#47301)

🏗 Architecture refactoring:

  • Remove 2nd Learner ConnectorV2 pass from PPO (#47401)
  • Add separate learning rates for policy and alpha to SAC. (#47078)

🔨 Fixes:

Ray Core

💫 Enhancements:

🔨 Fixes:

  • Fix ray_unintentional_worker_failures_total to only count unintentional worker failures (#47368)
  • Fix runtime env race condition when uploading the same package concurrently (#47482)

Dashboard

🔨 Fixes:

Docs

💫 Enhancements:

  • Add sphinx-autobuild and documentation for make local (#47275): Speed up of local docs builds with make local.
  • Add Algolia search to docs (#46477)
  • Update PyTorch Mnist Training doc for KubeRay 1.2.0 (#47321)
  • Life-cycle of documentation policy of Ray APIs

Thanks

Many thanks to all those who contributed to this release!
@GeneDer, @Bye-legumes, @nikitavemuri, @kevin85421, @MortalHappiness, @LeoLiao123, @saihaj, @rmcsqrd, @bveeramani, @zcin, @matthewdeng, @raulchen, @mattip, @jjyao, @ruisearch42, @scottjlee, @can-anyscale, @khluu, @aslonnie, @rynewang, @edoakes, @zhanluxianshen, @venkatram-dev, @c21, @allenyin55, @alexeykudinkin, @snehakottapalli, @BitPhinix, @hongchaodeng, @dengwxn, @liuxsh9, @simonsays1980, @peytondmurray, @KepingYan, @bryant1410, @woshiyyya, @sven1977