23 Oct 21:57

aslonnie

385ee46

Ray-2.38.0 Latest

Latest

Ray Libraries

Ray Data

🎉 New Features:

Add Dataset.rename_columns (#47906)
Basic structured logging (#47210)

💫 Enhancements:

Add partitioning parameter to read_parquet (#47553)
Add SERVICE_UNAVAILABLE to list of retried transient errors (#47673)
Re-phrase the streaming executor current usage string (#47515)
Remove ray.kill in ActorPoolMapOperator (#47752)
Simplify and consolidate progress bar outputs (#47692)
Refactor OpRuntimeMetrics to support properties (#47800)
Refactor plan_write_op and Datasinks (#47942)
Link PhysicalOperator to its LogicalOperator (#47986)
Allow specifying both num_cpus and num_gpus for map APIs (#47995)
Allow specifying insertion index when registering custom plan optimization Rules (#48039)
Adding in better framework for substituting logging handlers (#48056)

🔨 Fixes:

Fix bug where Ray Data incorrectly emits progress bar warning (#47680)
Yield remaining results from async map_batches (#47696)
Fix event loop mismatch with async map (#47907)
Make sure num_gpus provide to Ray Data is appropriately passed to ray.remote call (#47768)
Fix unequal partitions when grouping by multiple keys (#47924)
Fix reading multiple parquet files with ragged ndarrays (#47961)
Removing unneeded test case (#48031)
Adding in better json checking in test logging (#48036)
Fix bug with inserting custom optimization rule at index 0 (#48051)
Fix logging output from write_xxx APIs (#48096)

📖 Documentation:

Add docs section for Ray Data progress bars (#47804)
Add reference to parquet predicate pushdown (#47881)
Add tip about how to understand map_batches format (#47394)

Ray Train

🏗 Architecture refactoring:

Remove deprecated mosaic and sklearn trainer code (#47901)

Ray Tune

🔨 Fixes:

Fix WandbLoggerCallback to reuse actors upon restore (#47985)

Ray Serve

🔨 Fixes:

Stop scheduling task early when requests have been canceled (#47847)

RLlib

🎉 New Features:

Enable cloud checkpointing. (#47682)

💫 Enhancements:

PPO on new API stack now shuffles batches properly before each epoch. (#47458)
Other enhancements: #47705, #47501, #47731, #47451, #47830, #47970, #47157

🔨 Fixes:

Fix spot node preemption problem (RLlib now run stably with EnvRunner workers on spot nodes) (#47940)
Fix action masking example. (#47817)
Various other fixes: #47973, #46721, #47914, #47880, #47304, #47686

🏗 Architecture refactoring:

Switch on new API stack by default for SAC and DQN. (#47217)
Remove Tf support on new API stack for PPO/IMPALA/APPO (only DreamerV3 on new API stack remains with tf now). (#47892)
Discontinue support for "hybrid" API stack (using RLModule + Learner, but still on RolloutWorker and Policy) (#46085)
RLModule (new API stack) refinements: #47884, #47885, #47889, #47908, #47915, #47965, #47775

📖 Documentation:

Add new API stack migration guide. (#47779)
New API stack example script: BC pre training, then PPO finetuning using same RLModule class. (#47838)
New API stack: Autoregressive actions example. (#47829)
Remove old API stack connector docs entirely. (#47778)

Ray Core and Ray Clusters

Ray Core

🎉 New Features:

CompiledGraphs: support multi readers in multi node when DAG is created from an actor (#47601)

💫 Enhancements:

Add a flag to raise exception for out of band serialization of ObjectRef (#47544)
Store each GCS table in its own Redis Hash (#46861)
Decouple create worker vs pop worker request. (#47694)
Add metrics for GCS jobs (#47793)

🔨 Fixes:

Fix broken dashboard cluster page when there are dead nodes (#47701)
Fix the ray_tasks{State="PENDING_ARGS_FETCH"} metric counting (#47770)
Separate the attempt_number with the task_status in memory summary and object list (#47818)
Fix object reconstruction hang on arguments pending creation (#47645)
Fix check failure: sync_reactors_.find(reactor->GetRemoteNodeID()) == sync_reactors_.end() (#47861)
Fix check failure RAY_CHECK(it != current_tasks_.end()); (#47659)

📖 Documentation:

KubeRay docs: Add docs for YuniKorn Gang scheduling #47850

Dashboard

💫 Enhancements:

Performance improvements for large scale clusters (#47617)

🔨 Fixes:

Placement group and required resources not showing correctly in dashboard (#47754)

Thanks

Many thanks to all those who contributed to this release!
@GeneDer, @rkooo567, @dayshah, @saihaj, @nikitavemuri, @bill-oconnor-anyscale, @WeichenXu123, @can-anyscale, @jjyao, @edoakes, @kekulai-fredchang, @bveeramani, @alexeykudinkin, @raulchen, @khluu, @sven1977, @ruisearch42, @dentiny, @MengjinYan, @Mark2000, @simonsays1980, @rynewang, @PatricYan, @zcin, @sofianhnaide, @matthewdeng, @dlwh, @scottjlee, @MortalHappiness, @kevin85421, @win5923, @aslonnie, @prithvi081099, @richardsliu, @milesvant, @omatthew98, @Superskyyy, @pcmoritz

Contributors

dlwh, pcmoritz, and 36 other contributors

Assets 2

24 Sep 23:37

khluu

ray-2.37.0

1b620f2

Ray-2.37.0

Ray Libraries

Ray Data

💫 Enhancements:

Simplify custom metadata provider API (#47575)
Change counts of metrics to rates of metrics (#47236)
Throw exception for non-streaming HF datasets with "override_num_blocks" argument (#47559)
Refactor custom optimizer rules (#47605)

🔨 Fixes:

Remove ineffective retry code in plan_read_op (#47456)
Fix incorrect pending task size if outputs are empty (#47604)

Ray Train

💫 Enhancements:

Update run status and add stack trace to TrainRunInfo (#46875)

Ray Serve

💫 Enhancements:

Allow control of some serve configuration via env vars (#47533)
[serve] Faster detection of dead replicas (#47237)

🔨 Fixes:

[Serve] fix component id logging field (#47609)

RLlib

💫 Enhancements:

New API stack:
- Add restart-failed-env option to EnvRunners. (#47608 )
- Offline RL: Store episodes in state form. (#47294 )
- Offline RL: Replace GAE in MARWILOfflinePreLearner with GeneralAdvantageEstimation connector in learner pipeline. (#47532)
- Off-policy algos: Add episode sampling to EpisodeReplayBuffer. (#47500)
- RLModule APIs: Add SelfSupervisedLossAPI for RLModules that bring their own loss and InferenceOnlyAPI. (#47581, #47572)

Ray Core

💫 Enhancements:

[aDAG] Allow custom NCCL group for aDAG (#47141)
[aDAG] support buffered input (#47272)
[aDAG] Support multi node multi reader (#47480)
[Core] Make is_gpu, is_actor, root_detached_id fields late bind to workers. (#47212)
[Core] Reconstruct actor to run lineage reconstruction triggered actor task (#47396)
[Core] Optimize GetAllJobInfo API for performance (#47530)

🔨 Fixes:

[aDAG] Fix ranks ordering for custom NCCL group (#47594)

Ray Clusters

📖 Documentation:

[KubeRay] add a guide for deploying vLLM with RayService (#47038)

Thanks

Many thanks to all those who contributed to this release!
@ruisearch42, @andrewsykim, @timkpaine, @rkooo567, @WeichenXu123, @GeneDer, @sword865, @simonsays1980, @angelinalg, @sven1977, @jjyao, @woshiyyya, @aslonnie, @zcin, @omatthew98, @rueian, @khluu, @justinvyu, @bveeramani, @nikitavemuri, @chris-ray-zhang, @liuxsh9, @xingyu-long, @peytondmurray, @rynewang

Contributors

sword865, jjyao, and 23 other contributors

Assets 2

23 Sep 18:47

khluu

ray-2.36.1

999f766

Ray-2.36.1

Ray Core

🔨 Fixes:

Fix broken dashboard cluster page when there are dead nodes (#47701)
Fix broken dashboard worker page (#47714)

Assets 2

17 Sep 18:30

GeneDer

ray-2.36.0

85d98e1

Ray-2.36.0

Ray Libraries

Ray Data

💫 Enhancements:

Remove limit on number of tasks launched per scheduling step (#47393)
Allow user-defined Exception to be caught. (#47339)

🔨 Fixes:

Display pending actors separately in the progress bar and not count them towards running resources (#46384)
Fix bug where arrow_parquet_args aren't used (#47161)
Skip empty JSON files in read_json() (#47378)
Remove remote call for initializing Datasource in read_datasource() (#47467)
Remove dead from_*_operator modules (#47457)
Release test fixes
Add AWS ACCESS_DENIED as retryable exception for multi-node Data+Train benchmarks (#47232)
Get AWS credentials with boto (#47352)
Use worker node instead of head node for read_images_comparison_microbenchmark_single_node release test (#47228)

📖 Documentation:

Add docstring to explain Dataset.deserialize_lineage (#47203)
Add a comment explaining the bundling behavior for map_batches with default batch_size (#47433)

Ray Train

💫 Enhancements:

Decouple device-related modules and add Huawei NPU support to Ray Train (#44086)

🔨 Fixes:

Update TORCH_NCCL_ASYNC_ERROR_HANDLING env var (#47292)

📖 Documentation:

Add missing Train public API reference (#47134)

Ray Tune

📖 Documentation:

Add missing Tune public API references (#47138)

Ray Serve

💫 Enhancements:

Mark proxy as unready when its routers are aware of zero replicas (#47002)
Setup default serve logger (#47229)

🔨 Fixes:

Allow get_serve_logs_dir to run outside of Ray's context (#47224)
Use serve logger name for logs in serve (#47205)

📖 Documentation:

[HPU] [Serve] [experimental] Add vllm HPU support in vllm example (#45893)

🏗 Architecture refactoring:

Remove support for nested DeploymentResponses (#47209)

RLlib

🎉 New Features:

New API stack: Add CQL algorithm. (#47000, #47402)
New API stack: Enable GPU and multi-GPU support for DQN/SAC/CQL. (#47179)

💫 Enhancements:

New API stack: Offline RL enhancements: #47195, #47359
Enhance new API stack stability: #46324, #47196, #47245, #47279
Fix large batch size for synchronous algos (e.g. PPO) after EnvRunner failures. (#47356)
Add torch.compile config options to old API stack. (#47340 )
Add kwargs to torch.nn.parallel.DistributedDataParallel (#47276)
Enhanced CI stability: #47197, #47249

📖 Documentation:

New API stack example scripts:
- Float16 training example script. (#47362)
- Mixed precision training example script (#47116)
- ModelV2 -> RLModule wrapper for migrating to new API stack. (#47425)
Remove "new API stack experimental" hint from docs. (#47301)

🏗 Architecture refactoring:

Remove 2nd Learner ConnectorV2 pass from PPO (#47401)
Add separate learning rates for policy and alpha to SAC. (#47078)

🔨 Fixes:

Various bug fixes: #47401, #47194, #47259, #47271, #47277, #47382

Ray Core

💫 Enhancements:

[ADAG] Raise proper error message for nccl within the same actor (#47250)
[ADAG] Support multi-read of the same shm channel (#47311 )
Log why core worker is not idle during HandleExit (#47300 )
Add PREPARED state for placement groups in GCS for better fault tolerance. (#46858)

🔨 Fixes:

Fix ray_unintentional_worker_failures_total to only count unintentional worker failures (#47368)
Fix runtime env race condition when uploading the same package concurrently (#47482)

Dashboard

🔨 Fixes:

Performance optimizations for dashboard backend logic (#47392) (#47367) (#47160) (#47213)
Refactor to simplify dashboard backend logic (#47324)

Docs

💫 Enhancements:

Add sphinx-autobuild and documentation for make local (#47275): Speed up of local docs builds with make local.
Add Algolia search to docs (#46477)
Update PyTorch Mnist Training doc for KubeRay 1.2.0 (#47321)
Life-cycle of documentation policy of Ray APIs

Thanks

Many thanks to all those who contributed to this release!
@GeneDer, @Bye-legumes, @nikitavemuri, @kevin85421, @MortalHappiness, @LeoLiao123, @saihaj, @rmcsqrd, @bveeramani, @zcin, @matthewdeng, @raulchen, @mattip, @jjyao, @ruisearch42, @scottjlee, @can-anyscale, @khluu, @aslonnie, @rynewang, @edoakes, @zhanluxianshen, @venkatram-dev, @c21, @allenyin55, @alexeykudinkin, @snehakottapalli, @BitPhinix, @hongchaodeng, @dengwxn, @liuxsh9, @simonsays1980, @peytondmurray, @KepingYan, @bryant1410, @woshiyyya, @sven1977

Contributors

alexeykudinkin, mattip, and 35 other contributors

Assets 2

28 Aug 00:11

khluu

ray-2.35.0

c5d536d

Ray-2.35.0

Notice: Starting from this release, pip install ray[all] will not include ray[cpp], and will not install the respective ray-cpp package. To install everything that includes ray-cpp, one can use pip install ray[cpp-all] instead.

Ray Libraries

Ray Data

🎉 New Features:

Upgrade supported Arrow version from 16 to 17 (#47034)
Add support for reading from Iceberg (#46889)

💫 Enhancements:

Various Progress Bar UX improvements (#46816, #46801, #46826, #46692, #46699, #46974, #46928, #47029, #46924, #47120, #47095, #47106)
Try get size_bytes from metadata and consolidate metadata methods (#46862)
Improve warning message when read task is large (#46942)
Extend API to enable passing sample weights via ray.dataset.to_tf (#45701)
Add a parameter to allow overriding LanceDB scanner options (#46975)
Add failure retry logic for read_lance (#46976)
Clarify warning for reading old Parquet data (#47049)
Move datasource implementations to _internal subpackage (#46825)
Handle logs from tensor extensions (#46943)

🔨 Fixes:

Change type of DataContext.retried_io_errors from tuple to list (#46884)
Make Parquet tests more robust and expose Parquet logic (#46944)
Change pickling log level from warning to debug (#47032)
Add validation for shuffle arg (#47055)
Fix validation bug when size=0 in ActorPoolStrategy (#47072)
Fix exception in async map (#47110)
Fix wrong metrics group for Object Store Memory metrics on Ray Data Dashboard (#47170)
Handle errors in SplitCoordinator when generating a new epoch (#47176)

📖 Documentation:

Auto-gen GroupedData api (#46925)
Fix signature of Rule.plan (#47094)

Ray Train

💫 Enhancements:

[train] Updates to support xgboost==2.1.0 (#46667)
[train] Add hardware stats (#46719)

Ray Tune

🔨 Fixes:

[RLlib; Tune] Fix WandB metric overlap after restore from checkpoint. (#46897)

Ray Serve

💫 Enhancements:

Improved handling of replica death and replica unavailability in deployment handle routers before controller restarts replica (#47008)
Eagerly create routers in proxy for better GCS fault tolerance (#47031)
Immediately send ping in router when receiving new replica set (#47053)

🏗 Architecture refactoring:

Deprecate passing arguments that contain DeploymentResponses in nested objects to downstream deployment handle calls (#46806)

RLlib

🎉 New Features:

Offline RL on the new API stack:
- Record offline data (#46818, #47046, #47133, #47155) and support to directly read from episodes. (#46865)
- RLUnplugged example. (#46792)
- Progress on BC/MARWIL migration: #44970, #47154, #46799
- Progress on CQL migration: #46969, #47105

💫 Enhancements:

Add ObservationPreprocessor (ConnectorV2). (#47077)

🔨 Fixes:

New API stack: Fix IMPALA/APPO + LSTM for single- and multi-GPU. (#47132, #47158)
Various bug fixes: #46898, #47047, #46963, #47021, #46897
Add more control to Algorithm.add_module/policy methods. (#46932, #46836)

📖 Documentation:

Example scripts for new API stack:
- Curiosity (inverse dynamics model-based) RLModule example. (#46841)
- Add example script for Env with protobuf observation space. (#47071)
New API stack documentation:
- Cleanup old API stack docs (rllib-dev.rst). (#47172)
- Episodes (SingleAgentEpisode). (#46985)
- Redo rllib-algorithms.rst page. (#46916)

🏗 Architecture refactoring:

Rename MultiAgent...RLModule... into MultiRL...Module for more generality. (#46840)
Add learner_only flag to RLModuleConfig/Spec and simplify creation of RLModule specs from algo-config. (#46900)

Ray Core

💫 Enhancements:

Emit total lineage bytes metrics (#46725)
Adding accelerator type H100 (#46823)
More structured logging in core worker (#46906)
Change all callbacks to move to save copies. (#46971)
Add ray[adag] option to pip install (#47009)

🔨 Fixes:

Fix dashboard process reporting on windows (#45578)
Fix Ray-on-Spark cluster crashing bug when user cancels cell execution (#46899)
Fix PinExistingReturnObject segfault by passing owner_address (#46973)
Fix raylet CHECK failure from runtime env creation failure. (#46991)
Fix typo in memray command (#47006)
[ADAG] Fix for asyncio outputs (#46845)

📖 Documentation:

Clarify behavior of placement_group_capture_child_tasks in docs (#46885)
Update ray.available_resources() docstring (#47018)

🏗 Architecture refactoring:

Async APIs for the New GcsClient. (#46788)
Replace GCS stubs in the dashboard to use NewGcsAioClient. (#46846)

Dashboard

💫 Enhancements:

Polish and minor improvements to the Serve page (#46811)

🔨 Fixes:

Fix CPU/GPU/RAM not being reported correctly on Windows (#44578)

Docs

💫 Enhancements:

Add more information about developer tooling for docs contributions (#46636), including esbonio section

🔨 Fixes:

Use PyData Sphinx theme version switcher (#46936)

Thanks

Many thanks to all those who contributed to this release!
@simonsays1980, @bveeramani, @tungh2, @zcin, @xingyu-long, @WeichenXu123, @aslonnie, @MaxVanDijck, @can-anyscale, @galenhwang, @omatthew98, @matthewdeng, @raulchen, @sven1977, @shrekris-anyscale, @deepyaman, @alexeykudinkin, @stephanie-wang, @kevin85421, @ruisearch42, @hongchaodeng, @khluu, @alanwguo, @hongpeng-guo, @saihaj, @Superskyyy, @tespent, @slfan1989, @justinvyu, @rynewang, @nikitavemuri, @amogkam, @mattip, @dev-goyal, @ryanaoleary, @peytondmurray, @edoakes, @venkatajagannath, @jjyao, @cristianjd, @scottjlee, @Bye-legumes

Contributors

alexeykudinkin, alanwguo, and 40 other contributors

Assets 2

31 Jul 18:02

can-anyscale

ray-2.34.0

fc87217

Release 2.34.0 Notes

Ray Libraries

Ray Data

💫 Enhancements:

Add better support for UDF returns from list of datetime objects (#46762)

🔨 Fixes:

Remove read task warning if size bytes not set in metadata (#46765)

📖 Documentation:

Fix read_tfrecords() docstring to display tfx-bsl tip (#46717)
Update Dataset.zip() docs (#46757)

Ray Train

🔨 Fixes:

Sort workers by node ID rather than by node IP (#46163)

🏗 Architecture refactoring:

Remove dead RayDatasetSpec (#46764)

RLlib

🎉 New Features:

Offline RL support on new API stack:
- Initial design for Ray-Data based offline RL Algos (on new API stack). (#44969)
- Add user-defined schemas for data loading. (#46738)
- Make data pipeline better configurable and tuneable for users. (#46777)

💫 Enhancements:

Move DQN into the TargetNetworkAPI (and deprecate RLModuleWithTargetNetworksInterface). (#46752)

🔨 Fixes:

Numpy version fix: Rename all np.product usage to np.prod (#46317)

📖 Documentation:

Examples for new API stack: Add 2 (count-based) curiosity examples. (#46737)
Remove RLlib CLI from docs (soon to be deprecated and replaced by python API). (#46724)

🏗 Architecture refactoring:

Cleanup, rename, clarify: Algorithm.workers/evaluation_workers, local_worker(), etc.. (#46726)

Ray Core

🏗 Architecture refactoring:

New python GcsClient binding (#46186)

Many thanks to all those who contributed to this release! @KyleKoon, @ruisearch42, @rynewang, @sven1977, @saihaj, @aslonnie, @bveeramani, @akshay-anyscale, @kevin85421, @omatthew98, @anyscalesam, @MaxVanDijck, @justinvyu, @simonsays1980, @can-anyscale, @peytondmurray, @scottjlee

Contributors

simonsays1980, justinvyu, and 15 other contributors

Assets 2

25 Jul 20:28

jjyao

ray-2.33.0

914af09

Ray-2.33.0

Ray Libraries

Ray Core

💫 Enhancements:

Add "last exception" to error message when GCS connection fails in ray.init() (#46516)

🔨 Fixes:

Add object back to memory store when object recovery is skipped (#46460)
Task status should start with PENDING_ARGS_AVAIL when retry (#46494)
Fix ObjectFetchTimedOutError (#46562)
Make working_dir support files created before 1980 (#46634)
Allow full path in conda runtime env. (#45550)
Fix worker launch time formatting in state api (#43516)

Ray Data

🎉 New Features:

Deprecate Dataset.get_internal_block_refs() (#46455)
Add read API for reading Databricks table with Delta Sharing (#46072)
Add support for objects to Arrow blocks (#45272)

💫 Enhancements:

Change offsets to int64 and change to LargeList for ArrowTensorArray (#45352)
Prevent from_pandas from combining input blocks (#46363)
Update Dataset.count() to avoid unnecessarily keeping BlockRefs in-memory (#46369)
Use Set to fix inefficient iteration over Arrow table columns (#46541)
Add AWS Error UNKNOWN to list of retried write errors (#46646)
Always print traceback for internal exceptions (#46647)
Allow unknown estimate of operator output bundles and ProgressBar totals (#46601)
Improve filesystem retry coverage (#46685)

🔨 Fixes:

Replace lambda mutable default arguments (#46493)

📖 Documentation:

Auto-generate Dataset API documentation (#46557)
Update outdated ExecutionPlan docstring (#46638)

Ray Train

💫 Enhancements:

Update run status and actor status for train runs. (#46395)

🔨 Fixes:

Replace lambda default arguments (#46576)

📖 Documentation:

Add MNIST training using KubeRay doc page (#46123)
Add example of pre-training Llama model on Intel Gaudi (#45459)
Fix tensorflow example by using ScalingConfig (#46565)

Ray Tune

🔨 Fixes:

Replace lambda default arguments (#46596)

Ray Serve

🎉 New Features:

Fully deprecate target_num_ongoing_requests_per_replica and max_concurrent_queries, respectively replaced by max_ongoing_requests and target_ongoing_requests (#46392 and #46427)
Configure the task launched by the controller to build an application with Serve’s logging config (#46347)

RLlib

💫 Enhancements:

Moving sampling coordination for batch_mode=complete_episodes to synchronous_parallel_sample. (#46321)
Enable complex action spaces with stateful modules. (#46468)

🏗 Architecture refactoring:

Enable multi-learner setup for hybrid stack BC. (#46436)
Introduce Checkpointable API for RLlib components and subcomponents. (#46376)

🔨 Fixes:

Replace Mapping typehint with Dict: #46474

📖 Documentation:

More example scripts for new API stack: Two separate optimizers (w/ different learning rates). (#46540) and custom loss function. (#46445)

Dashboard

🔨 Fixes:

Task end time showing the incorrect time (#46439)
Events Table rows having really bad spacing (#46701)
UI bugs in the serve dashboard page (#46599)

Thanks

Many thanks to all those who contributed to this release!

@alanwguo, @hongchaodeng, @anyscalesam, @brucebismarck, @bt2513, @woshiyyya, @terraflops1048576, @lorenzoritter, @omrishiv, @davidxia, @cchen777, @nono-Sang, @jackhumphries, @aslonnie, @JoshKarpel, @zjregee, @bveeramani, @khluu, @Superskyyy, @liuxsh9, @jjyao, @ruisearch42, @sven1977, @harborn, @saihaj, @zcin, @can-anyscale, @veekaybee, @chungen04, @WeichenXu123, @GeneDer, @sergey-serebryakov, @Bye-legumes, @scottjlee, @rynewang, @kevin85421, @cristianjd, @peytondmurray, @MortalHappiness, @MaxVanDijck, @simonsays1980, @mjovanovic9999

Contributors

omrishiv, davidxia, and 40 other contributors

Assets 2

10 Jul 16:40

aslonnie

ray-2.32.0

607f2f3

Ray-2.32.0

Highlight: aDAG Developer Preview

This is a new Ray Core specific feature called Ray accelerated DAGs (aDAGs).

aDAGs give you a Ray Core-like API but with extensibility to pre-compile execution paths across pre-allocated resources on a Ray Cluster to possible benefits for optimization on throughput and latency. Some practical examples include:
- Up to 10x lower task execution time on single-node.
- Native support for GPU-GPU communication, via NCCL.
This is still very early, but please reach out on #ray-core on Ray Slack to learn more!

Ray Libraries

Ray Data

💫 Enhancements:

Support async callable classes in map_batches() (#46129)

🔨 Fixes:

Ensure InputDataBuffer doesn't free block references (#46191)
MapOperator.num_active_tasks should exclude pending actors (#46364)
Fix progress bars being displayed as partially completed in Jupyter notebooks (#46289)

📖 Documentation:

Fix docs: read_api.py docstring (#45690)
Correct API annotation for tfrecords_datasource (#46171)
Fix broken links in README and in ray.data.Dataset (#45345)

Ray Train

📖 Documentation:

Update PyTorch Data Ingestion User Guide (#45421)

Ray Serve

💫 Enhancements:

Optimize ServeController.get_app_config() (#45878)
Change default for max and target ongoing requests (#45943)
Integrate with Ray structured logging (#46215)
Allow configuring handle cache size and controller max concurrency (#46278)
Optimize DeploymentDetails.deployment_route_prefix_not_set() (#46305)

RLlib

🎉 New Features:

APPO on new API stack (w/ EnvRunners). (#46216)

💫 Enhancements:

Stability: APPO, SAC, and DQN activate multi-agent learning tests (#45542, #46299)
Make Tune trial ID available in EnvRunners (and callbacks). (#46294)
Add env- and agent_steps to custom evaluation function. (#45652)
Remove default-metrics from Algorithm (tune does NOT error anymore if any stop-metric is missing). (#46200)

🔨 Fixes:

Various bug fixes: #45542

📖 Documentation:

Example for new API stack: Offline RL (BC) training on single-agent, while evaluating w/ multi-agent setup. (#46251)
Example for new API stack: Custom RLModule with an LSTM. (#46276)

Ray Core

🎉 New Features:

aDAG Developer Preview.

💫 Enhancements:

Allow env setup logger encoding (#46242)
ray list tasks filter state and name on GCS side (#46270)
Log ray version and ray commit during GCS start (#46341)

🔨 Fixes:

Decrement lineage ref count of an actor when the actor task return object reference is deleted (#46230)
Fix negative ALIVE actors metric and introduce IDLE state (#45718)
psutil process attr num_fds is not available on Windows (#46329)

Dashboard

🎉 New Features:

Added customizable refresh frequency for metrics on Ray Dashboard (#44037)

💫 Enhancements:

Upgraded to MUIv5 and React 18 (#45789)

🔨 Fixes:

Fix for multi-line log items breaking log viewer rendering (#46391)
Fix for UI inconsistency when a job submission creates more than one Ray job. (#46267)
Fix filtering by job id for tasks API not filtering correctly. (#45017)

Docs

🔨 Fixes:

Re-enabled automatic cross-reference link checking for Ray documentation, with Sphinx nitpicky mode (#46279)
Enforced naming conventions for public and private APIs to maintain accuracy, starting with Ray Data API documentation (#46261)

📖 Documentation:

Upgrade Python 3.12 support to alpha, marking the release of the Ray wheel to PyPI and conducting a sanity check of the most critical tests.

Thanks

Many thanks to all those who contributed to this release!

@stephanie-wang, @MortalHappiness, @aslonnie, @ryanaoleary, @jjyao, @jackhumphries, @nikitavemuri, @woshiyyya, @JoshKarpel, @ruisearch42, @sven1977, @alanwguo, @GeneDer, @saihaj, @raulchen, @liuxsh9, @khluu, @cristianjd, @scottjlee, @bveeramani, @zcin, @simonsays1980, @SumanthRH, @davidxia, @can-anyscale, @peytondmurray, @kevin85421

Contributors

davidxia, alanwguo, and 25 other contributors

Assets 2

26 Jun 22:06

khluu

ray-2.31.0

1240d3f

Ray-2.31.0

Ray Libraries

Ray Data

🔨 Fixes:

Fixed bug where preserve_order doesn’t work with file reads (#46135)

📖 Documentation:

Added documentation for dataset.Schema (#46170)

Ray Train

💫 Enhancements:

Add API for Ray Train run stats (#45711)

Ray Tune

💫 Enhancements:

Missing stopping criterion should not error (just warn). (#45613)

📖 Documentation:

Fix broken references in Ray Tune documentation (#45233)

Ray Serve

WARNING: the following default values will change in Ray 2.32:

Default for max_ongoing_requests will change from 100 to 5.
Default for target_ongoing_requests will change from 1 to 2.

💫 Enhancements:

Optimize DeploymentStateManager.get_deployment_statuses (#45872)

🔨 Fixes:

Fix logging error on passing traceback object into exc_info (#46105)
Run del even if constructor is still in-progress (#45882)
Spread replicas with custom resources in torch tune serve release test (#46093)
[1k release test] don't run replicas on head node (#46130)

📖 Documentation:

Remove todo since issue is fixed (#45941)

RLlib

🎉 New Features:

IMPALA runs on the new API stack (with EnvRunners and ConnectorV2s). (#42085)
SAC/DQN: Prioritized multi-agent episode replay buffer. (#45576 )

💫 Enhancements:

New API stack stability: Add systematic CI learning tests for all possible combinations of: [PPO|IMPALA] + [1CPU|2CPU|1GPU|2GPU] + [single-agent|multi-agent]. (#46162, #46161)

📖 Documentation:

New API stack: Example script for action masking (#46146)
New API stack: PyFlight example script cleanup (#45956 )
Old API stack: Enhanced ONNX example (+LSTM). (#43592 )

Ray Core and Ray Clusters

Ray Core

💫 Enhancements:

[runtime-env] automatically infer worker path when starting worker in container (#42304)

🔨 Fixes:

On GCS restart, destroy not forget the unused workers. Fixing PG leaks. (#45854)
Cancel lease requests before returning a PG bundle (#45919)
Fix boost fiber stack overflow (#46133)

Thanks

Many thanks to all those who contributed to this release!

@jjyao, @kevin85421, @vincent-pli, @khluu, @simonsays1980, @sven1977, @rynewang, @can-anyscale, @richardsliu, @jackhumphries, @alexeykudinkin, @bveeramani, @ruisearch42, @shrekris-anyscale, @stephanie-wang, @matthewdeng, @zcin, @hongchaodeng, @ryanaoleary, @liuxsh9, @GeneDer, @aslonnie, @peytondmurray, @Bye-legumes, @woshiyyya, @scottjlee, @JoshKarpel

Contributors

alexeykudinkin, jjyao, and 25 other contributors

Assets 2

20 Jun 23:08

can-anyscale

ray-2.30.0

97c3729

Ray-2.30.0

Ray Libraries

Ray Data

💫 Enhancements:

Improve fractional CPU/GPU formatting (#45673)
Use sampled fragments to estimate Parquet reader batch size (#45749)
Refactoring ParquetDatasource and metadata fetching logic (#45728, #45727, #45733, #45734, #45767)
Refactor planner.py (#45706)

Ray Tune

💫 Enhancements:

Change the behavior of a missing stopping criterion metric to warn instead of raising an error. This enables the use case of reporting different sets of metrics on different iterations (ex: a separate set of training and validation metrics). (#45613)

Ray Serve

💫 Enhancements:

Create internal request id to track request objects (#45761)

RLLib

💫 Enhancements:

Stability: DreamerV3 weekly release test (#45654); Add "official" benchmark script for Atari PPO benchmarks. (#45697)
Enhance env-rendering callback (#45682)

🔨 Fixes:

Bug fix in new MetricsLogger API: EMA stats w/o window would lead to infinite list mem-leak. (#45752)
Various other bug fixes: (#45819, #45820, #45683, #45651, #45753)

📖 Documentation:

Re-do examples overview page (new API stack): #45382
- PyFlyt QuadX WayPoints example #44758, #45956
- RLModule inference on new API stack (#45831, #45845)
- How to resume a tune.Tuner.fit() experiment from checkpoint. (#45681)
- Custom RLModule (tiny CNN): #45774
- Connector examples docstrings (#45864)
Old API stack examples: #43592, #45829

Ray Core

🎉 New Features:

Alpha release of job level logging configuration: users can now config the user logging to be logfmt format with logging context attached. (#45344)

💫 Enhancements:

Integrate amdsmi in AMDAcceleratorManager (#44572)

🔨 Fixes:

Fix the C++ GcsClient Del not respecting del_by_prefix (#45604)
Fix exit handling of FiberState threads (#45834)

Dashboard

💫 Enhancements:

Parse out json logs (#45853)

Many thanks to all those who contributed to this release: @liuxsh9, @peytondmurray, @pcmoritz, @GeneDer, @saihaj, @khluu, @aslonnie, @yucai, @vickytsang, @can-anyscale, @bthananjeyan, @raulchen, @hongchaodeng, @x13n, @simonsays1980, @peterghaddad, @kevin85421, @rynewang, @angelinalg, @jjyao, @BenWilson2, @jackhumphries, @zcin, @chris-ray-zhang, @c21, @shrekris-anyscale, @alanwguo, @stephanie-wang, @Bye-legumes, @sven1977, @WeichenXu123, @bveeramani, @nikitavemuri

Contributors

pcmoritz, x13n, and 31 other contributors

Assets 2

Releases: ray-project/ray

Ray-2.38.0

Ray Libraries

Ray Data

Ray Train

Ray Tune

Ray Serve

RLlib

Ray Core and Ray Clusters

Ray Core

Dashboard

Thanks

Contributors

Ray-2.37.0

Ray Libraries

Ray Data

Ray Train

Ray Serve

RLlib

Ray Core

Ray Clusters

Thanks

Contributors

Ray-2.36.1

Ray Core

Ray-2.36.0

Ray Libraries

Ray Data

Ray Train

Ray Tune

Ray Serve

RLlib

Ray Core

Dashboard

Docs

Thanks

Contributors

Ray-2.35.0

Ray Libraries

Ray Data

Ray Train

Ray Tune

Ray Serve

RLlib

Ray Core

Dashboard

Docs

Thanks

Contributors

Release 2.34.0 Notes

Ray Libraries

Ray Data

Ray Train

RLlib

Ray Core

Contributors

Ray-2.33.0

Ray Libraries

Ray Core

Ray Data

Ray Train

Ray Tune

Ray Serve

RLlib

Dashboard

Thanks

Contributors

Ray-2.32.0

Highlight: aDAG Developer Preview

Ray Libraries

Ray Data

Ray Train

Ray Serve

RLlib

Ray Core

Dashboard

Docs

Thanks

Contributors

Ray-2.31.0