Ray-2.11.0
Release Highlights
- [data] Support reading Avro files with
ray.data.read_avro
- [train] Added experimental support for AWS Trainium (Neuron) and Intel HPU.
Ray Libraries
Ray Data
🎉 New Features:
- Support reading Avro files with
ray.data.read_avro
(#43663)
💫 Enhancements:
- Pin
ipywidgets==7.7.2
to enable Data progress bars in VSCode Web (#44398) - Change log level for ignored exceptions (#44408)
🔨 Fixes:
- Change Parquet encoding ratio lower bound from 2 to 1 (#44470)
- Fix throughput time calculations for metrics (#44138)
- Fix nested ragged
numpy.ndarray
(#44236) - Fix Ray debugger incompatibility caused by trimmed error stack trace (#44496)
📖 Documentation:
- Update "Data Loading and Preprocessing" doc (#44165)
- Move imports into
TFPRedictor
in batch inference example (#44434)
Ray Train
🎉 New Features:
- Add experimental support for AWS Trainium (Neuron) (#39130)
- Add experimental support for Intel HPU (#43343)
💫 Enhancements:
- Log a deprecation warning for local_dir and related environment variables (#44029)
- Enforce xgboost>=1.7 for XGBoostTrainer usage (#44269)
🔨 Fixes:
- Fix ScalingConfig(accelerator_type) to request an appropriate resource amount (#44225)
- Fix maximum recursion issue when serializing exceptions (#43952)
- Remove base config deepcopy when initializing the trainer actor (#44611)
🏗 Architecture refactoring:
- Remove deprecated
BatchPredictor
(#43934)
Ray Tune
💫 Enhancements:
- Add support for new style lightning import (#44339)
- Log a deprecation warning for local_dir and related environment variables (#44029)
🏗 Architecture refactoring:
- Remove scikit-optimize search algorithm (#43969)
Ray Serve
🔨 Fixes:
- Dynamically-created applications will no longer be deleted when a config is PUT via the REST API (#44476).
- Fix
_to_object_ref
memory leak (#43763) - Log warning to reconfigure
max_ongoing_requests
ifmax_batch_size
is less thanmax_ongoing_requests
(#43840) - Deployment fails to start with
ModuleNotFoundError
in Ray 3.10 (#44329)- This was fixed by reverting the original core changes on the
sys.path
behavior. Revert "[core] If there's working_dir, don't set _py_driver_sys_path." (#44435)
- This was fixed by reverting the original core changes on the
- The
batch_queue_cls
parameter is removed from the@serve.batch
decorator (#43935)
RLlib
🎉 New Features:
- New API stack: DQN Rainbow is now available for single-agent (#43196, #43198, #43199)
PrioritizedEpisodeReplayBuffer
is available for off-policy learning using the EnvRunner API (SingleAgentEnvRunner
) and supports random n-step sampling (#42832, #43258, #43458, #43496, #44262)
💫 Enhancements:
- Restructured
examples/
folder; started moving example scripts to the new API stack (#44559, #44067, #44603) - Evaluation do-over: Deprecate
enable_async_evaluation
option (in favor of existingevaluation_parallel_to_training
setting). (#43787) - Add:
module_for
API to MultiAgentEpisode (analogous topolicy_for
API of the old Episode classes). (#44241) - All
rllib_contrib
old stack algorithms have been removed fromrllib/algorithms
(#43656)
🔨 Fixes:
- New API stack: Multi-GPU + multi-agent has been fixed. This completes support for any combinations of the following on the new API stack: [single-agent, multi-agent] vs [0 GPUs, 1 GPU, >1GPUs] vs [any number of EnvRunners] (#44420, #44664, #44594, #44677, #44082, #44669, #44622)
- Various other bug fixes: #43906, #43871, #44000, #44340, #44491, #43959, #44043, #44446, #44040
📖 Documentation:
Ray Core and Ray Clusters
🎉 New Features:
- Added Ray check-open-ports CLI for checking potential open ports to the public (#44488)
💫 Enhancements:
- Support nodes sharing the same spilling directory without conflicts. (#44487)
- Create two subclasses of
RayActorError
to distinguish between actor died (ActorDiedError
) and actor temporarily unavailable (ActorUnavailableError
) cases.
🔨 Fixes:
- Fixed the
ModuleNotFound
issued introduced in 2.10 (#44435) - Fixed an issue where agent process is using too much CPU (#44348)
- Fixed race condition in multi-threaded actor creation (#44232)
- Fixed several streaming generator bugs (#44079, #44257, #44197)
- Fixed an issue where user exception raised from tasks cannot be subclassed (#44379)
Dashboard
💫 Enhancements:
- Add serve controller metrics to serve system dashboard page (#43797)
- Add Serve Application rows to Serve top-level deployments details page (#43506)
- [Actor table page enhancements] Include "NodeId", "CPU", "Memory", "GPU", "GRAM" columns in the actor table page. Add sort functionality to resource utilization columns. Enable searching table by "Class" and "Repr". (#42588) (#42633) (#42788)
🔨 Fixes:
- Fix default sorting of nodes in Cluster table page to first be by "Alive" nodes, then head nodes, then alphabetical by node ID. (#42929)
- Fix bug where the Serve Deployment detail page fails to load if the deployment is in "Starting" state (#43279)
Docs
💫 Enhancements:
- Landing page refreshes its look and feel. (#44251)
Thanks
Many thanks to all those who contributed to this release!
@aslonnie, @brycehuang30, @MortalHappiness, @astron8t-voyagerx, @edoakes, @sven1977, @anyscalesam, @scottjlee, @hongchaodeng, @slfan1989, @hebiao064, @fishbone, @zcin, @GeneDer, @shrekris-anyscale, @kira-lin, @chappidim, @raulchen, @c21, @WeichenXu123, @marian-code, @bveeramani, @can-anyscale, @mjd3, @justinvyu, @jackhumphries, @Bye-legumes, @ashione, @alanwguo, @Dreamsorcerer, @KamenShah, @jjyao, @omatthew98, @autolisis, @Superskyyy, @stephanie-wang, @simonsays1980, @davidxia, @angelinalg, @architkulkarni, @chris-ray-zhang, @kevin85421, @rynewang, @peytondmurray, @zhangyilun, @khluu, @matthewdeng, @ruisearch42, @pcmoritz, @mattip, @jerome-habana, @alexeykudinkin