Skip to content

Ray-2.11.0

Compare
Choose a tag to compare
@aslonnie aslonnie released this 17 Apr 23:31
· 1680 commits to master since this release

Release Highlights

  • [data] Support reading Avro files with ray.data.read_avro
  • [train] Added experimental support for AWS Trainium (Neuron) and Intel HPU.

Ray Libraries

Ray Data

🎉 New Features:

  • Support reading Avro files with ray.data.read_avro (#43663)

💫 Enhancements:

  • Pin ipywidgets==7.7.2 to enable Data progress bars in VSCode Web (#44398)
  • Change log level for ignored exceptions (#44408)

🔨 Fixes:

  • Change Parquet encoding ratio lower bound from 2 to 1 (#44470)
  • Fix throughput time calculations for metrics (#44138)
  • Fix nested ragged numpy.ndarray (#44236)
  • Fix Ray debugger incompatibility caused by trimmed error stack trace (#44496)

📖 Documentation:

  • Update "Data Loading and Preprocessing" doc (#44165)
  • Move imports into TFPRedictor in batch inference example (#44434)

Ray Train

🎉 New Features:

  • Add experimental support for AWS Trainium (Neuron) (#39130)
  • Add experimental support for Intel HPU (#43343)

💫 Enhancements:

  • Log a deprecation warning for local_dir and related environment variables (#44029)
  • Enforce xgboost>=1.7 for XGBoostTrainer usage (#44269)

🔨 Fixes:

  • Fix ScalingConfig(accelerator_type) to request an appropriate resource amount (#44225)
  • Fix maximum recursion issue when serializing exceptions (#43952)
  • Remove base config deepcopy when initializing the trainer actor (#44611)

🏗 Architecture refactoring:

  • Remove deprecated BatchPredictor (#43934)

Ray Tune

💫 Enhancements:

  • Add support for new style lightning import (#44339)
  • Log a deprecation warning for local_dir and related environment variables (#44029)

🏗 Architecture refactoring:

  • Remove scikit-optimize search algorithm (#43969)

Ray Serve

🔨 Fixes:

  • Dynamically-created applications will no longer be deleted when a config is PUT via the REST API (#44476).
  • Fix _to_object_ref memory leak (#43763)
  • Log warning to reconfigure max_ongoing_requests if max_batch_size is less than max_ongoing_requests (#43840)
  • Deployment fails to start with ModuleNotFoundError in Ray 3.10 (#44329)
    • This was fixed by reverting the original core changes on the sys.path behavior. Revert "[core] If there's working_dir, don't set _py_driver_sys_path." (#44435)
  • The batch_queue_cls parameter is removed from the @serve.batch decorator (#43935)

RLlib

🎉 New Features:

  • New API stack: DQN Rainbow is now available for single-agent (#43196, #43198, #43199)
  • PrioritizedEpisodeReplayBuffer is available for off-policy learning using the EnvRunner API (SingleAgentEnvRunner) and supports random n-step sampling (#42832, #43258, #43458, #43496, #44262)

💫 Enhancements:

  • Restructured examples/ folder; started moving example scripts to the new API stack (#44559, #44067, #44603)
  • Evaluation do-over: Deprecate enable_async_evaluation option (in favor of existing evaluation_parallel_to_training setting). (#43787)
  • Add: module_for API to MultiAgentEpisode (analogous to policy_for API of the old Episode classes). (#44241)
  • All rllib_contrib old stack algorithms have been removed from rllib/algorithms (#43656)

🔨 Fixes:

📖 Documentation:

Ray Core and Ray Clusters

🎉 New Features:

  • Added Ray check-open-ports CLI for checking potential open ports to the public (#44488)

💫 Enhancements:

  • Support nodes sharing the same spilling directory without conflicts. (#44487)
  • Create two subclasses of RayActorError to distinguish between actor died (ActorDiedError) and actor temporarily unavailable (ActorUnavailableError) cases.

🔨 Fixes:

  • Fixed the ModuleNotFound issued introduced in 2.10 (#44435)
  • Fixed an issue where agent process is using too much CPU (#44348)
  • Fixed race condition in multi-threaded actor creation (#44232)
  • Fixed several streaming generator bugs (#44079, #44257, #44197)
  • Fixed an issue where user exception raised from tasks cannot be subclassed (#44379)

Dashboard

💫 Enhancements:

  • Add serve controller metrics to serve system dashboard page (#43797)
  • Add Serve Application rows to Serve top-level deployments details page (#43506)
  • [Actor table page enhancements] Include "NodeId", "CPU", "Memory", "GPU", "GRAM" columns in the actor table page. Add sort functionality to resource utilization columns. Enable searching table by "Class" and "Repr". (#42588) (#42633) (#42788)

🔨 Fixes:

  • Fix default sorting of nodes in Cluster table page to first be by "Alive" nodes, then head nodes, then alphabetical by node ID. (#42929)
  • Fix bug where the Serve Deployment detail page fails to load if the deployment is in "Starting" state (#43279)

Docs

💫 Enhancements:

  • Landing page refreshes its look and feel. (#44251)

Thanks

Many thanks to all those who contributed to this release!

@aslonnie, @brycehuang30, @MortalHappiness, @astron8t-voyagerx, @edoakes, @sven1977, @anyscalesam, @scottjlee, @hongchaodeng, @slfan1989, @hebiao064, @fishbone, @zcin, @GeneDer, @shrekris-anyscale, @kira-lin, @chappidim, @raulchen, @c21, @WeichenXu123, @marian-code, @bveeramani, @can-anyscale, @mjd3, @justinvyu, @jackhumphries, @Bye-legumes, @ashione, @alanwguo, @Dreamsorcerer, @KamenShah, @jjyao, @omatthew98, @autolisis, @Superskyyy, @stephanie-wang, @simonsays1980, @davidxia, @angelinalg, @architkulkarni, @chris-ray-zhang, @kevin85421, @rynewang, @peytondmurray, @zhangyilun, @khluu, @matthewdeng, @ruisearch42, @pcmoritz, @mattip, @jerome-habana, @alexeykudinkin