Releases: ray-project/ray
ray-0.7.5
Ray 0.7.5 Release Notes
Ray API
- Objects created with
ray.put()
are now reference counted. #5590 - Add internal
pin_object_data()
API. #5637 - Initial support for pickle5. #5611
- Warm up Ray on
ray.init()
. #5685 redis_address
passed toray.init
is now justaddress
. #5602
Core
- Progress towards a common C++ core worker. #5516, #5272, #5566, #5664
- Fix log monitor stall with many log files. #5569
- Print warnings when tasks are unschedulable. #5555
- Take into account resource queue lengths when autoscaling #5702, #5684
Tune
RLlib
Other Libraries
ray-0.7.4
Ray 0.7.4 Release Notes
Highlights
-
There were many documentation improvements (#5391, #5389, #5175). As we continue to improve the documentation we value your feedback through the “Doc suggestion?” link at the top of the documentation. Notable improvements:
- We’ve added guides for best practices using TensorFlow and PyTorch.
- We’ve revamped the Walkthrough page for Ray users, providing a better experience for beginners.
- We’ve revamped guides for using Actors and inspecting internal state.
-
Ray supports memory limits now to ensure memory-intensive applications run predictably and reliably. You
can activate them through theray.remote
decorator:@ray.remote( memory=2000 * 1024 * 1024, object_store_memory=200 * 1024 * 1024) class SomeActor(object): def __init__(self, a, b): pass
You can set limits for the heap and the object store, see the documentation.
-
There is now preliminary support for projects, see the the project documentation. Projects allow you to
package your code and easily share it with others, ensuring a reproducible cluster setup. To get started, you
can run# Create a new project. ray project create <project-name> # Launch a session for the project in the current directory. ray session start # Open a console for the given session. ray session attach # Stop the given session and all of its worker nodes. ray session stop
Check out the examples. This is an actively developed new feature so we appreciate your feedback!
Breaking change: The redis_address
parameter was renamed to address
(#5412, #5602) and the former will be removed in the future.
Core
- Move Java bindings on top of the core worker #5370
- Improve log file discoverability #5580
- Clean up and improve error messages #5368, #5351
RLlib
- Support custom action space distributions #5164
- Add TensorFlow eager support #5436
- Add autoregressive KL #5469
- Autoregressive Action Distributions #5304
- Implement MADDPG agent #5348
- Port Soft Actor-Critic on Model v2 API #5328
- More examples: Add CARLA community example #5333 and rock paper scissors multi-agent example #5336
- Moved RLlib to top level directory #5324
Tune
- Experimental Implementation of the BOHB algorithm #5382
- Breaking change: Nested dictionary results are now flattened for CSV writing:
{“a”: {“b”: 1}} => {“a/b”: 1}
#5346 - Add Logger for MLFlow #5438
- TensorBoard support for TensorFlow 2.0 #5547
- Added examples for XGBoost and LightGBM #5500
- HyperOptSearch now has warmstarting #5372
Other Libraries
- SGD: Tune interface for Pytorch MultiNode SGD #5350
- Serving: The old version of ray.serve was deprecated #5541
- Autoscaler: Fix ssh control path limit #5476
- Dev experience: Ray CI tracker online at https://ray-travis-tracker.herokuapp.com/
Various fixes: Fix log monitor issues #4382 #5221 #5569, the top-level ray directory was cleaned up #5404
Thanks
We thank the following contributors for their amazing contributions:
@jon-chuang, @lufol, @adamochayon, @idthanm, @RehanSD, @ericl, @michaelzhiluo, @nflu, @pengzhenghao, @hartikainen, @wsjeon, @raulchen, @TomVeniat, @layssi, @jovany-wang, @llan-ml, @ConeyLiu, @mitchellstern, @gregSchwartz18, @jiangzihao2009, @jichan3751, @mhgump, @zhijunfu, @micafan, @simon-mo, @richardliaw, @stephanie-wang, @edoakes, @akharitonov, @mawright, @robertnishihara, @lisadunlap, @flying-mojo, @pcmoritz, @jredondopizarro, @gehring, @holli, @kfstorm
ray-0.7.3
Ray 0.7.3 Release Note
Highlights
- RLlib ModelV2API is ready to use. It improves support for Keras and RNN models, as well as allowing object-oriented reuse of variables. ModelV1 API is deprecated. No migration is needed.
ray.experimental.sgd.pytorch.PyTorchTrainer
is ready for early adopters. Checkout the documentation here. We welcome your feedback!model_creator = lambda config: YourPyTorchModel() data_creator = lambda config: YourTrainingSet(), YourValidationSet() trainer = PyTorchTrainer( model_creator, data_creator, optimizer_creator=utils.sgd_mse_optimizer, config={"lr": 1e-4}, num_replicas=2, resources_per_replica=Resources(num_gpus=1), batch_size=16, backend="auto") for i in range(NUM_EPOCHS): trainer.train()
- You can query all the clients that have performed
ray.init
to connect to the current cluster withray.jobs()
. #5076>>> ray.jobs() [{'JobID': '02000000', 'NodeManagerAddress': '10.99.88.77', 'DriverPid': 74949, 'StartTime': 1564168784, 'StopTime': 1564168798}, {'JobID': '01000000', 'NodeManagerAddress': '10.99.88.77', 'DriverPid': 74871, 'StartTime': 1564168742}]
Core
RLlib
- Finished port of all major RLlib algorithms to builder pattern #5277, #5258, #5249
learner_queue_timeout
can be configured for async sample optimizer. #5270reproducible_seed
can be used for reproducible experiments. #5197- Added entropy coefficient decay to IMPALA, APPO and PPO #5043
Tune:
- Breaking:
ExperimentAnalysis
is now returned by default fromtune.run
. To obtain a list of trials, useanalysis.trials
. #5115 - Breaking: Syncing behavior between head and workers can now be customized (
sync_to_driver
). Syncing behavior (upload_dir
) between cluster and cloud is now separately customizable (sync_to_cloud
). This changes the structure of the uploaded directory - nowlocal_dir
is synced withupload_dir
. #4450 - Introduce
Analysis
andExperimentAnalysis
objects.Analysis
object will now return all trials in a folder;ExperimentAnalysis
is a subclass that returns all trials of an experiment. #5115 - Add missing argument
tune.run(keep_checkpoints_num=...)
. Enables only keeping the last N checkpoints. #5117 - Trials on failed nodes will be prioritized in processing. #5053
- Trial Checkpointing is now more flexible. #4728
- Add system performance tracking for gpu, ram, vram, cpu usage statistics - toggle with
tune.run(log_sys_usage=True)
. #4924 - Experiment checkpointing frequency is now less frequent and can be controlled with
tune.run(global_checkpoint_period=...)
. #4859
Autoscaler
-
Add a
request_cores
function for manual autoscaling. You can now manually request resources for the autoscaler. #4754 -
Local cluster:
-
Improved logging with AWS NodeProvider.
create_instance
call will be logged. #4998
Others Libraries:
- SGD:
- Kuberentes: Ray namespace added for k8s. #4111
- Dev experience: Add linting pre-push hook. #5154
Thanks:
We thank the following contributors for their amazing contributions:
@joneswong, @1beb, @richardliaw, @pcmoritz, @raulchen, @stephanie-wang, @jiangzihao2009, @LorenzoCevolani, @kfstorm, @pschafhalter, @micafan, @simon-mo, @vipulharsh, @haje01, @ls-daniel, @hartikainen, @stefanpantic, @edoakes, @llan-ml, @alex-petrenko, @ztangent, @gravitywp, @MQQ, @Dulex123, @morgangiraud, @antoine-galataud, @robertnishihara, @qxcv, @vakker, @jovany-wang, @zhijunfu, @ericl
ray-0.7.2
Core
- Improvements
- Python
- Java
- Allow users to set JVM options at actor creation time. #4970
- Internal
- Peformance
Tune
- Add directional metrics for components. #4120, #4915
- Disallow setting
resources_per_trial
when it is already configured. #4880 - Make PBT Quantile fraction configurable. #4912
RLlib
- Add QMIX mixer parameters to optimizer param list. #5014
- Allow Torch policies access to full action input dict in
extra_action_out_fn
. #4894 - Allow access to batches prior to postprocessing. #4871
- Throw error if
sample_async
is used with pytorch for A3C. #5000 - Patterns & User Experience
- Documentation
Other Libraries
- Add support for distributed training with PyTorch. #4797, #4933
- Autoscaler will kill workers on exception. #4997
- Fix handling of non-integral timeout values in
signal.receive
. #5002
Thanks
We thank the following contributors for their amazing contributions: @jiangzihao2009, @raulchen, @ericl, @hershg, @kfstorm, @kiddyboots216, @jovany-wang, @pschafhalter, @richardliaw, @robertnishihara, @stephanie-wang, @simon-mo, @zhijunfu, @ls-daniel, @ajgokhale, @rueberger, @suquark, @guoyuhong, @jovany-wang, @pcmoritz, @hartikainen, @timonbimon, @TianhongDai
ray-0.7.1
Core
- Change global state API. #4857
ray.global_state.client_table()
->ray.nodes()
ray.global_state.task_table()
->ray.tasks()
ray.global_state.object_table()
->ray.objects()
ray.global_state.chrome_tracing_dump()
->ray.timeline()
ray.global_state.cluster_resources()
->ray.cluster_resources()
ray.global_state.available_resources()
->ray.available_resources()
- Export remote functions lazily. #4898
- Begin moving worker code to C++. #4875, #4899, #4898
- Upgrade arrow to latest master. #4858
- Upload wheels to S3 under
<branch-name>/<commit-id>
. #4949 - Add hash table to Redis-Module. #4911
- Initial support for distributed training with PyTorch. #4797
Tune
- Disallow setting
resources_per_trial
when it is already configured. #4880 - Initial experiment tracking support. #4362
RLlib
- Begin deprecating Python 2 support in RLlib. #4832
- TensorFlow 2 compatibility. #4802
- Allow Torch policies access to full action input dict in
extra_action_out_fn
. #4894 - Allow access to batches prior to postprocessing. #4871
- Port algorithms to
build_trainer()
pattern. #4823 - Rename
PolicyEvaluator
->RolloutWorker
. #4820 - Rename
PolicyGraph
->Policy
, move from evaluation/ to policy/. #4819 - Support continuous action distributions in IMPALA/APPO. #4771
(Revision: 6/23/2019 - Accidentally included commits that were not part of the release.)
ray-0.7.0
Core
- Backend bug fixes. #4766, #4763, #4605
- Add experimental API for creating resources at runtime. #3742
Tune
RLlib
- Remove dependency on TensorFlow. #4764
- TD3/DDPG improvements and MuJoCo benchmarks. #4694
- Evaluation mode implementation for rllib.Trainer class. #4647
- Replace ray.get() with ray_get_and_free() to automatically free object store memory. #4586
- RLLib bug fixes. #4736, #4735, #4652, #4630
Autoscaler
ray-0.6.6
Core
- Add
delete_creating_tasks
option forinternal.free()
#4588
Tune
- Add filter flag for Tune CLI. #4337
- Better handling of
tune.function
in global checkpoint. #4519 - Add compatibility to nevergrad 0.2.0+. #4529
- Add
--columns
flag for CLI. #4564 - Add checkpoint eraser. #4490
- Fix checkpointing for Gym types. #4619
RLlib
- Report sampler performance metrics. #4427
- Ensure stats are consistently reported across all algos. #4445
- Cleanup
TFPolicyGraph
. #4478 - Make batch timeout for remote workers tunable. #4435
- Fix inconsistent weight assignment operations in
DQNPolicyGraph
. #4504 - Add support for LR schedule to DQN/APEX. #4473
- Add option for RNN state and value estimates to span episodes. #4429
- Create a combination of
ExternalEnv
andMultiAgentEnv
, calledExternalMutliAgentEnv
. #4200 - Support
prev_state
/prev_action
in rollout and fix multiagent. #4565 - Support torch device and distributions. #4553
Java
- TestNG outputs more verbose error messages. #4507
- Implement
GcsClient
. #4601 - Avoid unnecessary memory copy and addd a benchmark. #4611
Autoscaler
ray-0.6.5
Core
- Build system fully converted to Bazel. #4284, #4280, #4281
- Introduce a set data structure in the GCS. #4199
- Make all arguments to
_remote()
optional. #4305 - Improve object transfer latency by setting
TCP_NODELAY
on all TCP connections. #4318 - Add beginning of experimental serving module. #4095
- Remove Jupyter notebook based UI. #4301
- Add
ray timeline
command line command for dumping Chrome trace. #4239
Tune
- Add custom field for serializations. #4237
- Begin adding Tune CLI. #3983, #4321, #4322
- Add optimization to reuse actors. #4218
- Add warnings if the Tune event loop gets clogged. #4353
- Switch preferred API from
tune.run_experiments
totune.run
. #4234 - Make the logging from the function API consistent and predictable. #4011
RLlib
- Breaking: Flip sign of entropy coefficient in A2C and Impala. #4374
- Add option to continue training even if some workers crash. #4376
- Add asynchronous remote workers. #4253
- Add callback accessor for raw observations. #4212
Java
- Improve single-process mode. #4245, #4265
- Package native dependencies into jar. #4367
- Initial support for calling Python functions from Java. #4166
Autoscaler
- Restore error messages for setup errors. #4388
Known Issues
- Object broadcasts on large clusters are inefficient. #2945
ray-0.6.4
Breaking
- Removed
redirect_output
andredirect_worker_output
fromray.init
, removed deprecated_submit
method. #4025 - Move
TensorFlowVariables
toray.experimental.tf_utils
. #4145
Core
- Stream worker logging statements to driver by default. #3892
- Added experimental ray signaling mechanism, see the documentation. #3624
- Make Bazel the default build system. #3898
- Preliminary experimental streaming API for Python. #4126
- Added web dashboard for monitoring node resource usage. #4066
- Improved propagation of backend errors to user. #4039
- Many improvements for the Java frontend. #3687, #3978, #4014, #3943, #3839, #4038, #4039, #4063, #4100, #4179, #4178
- Support for dataclass serialization. #3964
- Implement actor checkpointing. #3839
- First steps toward cross-language invocations. #3675
- Better defaults for Redis memory usage. #4152
Tune
- Breaking: Introduce ability to turn off default logging. Deprecates custom_loggers. #4104
- Support custom resources. #2979
- Add initial parameter suggestions for HyperOpt. #3944
- Add scipy-optimize to Tune. #3924
- Add Nevergrad. #3985
- Add number of trials to the trial runner logger. #4068
- Support RESTful API for the webserver. #4080
- Local mode support. #4138
- Dynamic resources for trials. #3974
RLlib
- Basic infrastructure for off-policy estimation. #3941
- Add simplex action space and Dirichlet action distribution. #4070
- Exploration with parameter space noise. #4048
- Custom supervised loss API. #4083
- Add torch policy gradient implementation. #3857
Autoscaler and Cluster Setup
- Add docker run option (e.g. to support nvidia-docker). #3921
Modin
- Upgrade Modin to 0.3.1, see the release notes. #4058
Known Issues
ray-0.6.3
Core
- Initial work on porting the build system to Bazel. #3918, #3806, #3867, #3842
- Allow starting Ray processes inside valgrind, gdb, tmux. #3824, #3847
- Stability improvements and bug fixes. #3861, #3962, #3958, #3855, #3736, #3822, #3821, #3925
- Convert Python C extensions to Cython. #3541
ray start
can now be used to start Java workers. #3838, #3852- Enable LZ4 compression in
pyarrow
build. #3931 - Update Redis to version 5.0.3. #3886
- Use one memory-mapped file for Plasma store. #3871,
Tune
- Support for BayesOpt. #3864
- Support for SigOpt. #3844
- Support executing infinite recovery retries for a trial. #3901
- Support
export_formats
option to export policy graphs. #3868 - Cluster and logging improvements. #3906
RLlib
- Support for Asynchronous Proximal Policy Optimization (APPO). #3779
- Support for MARWIL. #3635
- Support for evaluation option in DQN. #3835
- Bug fixes. #3865, #3810, #3938
- Annotations for API stability. #3808
Autoscaler and Cluster Setup
- Faster cluster launch and update. #3720
- Bug fixes. #3916, #3860, #3937, #3782, #3969
- Kubernetes configuration improvements. #3875, #3909
Modin
- Update Modin to 0.3.0. #3936
Known Issues
- Object broadcasts on large clusters are inefficient. #2945