Skip to content

Releases: ray-project/ray

ray-0.7.5

25 Sep 00:07
Compare
Choose a tag to compare

Ray 0.7.5 Release Notes

Ray API

  • Objects created with ray.put() are now reference counted. #5590
  • Add internal pin_object_data() API. #5637
  • Initial support for pickle5. #5611
  • Warm up Ray on ray.init(). #5685
  • redis_address passed to ray.init is now just address. #5602

Core

  • Progress towards a common C++ core worker. #5516, #5272, #5566, #5664
  • Fix log monitor stall with many log files. #5569
  • Print warnings when tasks are unschedulable. #5555
  • Take into account resource queue lengths when autoscaling #5702, #5684

Tune

  • TF2.0 TensorBoard support. #5547, #5631
  • tune.function() is now deprecated. #5601

RLlib

Other Libraries

  • Complete rewrite of experimental serving library. #5562
  • Progress toward Ray projects APIs. #5525, #5632, #5706
  • Add TF SGD implementation for training. #5440
  • Many documentation improvements and bugfixes.

ray-0.7.4

05 Sep 23:11
Compare
Choose a tag to compare

Ray 0.7.4 Release Notes

Highlights

  • There were many documentation improvements (#5391, #5389, #5175). As we continue to improve the documentation we value your feedback through the “Doc suggestion?” link at the top of the documentation. Notable improvements:

    • We’ve added guides for best practices using TensorFlow and PyTorch.
    • We’ve revamped the Walkthrough page for Ray users, providing a better experience for beginners.
    • We’ve revamped guides for using Actors and inspecting internal state.
  • Ray supports memory limits now to ensure memory-intensive applications run predictably and reliably. You
    can activate them through the ray.remote decorator:

    @ray.remote(
        memory=2000 * 1024 * 1024,
        object_store_memory=200 * 1024 * 1024)
    class SomeActor(object):
        def __init__(self, a, b):
            pass

    You can set limits for the heap and the object store, see the documentation.

  • There is now preliminary support for projects, see the the project documentation. Projects allow you to
    package your code and easily share it with others, ensuring a reproducible cluster setup. To get started, you
    can run

    # Create a new project.
    ray project create <project-name>
    # Launch a session for the project in the current directory.
    ray session start
    # Open a console for the given session.
    ray session attach
    # Stop the given session and all of its worker nodes.
    ray session stop

    Check out the examples. This is an actively developed new feature so we appreciate your feedback!

Breaking change: The redis_address parameter was renamed to address (#5412, #5602) and the former will be removed in the future.

Core

  • Move Java bindings on top of the core worker #5370
  • Improve log file discoverability #5580
  • Clean up and improve error messages #5368, #5351

RLlib

  • Support custom action space distributions #5164
  • Add TensorFlow eager support #5436
  • Add autoregressive KL #5469
  • Autoregressive Action Distributions #5304
  • Implement MADDPG agent #5348
  • Port Soft Actor-Critic on Model v2 API #5328
  • More examples: Add CARLA community example #5333 and rock paper scissors multi-agent example #5336
  • Moved RLlib to top level directory #5324

Tune

  • Experimental Implementation of the BOHB algorithm #5382
  • Breaking change: Nested dictionary results are now flattened for CSV writing: {“a”: {“b”: 1}} => {“a/b”: 1} #5346
  • Add Logger for MLFlow #5438
  • TensorBoard support for TensorFlow 2.0 #5547
  • Added examples for XGBoost and LightGBM #5500
  • HyperOptSearch now has warmstarting #5372

Other Libraries

Various fixes: Fix log monitor issues #4382 #5221 #5569, the top-level ray directory was cleaned up #5404

Thanks

We thank the following contributors for their amazing contributions:

@jon-chuang, @lufol, @adamochayon, @idthanm, @RehanSD, @ericl, @michaelzhiluo, @nflu, @pengzhenghao, @hartikainen, @wsjeon, @raulchen, @TomVeniat, @layssi, @jovany-wang, @llan-ml, @ConeyLiu, @mitchellstern, @gregSchwartz18, @jiangzihao2009, @jichan3751, @mhgump, @zhijunfu, @micafan, @simon-mo, @richardliaw, @stephanie-wang, @edoakes, @akharitonov, @mawright, @robertnishihara, @lisadunlap, @flying-mojo, @pcmoritz, @jredondopizarro, @gehring, @holli, @kfstorm

ray-0.7.3

04 Aug 02:37
Compare
Choose a tag to compare

Ray 0.7.3 Release Note

Highlights

  • RLlib ModelV2API is ready to use. It improves support for Keras and RNN models, as well as allowing object-oriented reuse of variables. ModelV1 API is deprecated. No migration is needed.
  • ray.experimental.sgd.pytorch.PyTorchTrainer is ready for early adopters. Checkout the documentation here. We welcome your feedback!
    model_creator = lambda config: YourPyTorchModel()
    data_creator = lambda config: YourTrainingSet(), YourValidationSet()
    
    trainer = PyTorchTrainer(
        model_creator,
        data_creator,
        optimizer_creator=utils.sgd_mse_optimizer,
        config={"lr": 1e-4},
        num_replicas=2,
        resources_per_replica=Resources(num_gpus=1),
        batch_size=16,
        backend="auto")
    
    for i in range(NUM_EPOCHS):
        trainer.train()
  • You can query all the clients that have performed ray.init to connect to the current cluster with ray.jobs(). #5076
    >>> ray.jobs()
    [{'JobID': '02000000',
      'NodeManagerAddress': '10.99.88.77',
      'DriverPid': 74949,
      'StartTime': 1564168784,
      'StopTime': 1564168798},
     {'JobID': '01000000',
      'NodeManagerAddress': '10.99.88.77',
      'DriverPid': 74871,
      'StartTime': 1564168742}]

Core

  • Improvement on memory storage handling. #5143, #5216, #4893
  • Improved workflow:
    • Debugging tool local_mode now behaves more consistently. #5060
    • Improved KeyboardInterrupt Exception Handling, stack trace reduced from 115 lines to 22 lines. #5237
  • Ray core:
    • Experimental direct actor call. #5140, #5184
    • Improvement in core worker, the shared module between Python and Java. #5079, #5034, #5062
    • GCS (global control store) was refactored. #5058, #5050

RLlib

  • Finished port of all major RLlib algorithms to builder pattern #5277, #5258, #5249
  • learner_queue_timeout can be configured for async sample optimizer. #5270
  • reproducible_seed can be used for reproducible experiments. #5197
  • Added entropy coefficient decay to IMPALA, APPO and PPO #5043

Tune:

  • Breaking: ExperimentAnalysis is now returned by default from tune.run. To obtain a list of trials, use analysis.trials. #5115
  • Breaking: Syncing behavior between head and workers can now be customized (sync_to_driver). Syncing behavior (upload_dir) between cluster and cloud is now separately customizable (sync_to_cloud). This changes the structure of the uploaded directory - now local_dir is synced with upload_dir. #4450
  • Introduce Analysis and ExperimentAnalysis objects. Analysis object will now return all trials in a folder; ExperimentAnalysis is a subclass that returns all trials of an experiment. #5115
  • Add missing argument tune.run(keep_checkpoints_num=...). Enables only keeping the last N checkpoints. #5117
  • Trials on failed nodes will be prioritized in processing. #5053
  • Trial Checkpointing is now more flexible. #4728
  • Add system performance tracking for gpu, ram, vram, cpu usage statistics - toggle with tune.run(log_sys_usage=True). #4924
  • Experiment checkpointing frequency is now less frequent and can be controlled with tune.run(global_checkpoint_period=...). #4859

Autoscaler

  • Add a request_cores function for manual autoscaling. You can now manually request resources for the autoscaler. #4754

  • Local cluster:

    • More readable example yaml with comments. #5290

    • Multiple cluster name is supported. #4864

  • Improved logging with AWS NodeProvider. create_instance call will be logged. #4998

Others Libraries:

  • SGD:
    • Example for Training. #5292
    • Deprecate old distributed SGD implementation. #5160
  • Kuberentes: Ray namespace added for k8s. #4111
  • Dev experience: Add linting pre-push hook. #5154

Thanks:

We thank the following contributors for their amazing contributions:

@joneswong, @1beb, @richardliaw, @pcmoritz, @raulchen, @stephanie-wang, @jiangzihao2009, @LorenzoCevolani, @kfstorm, @pschafhalter, @micafan, @simon-mo, @vipulharsh, @haje01, @ls-daniel, @hartikainen, @stefanpantic, @edoakes, @llan-ml, @alex-petrenko, @ztangent, @gravitywp, @MQQ, @Dulex123, @morgangiraud, @antoine-galataud, @robertnishihara, @qxcv, @vakker, @jovany-wang, @zhijunfu, @ericl

ray-0.7.2

03 Jul 05:57
Compare
Choose a tag to compare

Core

  • Improvements
  • Python
    • @ray.remote now inherits the function docstring. #4985
    • Remove typing module from setup.py install_requirements. #4971
  • Java
    • Allow users to set JVM options at actor creation time. #4970
  • Internal
    • Refactor IDs: DriverID -> JobID, change all ID functions to camel case. #4964, #4896
    • Improve organization of directory structure. #4898
  • Peformance
    • Get task object dependencies in parallel from object store. #4775
    • Flush lineage cache on task submission instead of execution. #4942
    • Remove debug check for uncommitted lineage. #5038

Tune

  • Add directional metrics for components. #4120, #4915
  • Disallow setting resources_per_trial when it is already configured. #4880
  • Make PBT Quantile fraction configurable. #4912

RLlib

  • Add QMIX mixer parameters to optimizer param list. #5014
  • Allow Torch policies access to full action input dict in extra_action_out_fn. #4894
  • Allow access to batches prior to postprocessing. #4871
  • Throw error if sample_async is used with pytorch for A3C. #5000
  • Patterns & User Experience
    • Rename PolicyEvaluator => RolloutWorker. #4820
    • Port remainder of algorithms to build_trainer() pattern. #4920
    • Port DQN to build_tf_policy() pattern. #4823
  • Documentation
    • Add docs on how to use TF eager execution. #4927
    • Add preprocessing example to offline documentation. #4950

Other Libraries

  • Add support for distributed training with PyTorch. #4797, #4933
  • Autoscaler will kill workers on exception. #4997
  • Fix handling of non-integral timeout values in signal.receive. #5002

Thanks

We thank the following contributors for their amazing contributions: @jiangzihao2009, @raulchen, @ericl, @hershg, @kfstorm, @kiddyboots216, @jovany-wang, @pschafhalter, @richardliaw, @robertnishihara, @stephanie-wang, @simon-mo, @zhijunfu, @ls-daniel, @ajgokhale, @rueberger, @suquark, @guoyuhong, @jovany-wang, @pcmoritz, @hartikainen, @timonbimon, @TianhongDai

ray-0.7.1

23 Jun 21:35
Compare
Choose a tag to compare

Core

  • Change global state API. #4857
    • ray.global_state.client_table() -> ray.nodes()
    • ray.global_state.task_table() -> ray.tasks()
    • ray.global_state.object_table() -> ray.objects()
    • ray.global_state.chrome_tracing_dump() -> ray.timeline()
    • ray.global_state.cluster_resources() -> ray.cluster_resources()
    • ray.global_state.available_resources() -> ray.available_resources()
  • Export remote functions lazily. #4898
  • Begin moving worker code to C++. #4875, #4899, #4898
  • Upgrade arrow to latest master. #4858
  • Upload wheels to S3 under <branch-name>/<commit-id>. #4949
  • Add hash table to Redis-Module. #4911
  • Initial support for distributed training with PyTorch. #4797

Tune

  • Disallow setting resources_per_trial when it is already configured. #4880
  • Initial experiment tracking support. #4362

RLlib

  • Begin deprecating Python 2 support in RLlib. #4832
  • TensorFlow 2 compatibility. #4802
  • Allow Torch policies access to full action input dict in extra_action_out_fn. #4894
  • Allow access to batches prior to postprocessing. #4871
  • Port algorithms to build_trainer() pattern. #4823
  • Rename PolicyEvaluator -> RolloutWorker. #4820
  • Rename PolicyGraph -> Policy, move from evaluation/ to policy/. #4819
  • Support continuous action distributions in IMPALA/APPO. #4771

(Revision: 6/23/2019 - Accidentally included commits that were not part of the release.)

ray-0.7.0

18 May 22:13
Compare
Choose a tag to compare

Core

  • Backend bug fixes. #4766, #4763, #4605
  • Add experimental API for creating resources at runtime. #3742

Tune

RLlib

  • Remove dependency on TensorFlow. #4764
  • TD3/DDPG improvements and MuJoCo benchmarks. #4694
  • Evaluation mode implementation for rllib.Trainer class. #4647
  • Replace ray.get() with ray_get_and_free() to automatically free object store memory. #4586
  • RLLib bug fixes. #4736, #4735, #4652, #4630

Autoscaler

  • Add an aggressive autoscaling flag. #4285
  • Autoscalar bug fixes. #4782, #4653

ray-0.6.6

19 Apr 05:47
Compare
Choose a tag to compare

Core

  • Add delete_creating_tasks option for internal.free() #4588

Tune

  • Add filter flag for Tune CLI. #4337
  • Better handling of tune.function in global checkpoint. #4519
  • Add compatibility to nevergrad 0.2.0+. #4529
  • Add --columns flag for CLI. #4564
  • Add checkpoint eraser. #4490
  • Fix checkpointing for Gym types. #4619

RLlib

  • Report sampler performance metrics. #4427
  • Ensure stats are consistently reported across all algos. #4445
  • Cleanup TFPolicyGraph. #4478
  • Make batch timeout for remote workers tunable. #4435
  • Fix inconsistent weight assignment operations in DQNPolicyGraph. #4504
  • Add support for LR schedule to DQN/APEX. #4473
  • Add option for RNN state and value estimates to span episodes. #4429
  • Create a combination of ExternalEnv and MultiAgentEnv, called ExternalMutliAgentEnv. #4200
  • Support prev_state/prev_action in rollout and fix multiagent. #4565
  • Support torch device and distributions. #4553

Java

  • TestNG outputs more verbose error messages. #4507
  • Implement GcsClient. #4601
  • Avoid unnecessary memory copy and addd a benchmark. #4611

Autoscaler

  • Add support for separate docker containers on head and worker nodes. #4537
  • Add an aggressive autoscaling flag. #4285

ray-0.6.5

25 Mar 21:18
Compare
Choose a tag to compare

Core

  • Build system fully converted to Bazel. #4284, #4280, #4281
  • Introduce a set data structure in the GCS. #4199
  • Make all arguments to _remote() optional. #4305
  • Improve object transfer latency by setting TCP_NODELAY on all TCP connections. #4318
  • Add beginning of experimental serving module. #4095
  • Remove Jupyter notebook based UI. #4301
  • Add ray timeline command line command for dumping Chrome trace. #4239

Tune

  • Add custom field for serializations. #4237
  • Begin adding Tune CLI. #3983, #4321, #4322
  • Add optimization to reuse actors. #4218
  • Add warnings if the Tune event loop gets clogged. #4353
  • Switch preferred API from tune.run_experiments to tune.run. #4234
  • Make the logging from the function API consistent and predictable. #4011

RLlib

  • Breaking: Flip sign of entropy coefficient in A2C and Impala. #4374
  • Add option to continue training even if some workers crash. #4376
  • Add asynchronous remote workers. #4253
  • Add callback accessor for raw observations. #4212

Java

  • Improve single-process mode. #4245, #4265
  • Package native dependencies into jar. #4367
  • Initial support for calling Python functions from Java. #4166

Autoscaler

  • Restore error messages for setup errors. #4388

Known Issues

  • Object broadcasts on large clusters are inefficient. #2945

ray-0.6.4

06 Mar 01:03
Compare
Choose a tag to compare

Breaking

  • Removed redirect_output and redirect_worker_output from ray.init, removed deprecated _submit method. #4025
  • Move TensorFlowVariables to ray.experimental.tf_utils. #4145

Core

  • Stream worker logging statements to driver by default. #3892
  • Added experimental ray signaling mechanism, see the documentation. #3624
  • Make Bazel the default build system. #3898
  • Preliminary experimental streaming API for Python. #4126
  • Added web dashboard for monitoring node resource usage. #4066
  • Improved propagation of backend errors to user. #4039
  • Many improvements for the Java frontend. #3687, #3978, #4014, #3943, #3839, #4038, #4039, #4063, #4100, #4179, #4178
  • Support for dataclass serialization. #3964
  • Implement actor checkpointing. #3839
  • First steps toward cross-language invocations. #3675
  • Better defaults for Redis memory usage. #4152

Tune

  • Breaking: Introduce ability to turn off default logging. Deprecates custom_loggers. #4104
  • Support custom resources. #2979
  • Add initial parameter suggestions for HyperOpt. #3944
  • Add scipy-optimize to Tune. #3924
  • Add Nevergrad. #3985
  • Add number of trials to the trial runner logger. #4068
  • Support RESTful API for the webserver. #4080
  • Local mode support. #4138
  • Dynamic resources for trials. #3974

RLlib

  • Basic infrastructure for off-policy estimation. #3941
  • Add simplex action space and Dirichlet action distribution. #4070
  • Exploration with parameter space noise. #4048
  • Custom supervised loss API. #4083
  • Add torch policy gradient implementation. #3857

Autoscaler and Cluster Setup

  • Add docker run option (e.g. to support nvidia-docker). #3921

Modin

Known Issues

  • Object broadcasts on large clusters are inefficient. #2945
  • IMPALA is broken #4329

ray-0.6.3

06 Mar 00:11
d2b6db3
Compare
Choose a tag to compare

Core

Tune

  • Support for BayesOpt. #3864
  • Support for SigOpt. #3844
  • Support executing infinite recovery retries for a trial. #3901
  • Support export_formats option to export policy graphs. #3868
  • Cluster and logging improvements. #3906

RLlib

  • Support for Asynchronous Proximal Policy Optimization (APPO). #3779
  • Support for MARWIL. #3635
  • Support for evaluation option in DQN. #3835
  • Bug fixes. #3865, #3810, #3938
  • Annotations for API stability. #3808

Autoscaler and Cluster Setup

Modin

Known Issues

  • Object broadcasts on large clusters are inefficient. #2945