diff --git a/docs/release-notes.rst b/docs/release-notes.rst index 7a3bedb5328..209c2ae4f43 100644 --- a/docs/release-notes.rst +++ b/docs/release-notes.rst @@ -10,6 +10,48 @@ Version 0.26 ************** +Version 0.26.4 +============== + +**Release Date:** November 17, 2023 + +**Breaking Changes** + +- CLI: The CLI command to patch the master log config has been changed from ``det master config + --log --level --color `` to ``det master config set --log.level= + --log.color=``. + +**New Features** + +- Experiments: Add a ``log_policies`` configuration option to define actions when a trial's log + matches specified patterns. + + - The ``exclude_node`` action prevents a failed trial's restart attempts (due to its + ``max_restarts`` policy) from being scheduled on nodes with matching error logs. This is + useful for bypassing nodes with hardware issues like uncorrectable GPU ECC errors. + + - The ``cancel_retries`` action prevents a trial from restarting if a trial reports a log that + matches the pattern, even if it has remaining ``max_restarts``. This avoids using resources + for retrying a trial that encounters certain failures that won't be fixed by retrying the + trial, such as CUDA memory issues. For details, visit :ref:`experiment-config-reference` and + :ref:`master-config-reference`. + + This option is also configurable at the cluster or resource pool level via task container + defaults. + +- CLI: Add a new CLI command ``det e delete-tb-files [Experiment ID]`` to delete local TensorBoard + files associated with a given experiment. + +**Improvements** + +- Update default environment images to Python 3.9 from Python 3.8. + +**Bug Fixes** + +- Users: Fix an issue where if a user's remote status was edited through ``det user edit + --remote=true``, that user could still log in using their username and password; they should only + be able to log in through IdP integrations. + Version 0.26.3 ============== diff --git a/docs/release-notes/log-policies.rst b/docs/release-notes/log-policies.rst deleted file mode 100644 index bd36df108ce..00000000000 --- a/docs/release-notes/log-policies.rst +++ /dev/null @@ -1,18 +0,0 @@ -:orphan: - -**New Features** - -- Experiments: Add a ``log_policies`` configuration option to define actions when a trial's log - matches specified patterns. - - - The ``exclude_node`` action prevents a failed trial's restart attempts (due to its - max_restarts policy) from being scheduled on nodes with matched error logs. This is useful for - bypassing nodes with hardware issues like uncorrectable GPU ECC errors. - - - The ``cancel_retries`` action prevents a trial from restarting if a trial reports a log that - matches the pattern, even if it has remaining max_restarts. This avoids using resources for - retrying a trial that encounters certain failures that won't be fixed by retrying the trial, - such as CUDA memory issues. For details, visit :ref:`experiment-config-reference` and - :ref:`master-config-reference`. - -This option is also configurable at the cluster or resource pool level via task container defaults. diff --git a/docs/release-notes/patch_master_config_cli.rst b/docs/release-notes/patch_master_config_cli.rst deleted file mode 100644 index 68c8ec6fd1e..00000000000 --- a/docs/release-notes/patch_master_config_cli.rst +++ /dev/null @@ -1,7 +0,0 @@ -:orphan: - -**Breaking Change** - -- CLI: The old CLI command to patch master log config has been changed from ``det master config - --log --level --color `` to ``det master config set --log.level= - --log.color=``. diff --git a/docs/release-notes/python-39-bump.rst b/docs/release-notes/python-39-bump.rst deleted file mode 100644 index 14fe9d4deda..00000000000 --- a/docs/release-notes/python-39-bump.rst +++ /dev/null @@ -1,5 +0,0 @@ -:orphan: - -**Improvements** - -- Update default environment images to Python 3.9 from Python 3.8. diff --git a/docs/release-notes/remote-was-able-to-login.rst b/docs/release-notes/remote-was-able-to-login.rst deleted file mode 100644 index 2897c1d8ba7..00000000000 --- a/docs/release-notes/remote-was-able-to-login.rst +++ /dev/null @@ -1,7 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Users: Fix an issue where if a user's remote status was edited through ``det user edit - --remote=true`` that user could still login through their username and password while they were - expected to only be able to login through IDP integrations. diff --git a/docs/release-notes/tensorboard-delete.rst b/docs/release-notes/tensorboard-delete.rst deleted file mode 100644 index 46715d82dd8..00000000000 --- a/docs/release-notes/tensorboard-delete.rst +++ /dev/null @@ -1,6 +0,0 @@ -:orphan: - -**New Features** - -- CLI: Add a new CLI command ``det e delete-tb-files [Experiment ID]`` to delete local TensorBoard - files associated to a given experiment.