From 0a32f3031f4805c3f3e89ff6d645d02dacb5ff0c Mon Sep 17 00:00:00 2001 From: Justin Chen Date: Thu, 3 Sep 2020 10:35:21 -0700 Subject: [PATCH 1/2] docs: Release notes for 0.13.1 and 0.13.2. --- docs/release-notes.txt | 83 +++++++++++++++++++ .../1034-consolidate-main-nav.txt | 13 --- docs/release-notes/1112-fix-keras-wrap.txt | 6 -- .../1114-ensure-proper-wrapping.txt | 6 -- docs/release-notes/1116-sys-exit.txt | 5 -- .../1140-fix-archive-pagination.txt | 12 --- .../1147-dynamic-chart-range.txt | 8 -- docs/release-notes/1151-kubernetes.txt | 26 ------ .../release-notes/1161-custom-mult-select.txt | 9 -- .../1170-add-user-preferences.txt | 9 -- .../1171-has-validation-filter.txt | 11 --- .../1172-fix-dashboard-limit.txt | 5 -- .../1173-multiple-dtrain-single-node.txt | 12 --- .../1176-fix-adaptive-asha-progress.txt | 5 -- ...ix-prior-batches-processed-backfilling.txt | 7 -- .../1201-fix-loading-old-tensorboards.txt | 6 -- .../1213-set-default-session.txt | 5 -- .../1228-migrate-experiments-list-api.txt | 9 -- 18 files changed, 83 insertions(+), 154 deletions(-) delete mode 100644 docs/release-notes/1034-consolidate-main-nav.txt delete mode 100644 docs/release-notes/1112-fix-keras-wrap.txt delete mode 100644 docs/release-notes/1114-ensure-proper-wrapping.txt delete mode 100644 docs/release-notes/1116-sys-exit.txt delete mode 100644 docs/release-notes/1140-fix-archive-pagination.txt delete mode 100644 docs/release-notes/1147-dynamic-chart-range.txt delete mode 100644 docs/release-notes/1151-kubernetes.txt delete mode 100644 docs/release-notes/1161-custom-mult-select.txt delete mode 100644 docs/release-notes/1170-add-user-preferences.txt delete mode 100644 docs/release-notes/1171-has-validation-filter.txt delete mode 100644 docs/release-notes/1172-fix-dashboard-limit.txt delete mode 100644 docs/release-notes/1173-multiple-dtrain-single-node.txt delete mode 100644 docs/release-notes/1176-fix-adaptive-asha-progress.txt delete mode 100644 docs/release-notes/1200-fix-prior-batches-processed-backfilling.txt delete mode 100644 docs/release-notes/1201-fix-loading-old-tensorboards.txt delete mode 100644 docs/release-notes/1213-set-default-session.txt delete mode 100644 docs/release-notes/1228-migrate-experiments-list-api.txt diff --git a/docs/release-notes.txt b/docs/release-notes.txt index 889f68fe5be..1202c753f0c 100644 --- a/docs/release-notes.txt +++ b/docs/release-notes.txt @@ -6,6 +6,89 @@ Release Notes Version 0.13 ------------ +Version 0.13.2 +^^^^^^^^^^^^^^ + +**Release Date:** September 3, 2020 + +**New Features** + +- Support deploying Determined on `Kubernetes `__. + + - Determined workloads run as a collection of pods, which allows standard Kubernetes tools for logging, metrics, and tracing to be used. Determined is compatible with Kubernetes >= 1.15, including managed Kubernetes services such as Google Kubernetes Engine (GKE) and AWS Elastic Kubernetes Service (EKS). + + - When using Determined with Kubernetes, we currently do not support + fair-share scheduling, priority scheduling, per-experiment weights, or + gang-scheduling for distributed training experiments; workloads will be + scheduled according the behavior of the default Kubernetes scheduler. + + - Users can configure the behavior of the pods that are launched for + Determined workloads by specifying a :ref:`custom pod spec + `. A default pod spec can be configured when installing + Kubernetes, but a custom pod spec can also be specified on a per-task basis + (e.g., via the :ref:`environment.pod_spec ` field + in the experiment configuration file). + + - For more information on using Determined with Kubernetes, see the + :ref:`documentation `. + +- Support running multiple distributed training jobs on a single agent. + + - In previous versions of Determined, a distributed training job could only be + scheduled on an agent if it was configured to use all of the GPUs on that + agent. In this release, that restriction has been lifted: for example, an + agent with 8 GPUs can now be used to run two 4-GPU distributed training + jobs. This feature is particularly useful as a way to improve utilization + and fair resource allocation for smaller clusters. + +**Improvements** + +- WebUI: Update primary navigation. The primary navigation is all to one side, and is now collapsible to maximize content space. + +- WebUI: Trial details improvements: + + - Update metrics selector to show the number of metrics selected to improve readability. + + - Add the "Has Checkpoint or Validation" filter. + + - Persist the "Has Checkpoint or Validation" filter setting across all trials, and persist the "Metrics" filter on trials of the same experiment. + +- WebUI: Improve table pagination behavior. This will improve performance on Determined instances with many experiments. + +- WebUI: Persist the sort order and sort column for the experiments, tasks, and trials tables to local storage. + +- WebUI: Improve the default axis ranges for metrics charts. Also, update the range as new data points arrive. + +- Add a warning when the PyTorch LR scheduler incorrectly uses an unwrapped optimizer. When using PyTorch with Determined, LR schedulers should be constructed using an optimizer that has been wrapped via the :meth:`~determined.pytorch.PyTorchTrialContext.wrap_optimizer` method. + +- Add a reminder to remove ``sys.exit()`` if ``SystemExit`` exception is caught. + +**Bug Fixes** + +- WebUI: Fix an issue where the recent task list did not apply the limit filter properly. + +- Fix Keras and Estimator wrapping functions not returning the original objects + when exporting checkpoints. + +- Fix progress reporting for ``adaptive_asha`` searches that contain failed trials. + +- Fix an issue that was causing OOM errors for some distributed ``EstimatorTrial`` experiments. + +Version 0.13.1 +^^^^^^^^^^^^^^ + +**Release Date:** August 31, 2020 + +**Bug Fixes** + +- Database migration: Fix a bug with a database migration in Determined version 0.13.0 which caused it to run slow and backfill incorrect values. Users on Determined versions 0.12.13 or earlier are recommended to upgrade to version 0.13.1. Users already on version 0.13.0 should upgrade to version 0.13.1 as usual. + +- Tensorboard: Fix a bug that prevents Tensorboards from experiments with old experiment configuration versions from being loaded. + +- WebUI: Fix an API response decoding issue on React where a null checkpoint resource was unhandled and could prevent trial detail page from rendering. + +- WebUI: Fix an issue where terminated Tensorboard and notebook tasks were rendered as openable. + Version 0.13.0 ^^^^^^^^^^^^^^ diff --git a/docs/release-notes/1034-consolidate-main-nav.txt b/docs/release-notes/1034-consolidate-main-nav.txt deleted file mode 100644 index 056bd2f06b7..00000000000 --- a/docs/release-notes/1034-consolidate-main-nav.txt +++ /dev/null @@ -1,13 +0,0 @@ -:orphan: - -**Improvements** - -- Update primary WebUI navigation. - - - Converge both the top navbar and the sidebar into a primary side navigation. - - Add support for collapsing the side navigation into a thin bar to maximize - real estate for content. - - Add animation to support smooth transition between the collapsed mode and - the expanded mode. - - Save the collapsed/expanded state of the navigation to be able to restore - the same state when revisting the app on the same browser. diff --git a/docs/release-notes/1112-fix-keras-wrap.txt b/docs/release-notes/1112-fix-keras-wrap.txt deleted file mode 100644 index 4d2d9023a9b..00000000000 --- a/docs/release-notes/1112-fix-keras-wrap.txt +++ /dev/null @@ -1,6 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Fix Keras and Estimator wrapping functions not returning the original objects - when exporting checkpoints. diff --git a/docs/release-notes/1114-ensure-proper-wrapping.txt b/docs/release-notes/1114-ensure-proper-wrapping.txt deleted file mode 100644 index fcf623f2949..00000000000 --- a/docs/release-notes/1114-ensure-proper-wrapping.txt +++ /dev/null @@ -1,6 +0,0 @@ -:orphan: - -**Improvements** - -- Warn out if not properly refer Pytorch LR schedulers with a optimizer wrapped with - :meth:`determined.pytorch.PytorchTrialContext.wrap_optimizer`. diff --git a/docs/release-notes/1116-sys-exit.txt b/docs/release-notes/1116-sys-exit.txt deleted file mode 100644 index d759199d810..00000000000 --- a/docs/release-notes/1116-sys-exit.txt +++ /dev/null @@ -1,5 +0,0 @@ -:orphan: - -**Improvements** - -- Remind users to remove ``sys.exit()`` if ``SystemExit`` exception is caught. diff --git a/docs/release-notes/1140-fix-archive-pagination.txt b/docs/release-notes/1140-fix-archive-pagination.txt deleted file mode 100644 index e041b88cdba..00000000000 --- a/docs/release-notes/1140-fix-archive-pagination.txt +++ /dev/null @@ -1,12 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Correct table pagination behavior. - - - Hide pagination when there are less than 10 items. - - Show pagination and page size picker when there are 10 or more items. - - Persist pagination and page size picker (both are strongly tied to each other - via Ant Design). This will ensure that the page size picker can stick around - to allow the user to change the page size even when the number of entries is - less than page size. diff --git a/docs/release-notes/1147-dynamic-chart-range.txt b/docs/release-notes/1147-dynamic-chart-range.txt deleted file mode 100644 index 157dd5d0629..00000000000 --- a/docs/release-notes/1147-dynamic-chart-range.txt +++ /dev/null @@ -1,8 +0,0 @@ -:orphan: - -**Improvements** - -- Upgrade metric charts to dynamically adjust x and y axis ranges to be max ranges dictated - by the data points plus 10% padding. -- Upgrade chart to change the range dynamically as new data points come in, except for when - the user has manually zoomed in. diff --git a/docs/release-notes/1151-kubernetes.txt b/docs/release-notes/1151-kubernetes.txt deleted file mode 100644 index f73bdb60c25..00000000000 --- a/docs/release-notes/1151-kubernetes.txt +++ /dev/null @@ -1,26 +0,0 @@ -:orphan: - -**New Features** - -- Support for `Kubernetes `__. - - - Determined can now be deployed on Kubernetes. Determined workloads run as a - collection of pods, which allows standard Kubernetes tools for logging, - metrics, and tracing to be used. Determined is compatible with Kubernetes >= - 1.15, including managed Kubernetes services such as Google Kubernetes Engine - (GKE) and AWS Elastic Kubernetes Service (EKS). - - - When using Determined with Kubernetes, we currently do not support - fair-share scheduling, priority scheduling, per-experiment weights, or - gang-scheduling for distributed training experiments; workloads will be - scheduled according the behavior of the default Kubernetes scheduler. - - - Users can configure the behavior of the pods that are launched for - Determined workloads by specifying a :ref:`custom pod spec - `. A default pod spec can be configured when installing - Kubernetes, but a custom pod spec can also be specified on a per-task basis - (e.g., via the :ref:`environment.pod_spec ` field - in the experiment configuration file). - - - For more information on using Determined with Kubernetes, see the - :ref:`documentation `. diff --git a/docs/release-notes/1161-custom-mult-select.txt b/docs/release-notes/1161-custom-mult-select.txt deleted file mode 100644 index f0884cca852..00000000000 --- a/docs/release-notes/1161-custom-mult-select.txt +++ /dev/null @@ -1,9 +0,0 @@ -:orphan: - -**Feature Upgrades** - -- Update trial details multiple metric picker to display # of metrics selected instead of a series of tags. - - - The list of tags can take up a significant amount screen space since the selector will grow to fit all - of the selected tags. Showing the total count of the selected metrics all the selector to stay small - and manageable. diff --git a/docs/release-notes/1170-add-user-preferences.txt b/docs/release-notes/1170-add-user-preferences.txt deleted file mode 100644 index 372e426e15c..00000000000 --- a/docs/release-notes/1170-add-user-preferences.txt +++ /dev/null @@ -1,9 +0,0 @@ -:orphan: - -**New Features** - -- Save more WebUI user preferences to local storage, for restoring UI states for future revisits. - - - Experiment List Page: Save selected table sort column and order. - - Task List Page: Save selected table sort column and order. - - Experiment Detail Page: Save selected table sort column and order. diff --git a/docs/release-notes/1171-has-validation-filter.txt b/docs/release-notes/1171-has-validation-filter.txt deleted file mode 100644 index 683db6c3850..00000000000 --- a/docs/release-notes/1171-has-validation-filter.txt +++ /dev/null @@ -1,11 +0,0 @@ -:orphan: - -**New Features** - -- Update the Has Checkpoint filter on the trial details table to Has Checkpoint Or Validation filter. - If the trial batches has a checkpoint or a validation, it will show up when the filter toggle is enabled. - -- Add the ability save filter states to user preferences. The Has Checkpoint Or Validation will get applied - across all trial detail pages, while the Metrics filter save on a per unique experiment trial. If the user - selects metrics X, Y and Z on trial 1 of experiment 1, when they visit trial 5 of experiment 1, the metrics - X, Y and Z will be selected by default. diff --git a/docs/release-notes/1172-fix-dashboard-limit.txt b/docs/release-notes/1172-fix-dashboard-limit.txt deleted file mode 100644 index 4edc7f00592..00000000000 --- a/docs/release-notes/1172-fix-dashboard-limit.txt +++ /dev/null @@ -1,5 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Fix the issue where the Dashboard recent task list did not apply the limit filter properly diff --git a/docs/release-notes/1173-multiple-dtrain-single-node.txt b/docs/release-notes/1173-multiple-dtrain-single-node.txt deleted file mode 100644 index eef1e9879e8..00000000000 --- a/docs/release-notes/1173-multiple-dtrain-single-node.txt +++ /dev/null @@ -1,12 +0,0 @@ -:orphan: - -**New Features** - -- Support running multiple distributed training jobs on a single agent. - - - In previous versions of Determined, a distributed training job could only be - scheduled on an agent if it was configured to use all of the GPUs on that - agent. In this release, that restriction has been lifted: for example, an - agent with 8 GPUs can now be used to run two 4-GPU distributed training - jobs. This feature is particularly useful as a way to improve utilization - and fair resource allocation for smaller clusters. diff --git a/docs/release-notes/1176-fix-adaptive-asha-progress.txt b/docs/release-notes/1176-fix-adaptive-asha-progress.txt deleted file mode 100644 index 3cd49080d70..00000000000 --- a/docs/release-notes/1176-fix-adaptive-asha-progress.txt +++ /dev/null @@ -1,5 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Fix ``adaptive_asha`` progress in the event of failed trials. diff --git a/docs/release-notes/1200-fix-prior-batches-processed-backfilling.txt b/docs/release-notes/1200-fix-prior-batches-processed-backfilling.txt deleted file mode 100644 index 16fac547353..00000000000 --- a/docs/release-notes/1200-fix-prior-batches-processed-backfilling.txt +++ /dev/null @@ -1,7 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Fix a bug with a database migration in v0.13.0 which caused it to run slow and backfill - incorrect values. Users on versions v0.12.13 or earlier of Determined are recommended to upgrade - straight to v0.13.1. Users already on v0.13.0 should upgrade to v0.13.1 as usual. diff --git a/docs/release-notes/1201-fix-loading-old-tensorboards.txt b/docs/release-notes/1201-fix-loading-old-tensorboards.txt deleted file mode 100644 index 21c71508190..00000000000 --- a/docs/release-notes/1201-fix-loading-old-tensorboards.txt +++ /dev/null @@ -1,6 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Fix a bug that prevents tensorboards from experiments with old experiment configurations versions - from being loaded. diff --git a/docs/release-notes/1213-set-default-session.txt b/docs/release-notes/1213-set-default-session.txt deleted file mode 100644 index 07b0f4c21a8..00000000000 --- a/docs/release-notes/1213-set-default-session.txt +++ /dev/null @@ -1,5 +0,0 @@ -:orphan: - -**Bug Fixes** - -- Fix issue that was causing OOM for some distributed EstimatorTrial experiments. diff --git a/docs/release-notes/1228-migrate-experiments-list-api.txt b/docs/release-notes/1228-migrate-experiments-list-api.txt deleted file mode 100644 index 791029dbd04..00000000000 --- a/docs/release-notes/1228-migrate-experiments-list-api.txt +++ /dev/null @@ -1,9 +0,0 @@ -:orphan: - -**BUG FIXES** - -- Migrate to new experiments list endpoint to support table pagination. - -**NEW FEATURES** - -- Add number of trials per experiment as a column in experiments list page. From 2c1e2b6c0b23aeb7e0c9ba1838b5f229c537b93d Mon Sep 17 00:00:00 2001 From: Justin Chen Date: Thu, 3 Sep 2020 10:47:43 -0700 Subject: [PATCH 2/2] Update docs/release-notes.txt Co-authored-by: Vishnu Mohan --- docs/release-notes.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/release-notes.txt b/docs/release-notes.txt index 1202c753f0c..2efa69ef001 100644 --- a/docs/release-notes.txt +++ b/docs/release-notes.txt @@ -57,7 +57,7 @@ Version 0.13.2 - WebUI: Persist the sort order and sort column for the experiments, tasks, and trials tables to local storage. -- WebUI: Improve the default axis ranges for metrics charts. Also, update the range as new data points arrive. +- WebUI: Improve the default axes' ranges for metrics charts. Also, update the range as new data points arrive. - Add a warning when the PyTorch LR scheduler incorrectly uses an unwrapped optimizer. When using PyTorch with Determined, LR schedulers should be constructed using an optimizer that has been wrapped via the :meth:`~determined.pytorch.PyTorchTrialContext.wrap_optimizer` method.