Skip to content

Commit

Permalink
docs: Release notes for 0.13.0. (#1146)
Browse files Browse the repository at this point in the history
* docs: Release notes for 0.13.0.

* docs: minor fixes for release notes.

* docs: fixes for release notes.

* Release note tweaks.

* docs: release note fixes.

* docs: work on release notes.

* docs: release note tweaks.

Co-authored-by: Neil Conway <[email protected]>
(cherry picked from commit 35b99ee)
  • Loading branch information
determined-dsw committed Aug 26, 2020
1 parent 80a4dc2 commit 818c22c
Show file tree
Hide file tree
Showing 2 changed files with 143 additions and 4 deletions.
8 changes: 4 additions & 4 deletions docs/reference/experiment-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1149,11 +1149,11 @@ the :ref:`training units<experiment-configuration_training_units>` records, batc
instead of steps.

This migration guide describes the steps to migrate your experiment configurations from v0.12.13 to
v0.13.0 while maintaining nearly identical behavior.
v0.13.0 while maintaining nearly identical behavior.

.. warning ::
Before migrating, make sure to cancel or kill all experiments in the ``ACTIVE`` or ``CANCELED``

Before migrating, make sure to cancel or kill all experiments in the ``ACTIVE`` or ``PAUSED``
state, as they will not be able to resume on the new version of the Determined master. Also,
we recommend taking a database snapshot and archiving any old experiments ahead of time.

Expand Down Expand Up @@ -1212,4 +1212,4 @@ value for ``min_validation_period`` to ``{batches: 100*1}``, and change the valu
max_length:
batches: 1000

Finally, we also rename ``batches_per_step`` to ``scheduling_unit``, leaving the value the same.
Finally, we also rename ``batches_per_step`` to ``scheduling_unit``, leaving the value the same.
139 changes: 139 additions & 0 deletions docs/release-notes.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,145 @@
Release Notes
=============

Version 0.13
------------

Version 0.13.0
^^^^^^^^^^^^^^

**Release Date:** August 20, 2020

This release of Determined introduces several significant new features and modifications to existing features. When upgrading from a prior release of Determined, users should pay particular attention to the following changes:

- The concept of "steps" has been removed from the CLI, WebUI, APIs, and configuration files. Before upgrading, **terminate all active and paused experiments** (e.g., via ``det experiment cancel`` or ``det experiment kill``). The format of the experiment config file has changed -- configuration files that worked with previous versions of Determined will need to be updated to work with Determined >= 0.13.0. For more details, see the notes below or the :ref:`migration guide <migration-guide_remove-steps>`.

- The WebUI has been partially rewritten, moving several components that were implemented in Elm to now being written in React and TypeScript. As part of this change, many improvements to the performance, appearance, and usability of the WebUI have been made. For more details, see the list of changes below. Please notify the Determined team of any regressions in functionality.

- The usability of the ``det shell`` feature has been significantly enhanced. As part of this change, the way in which arguments to ``det shell`` are parsed has changed; see details below.

*We recommend taking a backup of the database before upgrading Determined.*

**New Features**

- Allow trial containers to connect to the master using TLS.

- Allow agent's TLS verification to skip verification or use a custom certificate for the master.

- For :class:`~determined.keras.TFKerasTrial` and :class:`~determined.estimator.EstimatorTrial`, add support for disabling automatic sharding of the training dataset when doing distributed training. When wrapping a dataset via ``context.wrap_dataset``, users can now pass ``shard_dataset=False``. If this is done, users are responsible for splitting their dataset in such a manner that every GPU (rank) sees unique data.

**Improvements**

- **Remove Steps from the UX:** Remove the concept of a "step" from the CLI, WebUI, and configuration files. Add new configuration settings to allow settings previously in terms of steps to be configured instead in terms of records, batches or epochs. See the :ref:`migration guide <migration-guide_remove-steps>` for details on migrating from the old configuration to the new configuration.

- Many configuration settings can now be set in terms of records, batches or epochs. For example, a single searcher can be configured to run for 100 records by setting ``max_length: {records: 100}``, 100 batches by setting ``max_length: {batches: 100}``, or 100 epochs by setting ``records_per_epoch`` at the root of the config and ``max_length: {epochs: 100}``.
- A new configuration setting, ``records_per_epoch``, is added that must be specified when any quantity is configured in terms of epochs.
- **Breaking Change:** For single, random and grid searchers ``searcher.max_steps`` has been replaced by ``searcher.max_length``
- **Breaking Change:** For ASHA based searchers, ``searcher.target_trial_steps`` and ``searcher.step_budget`` has been replaced by ``searcher.max_length`` and ``searcher.budget``, respectively.
- **Breaking Change:** For PBT, ``searcher.steps_per_round`` has been replaced by ``searcher.length_per_round``.
- **Breaking Change:** For all experiments, the names for ``min_validation_period`` and ``min_checkpoint_period`` are unchanged but they are now configured in terms of records, batches or epochs.

- **Shell Mode Improvements:** Determined supports launching GPU-attached terminal sessions via ``det shell``. This release includes several changes to improve the usability of this feature, including:

- The ``determined`` and ``determined-cli`` Python packages are now automatically installed inside containers launched by ``det shell``. Any user-defined environment variables for the task image will be passed into the ssh sessions opened via ``det shell start`` or ``det shell open``.

- ``det shell`` should now work correctly in "host" networking mode.

- ``det shell`` should now work correctly with dynamic agents and in cloud environments.

- **Breaking Change:** Change how additional arguments to ``ssh`` are passed through ``det shell start`` and ``det shell open``. Previously they were passed as a single string, like ``det shell open SHELL_ID --ssh-opt '-X -Y -o SomeSetting="some string"'``, but now the ``--ssh-opt`` has been removed and all extra positional arguments are passed through without requiring double-layers of quoting, like ``det shell open SHELL_ID -- -X -Y -o SomeSetting="some string"`` (note the use of ``--`` to indicate all following arguments are positional arguments).

- **WebUI changes**

- Tasks List: ``/det/tasks``

- Consolidate notebooks, tensorboards, shells, commands into single list page.
- Add type filter to control which task types to display. By default all task types are shown when none of the types are selected.
- Add type column with iconography to train users to familiarize task types with visual indicators.
- Convert State filter from multi-select to single-select.
- Convert actions from expanded buttons to overflow menu (triple vertical dots).
- Move notebook launch buttons to task list from notebook list page.
- Add pagination support that auto turns on when entries extend beyond 10 entries.
- Add list of TensorBoard sources in a table Source column.

- Experiment List: ``/det/experiments``

- State filter converted from multi-select to single-select.
- Convert actions from expanded buttons to overflow menu (triple vertical dots).
- Batch operation logic change to available if the action can be applied to any of the selected experiments
- Add pagination support that auto turns on when entries extend beyond 10 entries.

- Experiment Detail: ``/det/experiments/<id>``

- Implement charting with Plotly with zooming capability.
- Trial table paginates on the WebUI side in preparation for API pagination in the near future.
- Convert steps to batches in trials table and metric chart.
- Update continue trial flow to use batches, epochs or records.
- Use Monaco editor for the experiment config with YAML syntax highlighting.
- Add links to source for Checkpoint modal view, allowing users to navigate to the corresponding experiment or trial for the checkpoint.

- Trial Detail: ``/det/trials/<id>``

- Add trial information table.
- Add trial metrics chart.
- Implement charting with Plotly with zooming capability.
- Trial info table paginates on the WebUI side in preparation for API pagination in the near future.
- Add support for batches, records and epochs for experiment config.
- Convert metric chart to show batches.
- Convert steps table to batches table.

- Master Logs: ``/det/logs``, Trial Logs: ``/det/trials/<id>/logs``, Task Logs: ``/det/<tasktype>/<id>/logs``

- Limit logs to 1000 lines for initial load and load an additional 1000 for each subsequent fetch of older logs.
- Use new log viewer optimized for efficient rendering.
- Introduce log line numbers.
- Add ANSI color support.
- Add error, warning, and debug visual icons and colors.
- Add tailing button to enable tailing log behavior.
- Add scroll to top button to load older logs out
- Fix back and forth scrolling behavior on log viewer.

- Cluster: ``/det/cluster``

- Separate out GPU from CPU resources.
- Show resource availability and resource count (per type).
- Render each resource as a donut chart.

- Navigation

- Update sidebar navigation for new task and experiment list pages.
- Add link to new swagger API documentation.
- Hide pagination controls for tables with less than 10 entries.

**Bug Fixes**

- Configuration: Do not load the entire experiment configuration when trying to check if an experiment is valid to be archived or unarchived.

- Configuration: Improve the master to validation hyperparameter configurations when experiments are submitted. Currently, the master checks whether ``global_batch_size`` has been specified and if it is numeric.

- Logs: Fix issue of not detecting newlines in the log messages, particularly Kubernetes log messages.

- Logs: Add intermediate step to trial log download to alert user that the CLI is the recommended action, especially for large logs.

- Searchers: Fix a bug in the SHA searcher caused by the promotion of already-exited trials.

- Security: Apply user authentication to streaming endpoints.

- Tasks: Allow the master certificate file to be readable even for a non-root task.

- TensorBoard: Fix issue affecting TensorBoards on AWS in us-east-1 region.

- TensorBoard: Recursively search for tfevents files in subdirectories, not just the top level log directory.

- WebUI: Fix scrolling issue that occurs when older logs are loaded, the tailing behavior is enabled, and the view is scrolled up.

- WebUI: Fix colors used for different states in the cluster resources chart.

- WebUI: Correct the numbers in the ``Batches`` column on the experiment list page.

- WebUI: Fix cluster and dashboard reporting for disabled slots.

- WebUI: Fix issue of archive/unarchive not showing up properly under the task actions.

Version 0.12
------------

Expand Down

0 comments on commit 818c22c

Please sign in to comment.