Skip to content

Commit

Permalink
docs: add log signal release note and update docs (#10126)
Browse files Browse the repository at this point in the history
  • Loading branch information
jgongd authored Oct 25, 2024
1 parent 02fcc74 commit c7e0fb5
Show file tree
Hide file tree
Showing 6 changed files with 55 additions and 71 deletions.
47 changes: 30 additions & 17 deletions docs/reference/experiment-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -308,10 +308,11 @@ Optional. Defines actions and labels in response to trial logs matching specifie
language syntax). For more information about the syntax, you can visit this `RE2 reference page
<https://github.com/google/re2/wiki/Syntax>`__. Each log policy can have the following fields:

- ``name``: Optional. A name for the log policy. If provided, this name will be displayed as a
label in the UI when the log policy matches.
- ``name``: Required. The name of the log policy, displayed as a label in the WebUI when a log
policy match occurs.

- ``pattern``: Required. The regex pattern to match in the logs.
- ``pattern``: Optional. Defines a regex pattern to match log entries. If not specified, this
policy is disabled.

- ``action``: Optional. The action to take when the pattern is matched. Actions include:

Expand All @@ -336,24 +337,36 @@ Example configuration:
.. code:: yaml
log_policies:
- name: "ECC Error"
pattern: ".*uncorrectable ECC error encountered.*"
action:
type: exclude_node
- name: "CUDA OOM"
pattern: ".*CUDA out of memory.*"
action:
type: cancel_retries
When a log policy matches, its name (if provided) will be displayed as a label in the WebUI,
allowing for easy identification of specific issues or events during a run. These labels will appear
in both the run table and run detail views.
- name: ECC Error
pattern: ".*uncorrectable ECC error encountered.*"
action: exclude_node
- name: CUDA OOM
pattern: ".*CUDA out of memory.*"
action: cancel_retries
When a log policy matches, its name appears as a label in the WebUI, making it easy to identify
specific issues during a run. These labels are shown in both the run table and run detail views.

These settings may also be specified at the cluster or resource pool level through task container
defaults.

To find out more about log management features like **Log Search** and **Log Signal**, visit
:ref:`Log Management <log-management>`.
Default policies:

.. code:: yaml
log_policies:
- name: CUDA OOM
pattern: ".*CUDA out of memory.*"
- name: ECC Error
pattern: ".*uncorrectable ECC error encountered.*"
To disable showing labels from the default policies:

.. code:: yaml
log_policies:
- name: CUDA OOM
- name: ECC Error
.. _log-retention-days:

Expand Down
2 changes: 1 addition & 1 deletion docs/release-notes/log-search-improvement.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@
search result will take users directly to the relevant position in the log, allowing them to
easily view logs both before and after the matched entry. Additionally, add support for
regex-based searches, providing more flexible log filtering. For more details, refer to
:ref:`log_policies <config-log-policies>`.
:ref:`WebUI <web-ui-if>`.
10 changes: 10 additions & 0 deletions docs/release-notes/log-signal.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
:orphan:

**New Features**

- Experiments: Add a ``name`` field to ``log_policies``. When a log policy matches, its name shows
as a label in the WebUI, making it easy to spot specific issues during a run. Labels appear in
both the run table and run detail views.

In addition, there is a new format: ``name`` is required, and ``action`` is now a plain string.
For more details, refer to :ref:`log_policies <config-log-policies>`.
14 changes: 14 additions & 0 deletions docs/tools/webui-if.rst
Original file line number Diff line number Diff line change
Expand Up @@ -241,3 +241,17 @@ Clear the message with the following command:
.. code:: bash
det master cluster-message clear
****************************
Viewing Log Search Results
****************************

To perform a log search:

#. Navigate to your run in the WebUI.
#. In the Logs tab, start typing in the search box to open the search pane.
#. To use regex search, click the "Regex" checkbox in the search pane.
#. Click on a search result to view it in context, with logs before and after visible.
#. Scroll up and down to fetch new logs.

Note: Search results are not auto-updating. You may need to refresh to see new logs.
1 change: 0 additions & 1 deletion docs/tutorials/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ Examples let you build off of an existing model that already runs on Determined.
:hidden:

Quickstart for Model Developers <quickstart-mdldev>
Managing Logs and Log Policies <log-management>
Get Started with Detached Mode <detached-mode/_index>
Viewing Epoch-Based Metrics in the WebUI <viewing-epoch-based-metrics>
Using Pachyderm to Create a Batch Inferencing Pipeline <pachyderm-cat-dog>
Expand Down
52 changes: 0 additions & 52 deletions docs/tutorials/log-management.rst

This file was deleted.

0 comments on commit c7e0fb5

Please sign in to comment.