Skip to content

Commit

Permalink
docs: improve docs for tensorboard_timeout. (#1124)
Browse files Browse the repository at this point in the history
Document default value, update TensorBoard how-to.
  • Loading branch information
neilconway authored Aug 20, 2020
1 parent d34f0be commit 03c6727
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 9 deletions.
16 changes: 11 additions & 5 deletions docs/how-to/tensorboard.txt
Original file line number Diff line number Diff line change
Expand Up @@ -142,9 +142,15 @@ add a :class:`~determined.tensorpack.TFEventWriter` callback to your trial:
Lifecycle Management
--------------------

Once a new TensorBoard instance has been scheduled onto the cluster, it
will remain running until you explicitly terminate it. This can be done
with ``det tensorboard kill <tensorboard-id>``:
Determined will automatically terminate idle TensorBoard instances. A
TensorBoard instance is considered idle if it is does not receive HTTP
traffic (a TensorBoard that is still being viewed by a web browser will not be
considered idle). By default, idle TensorBoards will be terminated after 5 minutes; the
timeout duration can be changed by editing ``tensorboard_timeout`` in the
:ref:`master config file <master-configuration>`.

You can also terminate TensorBoard instances by hand using ``det tensorboard
kill <tensorboard-id>``:

.. code::

Expand All @@ -159,5 +165,5 @@ Implementation Details

Determined schedules TensorBoard instances in containers that run on agent
machines. The Determined master will proxy HTTP requests to and from the
TensorBoard container. Although TensorBoard instances are hosted on
agent machines, they do not occupy GPUs.
TensorBoard container. TensorBoard instances are hosted on agent machines but
they do not occupy GPUs.
8 changes: 5 additions & 3 deletions docs/reference/cluster-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -233,8 +233,10 @@ The master supports the following configuration settings:
- ``root``: Specifies the root directory of the state files. Defaults to
``/usr/share/determined/master``.

- ``tensorboard_timeout``: Specifies the duration in seconds a TensorBoard
instance can be idle before it is automatically killed.
- ``tensorboard_timeout``: Specifies the duration in seconds before idle
TensorBoard instances are automatically terminated. A TensorBoard instance is
considered to be idle if it does not receive any HTTP traffic. The default
timeout is ``300`` (5 minutes).

- ``provisioner``: Specifies the configuration of dynamic agents.

Expand All @@ -246,7 +248,7 @@ The master supports the following configuration settings:
``public-ipv4``, ``local-hostname``, or ``public-hostname``. If
the master is deployed on GCP, rather than hardcoding the IP
address, we advise you use one of the following to set the host as
an alias: ``internal-ip`` or\ ``external-ip``. Which one you
an alias: ``internal-ip`` or ``external-ip``. Which one you
should select is based on your network configuration. On master
startup, we will replace the above alias host with its real value.
Defaults to ``http`` as scheme, local IP address as host, and
Expand Down
2 changes: 1 addition & 1 deletion master/internal/command/tensorboard_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ func (t *tensorboardManager) Receive(ctx *actor.Context) error {
}

if time.Now().After(service.LastRequested.Add(t.timeout)) {
ctx.Log().Infof("Killing %s due to inactivity", boardSummary.Config.Description)
ctx.Log().Infof("killing %s due to inactivity", boardSummary.Config.Description)
ctx.Ask(boardRef, &apiv1.KillTensorboardRequest{})
}
}
Expand Down

0 comments on commit 03c6727

Please sign in to comment.