Skip to content

Commit

Permalink
docs: update experiment config reference for records_per_epoch in PyT…
Browse files Browse the repository at this point in the history
…orchTrials (#8185)

update docs to clarify records_per_epoch requirements
  • Loading branch information
azhou-determined authored Oct 18, 2023
1 parent 1fb50cd commit bfd20f1
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 24 deletions.
50 changes: 27 additions & 23 deletions docs/reference/training/experiment-config-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,9 @@ Some configuration settings, such as searcher training lengths and budgets,
training units: records, batches, or epochs.

- ``records``: A *record* is a single labeled example (sometimes called a sample).

- ``batches``: A *batch* is a group of records. The number of records in a batch is configured via
the ``global_batch_size`` hyperparameter.

- ``epoch``: An *epoch* is a single copy of the entire training data set; the number of records in
an epoch is configured via the :ref:`records_per_epoch <config-records-per-epoch>` configuration
field.
- ``epoch``: An *epoch* is a single copy of the entire training data set.

For example, to specify the ``max_length`` for a searcher in terms of batches, the configuration
would read as shown below.
Expand All @@ -43,9 +39,10 @@ would read as shown below.
batches: 900
To express it in terms of records or epochs, ``records`` or ``epochs`` would be specified in place
of ``batches``. In the case of epochs, :ref:`records_per_epoch <config-records-per-epoch>` must also
be specified. Below is an example that configures a ``single`` searcher to train a model for 64
epochs.
of ``batches``. For :class:`~determined.pytorch.deepspeed.DeepSpeedTrial` and
:class:`~determined.keras.TFKerasTrial`, :ref:`records_per_epoch <config-records-per-epoch>` must
also be specified if using epochs. Below is an example that configures a ``single`` searcher to
train a model for 64 epochs.

.. code:: yaml
Expand Down Expand Up @@ -266,9 +263,10 @@ Optional. The number of records in the training data set. It must be configured
specify ``min_validation_period``, ``min_checkpoint_period``, and ``searcher.max_length`` in units
of ``epochs``.

- The system does not attempt to determine the size of an epoch automatically, because the size of
the training set might vary based on data augmentation, changes to external storage, or other
factors.
.. note::

For :class:`~determined.pytorch.PyTorchTrial`, epoch length is automatically determined using the
chief worker's dataset length, and this value will be ignored.

.. _max-restarts:

Expand Down Expand Up @@ -301,8 +299,9 @@ Optional. Specifies the minimum frequency at which validation should be run for
min_validation_period:
epochs: 2
- If this is in the unit of epochs, :ref:`records_per_epoch <config-records-per-epoch>` must be
specified.
- :class:`~determined.pytorch.deepspeed.DeepSpeedTrial` and
:class:`~determined.keras.TFKerasTrial`: If this is in the unit of epochs,
:ref:`records_per_epoch <config-records-per-epoch>` must be specified.

.. _experiment-config-perform-initial-validation:

Expand Down Expand Up @@ -341,8 +340,9 @@ Optional. Specifies the minimum frequency for running checkpointing for each tri
min_checkpoint_period:
epochs: 2
- If the unit is in epochs, you must also specify :ref:`records_per_epoch
<config-records-per-epoch>`.
- :class:`~determined.pytorch.deepspeed.DeepSpeedTrial` and
:class:`~determined.keras.TFKerasTrial`: If the unit is in epochs, you must also specify
:ref:`records_per_epoch <config-records-per-epoch>`.

``checkpoint_policy``
=====================
Expand Down Expand Up @@ -799,8 +799,9 @@ Required. The length of the trial.
max_length:
epochs: 2
- If this is in the unit of epochs, :ref:`records_per_epoch <config-records-per-epoch>` must be
specified.
- :class:`~determined.pytorch.deepspeed.DeepSpeedTrial` and
:class:`~determined.keras.TFKerasTrial`: If this is in the unit of epochs,
:ref:`records_per_epoch <config-records-per-epoch>` must be specified.

**Optional Fields**

Expand Down Expand Up @@ -857,8 +858,9 @@ Required. The length of each trial.
max_length:
epochs: 2
- If this is in the unit of epochs, :ref:`records_per_epoch <config-records-per-epoch>` must be
specified.
- :class:`~determined.pytorch.deepspeed.DeepSpeedTrial` and
:class:`~determined.keras.TFKerasTrial`: If this is in the unit of epochs,
:ref:`records_per_epoch <config-records-per-epoch>` must be specified.

**Optional Fields**

Expand Down Expand Up @@ -913,8 +915,9 @@ Required. The length of each trial.
max_length:
epochs: 2
- If this is in the unit of epochs, :ref:`records_per_epoch <config-records-per-epoch>` must be
specified.
- :class:`~determined.pytorch.deepspeed.DeepSpeedTrial` and
:class:`~determined.keras.TFKerasTrial`: If this is in the unit of epochs,
:ref:`records_per_epoch <config-records-per-epoch>` must be specified.

**Optional Fields**

Expand Down Expand Up @@ -974,8 +977,9 @@ to converge on the data set.
max_length:
epochs: 2
- If this is in the unit of epochs, :ref:`records_per_epoch <config-records-per-epoch>` must be
specified.
- :class:`~determined.pytorch.deepspeed.DeepSpeedTrial` and
:class:`~determined.keras.TFKerasTrial`: If this is in the unit of epochs,
:ref:`records_per_epoch <config-records-per-epoch>` must be specified.

``max_trials``
--------------
Expand Down
1 change: 0 additions & 1 deletion docs/tutorials/pytorch-mnist-tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,6 @@ fixed values for the model's hyperparameters:
n_filters2: 64
dropout1: 0.25
dropout2: 0.5
records_per_epoch: 50_000
searcher:
name: single
metric: validation_loss
Expand Down

0 comments on commit bfd20f1

Please sign in to comment.