docs: various fixes for tf.keras docs.

* Clarify docs on epoch boundaries * Fix epoch size in Fashion MNIST example * Make wrapper requirements more clear * Better linking to API reference.
determined-ai · Aug 19, 2020 · d59c66b · d59c66b
1 parent 44fdf15
commit d59c66b
Show file tree

Hide file tree

Showing 11 changed files with 87 additions and 65 deletions.
diff --git a/docs/reference/api/keras.txt b/docs/reference/api/keras.txt
@@ -8,7 +8,7 @@ determined.keras
 
 .. autoclass:: determined.keras.TFKerasTrial
     :members:
-    :exclude-members: trial_controller_class
+    :exclude-members: trial_controller_class, trial_context_class
     :inherited-members:
     :member-order: bysource
     :special-members: __init__
@@ -20,7 +20,7 @@ Data Loading
 
 There are five supported data types for loading data into ``tf.keras`` models:
 
-#. A tuple ``(x, y)`` of Numpy arrays. x must be a Numpy array (or array-like),
+#. A tuple ``(x, y)`` of Numpy arrays. x must be a NumPy array (or array-like),
    a list of arrays (in case the model has multiple inputs), or
    a dict mapping input names to the corresponding array, if the model has named inputs.
    y should be a numpy array.
@@ -34,12 +34,12 @@ There are five supported data types for loading data into ``tf.keras`` models:
 #. A ``keras.utils.Sequence`` returning a tuple of either (inputs, targets) or
    (inputs, targets, sample weights).
 
-#. A ``det.keras.SequenceAdapter`` returning a tuple of either (inputs, targets) or
+#. A :class:`determined.keras.SequenceAdapter` returning a tuple of either (inputs, targets) or
    (inputs, targets, sample weights).
 
-Loading data is done by defining ``build_training_data_loader`` and
-``build_validation_data_loader`` functions. Each should return one of the
-supported data types mentioned above.
+Loading data is done by defining :meth:`~determined.keras.TFKerasTrial.build_training_data_loader` and
+:meth:`~determined.keras.TFKerasTrial.build_validation_data_loader`
+methods. Each should return one of the supported data types mentioned above.
 
 
 Optimizing Keras Sequences
@@ -75,30 +75,25 @@ Determined provides ``determined.keras.SequenceAdapter``.
 Required Wrappers
 ~~~~~~~~~~~~~~~~~
 
-Users are required wrap their model prior to compiling it using the
-:func:`self.context.wrap_model <determined.keras.TFKerasTrialContext.wrap_model>`.
-This is typically done inside ``determined.keras.TFKerasTrial.build_model()``.
-
-.. autofunction:: determined.keras.TFKerasTrialContext.wrap_model
-    :noindex:
+Users are required wrap their model prior to compiling it using
+:meth:`self.context.wrap_model <determined.keras.TFKerasTrialContext.wrap_model>`.
+This is typically done inside :meth:`~determined.keras.TFKerasTrial.build_model`.
 
 If using ``tf.data.Dataset``, users are required to wrap both their training and
-validation dataset in a Determined-provided wrapper. This wrapper is used to shard
-the dataset for :ref:`multi-gpu-training`. For optimal performance, users should
-wrap dataset immediately after creating it.
-
-.. autofunction:: determined.keras.TFKerasContext.wrap_dataset
+validation dataset using :meth:`self.context.wrap_dataset
+<determined.keras.TFKerasTrialContext.wrap_dataset>`. This wrapper is used to
+shard the dataset for :ref:`multi-gpu-training`. For optimal performance, users
+should wrap a dataset immediately after creating it.
 
 
 Trial Context
 ~~~~~~~~~~~~~
 
-``determined.keras.TFKerasTrialContext`` subclasses :class:`determined.TrialContext`.
-It provides useful methods for writing ``Trial`` subclasses. It also provides
-the model and dataset wrappers.
+``determined.keras.TFKerasTrialContext`` is a sub-class of
+:class:`determined.TrialContext` that provides useful methods for writing
+``tf.keras`` trial definitions, as well as functions to wrap the model and dataset.
 
 .. autoclass:: determined.keras.TFKerasTrialContext
-    :noindex:
     :members: wrap_model, wrap_dataset
     :member-order: bysource
 
@@ -117,14 +112,14 @@ Callbacks
 
 To execute arbitrary Python code during the lifecycle of a ``TFKerasTrial``, implement the standard
 Keras callback interface ``tf.keras.callbacks.Callbacks`` and supply them to the ``TFKerasTrial``
-by implementing ``TFKerasTrial.keras_callbacks``.
+by implementing :meth:`~determined.keras.TFKerasTrial.keras_callbacks`.
 
 .. autofunction:: determined.keras.TFKerasTrial.keras_callbacks
 
 Native
 ~~~~~~
 
-Disregard if using the trial API (subclassing ``determined.keras.TFKerasTrial``).
+Disregard if using the trial API (subclassing :class:`~determined.keras.TFKerasTrial`).
 
 .. _keras-init:
 

diff --git a/docs/reference/experiment-config.txt b/docs/reference/experiment-config.txt
@@ -902,6 +902,8 @@ customizing the trial environment, refer to :ref:`custom-env`.
 
 **Optional Fields**
 
+.. _exp-environment-image:
+
 ``image``
   The Docker image to use when executing the workload. This image must be
   accessible via ``docker pull`` to every Determined agent machine in the
@@ -1127,3 +1129,8 @@ model for 64 epochs.
     max_length:
       epochs: 64
     smaller_is_better: true
+
+The epoch size configured here is only used for interpreting configuration
+fields that are expressed in epochs. Actual epoch boundaries are still
+determined by the dataset itself (specifically, the end of an epoch occurs when
+the training data loader runs out of records).
diff --git a/docs/tutorials/pytorch-mnist-tutorial.txt b/docs/tutorials/pytorch-mnist-tutorial.txt
@@ -182,7 +182,7 @@ Training the Model
 
 Now that we have ported our model code to the trial API, we can use Determined to train a single instance of the model or to do a hyperparameter search. In Determined, a :ref:`trial <concept-trial>` is a training task that consists of a dataset, a deep learning model, and values for all of the model's hyperparameters. An :ref:`experiment <concept-experiment>` is a collection of one or more trials: an experiment can either train a single model (with a single trial), or can define a search over a user-defined hyperparameter space.
 
-To create an experiment, we start by writing a configuration file that defines the kind of experiment we want to run. In this case, we want to train a single model for a fixed number of epochs, using fixed values for the model's hyperparameters:
+To create an experiment, we start by writing a configuration file that defines the kind of experiment we want to run. In this case, we want to train a single model for a single epoch, using fixed values for the model's hyperparameters:
 
 .. code:: yaml
 

diff --git a/docs/tutorials/tf-mnist-tutorial.txt b/docs/tutorials/tf-mnist-tutorial.txt
@@ -79,7 +79,7 @@ As with any Python class, the ``__init__`` method is invoked to construct our tr
 Building the Model
 """"""""""""""""""
 
-The ``build_model`` method returns a compiled ``tf.keras.Model`` object. The Fashion MNIST model code uses the Keras Sequential API and we can continue to use that API in our implementation of ``build_model``. The only minor difference is that the model needs to be wrapped by calling :func:`self.context.wrap_model() <determined.keras.TFKerasTrialContext.wrap_model>` before it is compiled.
+The :meth:`~determined.keras.TFKerasTrial.build_model` method returns a compiled ``tf.keras.Model`` object. The Fashion MNIST model code uses the Keras Sequential API and we can continue to use that API in our implementation of ``build_model``. The only minor difference is that the model needs to be wrapped by calling :func:`self.context.wrap_model() <determined.keras.TFKerasTrialContext.wrap_model>` before it is compiled.
 
 .. code:: python
 
@@ -102,7 +102,7 @@ The ``build_model`` method returns a compiled ``tf.keras.Model`` object. The Fas
 Loading Data
 """"""""""""
 
-The last two methods we need to define are ``build_training_data_loader`` and ``build_validation_data_loader``. Determined uses these methods to load the training and validation datasets, respectively.
+The last two methods we need to define are :meth:`~determined.keras.TFKerasTrial.build_training_data_loader` and :meth:`~determined.keras.TFKerasTrial.build_validation_data_loader`. Determined uses these methods to load the training and validation datasets, respectively.
 
 Determined supports three ways of loading data into a ``tf.keras`` model: as a `tf.keras.utils.Sequence <https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence>`__, a `tf.data.Dataset <https://www.tensorflow.org/api_docs/python/tf/data/Dataset>`__, or as a pair of NumPy arrays. Because the dataset is small, the Fashion MNIST model represents the data using NumPy arrays.
 
@@ -129,25 +129,25 @@ For more information on loading data in Determined, refer to the tutorial on :re
 Training the Model
 ------------------
 
-Now that we have ported our model code to the trial API, we can use Determined to train a single instance of the model or to do a hyperparameter search. In Determined, a :ref:`trial <concept-trial>` is a training task that consists of a dataset, a deep learning model, and values for all of the model's hyperparameters. An :ref:`experiment <concept-experiment>` is a collection of one or more trials: an experiment can either train a single model (with a single trial), or can define a search over a user-defined hyperparameter space.
+Now that we have ported our model code to the trial API, we can use Determined to train a single instance of the model or to do a hyperparameter search. In Determined, a :ref:`trial <concept-trial>` is a training task that consists of a dataset, a deep learning model, and values for all of the model's hyperparameters. An :ref:`experiment <concept-experiment>` is a collection of one or more trials: an experiment can either train a single model (with a single trial), or it can perform a search over a user-defined hyperparameter space.
 
-To create an experiment, we start by writing a configuration file which defines the kind of experiment we want to run. In this case, we want to train a single model for a fixed number of batches, using fixed values for the model's hyperparameters:
+To create an experiment, we start by writing a configuration file which defines the kind of experiment we want to run. In this case, we want to train a single model for five epochs, using fixed values for the model's hyperparameters:
 
 .. code:: yaml
 
     description: fashion_mnist_keras_const
     hyperparameters:
         global_batch_size: 32
         dense1: 128
-    records_per_epoch: 50_000
+    records_per_epoch: 50000
     searcher:
         name: single
         metric: val_accuracy
         max_length:
           epochs: 5
     entrypoint: model_def:FashionMNISTTrial
 
-For this model, we have two hyperparameters: the size of the ``Dense`` layer and the batch size. We train the model on five epochs and should reach about 85% accuracy on the validation set, which mimics the original ``tf.keras`` implementation.
+For this model, we have chosen two hyperparameters: the size of the ``Dense`` layer and the batch size. Training the model for five epochs should reach about 85% accuracy on the validation set, which matches the original ``tf.keras`` implementation.
 
 The ``entrypoint`` specifies the name of the trial class to use. This is useful if our model code contains more than one trial class. In this case, we use an entrypoint of ``model_def:FashionMNISTTrial`` because our trial class is named ``FashionMNISTTrial`` and it is defined in a Python file named ``model_def.py``.
 

diff --git a/examples/official/trial/fashion_mnist_tf_keras/adaptive.yaml b/examples/official/trial/fashion_mnist_tf_keras/adaptive.yaml
@@ -5,7 +5,7 @@ hyperparameters:
     type: int
     minval: 32
     maxval: 256
-records_per_epoch: 28800
+records_per_epoch: 50000
 searcher:
   name: adaptive_simple
   metric: val_accuracy

diff --git a/examples/official/trial/fashion_mnist_tf_keras/const.yaml b/examples/official/trial/fashion_mnist_tf_keras/const.yaml
@@ -2,7 +2,7 @@ description: fashion_mnist_tf_keras_const
 hyperparameters:
   global_batch_size: 32
   dense1: 128
-records_per_epoch: 28800
+records_per_epoch: 50000
 searcher:
   name: single
   metric: val_accuracy

diff --git a/examples/official/trial/fashion_mnist_tf_keras/distributed.yaml b/examples/official/trial/fashion_mnist_tf_keras/distributed.yaml
@@ -3,9 +3,8 @@ hyperparameters:
   global_batch_size: 32
   dense1: 128
 resources:
-  # Use 16 GPUs to train the model.
-  slots_per_trial: 16
-records_per_epoch: 28800
+  slots_per_trial: 8
+records_per_epoch: 50000
 searcher:
   name: single
   metric: val_accuracy

diff --git a/examples/official/trial/mnist_pytorch/distributed.yaml b/examples/official/trial/mnist_pytorch/distributed.yaml
@@ -9,8 +9,7 @@ hyperparameters:
   dropout1: 0.25
   dropout2: 0.5
 resources:
-  # Use 16 GPUs to train the model.
-  slots_per_trial: 16
+  slots_per_trial: 8
 records_per_epoch: 50000
 searcher:
   name: single

diff --git a/harness/determined/keras/_data.py b/harness/determined/keras/_data.py
@@ -138,8 +138,9 @@ def __getitem__(self, index):  # type: ignore
 
 class SequenceAdapter:
     """
-    A class to assist to optimize performance of tf.keras.sequence and help
-    with restoring and saving iterators for a dataset.
+    A class to assist to optimize the performance of loading data with
+    ``tf.keras.utils.Sequence`` and help with restoring and saving iterators for
+    a dataset.
     """
 
     def __init__(
@@ -154,7 +155,7 @@ def __init__(
         If you want these performance accelerations, please consider using a Sequence.
 
         Args:
-            sequence: A tf.keras.utils.Sequence that holds the data.
+            sequence: A ``tf.keras.utils.Sequence`` that holds the data.
 
             use_multiprocessing: If True, use process-based threading. If unspecified,
                 `use_multiprocessing` will default to False. Note that because this implementation

diff --git a/harness/determined/keras/_tf_keras_context.py b/harness/determined/keras/_tf_keras_context.py
@@ -168,7 +168,7 @@ def wrap_dataset(self, dataset: Any, shard_dataset: bool = True) -> Any:
         This should be used to wrap ``tf.data.Dataset`` objects immediately after
         they have been created. Users should use the output of this wrapper as the
         new instance of their dataset. If users create multiple datasets (e.g.,
-        one for training and one for testing), users should wrap each dataset
+        one for training and one for validation), users should wrap each dataset
         independently.
 
         Args:
@@ -206,7 +206,6 @@ def wrap_model(self, model: Any) -> Any:
 
         Args:
             model: tf.keras.Model
-
         """
         return self._wrap_model_with_train_fn(model, None)
 

diff --git a/harness/determined/keras/_tf_keras_trial.py b/harness/determined/keras/_tf_keras_trial.py
@@ -589,22 +589,25 @@ def compute_validation_metrics(self) -> workload.Response:
 
 class TFKerasTrial(det.Trial):
     """
-    ``tf.keras`` trials are created by subclassing this abstract class.
-
-    Users must define all the abstract methods to create the deep
-    learning model associated with a specific trial, and to subsequently
-    train and evaluate it.
-
-    By default, experiments run with TensorFlow 1.x. To configure your trial to
-    use TensorFlow 2.x, set a TF 2.x image in the experiment configuration
-    (e.g. ``determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.2-gpu-0.5.0``).
-
-    By default, trials using TF 2.x use eager execution and trials using TF
-    1.x do not. If you want to override the default, you must call the
-    appropriate function in the ``__init__``. For example, if you want to
-    disable eager execution while running a TF 2.x trial, call
-    ``tf.compat.v1.disable_eager_execution`` at the top of your
-    ``__init__`` function.
+    To implement a new ``tf.keras`` trial, subclass this class and
+    implement the abstract methods described below (:meth:`build_model`,
+    :meth:`build_training_data_loader`, and :meth:`build_validation_data_loader`).
+    In most cases you should provide a custom :meth:`__init__` method as well.
+
+    By default, experiments use TensorFlow 1.x. To configure your trial to use
+    TensorFlow 2.x, specify a TensorFlow 2.x image in the
+    :ref:`environment.image <exp-environment-image>` field of the experiment
+    configuration (e.g.,
+    ``determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.2-gpu-0.5.0``).
+
+    Trials default to using eager execution with TensorFlow 2.x but not with
+    TensorFlow 1.x. To override the default behavior, call the appropriate
+    function in your ``__init__`` method. For example, if you want to disable
+    eager execution while using TensorFlow 2.x, call
+    ``tf.compat.v1.disable_eager_execution`` at the top of your ``__init__`` method.
+
+    For more information on writing ``tf.keras`` trial classes, refer to the
+    :ref:`tutorial <tf-mnist-tutorial>`.
     """
 
     trial_controller_class = TFKerasTrialController
@@ -624,13 +627,19 @@ def __init__(self, context: keras.TFKerasTrialContext) -> None:
     @abstractmethod
     def build_model(self) -> tf.keras.models.Model:
         """
-        Defines the deep learning architecture associated with a trial.  The
+        Returns the deep learning architecture associated with a trial.  The
         architecture might depend on the current values of the model's
         hyperparameters, which can be accessed via :func:`context.get_hparam()
         <determined.TrialContext.get_hparam>`.  This function returns a
-        ``tf.keras.Model`` object. Users *must* compile this model by calling
-        ``model.compile()`` on the ``tf.keras.Model`` instance before it is
-        returned.
+        ``tf.keras.Model`` object.
+
+        After constructing the ``tf.keras.Model`` object, users **must** do two
+        things before returning it:
+
+          1. Wrap the model using :meth:`context.wrap_model()
+             <determined.keras.TFKerasTrialContext.wrap_model>`.
+
+          2. Compile the model using ``model.compile()``.
         """
         pass
 
@@ -659,6 +668,11 @@ def build_training_data_loader(self) -> keras.InputData:
             5) A :class:`determined.keras.SequenceAdapter` returning a tuple of either
             ``(inputs, targets)`` or ``(inputs, targets, sample weights)``.
 
+        When using ``tf.data.Dataset``, you must wrap the dataset using
+        :meth:`determined.keras.TFKerasTrialContext.wrap_dataset`. This wrapper is used
+        to shard the dataset for distributed training. For optimal performance, users
+        should wrap a dataset immediately after creating it.
+
         .. warning::
             If you are using ``tf.data.Dataset``, Determined’s support for
             automatically checkpointing the dataset does not currently work correctly.
@@ -691,6 +705,11 @@ def build_validation_data_loader(self) -> keras.InputData:
 
             5) A :class:`determined.keras.SequenceAdapter` returning a tuple of either
             (inputs, targets) or (inputs, targets, sample weights).
+
+        When using ``tf.data.Dataset``, you must wrap the dataset using
+        :meth:`determined.keras.TFKerasTrialContext.wrap_dataset`. This wrapper is used
+        to shard the dataset for distributed training. For optimal performance, users
+        should wrap a dataset immediately after creating it.
         """
         pass
 
@@ -713,9 +732,12 @@ def keras_callbacks(self) -> List[tf.keras.callbacks.Callback]:
         Determined training behavior.
 
         .. note::
-            If a callback is supplied that has implemented `keras.callbacks.Callback.on_epoch_end
+            If you specify a Keras callback that uses the `on_epoch_begin
+            <https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback#on_epoch_begin>`__
+            or <`on_epoch_end
             <https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback#on_epoch_end>`__
-            , it will use the epoch length as determined by the length of the training dataset, not
-            the Determined configuration setting ``records_per_epoch`` of the associated experiment.
+            interfaces, epoch boundaries are determined by the length of the
+            training data set, not by the value of the Determined configuration
+            setting :ref:`records_per_epoch <config-records-per-epoch>`.
         """
         return []