Skip to content

Commit

Permalink
docs: various fixes for tf.keras docs.
Browse files Browse the repository at this point in the history
* Clarify docs on epoch boundaries
* Fix epoch size in Fashion MNIST example
* Make wrapper requirements more clear
* Better linking to API reference.
  • Loading branch information
neilconway committed Aug 19, 2020
1 parent 44fdf15 commit d59c66b
Show file tree
Hide file tree
Showing 11 changed files with 87 additions and 65 deletions.
41 changes: 18 additions & 23 deletions docs/reference/api/keras.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ determined.keras

.. autoclass:: determined.keras.TFKerasTrial
:members:
:exclude-members: trial_controller_class
:exclude-members: trial_controller_class, trial_context_class
:inherited-members:
:member-order: bysource
:special-members: __init__
Expand All @@ -20,7 +20,7 @@ Data Loading

There are five supported data types for loading data into ``tf.keras`` models:

#. A tuple ``(x, y)`` of Numpy arrays. x must be a Numpy array (or array-like),
#. A tuple ``(x, y)`` of Numpy arrays. x must be a NumPy array (or array-like),
a list of arrays (in case the model has multiple inputs), or
a dict mapping input names to the corresponding array, if the model has named inputs.
y should be a numpy array.
Expand All @@ -34,12 +34,12 @@ There are five supported data types for loading data into ``tf.keras`` models:
#. A ``keras.utils.Sequence`` returning a tuple of either (inputs, targets) or
(inputs, targets, sample weights).

#. A ``det.keras.SequenceAdapter`` returning a tuple of either (inputs, targets) or
#. A :class:`determined.keras.SequenceAdapter` returning a tuple of either (inputs, targets) or
(inputs, targets, sample weights).

Loading data is done by defining ``build_training_data_loader`` and
``build_validation_data_loader`` functions. Each should return one of the
supported data types mentioned above.
Loading data is done by defining :meth:`~determined.keras.TFKerasTrial.build_training_data_loader` and
:meth:`~determined.keras.TFKerasTrial.build_validation_data_loader`
methods. Each should return one of the supported data types mentioned above.


Optimizing Keras Sequences
Expand Down Expand Up @@ -75,30 +75,25 @@ Determined provides ``determined.keras.SequenceAdapter``.
Required Wrappers
~~~~~~~~~~~~~~~~~

Users are required wrap their model prior to compiling it using the
:func:`self.context.wrap_model <determined.keras.TFKerasTrialContext.wrap_model>`.
This is typically done inside ``determined.keras.TFKerasTrial.build_model()``.

.. autofunction:: determined.keras.TFKerasTrialContext.wrap_model
:noindex:
Users are required wrap their model prior to compiling it using
:meth:`self.context.wrap_model <determined.keras.TFKerasTrialContext.wrap_model>`.
This is typically done inside :meth:`~determined.keras.TFKerasTrial.build_model`.

If using ``tf.data.Dataset``, users are required to wrap both their training and
validation dataset in a Determined-provided wrapper. This wrapper is used to shard
the dataset for :ref:`multi-gpu-training`. For optimal performance, users should
wrap dataset immediately after creating it.

.. autofunction:: determined.keras.TFKerasContext.wrap_dataset
validation dataset using :meth:`self.context.wrap_dataset
<determined.keras.TFKerasTrialContext.wrap_dataset>`. This wrapper is used to
shard the dataset for :ref:`multi-gpu-training`. For optimal performance, users
should wrap a dataset immediately after creating it.


Trial Context
~~~~~~~~~~~~~

``determined.keras.TFKerasTrialContext`` subclasses :class:`determined.TrialContext`.
It provides useful methods for writing ``Trial`` subclasses. It also provides
the model and dataset wrappers.
``determined.keras.TFKerasTrialContext`` is a sub-class of
:class:`determined.TrialContext` that provides useful methods for writing
``tf.keras`` trial definitions, as well as functions to wrap the model and dataset.

.. autoclass:: determined.keras.TFKerasTrialContext
:noindex:
:members: wrap_model, wrap_dataset
:member-order: bysource

Expand All @@ -117,14 +112,14 @@ Callbacks

To execute arbitrary Python code during the lifecycle of a ``TFKerasTrial``, implement the standard
Keras callback interface ``tf.keras.callbacks.Callbacks`` and supply them to the ``TFKerasTrial``
by implementing ``TFKerasTrial.keras_callbacks``.
by implementing :meth:`~determined.keras.TFKerasTrial.keras_callbacks`.

.. autofunction:: determined.keras.TFKerasTrial.keras_callbacks

Native
~~~~~~

Disregard if using the trial API (subclassing ``determined.keras.TFKerasTrial``).
Disregard if using the trial API (subclassing :class:`~determined.keras.TFKerasTrial`).

.. _keras-init:

Expand Down
7 changes: 7 additions & 0 deletions docs/reference/experiment-config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -902,6 +902,8 @@ customizing the trial environment, refer to :ref:`custom-env`.

**Optional Fields**

.. _exp-environment-image:

``image``
The Docker image to use when executing the workload. This image must be
accessible via ``docker pull`` to every Determined agent machine in the
Expand Down Expand Up @@ -1127,3 +1129,8 @@ model for 64 epochs.
max_length:
epochs: 64
smaller_is_better: true

The epoch size configured here is only used for interpreting configuration
fields that are expressed in epochs. Actual epoch boundaries are still
determined by the dataset itself (specifically, the end of an epoch occurs when
the training data loader runs out of records).
2 changes: 1 addition & 1 deletion docs/tutorials/pytorch-mnist-tutorial.txt
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ Training the Model

Now that we have ported our model code to the trial API, we can use Determined to train a single instance of the model or to do a hyperparameter search. In Determined, a :ref:`trial <concept-trial>` is a training task that consists of a dataset, a deep learning model, and values for all of the model's hyperparameters. An :ref:`experiment <concept-experiment>` is a collection of one or more trials: an experiment can either train a single model (with a single trial), or can define a search over a user-defined hyperparameter space.

To create an experiment, we start by writing a configuration file that defines the kind of experiment we want to run. In this case, we want to train a single model for a fixed number of epochs, using fixed values for the model's hyperparameters:
To create an experiment, we start by writing a configuration file that defines the kind of experiment we want to run. In this case, we want to train a single model for a single epoch, using fixed values for the model's hyperparameters:

.. code:: yaml

Expand Down
12 changes: 6 additions & 6 deletions docs/tutorials/tf-mnist-tutorial.txt
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ As with any Python class, the ``__init__`` method is invoked to construct our tr
Building the Model
""""""""""""""""""

The ``build_model`` method returns a compiled ``tf.keras.Model`` object. The Fashion MNIST model code uses the Keras Sequential API and we can continue to use that API in our implementation of ``build_model``. The only minor difference is that the model needs to be wrapped by calling :func:`self.context.wrap_model() <determined.keras.TFKerasTrialContext.wrap_model>` before it is compiled.
The :meth:`~determined.keras.TFKerasTrial.build_model` method returns a compiled ``tf.keras.Model`` object. The Fashion MNIST model code uses the Keras Sequential API and we can continue to use that API in our implementation of ``build_model``. The only minor difference is that the model needs to be wrapped by calling :func:`self.context.wrap_model() <determined.keras.TFKerasTrialContext.wrap_model>` before it is compiled.

.. code:: python

Expand All @@ -102,7 +102,7 @@ The ``build_model`` method returns a compiled ``tf.keras.Model`` object. The Fas
Loading Data
""""""""""""

The last two methods we need to define are ``build_training_data_loader`` and ``build_validation_data_loader``. Determined uses these methods to load the training and validation datasets, respectively.
The last two methods we need to define are :meth:`~determined.keras.TFKerasTrial.build_training_data_loader` and :meth:`~determined.keras.TFKerasTrial.build_validation_data_loader`. Determined uses these methods to load the training and validation datasets, respectively.

Determined supports three ways of loading data into a ``tf.keras`` model: as a `tf.keras.utils.Sequence <https://www.tensorflow.org/api_docs/python/tf/keras/utils/Sequence>`__, a `tf.data.Dataset <https://www.tensorflow.org/api_docs/python/tf/data/Dataset>`__, or as a pair of NumPy arrays. Because the dataset is small, the Fashion MNIST model represents the data using NumPy arrays.

Expand All @@ -129,25 +129,25 @@ For more information on loading data in Determined, refer to the tutorial on :re
Training the Model
------------------

Now that we have ported our model code to the trial API, we can use Determined to train a single instance of the model or to do a hyperparameter search. In Determined, a :ref:`trial <concept-trial>` is a training task that consists of a dataset, a deep learning model, and values for all of the model's hyperparameters. An :ref:`experiment <concept-experiment>` is a collection of one or more trials: an experiment can either train a single model (with a single trial), or can define a search over a user-defined hyperparameter space.
Now that we have ported our model code to the trial API, we can use Determined to train a single instance of the model or to do a hyperparameter search. In Determined, a :ref:`trial <concept-trial>` is a training task that consists of a dataset, a deep learning model, and values for all of the model's hyperparameters. An :ref:`experiment <concept-experiment>` is a collection of one or more trials: an experiment can either train a single model (with a single trial), or it can perform a search over a user-defined hyperparameter space.

To create an experiment, we start by writing a configuration file which defines the kind of experiment we want to run. In this case, we want to train a single model for a fixed number of batches, using fixed values for the model's hyperparameters:
To create an experiment, we start by writing a configuration file which defines the kind of experiment we want to run. In this case, we want to train a single model for five epochs, using fixed values for the model's hyperparameters:

.. code:: yaml

description: fashion_mnist_keras_const
hyperparameters:
global_batch_size: 32
dense1: 128
records_per_epoch: 50_000
records_per_epoch: 50000
searcher:
name: single
metric: val_accuracy
max_length:
epochs: 5
entrypoint: model_def:FashionMNISTTrial

For this model, we have two hyperparameters: the size of the ``Dense`` layer and the batch size. We train the model on five epochs and should reach about 85% accuracy on the validation set, which mimics the original ``tf.keras`` implementation.
For this model, we have chosen two hyperparameters: the size of the ``Dense`` layer and the batch size. Training the model for five epochs should reach about 85% accuracy on the validation set, which matches the original ``tf.keras`` implementation.

The ``entrypoint`` specifies the name of the trial class to use. This is useful if our model code contains more than one trial class. In this case, we use an entrypoint of ``model_def:FashionMNISTTrial`` because our trial class is named ``FashionMNISTTrial`` and it is defined in a Python file named ``model_def.py``.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ hyperparameters:
type: int
minval: 32
maxval: 256
records_per_epoch: 28800
records_per_epoch: 50000
searcher:
name: adaptive_simple
metric: val_accuracy
Expand Down
2 changes: 1 addition & 1 deletion examples/official/trial/fashion_mnist_tf_keras/const.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ description: fashion_mnist_tf_keras_const
hyperparameters:
global_batch_size: 32
dense1: 128
records_per_epoch: 28800
records_per_epoch: 50000
searcher:
name: single
metric: val_accuracy
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,8 @@ hyperparameters:
global_batch_size: 32
dense1: 128
resources:
# Use 16 GPUs to train the model.
slots_per_trial: 16
records_per_epoch: 28800
slots_per_trial: 8
records_per_epoch: 50000
searcher:
name: single
metric: val_accuracy
Expand Down
3 changes: 1 addition & 2 deletions examples/official/trial/mnist_pytorch/distributed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ hyperparameters:
dropout1: 0.25
dropout2: 0.5
resources:
# Use 16 GPUs to train the model.
slots_per_trial: 16
slots_per_trial: 8
records_per_epoch: 50000
searcher:
name: single
Expand Down
7 changes: 4 additions & 3 deletions harness/determined/keras/_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,9 @@ def __getitem__(self, index): # type: ignore

class SequenceAdapter:
"""
A class to assist to optimize performance of tf.keras.sequence and help
with restoring and saving iterators for a dataset.
A class to assist to optimize the performance of loading data with
``tf.keras.utils.Sequence`` and help with restoring and saving iterators for
a dataset.
"""

def __init__(
Expand All @@ -154,7 +155,7 @@ def __init__(
If you want these performance accelerations, please consider using a Sequence.
Args:
sequence: A tf.keras.utils.Sequence that holds the data.
sequence: A ``tf.keras.utils.Sequence`` that holds the data.
use_multiprocessing: If True, use process-based threading. If unspecified,
`use_multiprocessing` will default to False. Note that because this implementation
Expand Down
3 changes: 1 addition & 2 deletions harness/determined/keras/_tf_keras_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ def wrap_dataset(self, dataset: Any, shard_dataset: bool = True) -> Any:
This should be used to wrap ``tf.data.Dataset`` objects immediately after
they have been created. Users should use the output of this wrapper as the
new instance of their dataset. If users create multiple datasets (e.g.,
one for training and one for testing), users should wrap each dataset
one for training and one for validation), users should wrap each dataset
independently.
Args:
Expand Down Expand Up @@ -206,7 +206,6 @@ def wrap_model(self, model: Any) -> Any:
Args:
model: tf.keras.Model
"""
return self._wrap_model_with_train_fn(model, None)

Expand Down
68 changes: 45 additions & 23 deletions harness/determined/keras/_tf_keras_trial.py
Original file line number Diff line number Diff line change
Expand Up @@ -589,22 +589,25 @@ def compute_validation_metrics(self) -> workload.Response:

class TFKerasTrial(det.Trial):
"""
``tf.keras`` trials are created by subclassing this abstract class.
Users must define all the abstract methods to create the deep
learning model associated with a specific trial, and to subsequently
train and evaluate it.
By default, experiments run with TensorFlow 1.x. To configure your trial to
use TensorFlow 2.x, set a TF 2.x image in the experiment configuration
(e.g. ``determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.2-gpu-0.5.0``).
By default, trials using TF 2.x use eager execution and trials using TF
1.x do not. If you want to override the default, you must call the
appropriate function in the ``__init__``. For example, if you want to
disable eager execution while running a TF 2.x trial, call
``tf.compat.v1.disable_eager_execution`` at the top of your
``__init__`` function.
To implement a new ``tf.keras`` trial, subclass this class and
implement the abstract methods described below (:meth:`build_model`,
:meth:`build_training_data_loader`, and :meth:`build_validation_data_loader`).
In most cases you should provide a custom :meth:`__init__` method as well.
By default, experiments use TensorFlow 1.x. To configure your trial to use
TensorFlow 2.x, specify a TensorFlow 2.x image in the
:ref:`environment.image <exp-environment-image>` field of the experiment
configuration (e.g.,
``determinedai/environments:cuda-10.1-pytorch-1.4-tf-2.2-gpu-0.5.0``).
Trials default to using eager execution with TensorFlow 2.x but not with
TensorFlow 1.x. To override the default behavior, call the appropriate
function in your ``__init__`` method. For example, if you want to disable
eager execution while using TensorFlow 2.x, call
``tf.compat.v1.disable_eager_execution`` at the top of your ``__init__`` method.
For more information on writing ``tf.keras`` trial classes, refer to the
:ref:`tutorial <tf-mnist-tutorial>`.
"""

trial_controller_class = TFKerasTrialController
Expand All @@ -624,13 +627,19 @@ def __init__(self, context: keras.TFKerasTrialContext) -> None:
@abstractmethod
def build_model(self) -> tf.keras.models.Model:
"""
Defines the deep learning architecture associated with a trial. The
Returns the deep learning architecture associated with a trial. The
architecture might depend on the current values of the model's
hyperparameters, which can be accessed via :func:`context.get_hparam()
<determined.TrialContext.get_hparam>`. This function returns a
``tf.keras.Model`` object. Users *must* compile this model by calling
``model.compile()`` on the ``tf.keras.Model`` instance before it is
returned.
``tf.keras.Model`` object.
After constructing the ``tf.keras.Model`` object, users **must** do two
things before returning it:
1. Wrap the model using :meth:`context.wrap_model()
<determined.keras.TFKerasTrialContext.wrap_model>`.
2. Compile the model using ``model.compile()``.
"""
pass

Expand Down Expand Up @@ -659,6 +668,11 @@ def build_training_data_loader(self) -> keras.InputData:
5) A :class:`determined.keras.SequenceAdapter` returning a tuple of either
``(inputs, targets)`` or ``(inputs, targets, sample weights)``.
When using ``tf.data.Dataset``, you must wrap the dataset using
:meth:`determined.keras.TFKerasTrialContext.wrap_dataset`. This wrapper is used
to shard the dataset for distributed training. For optimal performance, users
should wrap a dataset immediately after creating it.
.. warning::
If you are using ``tf.data.Dataset``, Determined’s support for
automatically checkpointing the dataset does not currently work correctly.
Expand Down Expand Up @@ -691,6 +705,11 @@ def build_validation_data_loader(self) -> keras.InputData:
5) A :class:`determined.keras.SequenceAdapter` returning a tuple of either
(inputs, targets) or (inputs, targets, sample weights).
When using ``tf.data.Dataset``, you must wrap the dataset using
:meth:`determined.keras.TFKerasTrialContext.wrap_dataset`. This wrapper is used
to shard the dataset for distributed training. For optimal performance, users
should wrap a dataset immediately after creating it.
"""
pass

Expand All @@ -713,9 +732,12 @@ def keras_callbacks(self) -> List[tf.keras.callbacks.Callback]:
Determined training behavior.
.. note::
If a callback is supplied that has implemented `keras.callbacks.Callback.on_epoch_end
If you specify a Keras callback that uses the `on_epoch_begin
<https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback#on_epoch_begin>`__
or <`on_epoch_end
<https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback#on_epoch_end>`__
, it will use the epoch length as determined by the length of the training dataset, not
the Determined configuration setting ``records_per_epoch`` of the associated experiment.
interfaces, epoch boundaries are determined by the length of the
training data set, not by the value of the Determined configuration
setting :ref:`records_per_epoch <config-records-per-epoch>`.
"""
return []

0 comments on commit d59c66b

Please sign in to comment.