Skip to content

Commit

Permalink
rST Fixes for Developer Docs (#8535)
Browse files Browse the repository at this point in the history
* Update checkpoint.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update configs.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update intro.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update intro.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update neva.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update checkpoint.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update configs.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update controlnet.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update datasets.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update dreambooth.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update imagen.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update insp2p.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update sd.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update checkpoint.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update clip.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update mcore_customization.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update retro_model.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update migration-guide.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update nemo_forced_aligner.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update checkpoints.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update datasets.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update g2p.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update checkpoint.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update configs.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update datasets.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update vit.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update core.rst

Signed-off-by: Andrew Schilling <[email protected]>

* Update export.rst

Signed-off-by: Andrew Schilling <[email protected]>

---------

Signed-off-by: Andrew Schilling <[email protected]>
  • Loading branch information
aschilling-nv authored Feb 28, 2024
1 parent 0796199 commit 5f95f50
Show file tree
Hide file tree
Showing 27 changed files with 173 additions and 153 deletions.
3 changes: 1 addition & 2 deletions docs/source/core/core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -201,8 +201,7 @@ First, instantiate the model and trainer, then call ``.fit``:
# Or we can run the test loop on test data by calling
trainer.test(model=model)
All `trainer flags <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#trainer-flags>`_ can be set from from the
NeMo configuration.
All `trainer flags <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html#trainer-flags>`_ can be set from from the NeMo configuration.


Configuration
Expand Down
3 changes: 3 additions & 0 deletions docs/source/core/export.rst
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ To facilitate that, the hooks below are provided. To export, for example, 'encod
Some nertworks may be exported differently according to user-settable options (like ragged batch support for TTS or cache support for ASR). To facilitate that - `set_export_config()` method is provided by Exportable to set key/value pairs to predefined model.export_config dictionary, to be used during the export:

.. code-block:: Python
def set_export_config(self, args):
"""
Sets/updates export_config dictionary
Expand All @@ -207,6 +208,7 @@ An example can be found in ``<NeMo_git_root>/nemo/collections/asr/models/rnnt_mo
Here is example on now `set_export_config()` call is being tied to command line arguments in ``<NeMo_git_root>/scripts/export.py`` :

.. code-block:: Python
python scripts/export.py hybrid_conformer.nemo hybrid_conformer.onnx --export-config decoder_type=ctc
Exportable Model Code
Expand All @@ -217,6 +219,7 @@ Most importantly, the actual Torch code in your model should be ONNX or TorchScr
#. Create your model ``Exportable`` and add an export unit test, to catch any operation/construct not supported in ONNX/TorchScript, immediately.

For more information, refer to the PyTorch documentation:

- `List of supported operators <https://pytorch.org/docs/stable/onnx.html#supported-operators>`_
- `Tracing vs. scripting <https://pytorch.org/docs/stable/onnx.html#tracing-vs-scripting>`_
- `AlexNet example <https://pytorch.org/docs/stable/onnx.html#example-end-to-end-alexnet-from-pytorch-to-onnx>`_
Expand Down
2 changes: 1 addition & 1 deletion docs/source/multimodal/mllm/checkpoint.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ For conversion:
Model Parallelism Adjustment
-------------------------
----------------------------

NeVA Checkpoints
^^^^^^^^^^^^^^^^
Expand Down
26 changes: 13 additions & 13 deletions docs/source/multimodal/mllm/configs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,19 +123,19 @@ Each configuration file should detail the model architecture used for the experi

The parameters commonly shared across most multimodal language models include:

+---------------------------+--------------+---------------------------------------------------------------------------------------+
| **Parameter** | **Datatype** | **Description** |
+===========================+==============+=======================================================================================+
| :code:`micro_batch_size` | int | micro batch size that fits on each GPU |
+---------------------------+--------------+---------------------------------------------------------------------------------------+
| :code:`global_batch_size` | int | global batch size that takes consideration of gradient accumulation, data parallelism |
+---------------------------+--------------+---------------------------------------------------------------------------------------+
| :code:`tensor_model_parallel_size` | int | intra-layer model parallelism |
+---------------------------+--------------+---------------------------------------------------------------------------------------+
| :code:`pipeline_model_parallel_size` | int | inter-layer model parallelism |
+---------------------------+--------------+---------------------------------------------------------------------------------------+
| :code:`seed` | int | seed used in training |
+---------------------------+--------------+---------------------------------------------------------------------------------------+
+------------------------------------------+--------------+---------------------------------------------------------------------------------------+
| **Parameter** | **Datatype** | **Description** |
+===========================+==============+==============+=======================================================================================+
| :code:`micro_batch_size` | int | micro batch size that fits on each GPU |
+------------------------------------------+--------------+---------------------------------------------------------------------------------------+
| :code:`global_batch_size` | int | global batch size that takes consideration of gradient accumulation, data parallelism |
+------------------------------------------+--------------+---------------------------------------------------------------------------------------+
| :code:`tensor_model_parallel_size` | int | intra-layer model parallelism |
+------------------------------------------+--------------+---------------------------------------------------------------------------------------+
| :code:`pipeline_model_parallel_size` | int | inter-layer model parallelism |
+------------------------------------------+--------------+---------------------------------------------------------------------------------------+
| :code:`seed` | int | seed used in training |
+------------------------------------------+--------------+---------------------------------------------------------------------------------------+

NeVA
~~~~~~~~
Expand Down
6 changes: 3 additions & 3 deletions docs/source/multimodal/mllm/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ NeMo Multimodal currently supports the following models:
+-----------------------------------+----------+-------------+------+-------------------------+------------------+
| Model | Training | Fine-Tuning | PEFT | Evaluation | Inference |
+===================================+==========+=============+======+=========================+==================+
| `NeVA (LLaVA) <./neva.html>`_ | | | - | - | |
| `NeVA (LLaVA) <./neva.html>`_ | Yes | Yes | - | - | Yes |
+-----------------------------------+----------+-------------+------+-------------------------+------------------+
| Kosmos-2 | WIP | WIP | - | - | WIP |
+-----------------------------------+----------+-------------+------+-------------------------+------------------+
Expand Down Expand Up @@ -47,7 +47,7 @@ Flamingo :cite:`mm-models-flamingo` addresses inconsistent visual feature map si
- Dataset: Utilizes data from various datasets like M3W, ALIGN, LTIP, and VTP emphasizing multimodal in-context learning.

Kosmos-1: Language Is Not All You Need: Aligning Perception with Language Models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Kosmos-1 :cite:`mm-models-kosmos1` by Microsoft is a Multimodal Large Language Model (MLLM) aimed at melding language, perception, action, and world modeling.

Expand Down Expand Up @@ -108,4 +108,4 @@ References
:style: plain
:filter: docname in docnames
:labelprefix: MM-MODELS
:keyprefix: mm-models-
:keyprefix: mm-models-
11 changes: 6 additions & 5 deletions docs/source/multimodal/mllm/neva.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ Building upon LLaVA's foundational principles, NeVA amplifies its training effic


Main Language Model
^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^

The original LLaVA model incorporates the LLaMA architecture, renowned for its prowess in open-source, language-only instruction-tuning endeavors. LLaMA refines textual input through a process of tokenization and embedding. To these token embeddings, positional embeddings are integrated, and the combined representation is channeled through multiple transformer layers. The output from the concluding transformer layer, associated with the primary token, is designated as the text representation.

In NeMo, the text encoder is anchored in the :class:`~nemo.collections.nlp.models.language_modeling.megatron_gpt_model.MegatronGPTModel` class. This class is versatile, supporting not only NVGPT models but also LLaMA, LLaMA-2 and other community models, complete with a checkpoint conversion script. Concurrently, the vision model and projection layers enhance the primary language model's word embedding component. For a comprehensive understanding of the implementation, one can refer to the :class:`~nemo.collections.multimodal.models.multimodal_llm.neva.neva_model.MegatronNevaModel` class.


Vision Model
^^^^^^^^^^
^^^^^^^^^^^

For visual interpretation, NeVA harnesses the power of the pre-trained CLIP visual encoder, ViT-L/14, recognized for its visual comprehension acumen. Images are first partitioned into standardized patches, for instance, 16x16 pixels. These patches are linearly embedded, forming a flattened vector that subsequently feeds into the transformer. The culmination of the transformer's processing is a unified image representation. In the NeMo framework, the NeVA vision model, anchored on the CLIP visual encoder ViT-L/14, can either be instantiated via the :class:`~nemo.collections.multimodal.models.multimodal_llm.clip.megatron_clip_models.CLIPVisionTransformer` class or initiated through the `transformers` package from Hugging Face.

Expand All @@ -44,7 +44,7 @@ Architecture Table
+------------------+---------------+------------+--------------------+-----------------+------------+----------------+--------------------------+

Model Configuration
------------------
-------------------

Multimodal Configuration
^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -140,7 +140,8 @@ Optimizations
| BF16 O2 | Enables O2-level automatic mixed precision, optimizing Bfloat16 precision for better performance. | ``model.megatron_amp_O2=True`` |
+------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Flash Attention V2 | FlashAttention is a fast and memory-efficient algorithm to compute exact attention. It speeds up model training and reduces memory requirement by being IO-aware. This approach is particularly useful for large-scale models and is detailed further in the repository linked. [Reference](https://github.com/Dao-AILab/flash-attention) | ``model.use_flash_attention=True`` |
+----------------------------------- +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


NeVA Training
--------------
Expand All @@ -157,4 +158,4 @@ References
:style: plain
:filter: docname in docnames
:labelprefix: MM-MODELS
:keyprefix: mm-models-
:keyprefix: mm-models-
8 changes: 4 additions & 4 deletions docs/source/multimodal/text2img/checkpoint.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Refer to the following sections for instructions and examples for each.
Note that these instructions are for loading fully trained checkpoints for evaluation or fine-tuning.

Loading ``.nemo`` Checkpoints
-------------------------
-----------------------------

NeMo automatically saves checkpoints of a model that is trained in a ``.nemo`` format. Alternatively, to manually save the model at any
point, issue :code:`model.save_to(<checkpoint_path>.nemo)`.
Expand All @@ -27,7 +27,7 @@ If there is a local ``.nemo`` checkpoint that you'd like to load, use the :code:
Where the model base class is the MM model class of the original checkpoint.

Converting Intermediate Checkpoints
---------------------------
-----------------------------------
To evaluate a partially trained checkpoint, you may need to convert it to ``.nemo`` format.
`script to convert the checkpoint <ADD convert_ckpt_to_nemo.py PATH>`.

Expand All @@ -43,7 +43,7 @@ To evaluate a partially trained checkpoint, you may need to convert it to ``.nem
Converting HuggingFace Checkpoints
---------------------------------
----------------------------------

To fully utilize the optimized training pipeline and framework/TRT inference pipeline
of NeMo, we provide scripts to convert popular checkpoints on HuggingFace into NeMo format.
Expand Down Expand Up @@ -77,4 +77,4 @@ Imagen

We will provide conversion script if Imagen research team releases their checkpoint
in the future. Conversion script for DeepFloyd IF models will be provided in the
next release.
next release.
4 changes: 2 additions & 2 deletions docs/source/multimodal/text2img/configs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ for all possible arguments


Experiment Manager Configurations
---------------------------
---------------------------------

NeMo Experiment Manager provides convenient way to configure logging, saving, resuming options and more.

Expand Down Expand Up @@ -145,7 +145,7 @@ By default we use ``fused_adam`` as the optimizer, refer to NeMo user guide for
Learning rate scheduler can be specified in ``optim.sched`` section.

Model Architecture Configurations
------------------------
---------------------------------

Each configuration file should describe the model architecture being used for the experiment.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/multimodal/text2img/controlnet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ NeMo Multimodal provides a training pipeline and example implementation for gene


ControlNet Dataset
____________________
^^^^^^^^^^^^^^^^^^^^

ControlNet employs the WebDataset format for data ingestion. (See :doc:`Datasets<./datasets>`) Beyond the essential image-text pairs saved in tarfiles with matching names but distinct extensions (like 000001.jpg and 000001.txt), ControlNet also requires control input within the tarfiles, identifiable by their specific extension. By default, the control input should be stored as 000001.png for correct loading and identification in NeMo's implementation.

Expand Down Expand Up @@ -103,4 +103,4 @@ Reference
:style: plain
:filter: docname in docnames
:labelprefix: MM-MODELS
:keyprefix: mm-models-
:keyprefix: mm-models-
4 changes: 2 additions & 2 deletions docs/source/multimodal/text2img/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Datasets
========

Data pipeline overview
-----------------
----------------------

.. note:: It is the responsibility of each user to check the content of the dataset, review the applicable licenses, and determine if it is suitable for their intended use. Users should review any applicable links associated with the dataset before placing the data on their machine.

Expand Down Expand Up @@ -34,7 +34,7 @@ Instruction for configuring each sub-stage is provided as a comment next to each


Examples of Preparing a Dataset for Training Text2Img Model
-----------------------
-----------------------------------------------------------

Refer to the `Dataset Tutorial <http://TODOURL>`_` for details on how to prepare the training dataset for Training Text2Img models.

Loading

0 comments on commit 5f95f50

Please sign in to comment.