Skip to content

Commit

Permalink
Update readme (NVIDIA#8440)
Browse files Browse the repository at this point in the history
* update

Signed-off-by: eharper <[email protected]>

* udpate

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* landing pages added

* landing page added for vision

* landing pages updated

* some minor changes to the main readme

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* update

Signed-off-by: eharper <[email protected]>

* typo fixed

* update

Signed-off-by: eharper <[email protected]>

---------

Signed-off-by: eharper <[email protected]>
Co-authored-by: ntajbakhsh <[email protected]>
  • Loading branch information
2 people authored and HuiyingLi committed Feb 21, 2024
1 parent b32090a commit 1effe30
Show file tree
Hide file tree
Showing 12 changed files with 196 additions and 147 deletions.
132 changes: 47 additions & 85 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@

.. _main-readme:

**NVIDIA NeMo**
**NVIDIA NeMo Framework**
===============

Latest News
Expand All @@ -57,92 +57,66 @@ such as FSDP, Mixture-of-Experts, and RLHF with TensorRT-LLM to provide speedups
Introduction
------------

NVIDIA NeMo is a conversational AI toolkit built for researchers working on automatic speech recognition (ASR),
text-to-speech synthesis (TTS), large language models (LLMs), and
natural language processing (NLP).
The primary objective of NeMo is to help researchers from industry and academia to reuse prior work (code and pretrained models)
and make it easier to create new `conversational AI models <https://developer.nvidia.com/conversational-ai#started>`_.
NVIDIA NeMo Framework is a generative AI framework built for researchers and pytorch developers
working on large language models (LLMs), multimodal models (MM), automatic speech recognition (ASR),
and text-to-speech synthesis (TTS).
The primary objective of NeMo is to provide a scalable framework for researchers and developers from industry and academia
to more easily implement and design new generative AI models by being able to leverage existing code and pretrained models.

For technical documentation, please see the `NeMo Framework User Guide <https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html>`_.

All NeMo models are trained with `Lightning <https://github.com/Lightning-AI/lightning>`_ and
training is automatically scalable to 1000s of GPUs.
Additionally, NeMo Megatron LLM models can be trained up to 1 trillion parameters using tensor and pipeline model parallelism.
NeMo models can be optimized for inference and deployed for production use-cases with `NVIDIA Riva <https://developer.nvidia.com/riva>`_.

When applicable, NeMo models take advantage of the latest possible distributed training techniques,
including parallelism strategies such as

* data parallelism
* tensor parallelism
* pipeline model parallelism
* fully sharded data parallelism (FSDP)
* sequence parallelism
* context parallelism
* mixture-of-experts (MoE)

and mixed precision training recipes with bfloat16 and FP8 training.

NeMo's Transformer based LLM and Multimodal models leverage `NVIDIA Transformer Engine <https://github.com/NVIDIA/TransformerEngine>`_ for FP8 training on NVIDIA Hopper GPUs
and leverages `NVIDIA Megatron Core <https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core>`_ for scaling transformer model training.

NeMo LLMs can be aligned with state of the art methods such as SteerLM, DPO and Reinforcement Learning from Human Feedback (RLHF),
see `NVIDIA NeMo Aligner <https://github.com/NVIDIA/NeMo-Aligner>`_ for more details.

NeMo LLM and Multimodal models can be deployed and optimized with `NVIDIA Inference Microservices (Early Access) <https://developer.nvidia.com/nemo-microservices-early-access>`_.

NeMo ASR and TTS models can be optimized for inference and deployed for production use-cases with `NVIDIA Riva <https://developer.nvidia.com/riva>`_.

For scaling NeMo LLM and Multimodal training on Slurm clusters or public clouds, please see the `NVIDIA Framework Launcher <https://github.com/NVIDIA/NeMo-Megatron-Launcher>`_.
The NeMo Framework launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and Multimodal models and also has an `Autoconfigurator <https://github.com/NVIDIA/NeMo-Megatron-Launcher#53-using-autoconfigurator-to-find-the-optimal-configuration>`_
which can be used to find the optimal model parallel configuration for training on a specific cluster.
To get started quickly with the NeMo Framework Launcher, please see the `NeMo Framework Playbooks <https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html>`_
The NeMo Framework Launcher does not currently support ASR and TTS training but will soon.

Getting started with NeMo is simple.
State of the Art pretrained NeMo models are freely available on `HuggingFace Hub <https://huggingface.co/models?library=nemo&sort=downloads&search=nvidia>`_ and
`NVIDIA NGC <https://catalog.ngc.nvidia.com/models?query=nemo&orderBy=weightPopularDESC>`_.
These models can be used to transcribe audio, synthesize speech, or translate text in just a few lines of code.
These models can be used to generate text or images, transcribe audio, and synthesize speech in just a few lines of code.

We have extensive `tutorials <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/starthere/tutorials.html>`_ that
can be run on `Google Colab <https://colab.research.google.com>`_.
can be run on `Google Colab <https://colab.research.google.com>`_ or with our `NGC NeMo Framework Container. <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo>`_
and we have `playbooks <https://docs.nvidia.com/nemo-framework/user-guide/latest/playbooks/index.html>`_ for users that want to train NeMo models with the NeMo Framework Launcher.

For advanced users that want to train NeMo models from scratch or finetune existing NeMo models
we have a full suite of `example scripts <https://github.com/NVIDIA/NeMo/tree/main/examples>`_ that support multi-GPU/multi-node training.

For scaling NeMo LLM training on Slurm clusters or public clouds, please see the `NVIDIA NeMo Megatron Launcher <https://github.com/NVIDIA/NeMo-Megatron-Launcher>`_.
The NM launcher has extensive recipes, scripts, utilities, and documentation for training NeMo LLMs and also has an `Autoconfigurator <https://github.com/NVIDIA/NeMo-Megatron-Launcher#53-using-autoconfigurator-to-find-the-optimal-configuration>`_
which can be used to find the optimal model parallel configuration for training on a specific cluster.

Key Features
------------

* Speech processing
* `HuggingFace Space for Audio Transcription (File, Microphone and YouTube) <https://huggingface.co/spaces/smajumdar/nemo_multilingual_language_id>`_
* `Pretrained models <https://ngc.nvidia.com/catalog/collections/nvidia:nemo_asr>`_ available in 14+ languages
* `Automatic Speech Recognition (ASR) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/intro.html>`_
* Supported ASR `models <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html>`_:
* Jasper, QuartzNet, CitriNet, ContextNet
* Conformer-CTC, Conformer-Transducer, FastConformer-CTC, FastConformer-Transducer
* Squeezeformer-CTC and Squeezeformer-Transducer
* LSTM-Transducer (RNNT) and LSTM-CTC
* Supports the following decoders/losses:
* CTC
* Transducer/RNNT
* Hybrid Transducer/CTC
* NeMo Original `Multi-blank Transducers <https://arxiv.org/abs/2211.03541>`_ and `Token-and-Duration Transducers (TDT) <https://arxiv.org/abs/2304.06795>`_
* Streaming/Buffered ASR (CTC/Transducer) - `Chunked Inference Examples <https://github.com/NVIDIA/NeMo/tree/stable/examples/asr/asr_chunked_inference>`_
* `Cache-aware Streaming Conformer <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/models.html#cache-aware-streaming-conformer>`_ with multiple lookaheads (including microphone streaming `tutorial <https://github.com/NVIDIA/NeMo/blob/main/tutorials/asr/Online_ASR_Microphone_Demo_Cache_Aware_Streaming.ipynb>`_).
* Beam Search decoding
* `Language Modelling for ASR (CTC and RNNT) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html>`_: N-gram LM in fusion with Beam Search decoding, Neural Rescoring with Transformer
* `Support of long audios for Conformer with memory efficient local attention <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/results.html#inference-on-long-audio>`_
* `Speech Classification, Speech Command Recognition and Language Identification <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_classification/intro.html>`_: MatchboxNet (Command Recognition), AmberNet (LangID)
* `Voice activity Detection (VAD) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speech_classification/models.html#marblenet-vad>`_: MarbleNet
* ASR with VAD Inference - `Example <https://github.com/NVIDIA/NeMo/tree/stable/examples/asr/asr_vad>`_
* `Speaker Recognition <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speaker_recognition/intro.html>`_: TitaNet, ECAPA_TDNN, SpeakerNet
* `Speaker Diarization <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speaker_diarization/intro.html>`_
* Clustering Diarizer: TitaNet, ECAPA_TDNN, SpeakerNet
* Neural Diarizer: MSDD (Multi-scale Diarization Decoder)
* `Speech Intent Detection and Slot Filling <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/speech_intent_slot/intro.html>`_: Conformer-Transformer
* Natural Language Processing
* `NeMo Megatron pre-training of Large Language Models <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html>`_
* `Neural Machine Translation (NMT) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/machine_translation/machine_translation.html>`_
* `Punctuation and Capitalization <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/punctuation_and_capitalization.html>`_
* `Token classification (named entity recognition) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/token_classification.html>`_
* `Text classification <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/text_classification.html>`_
* `Joint Intent and Slot Classification <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/joint_intent_slot.html>`_
* `Question answering <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/question_answering.html>`_
* `GLUE benchmark <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/glue_benchmark.html>`_
* `Information retrieval <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/information_retrieval.html>`_
* `Entity Linking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/entity_linking.html>`_
* `Dialogue State Tracking <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/dialogue.html>`_
* `Prompt Learning <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/nemo_megatron/prompt_learning.html>`_
* `NGC collection of pre-trained NLP models. <https://ngc.nvidia.com/catalog/collections/nvidia:nemo_nlp>`_
* `Synthetic Tabular Data Generation <https://developer.nvidia.com/blog/generating-synthetic-data-with-transformers-a-solution-for-enterprise-data-challenges/>`_
* Text-to-Speech Synthesis (TTS):
* `Documentation <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tts/intro.html#>`_
* Mel-Spectrogram generators: FastPitch, SSL FastPitch, Mixer-TTS/Mixer-TTS-X, RAD-TTS, Tacotron2
* Vocoders: HiFiGAN, UnivNet, WaveGlow
* End-to-End Models: VITS
* `Pre-trained Model Checkpoints in NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com/catalog/collections/nvidia:nemo_tts>`_
* `Tools <https://github.com/NVIDIA/NeMo/tree/stable/tools>`_
* `Text Processing (text normalization and inverse text normalization) <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/text_normalization/intro.html>`_
* `NeMo Forced Aligner <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tools/nemo_forced_aligner.html>`_
* `CTC-Segmentation tool <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tools/ctc_segmentation.html>`_
* `Speech Data Explorer <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/tools/speech_data_explorer.html>`_: a dash-based tool for interactive exploration of ASR/TTS datasets
* `Speech Data Processor <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/tools/speech_data_processor.html>`_


Built for speed, NeMo can utilize NVIDIA's Tensor Cores and scale out training to multiple GPUs and multiple nodes.
* `Large Language Models <nemo/collections/nlp/README.md>`_
* `Multimodal <nemo/collections/multimodal/README.md>`_
* `Automatic Speech Recognition <nemo/collections/asr/README.md>`_
* `Text to Speech <nemo/collections/tts/README.md>`_
* `Computer Vision <nemo/collections/vision/README.md>`_

Requirements
------------
Expand All @@ -151,8 +125,8 @@ Requirements
2) Pytorch 1.13.1 or above
3) NVIDIA GPU, if you intend to do model training

Documentation
-------------
Developer Documentation
-----------------------

.. |main| image:: https://readthedocs.com/projects/nvidia-nemo/badge/?version=main
:alt: Documentation Status
Expand All @@ -172,18 +146,6 @@ Documentation
| Stable | |stable| | `Documentation of the stable (i.e. most recent release) branch. <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/>`_ |
+---------+-------------+------------------------------------------------------------------------------------------------------------------------------------------+

Tutorials
---------
A great way to start with NeMo is by checking `one of our tutorials <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/starthere/tutorials.html>`_.

You can also get a high-level overview of NeMo by watching the talk *NVIDIA NeMo: Toolkit for Conversational AI*, presented at PyData Yerevan 2022:

|pydata|

.. |pydata| image:: https://img.youtube.com/vi/J-P6Sczmas8/maxres3.jpg
:target: https://www.youtube.com/embed/J-P6Sczmas8?mute=0&start=14&autoplay=0
:width: 600
:alt: NeMo presentation at PyData@Yerevan 2022

Getting help with NeMo
----------------------
Expand Down
86 changes: 43 additions & 43 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
NVIDIA NeMo User Guide
======================
NVIDIA NeMo Framework Developer Docs
====================================

.. toctree::
:maxdepth: 2
Expand All @@ -12,18 +12,28 @@ NVIDIA NeMo User Guide
starthere/migration-guide

.. toctree::
:maxdepth: 2
:caption: NeMo Core
:name: core
:maxdepth: 3
:caption: Multimodal (MM)
:name: Multimodal

core/core
core/exp_manager
core/neural_types
core/export
core/adapters/intro
core/api
multimodal/mllm/intro
multimodal/vlm/intro
multimodal/text2img/intro
multimodal/nerf/intro
multimodal/api


.. toctree::
:maxdepth: 3
:caption: Large Language Models (LLMs)
:name: Large Language Models

nlp/nemo_megatron/intro
nlp/models
nlp/machine_translation/machine_translation
nlp/megatron_onnx_export
nlp/api

.. toctree::
:maxdepth: 2
:caption: Speech Processing
Expand All @@ -36,26 +46,33 @@ NVIDIA NeMo User Guide
asr/ssl/intro
asr/speech_intent_slot/intro

.. toctree::
:maxdepth: 3
:caption: Natural Language Processing
:name: Natural Language Processing

nlp/nemo_megatron/intro
nlp/machine_translation/machine_translation
nlp/text_normalization/intro
nlp/api
nlp/megatron_onnx_export
nlp/models


.. toctree::
:maxdepth: 1
:caption: Text To Speech (TTS)
:name: Text To Speech

tts/intro

.. toctree::
:maxdepth: 2
:caption: Vision
:name: vision

vision/intro


.. toctree::
:maxdepth: 2
:caption: NeMo Core
:name: core

core/core
core/exp_manager
core/neural_types
core/export
core/adapters/intro
core/api

.. toctree::
:maxdepth: 2
:caption: Common
Expand All @@ -71,27 +88,10 @@ NVIDIA NeMo User Guide
text_processing/g2p/g2p
common/intro

.. toctree::
:maxdepth: 3
:caption: Multimodal (MM)
:name: Multimodal

multimodal/mllm/intro
multimodal/vlm/intro
multimodal/text2img/intro
multimodal/nerf/intro
multimodal/api

.. toctree::
:maxdepth: 2
:caption: Vision
:name: vision

vision/intro

.. toctree::
:maxdepth: 3
:caption: Tools
:name: Tools
:caption: Speech Tools
:name: Speech Tools

tools/intro
2 changes: 1 addition & 1 deletion docs/source/multimodal/api.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
NeMo Megatron API
Multimodal API
=======================

Model Classes
Expand Down
4 changes: 2 additions & 2 deletions docs/source/nlp/api.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
NeMo Megatron API
=======================
Large language Model API
========================

Pretraining Model Classes
-------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/nlp/information_retrieval.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The model architecture and pre-training process are detailed in the `Sentence-BE
Sentence-BERT utilizes a BERT-based architecture, but it is trained using a siamese and triplet network structure to derive fixed-sized sentence embeddings that capture semantic information.
Sentence-BERT is commonly used to generate high-quality sentence embeddings for various downstream natural language processing tasks, such as semantic textual similarity, clustering, and information retrieval

Data Input for the Senntence-BERT model
Data Input for the Sentence-BERT model
---------------------------------------

The fine-tuning data for the Sentence-BERT (SBERT) model should consist of data instances,
Expand Down
12 changes: 3 additions & 9 deletions docs/source/nlp/nemo_megatron/intro.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,14 @@
NeMo Megatron
=============
Large Language Models
=====================

Megatron :cite:`nlp-megatron-shoeybi2019megatron` is a large, powerful transformer developed by the Applied Deep Learning Research
team at NVIDIA. NeMo Megatron supports several types of models:
To learn more about using NeMo to train Large Language Models at scale, please refer to the `NeMo Framework User Guide! <https://docs.nvidia.com/nemo-framework/user-guide/latest/index.html>`_.

* GPT-style models (decoder only)
* T5/BART/UL2-style models (encoder-decoder)
* BERT-style models (encoder only)
* RETRO model (decoder only)



.. note::
NeMo Megatron has an Enterprise edition which contains tools for data preprocessing, hyperparameter tuning, container, scripts for various clouds and more. With Enterprise edition you also get deployment tools. Apply for `early access here <https://developer.nvidia.com/nemo-megatron-early-access>`_ .


.. toctree::
:maxdepth: 1

Expand Down
Loading

0 comments on commit 1effe30

Please sign in to comment.