[TTS] Fix audio codec type checks #7373

rlangman · 2023-09-05T23:11:46Z

What does this PR do ?

Most classes in the audio codec code had input_type() and output_type() parameters but were missing the typecheck() annotation on the forward() method to enforce them.

This commit adds the typecheck() to all relevant forward() methods and fixes their corresponding type annotations.

Also has a small change to add the vector quantizer parameters to the optimizer in the model (though the existing RVQ has no trainable parameters).

Collection: [TTS]

Changelog

Add typecheck to forward methods
Fix various typos in input/output types

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

nithinraok

LGTM

nithinraok · 2023-09-06T17:59:07Z

Are they no test cases for Codecs yet?

rlangman · 2023-09-06T18:18:11Z

Are they no test cases for Codecs yet?

There are only a few. We haven't added CI tests which run the recipe itself, and I could not think of many tests that were not effectively just asserting what the neural typechecking is supposed to validate automatically at runtime in terms of types/dimensions of each module (though clearly it wasn't being validated).

titu1994 · 2023-09-06T18:32:17Z

Type checks work if the entire pipeline of types is setup - including data loader and loss.

anteju

LGTM

Signed-off-by: Ryan <[email protected]>

* [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]>

* [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]>

* Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix typos (#7581) Signed-off-by: Sasha Meister <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added per tests Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * rate_punctuation.py Fixed output manifest saving Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Fix tests Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Function name fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Moving PER to speech_to_text_eval.py Added: - "use_per": PER metric computing; - "scores_per_sample": metrics computation sample by sample for wer/cer/punctuation rates; - "output_with_scores_filename": saving manifest with metrics Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update test_metrics.py Updated "punctuation_error_rate" function name Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * guard extra dependencies Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Write metrics to "output_filename" if "scores_per_sample=True" Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * scores_per_sample description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import guards Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Stats printing when HAVE_TABLUATE_AND_PANDAS=False Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Delete examples/asr/rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * metric and variables name fixing Signed-off-by: Sasha Meister <[email protected]> * Add else samples = None Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signe…

* Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix typos (#7581) Signed-off-by: Sasha Meister <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added per tests Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * rate_punctuation.py Fixed output manifest saving Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Fix tests Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Function name fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Moving PER to speech_to_text_eval.py Added: - "use_per": PER metric computing; - "scores_per_sample": metrics computation sample by sample for wer/cer/punctuation rates; - "output_with_scores_filename": saving manifest with metrics Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update test_metrics.py Updated "punctuation_error_rate" function name Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * guard extra dependencies Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Write metrics to "output_filename" if "scores_per_sample=True" Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * scores_per_sample description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import guards Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Stats printing when HAVE_TABLUATE_AND_PANDAS=False Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Delete examples/asr/rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * metric and variables name fixing Signed-off-by: Sasha Meister <[email protected]> * Add else samples = None Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <slym@n…

* [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]>

* Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix typos (#7581) Signed-off-by: Sasha Meister <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added per tests Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * rate_punctuation.py Fixed output manifest saving Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Fix tests Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Function name fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Moving PER to speech_to_text_eval.py Added: - "use_per": PER metric computing; - "scores_per_sample": metrics computation sample by sample for wer/cer/punctuation rates; - "output_with_scores_filename": saving manifest with metrics Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update test_metrics.py Updated "punctuation_error_rate" function name Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * guard extra dependencies Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Write metrics to "output_filename" if "scores_per_sample=True" Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * scores_per_sample description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import guards Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Stats printing when HAVE_TABLUATE_AND_PANDAS=False Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Delete examples/asr/rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * metric and variables name fixing Signed-off-by: Sasha Meister <[email protected]> * Add else samples = None Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <slym@…

* Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix typos (#7581) Signed-off-by: Sasha Meister <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added per tests Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * rate_punctuation.py Fixed output manifest saving Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Fix tests Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Function name fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Moving PER to speech_to_text_eval.py Added: - "use_per": PER metric computing; - "scores_per_sample": metrics computation sample by sample for wer/cer/punctuation rates; - "output_with_scores_filename": saving manifest with metrics Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update test_metrics.py Updated "punctuation_error_rate" function name Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * guard extra dependencies Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Write metrics to "output_filename" if "scores_per_sample=True" Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * scores_per_sample description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import guards Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Stats printing when HAVE_TABLUATE_AND_PANDAS=False Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Delete examples/asr/rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * metric and variables name fixing Signed-off-by: Sasha Meister <[email protected]> * Add else samples = None Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signe…

* [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * move install info to INSTALLATION.md Signed-off-by: Elena Rastorgueva <[email protected]> * tidy up links Signed-off-by: Elena Rastorgueva <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typos (#7581) Signed-off-by: Elena Rastorgueva <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * add outline of asr quickstart info to asr/intro.rst Signed-off-by: Elena Rastorgueva <[email protected]> * add CLI, LM and real-time transcription sections Signed-off-by: Elena Rastorgueva <[email protected]> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-…

* [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]>

* Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-comm…

* Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-of…

* [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithi…

…IA#7970) * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-of…

* [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithi…

* Hotfix (#7501) (#7568) Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: jbaczek <[email protected]> * Avoid duplicated checkpoint save (#7555) (#7566) Signed-off-by: Mikołaj Błaż <[email protected]> Co-authored-by: mikolajblaz <[email protected]> * Cache FP8 weight and transpose only at the first micro-batch in each validation and test routine (#7470) (#7483) * Cache weight and transpose only in the first batch in all training, val, and test runs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add an option to disable manual GC in validation (#7467) (#7476) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> * Remove PUBLICATIONS.md, point to github.io NeMo page instead (#7694) (#7695) * update publications section to point to blog website page * add hyphen * use double backquotes for code formatting --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Fix multi rank finetune for ASR (#7684) (#7699) * Fix multi rank finetune for ASR * Actually add time * Actually add time --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * Update docs: readme, getting started, ASR intro (#7679) * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * move install info to INSTALLATION.md Signed-off-by: Elena Rastorgueva <[email protected]> * tidy up links Signed-off-by: Elena Rastorgueva <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typos (#7581) Signed-off-by: Elena Rastorgueva <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * add outline of asr quickstart info to asr/intro.rst Signed-off-by: Elena Rastorgueva <[email protected]> * add CLI, LM and real-time transcription sections Signed-off-by: Elena Rastorgueva <[email protected]> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * […

) * Hotfix (#7501) (#7568) Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: jbaczek <[email protected]> * Avoid duplicated checkpoint save (#7555) (#7566) Signed-off-by: Mikołaj Błaż <[email protected]> Co-authored-by: mikolajblaz <[email protected]> * Cache FP8 weight and transpose only at the first micro-batch in each validation and test routine (#7470) (#7483) * Cache weight and transpose only in the first batch in all training, val, and test runs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add an option to disable manual GC in validation (#7467) (#7476) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> * Remove PUBLICATIONS.md, point to github.io NeMo page instead (#7694) (#7695) * update publications section to point to blog website page * add hyphen * use double backquotes for code formatting --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Fix multi rank finetune for ASR (#7684) (#7699) * Fix multi rank finetune for ASR * Actually add time * Actually add time --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * Update docs: readme, getting started, ASR intro (#7679) * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * move install info to INSTALLATION.md Signed-off-by: Elena Rastorgueva <[email protected]> * tidy up links Signed-off-by: Elena Rastorgueva <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typos (#7581) Signed-off-by: Elena Rastorgueva <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * add outline of asr quickstart info to asr/intro.rst Signed-off-by: Elena Rastorgueva <[email protected]> * add CLI, LM and real-time transcription sections Signed-off-by: Elena Rastorgueva <[email protected]> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-… * fix onnx (#7703) (#7704) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * move core install to /workspace (#7706) Signed-off-by: Abhinav Khattar <[email protected]> * Fix typo in audio codec config, encoder target (#7697) Signed-off-by: Ante Jukić <[email protected]> * Replace strategy='dp'/None with 'auto' (#7681) (#7696) * Add strategy=auto for None and dp * Change strategy from None to auto --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [ASR] Multichannel mask estimator with flex number of channels (#7317) * Adding a mask estimator which can …

* [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithi…

* Hotfix (#7501) (#7568) Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: jbaczek <[email protected]> * Avoid duplicated checkpoint save (#7555) (#7566) Signed-off-by: Mikołaj Błaż <[email protected]> Co-authored-by: mikolajblaz <[email protected]> * Cache FP8 weight and transpose only at the first micro-batch in each validation and test routine (#7470) (#7483) * Cache weight and transpose only in the first batch in all training, val, and test runs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add an option to disable manual GC in validation (#7467) (#7476) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> * Remove PUBLICATIONS.md, point to github.io NeMo page instead (#7694) (#7695) * update publications section to point to blog website page * add hyphen * use double backquotes for code formatting --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Fix multi rank finetune for ASR (#7684) (#7699) * Fix multi rank finetune for ASR * Actually add time * Actually add time --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * Update docs: readme, getting started, ASR intro (#7679) * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * move install info to INSTALLATION.md Signed-off-by: Elena Rastorgueva <[email protected]> * tidy up links Signed-off-by: Elena Rastorgueva <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typos (#7581) Signed-off-by: Elena Rastorgueva <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * add outline of asr quickstart info to asr/intro.rst Signed-off-by: Elena Rastorgueva <[email protected]> * add CLI, LM and real-time transcription sections Signed-off-by: Elena Rastorgueva <[email protected]> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * […

* Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signe…

…IA#7970) * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-of…

* [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithi…

) * Hotfix (#7501) (#7568) Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: jbaczek <[email protected]> * Avoid duplicated checkpoint save (#7555) (#7566) Signed-off-by: Mikołaj Błaż <[email protected]> Co-authored-by: mikolajblaz <[email protected]> * Cache FP8 weight and transpose only at the first micro-batch in each validation and test routine (#7470) (#7483) * Cache weight and transpose only in the first batch in all training, val, and test runs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add an option to disable manual GC in validation (#7467) (#7476) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> * Remove PUBLICATIONS.md, point to github.io NeMo page instead (#7694) (#7695) * update publications section to point to blog website page * add hyphen * use double backquotes for code formatting --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Fix multi rank finetune for ASR (#7684) (#7699) * Fix multi rank finetune for ASR * Actually add time * Actually add time --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * Update docs: readme, getting started, ASR intro (#7679) * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * move install info to INSTALLATION.md Signed-off-by: Elena Rastorgueva <[email protected]> * tidy up links Signed-off-by: Elena Rastorgueva <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typos (#7581) Signed-off-by: Elena Rastorgueva <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * add outline of asr quickstart info to asr/intro.rst Signed-off-by: Elena Rastorgueva <[email protected]> * add CLI, LM and real-time transcription sections Signed-off-by: Elena Rastorgueva <[email protected]> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * […

* add pleasefixme marker for potential failed nightly tests. (#7678) Signed-off-by: Xuesong Yang <[email protected]> * Add new text segmentation library for better TTS quality (#7645) * Add new text segmentation library for better TTS quality * Update zh_cn_pinyin.py added detailed instruction on how to install pkuseg. Signed-off-by: Xuesong Yang <[email protected]> * Update requirements_tts.txt remove pkuseg as the default dependency of NeMo TTS, and instead, direct users to manually install pkuseg if they really need. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer (#7767) (#7774) * Create PrecisionPlugin for megatron_ckpt_to_nemo.py trainer * Add ddp_find_unused_parameters_true for punctuation_capitalization_train_evaluate.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add '32-true' for precision values --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix(clustering_diarizer.py): fix typo (#7772) Signed-off-by: Jean-Louis Queguiner <[email protected]> * fix(diarization-README): typo (#7771) Signed-off-by: Jean-Louis Queguiner <[email protected]> * Fix bug wrt change decoding strategy for bpe models (#7762) (#7764) * Fix bug wrt change decoding strategy for bpe models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Remove incorrect extra argument for load_from_checkpoint_dir() (#7500) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add nemo to mcore GPT conversion script (#7730) * add conversion script Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove references to 'ckpt' Signed-off-by: Chen Cui <[email protected]> * add one more sanity check to make sure there is no unexpected keys in state dict Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make cpu loading work Signed-off-by: Chen Cui <[email protected]> * make script work for llama2 models Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address code check Signed-off-by: Chen Cui <[email protected]> * remove trainer precision (was for old sanity check) Signed-off-by: Chen Cui <[email protected]> * fix script for llama2 model Signed-off-by: Chen Cui <[email protected]> * remove commented code Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785) Signed-off-by: anferico <[email protected]> * Add some docs and update scripts for ASR (#7790) * Add some docs and update scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * set context for text memmap to fork (#7784) * set context for text memmap to fork Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add training with multiple audios Signed-off-by: stevehuang52 <[email protected]> * Support flash decoding (#7744) * Add flash-decoding Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761) * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747) * Change accelerator to auto Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in nlp_checkpoint_port.py Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in export.py Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Abhishree <[email protected]> * docs: fix typos (#7758) Signed-off-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Abhishree <[email protected]> * Snake act (#7736) Signed-off-by: Abhishree <[email protected]> * Update gpt_dataset.py (#6963) Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: shuoer86 <[email protected]> Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788) * add selection criteria for reference audios Signed-off-by: anferico <[email protected]> * Update configuration files Signed-off-by: anferico <[email protected]> * add informative comment in config files Signed-off-by: anferico <[email protected]> * sample random index for reference audio selection Signed-off-by: anferico <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: anferico <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update text server to support compute logprobs (#7733) * update text server to support compute logprobs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo --------- Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add multi-layer feat extract and fix random question insertion Signed-off-by: stevehuang52 <[email protected]> * Configure MCore logger (#7781) Signed-off-by: Mikołaj Błaż <[email protected]> * Revert "PEFT eval fix (#7626) (#7638)" (#7693) This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9. * remove TN from ctc_segm tut (#7807) Signed-off-by: Evelina <[email protected]> * [TTS] Support audio offsets in TTS data loaders (#7156) * [TTS] Support audio offsets in TTS data loaders Signed-off-by: Ryan <[email protected]> * [TTS] Change docstring mentions of .pt to .npy Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Update Apex install command in Dockerfile (#7794) (#7804) * move core install to /workspace (#7706) * update apex install in dockerfile * use fetch head --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Nemo to HF converter for LLaMA model (#7770) * Create config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Add files via upload Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * clean up trainer * remove dependency on yaml config. load config from nemo file instead. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable ckpt saving into other precision formats * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support 70b + cleanup qkv slice logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug * move hf model folder code from comment to function and add instruction to run * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Utkarsh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Save best NeMo model only when necessary (#7836) Signed-off-by: Ante Jukić <[email protected]> * add guard if its a distributed checkpoint (#7845) Signed-off-by: Gerald Shen <[email protected]> * Fix tn duplex (#7808) * fix duplex tn infer Signed-off-by: Evelina <[email protected]> * fix typo Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix TN docs Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update transformers cache on Jenkins (#7854) * update transformers cache Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * add cd Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> * Update README.rst for container update (#7844) Signed-off-by: fayejf <[email protected]> * Add support for finetuning with huggingface datasets (#7834) * add finetune with huggingface dataset Signed-off-by: stevehuang52 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update yaml Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * add extrac hf text and update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * move dataset dependency to common Signed-off-by: stevehuang52 <[email protected]> * add docstring Signed-off-by: stevehuang52 <[email protected]> * Add to Dics Signed-off-by: Nithin Rao Koluguri <nithinraok> * add ci test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add max steps in jenkins Signed-off-by: Nithin Rao Koluguri <nithinraok> * reduce max steps Signed-off-by: Nithin Rao Koluguri <nithinraok> * jenkins test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add bs=2 Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Multimodal merge (#7728) * ControlNet TRT export * Final MR before release * SD2 update * Fixed export issue * Fix for instruct p2p and reformat * Fix SD export issue * Add nemo clip export for DB * Fix ins pix2pix * fix sd2 config * [Mingyuan Ma] BF16 and SD conversion script * [Imagen] NHWC Feature * Fix .nemo loading issue for NeMo CLIP in SD * NeMo r1.20.0 Multimodal Merge * fix the inductor issue in inference * Fix inductor loading .nemo issue * Add Neva Model Support * Imagen Optimizations * Neva inference code * NeMo TOT 1.21 to Internal/main * Update neva_inference.yaml * REBASING for latest code changes * Update internal/main to main tot * Parallel DDIM implementation * 1. Fixing indentation bug. (#7352) Signed-off-by: Micha Livne <[email protected]> * NeMo MCore llama2 support + MCore PEFT adapters (#7299) * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove imports Signed-off-by: ericharper <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * mcore llama2 ckpt conversion & small fix Signed-off-by: jasonwan <[email protected]> * Add inference & sft config by Hongbin Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: jasonwan <[email protected]> * fix config Signed-off-by: jasonwan <[email protected]> * add inference param. update TP/PP script to support mcore gpt Signed-off-by: jasonwan <[email protected]> * p-tuning Signed-off-by: jasonwan <[email protected]> * modify ckpt conversion script (adding model cast) Signed-off-by: jasonwan <[email protected]> * ckpt conversion use relative path for config Signed-off-by: jasonwan <[email protected]> * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * update module args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * remove optimizer_idx Signed-off-by: eharper <[email protected]> * prefetch num microbatches Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * fix for p-tuning sequence parallel Signed-off-by: jasonwan <[email protected]> * support SFT/distOpt mcore (#7207) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rollback model cast for p-tuning Signed-off-by: jasonwan <[email protected]> * update for dist adam Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use get_gpt_module_list Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion script Signed-off-by: jasonwan <[email protected]> * ptl2.0 patch for llama config Signed-off-by: jasonwan <[email protected]> * add plugins to trainer in scripts Signed-off-by: jasonwan <[email protected]> * fix activation checkpointing mcore Signed-off-by: jasonwan <[email protected]> * fix variable names Signed-off-by: jasonwan <[email protected]> * overwrite normalization type for mcore/te Signed-off-by: jasonwan <[email protected]> * Update megatron_llama_sft.yaml Signed-off-by: Jason Wang <[email protected]> * add PEFT adapter support for mcore gpt path (#7276) * implementation for mcore adapter/mxins Signed-off-by: jasonwan <[email protected]> * small fix for lora and ptuning Signed-off-by: jasonwan <[email protected]> * support layerwise peft Signed-off-by: jasonwan <[email protected]> * support multiple target layers Signed-off-by: jasonwan <[email protected]> * support lora GQA Signed-off-by: jasonwan <[email protected]> * support amp O2 Signed-off-by: jasonwan <[email protected]> * revert & more O2 fix Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lora inject to attention Signed-off-by: jasonwan <[email protected]> * support lora weight tying Signed-off-by: jasonwan <[email protected]> * add copyright header Signed-off-by: jasonwan <[email protected]> * rollback ptuning name change. full string match mcore target Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove comment Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * clean up config Signed-off-by: jasonwan <[email protected]> * Sync llama branch (#7297) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug: cpu initialization is not really enabled Signed-off-by: Hongbin Liu <[email protected]> * add use_cpu_initialization to TransformerConfig Signed-off-by: Hongbin Liu <[email protected]> * fix bug: wrong config path when using relative cjpt path Signed-off-by: Hongbin Liu <[email protected]> * revert mcore config change Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * clean up ckpt conversion script Signed-off-by: jasonwan <[email protected]> * rollback git merge errors Signed-off-by: jasonwan <[email protected]> * update mcore, add check for mcore+te Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * formatting Signed-off-by: jasonwan <[email protected]> * make sft test dataset optional. fix indentation in config Signed-off-by: jasonwan <[email protected]> * one more fix for optional test set Signed-off-by: jasonwan <[email protected]> * support merging lora weights in mcore Signed-off-by: jasonwan <[email protected]> * update mcore for cpu init Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion for code llama Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add seq_len_interpolation_factor support for long-context llama ckpts (#7312) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * add seq_len_interpolation_factor Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * fix old ptuning model, update mcore to support seq_len_interpolation_factor Signed-off-by: jasonwan <[email protected]> * support fused layernorm linear, fix ptuning O2 Signed-off-by: jasonwan <[email protected]> * drop loss mask for mcore for now Signed-off-by: jasonwan <[email protected]> * disable dist ckpt in peft Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix loading non dist ckpt Signed-off-by: jasonwan <[email protected]> * add ckpt conversion to CI Signed-off-by: jasonwan <[email protected]> * update CI Signed-off-by: jasonwan <[email protected]> * mcore_mixin docstring Signed-off-by: jasonwan <[email protected]> * minor change in mcore peft error message Signed-off-by: jasonwan <[email protected]> * fix amp o2 in lora weight tying Signed-off-by: jasonwan <[email protected]> * correct mcore fp8 config Signed-off-by: jasonwan <[email protected]> * add TE installation Signed-off-by: jasonwan <[email protected]> * support mcore adapter tuning Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out new CI test. rollback docker image Signed-off-by: jasonwan <[email protected]> * ignore FA tests, try new CI on 23.08 Signed-off-by: jasonwan <[email protected]> * mark new CI as L2, put to beginning to test Signed-off-by: jasonwan <[email protected]> * minor fix for prompt learning Signed-off-by: jasonwan <[email protected]> * rollback to 23.06. comment out CI Signed-off-by: jasonwan <[email protected]> * minor fix ckpt conversion script Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor rollback gpt model change Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Signed-off-by: Jason Wang <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: eharper <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: Kelvin Liu <[email protected]> * Hiddens modules documentation (#7303) * 1. Changed hiddens transformations module from `transformations` to `hiddens`. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Finished doc. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> --------- Signed-off-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Support for flash attention 2.0 (#7063) * Add flash attn 2 Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FA2 feature Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove debugging Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * lora merge fix for O2 names (#7325) * wip Signed-off-by: arendu <[email protected]> * adjust key names based on O2 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * minor Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Load buffers in checkpoint (#7357) Signed-off-by: Jason Wang <[email protected]> * Add migration guide for lightning 2.0 upgrade (#7360) * Add lightning 2.0 migration guide in NeMo docs Signed-off-by: Abhishree <[email protected]> * Add remaining guide for lightning 2.0 upgrade Signed-off-by: Abhishree <[email protected]> * Remove line spill over and continue in next line Signed-off-by: Abhishree <[email protected]> * Add missing dataloader_iter in the guide Signed-off-by: Abhishree <[email protected]> * Fix minor typo Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> * adding bias_dropout_add_fusion option for BERT (#7332) Signed-off-by: Alexander Jipa <[email protected]> Co-authored-by: Alexander Jipa <[email protected]> * [TTS] Change audio codec token type to TokenIndex (#7356) Signed-off-by: Ryan <[email protected]> * enable selective unfreeze (#7326) * wip Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * wip Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * avoid PTL method conflicts Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix typos (#7361) * fix typos Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typos Signed-off-by: omahs <[email protected]> * fix typos Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typos Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> --------- Signed-off-by: omahs <[email protected]> * pin numba=0.57.1 to fix reinstall.sh error (#7366) Signed-off-by: Xuesong Yang <[email protected]> * Update new conversion script for converting safetensors. * Upgrade pytorch container to 23.08 (#7353) * upgrade pytorch container Signed-off-by: eharper <[email protected]> * use mcore Signed-off-by: eharper <[email protected]> * revert test change Signed-off-by: eharper <[email protected]> * pleasefixme Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for ampere Signed-off-by: eharper <[email protected]> * comment test temporarily Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * enable fp32 optimizer for output_layer in mcore (#7355) Signed-off-by: lhb8125 <[email protected]> * revert comment (#7368) Signed-off-by: eharper <[email protected]> * Update to core 23.08 branch ToT (#7371) Signed-off-by: Abhinav Khattar <[email protected]> * upper bounding ptl (#7370) Signed-off-by: eharper <[email protected]> * fix pipeline parallel inference (#7367) * fix pp inference Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix for peft tied weights (#7372) Signed-off-by: arendu <[email protected]> * fixed trainer.strategy=auto from None. (#7369) Signed-off-by: Xuesong Yang <[email protected]> * add O2 option in gpt eval (#7358) * add O2 option in eval Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add doc for O2 config Signed-off-by: jasonwan <[email protected]> * add to llama inference config Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, s…

* fix(clustering_diarizer.py): fix typo (#7772) Signed-off-by: Jean-Louis Queguiner <[email protected]> * fix(diarization-README): typo (#7771) Signed-off-by: Jean-Louis Queguiner <[email protected]> * Fix bug wrt change decoding strategy for bpe models (#7762) (#7764) * Fix bug wrt change decoding strategy for bpe models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Remove incorrect extra argument for load_from_checkpoint_dir() (#7500) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add nemo to mcore GPT conversion script (#7730) * add conversion script Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove references to 'ckpt' Signed-off-by: Chen Cui <[email protected]> * add one more sanity check to make sure there is no unexpected keys in state dict Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make cpu loading work Signed-off-by: Chen Cui <[email protected]> * make script work for llama2 models Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address code check Signed-off-by: Chen Cui <[email protected]> * remove trainer precision (was for old sanity check) Signed-off-by: Chen Cui <[email protected]> * fix script for llama2 model Signed-off-by: Chen Cui <[email protected]> * remove commented code Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix bug in ConditionalInput: cat along the feature dim, not the batch dim (#7785) Signed-off-by: anferico <[email protected]> * Add some docs and update scripts for ASR (#7790) * Add some docs and update scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * set context for text memmap to fork (#7784) * set context for text memmap to fork Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add training with multiple audios Signed-off-by: stevehuang52 <[email protected]> * Support flash decoding (#7744) * Add flash-decoding Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7761) * Change accelerator to 'auto' in nlp_checkpoint_port.py (#7747) * Change accelerator to auto Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in nlp_checkpoint_port.py Signed-off-by: Abhishree <[email protected]> * Pass omegaconf object to trainer in export.py Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Abhishree <[email protected]> * docs: fix typos (#7758) Signed-off-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Abhishree <[email protected]> * Snake act (#7736) Signed-off-by: Abhishree <[email protected]> * Update gpt_dataset.py (#6963) Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: shuoer86 <[email protected]> Signed-off-by: Xin Yao <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: shuoer86 <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Xin Yao <[email protected]> Co-authored-by: Sandeep Subramanian <[email protected]> * Add selection criteria for reference audios in the `GlobalStyleToken` submodule (#7788) * add selection criteria for reference audios Signed-off-by: anferico <[email protected]> * Update configuration files Signed-off-by: anferico <[email protected]> * add informative comment in config files Signed-off-by: anferico <[email protected]> * sample random index for reference audio selection Signed-off-by: anferico <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: anferico <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update text server to support compute logprobs (#7733) * update text server to support compute logprobs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo --------- Signed-off-by: Zhilin Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add multi-layer feat extract and fix random question insertion Signed-off-by: stevehuang52 <[email protected]> * Configure MCore logger (#7781) Signed-off-by: Mikołaj Błaż <[email protected]> * Revert "PEFT eval fix (#7626) (#7638)" (#7693) This reverts commit f03dd660bd26d88fd569e76c6f74b83a7c203ff9. * remove TN from ctc_segm tut (#7807) Signed-off-by: Evelina <[email protected]> * [TTS] Support audio offsets in TTS data loaders (#7156) * [TTS] Support audio offsets in TTS data loaders Signed-off-by: Ryan <[email protected]> * [TTS] Change docstring mentions of .pt to .npy Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Update Apex install command in Dockerfile (#7794) (#7804) * move core install to /workspace (#7706) * update apex install in dockerfile * use fetch head --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: eharper <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Abhinav Khattar <[email protected]> * fix typo Signed-off-by: stevehuang52 <[email protected]> * Nemo to HF converter for LLaMA model (#7770) * Create config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Add files via upload Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update config_llama_truncate.yaml Signed-off-by: Utkarsh <[email protected]> * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update convert_nemo_llama_to_hf.py Signed-off-by: Utkarsh <[email protected]> * clean up trainer * remove dependency on yaml config. load config from nemo file instead. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * enable ckpt saving into other precision formats * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * support 70b + cleanup qkv slice logic * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bug * move hf model folder code from comment to function and add instruction to run * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Utkarsh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Save best NeMo model only when necessary (#7836) Signed-off-by: Ante Jukić <[email protected]> * add guard if its a distributed checkpoint (#7845) Signed-off-by: Gerald Shen <[email protected]> * Fix tn duplex (#7808) * fix duplex tn infer Signed-off-by: Evelina <[email protected]> * fix typo Signed-off-by: Evelina <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix TN docs Signed-off-by: Evelina <[email protected]> --------- Signed-off-by: Evelina <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update transformers cache on Jenkins (#7854) * update transformers cache Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * add cd Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> * Update README.rst for container update (#7844) Signed-off-by: fayejf <[email protected]> * Add support for finetuning with huggingface datasets (#7834) * add finetune with huggingface dataset Signed-off-by: stevehuang52 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update yaml Signed-off-by: stevehuang52 <[email protected]> * update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * add extrac hf text and update Signed-off-by: stevehuang52 <[email protected]> * update and refactor Signed-off-by: stevehuang52 <[email protected]> * move dataset dependency to common Signed-off-by: stevehuang52 <[email protected]> * add docstring Signed-off-by: stevehuang52 <[email protected]> * Add to Dics Signed-off-by: Nithin Rao Koluguri <nithinraok> * add ci test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add max steps in jenkins Signed-off-by: Nithin Rao Koluguri <nithinraok> * reduce max steps Signed-off-by: Nithin Rao Koluguri <nithinraok> * jenkins test Signed-off-by: Nithin Rao Koluguri <nithinraok> * add bs=2 Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: stevehuang52 <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Multimodal merge (#7728) * ControlNet TRT export * Final MR before release * SD2 update * Fixed export issue * Fix for instruct p2p and reformat * Fix SD export issue * Add nemo clip export for DB * Fix ins pix2pix * fix sd2 config * [Mingyuan Ma] BF16 and SD conversion script * [Imagen] NHWC Feature * Fix .nemo loading issue for NeMo CLIP in SD * NeMo r1.20.0 Multimodal Merge * fix the inductor issue in inference * Fix inductor loading .nemo issue * Add Neva Model Support * Imagen Optimizations * Neva inference code * NeMo TOT 1.21 to Internal/main * Update neva_inference.yaml * REBASING for latest code changes * Update internal/main to main tot * Parallel DDIM implementation * 1. Fixing indentation bug. (#7352) Signed-off-by: Micha Livne <[email protected]> * NeMo MCore llama2 support + MCore PEFT adapters (#7299) * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove imports Signed-off-by: ericharper <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * mcore llama2 ckpt conversion & small fix Signed-off-by: jasonwan <[email protected]> * Add inference & sft config by Hongbin Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: jasonwan <[email protected]> * fix config Signed-off-by: jasonwan <[email protected]> * add inference param. update TP/PP script to support mcore gpt Signed-off-by: jasonwan <[email protected]> * p-tuning Signed-off-by: jasonwan <[email protected]> * modify ckpt conversion script (adding model cast) Signed-off-by: jasonwan <[email protected]> * ckpt conversion use relative path for config Signed-off-by: jasonwan <[email protected]> * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * update module args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * remove optimizer_idx Signed-off-by: eharper <[email protected]> * prefetch num microbatches Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * fix for p-tuning sequence parallel Signed-off-by: jasonwan <[email protected]> * support SFT/distOpt mcore (#7207) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rollback model cast for p-tuning Signed-off-by: jasonwan <[email protected]> * update for dist adam Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use get_gpt_module_list Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion script Signed-off-by: jasonwan <[email protected]> * ptl2.0 patch for llama config Signed-off-by: jasonwan <[email protected]> * add plugins to trainer in scripts Signed-off-by: jasonwan <[email protected]> * fix activation checkpointing mcore Signed-off-by: jasonwan <[email protected]> * fix variable names Signed-off-by: jasonwan <[email protected]> * overwrite normalization type for mcore/te Signed-off-by: jasonwan <[email protected]> * Update megatron_llama_sft.yaml Signed-off-by: Jason Wang <[email protected]> * add PEFT adapter support for mcore gpt path (#7276) * implementation for mcore adapter/mxins Signed-off-by: jasonwan <[email protected]> * small fix for lora and ptuning Signed-off-by: jasonwan <[email protected]> * support layerwise peft Signed-off-by: jasonwan <[email protected]> * support multiple target layers Signed-off-by: jasonwan <[email protected]> * support lora GQA Signed-off-by: jasonwan <[email protected]> * support amp O2 Signed-off-by: jasonwan <[email protected]> * revert & more O2 fix Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lora inject to attention Signed-off-by: jasonwan <[email protected]> * support lora weight tying Signed-off-by: jasonwan <[email protected]> * add copyright header Signed-off-by: jasonwan <[email protected]> * rollback ptuning name change. full string match mcore target Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove comment Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * clean up config Signed-off-by: jasonwan <[email protected]> * Sync llama branch (#7297) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug: cpu initialization is not really enabled Signed-off-by: Hongbin Liu <[email protected]> * add use_cpu_initialization to TransformerConfig Signed-off-by: Hongbin Liu <[email protected]> * fix bug: wrong config path when using relative cjpt path Signed-off-by: Hongbin Liu <[email protected]> * revert mcore config change Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * clean up ckpt conversion script Signed-off-by: jasonwan <[email protected]> * rollback git merge errors Signed-off-by: jasonwan <[email protected]> * update mcore, add check for mcore+te Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * formatting Signed-off-by: jasonwan <[email protected]> * make sft test dataset optional. fix indentation in config Signed-off-by: jasonwan <[email protected]> * one more fix for optional test set Signed-off-by: jasonwan <[email protected]> * support merging lora weights in mcore Signed-off-by: jasonwan <[email protected]> * update mcore for cpu init Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion for code llama Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add seq_len_interpolation_factor support for long-context llama ckpts (#7312) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * add seq_len_interpolation_factor Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * fix old ptuning model, update mcore to support seq_len_interpolation_factor Signed-off-by: jasonwan <[email protected]> * support fused layernorm linear, fix ptuning O2 Signed-off-by: jasonwan <[email protected]> * drop loss mask for mcore for now Signed-off-by: jasonwan <[email protected]> * disable dist ckpt in peft Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix loading non dist ckpt Signed-off-by: jasonwan <[email protected]> * add ckpt conversion to CI Signed-off-by: jasonwan <[email protected]> * update CI Signed-off-by: jasonwan <[email protected]> * mcore_mixin docstring Signed-off-by: jasonwan <[email protected]> * minor change in mcore peft error message Signed-off-by: jasonwan <[email protected]> * fix amp o2 in lora weight tying Signed-off-by: jasonwan <[email protected]> * correct mcore fp8 config Signed-off-by: jasonwan <[email protected]> * add TE installation Signed-off-by: jasonwan <[email protected]> * support mcore adapter tuning Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out new CI test. rollback docker image Signed-off-by: jasonwan <[email protected]> * ignore FA tests, try new CI on 23.08 Signed-off-by: jasonwan <[email protected]> * mark new CI as L2, put to beginning to test Signed-off-by: jasonwan <[email protected]> * minor fix for prompt learning Signed-off-by: jasonwan <[email protected]> * rollback to 23.06. comment out CI Signed-off-by: jasonwan <[email protected]> * minor fix ckpt conversion script Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor rollback gpt model change Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Signed-off-by: Jason Wang <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: eharper <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: Kelvin Liu <[email protected]> * Hiddens modules documentation (#7303) * 1. Changed hiddens transformations module from `transformations` to `hiddens`. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Finished doc. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> --------- Signed-off-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Support for flash attention 2.0 (#7063) * Add flash attn 2 Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FA2 feature Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove debugging Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * lora merge fix for O2 names (#7325) * wip Signed-off-by: arendu <[email protected]> * adjust key names based on O2 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * minor Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Load buffers in checkpoint (#7357) Signed-off-by: Jason Wang <[email protected]> * Add migration guide for lightning 2.0 upgrade (#7360) * Add lightning 2.0 migration guide in NeMo docs Signed-off-by: Abhishree <[email protected]> * Add remaining guide for lightning 2.0 upgrade Signed-off-by: Abhishree <[email protected]> * Remove line spill over and continue in next line Signed-off-by: Abhishree <[email protected]> * Add missing dataloader_iter in the guide Signed-off-by: Abhishree <[email protected]> * Fix minor typo Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> * adding bias_dropout_add_fusion option for BERT (#7332) Signed-off-by: Alexander Jipa <[email protected]> Co-authored-by: Alexander Jipa <[email protected]> * [TTS] Change audio codec token type to TokenIndex (#7356) Signed-off-by: Ryan <[email protected]> * enable selective unfreeze (#7326) * wip Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * wip Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * avoid PTL method conflicts Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix typos (#7361) * fix typos Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typos Signed-off-by: omahs <[email protected]> * fix typos Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typos Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> --------- Signed-off-by: omahs <[email protected]> * pin numba=0.57.1 to fix reinstall.sh error (#7366) Signed-off-by: Xuesong Yang <[email protected]> * Update new conversion script for converting safetensors. * Upgrade pytorch container to 23.08 (#7353) * upgrade pytorch container Signed-off-by: eharper <[email protected]> * use mcore Signed-off-by: eharper <[email protected]> * revert test change Signed-off-by: eharper <[email protected]> * pleasefixme Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for ampere Signed-off-by: eharper <[email protected]> * comment test temporarily Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * enable fp32 optimizer for output_layer in mcore (#7355) Signed-off-by: lhb8125 <[email protected]> * revert comment (#7368) Signed-off-by: eharper <[email protected]> * Update to core 23.08 branch ToT (#7371) Signed-off-by: Abhinav Khattar <[email protected]> * upper bounding ptl (#7370) Signed-off-by: eharper <[email protected]> * fix pipeline parallel inference (#7367) * fix pp inference Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix for peft tied weights (#7372) Signed-off-by: arendu <[email protected]> * fixed trainer.strategy=auto from None. (#7369) Signed-off-by: Xuesong Yang <[email protected]> * add O2 option in gpt eval (#7358) * add O2 option in eval Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add doc for O2 config Signed-off-by: jasonwan <[email protected]> * add to llama inference config Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-aut…

* [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead …

* Hotfix (#7501) (#7568) Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: jbaczek <[email protected]> * Avoid duplicated checkpoint save (#7555) (#7566) Signed-off-by: Mikołaj Błaż <[email protected]> Co-authored-by: mikolajblaz <[email protected]> * Cache FP8 weight and transpose only at the first micro-batch in each validation and test routine (#7470) (#7483) * Cache weight and transpose only in the first batch in all training, val, and test runs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add an option to disable manual GC in validation (#7467) (#7476) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> * Remove PUBLICATIONS.md, point to github.io NeMo page instead (#7694) (#7695) * update publications section to point to blog website page * add hyphen * use double backquotes for code formatting --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Fix multi rank finetune for ASR (#7684) (#7699) * Fix multi rank finetune for ASR * Actually add time * Actually add time --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * Update docs: readme, getting started, ASR intro (#7679) * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * move install info to INSTALLATION.md Signed-off-by: Elena Rastorgueva <[email protected]> * tidy up links Signed-off-by: Elena Rastorgueva <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typos (#7581) Signed-off-by: Elena Rastorgueva <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * add outline of asr quickstart info to asr/intro.rst Signed-off-by: Elena Rastorgueva <[email protected]> * add CLI, LM and real-time transcription sections Signed-off-by: Elena Rastorgueva <[email protected]> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-… * fix onnx (#7703) (#7704) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * move core install to /workspace (#7706) Signed-off-by: Abhinav Khattar <[email protected]> * Fix typo in audio codec config, encoder target (#7697) Signed-off-by: Ante Jukić <[email protected]> * Replace strategy='dp'/None with 'auto' (#7681) (#7696) * Add strategy=auto for None and dp * Change strategy from None to auto --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [ASR] Multichannel mask estimator with flex number of channels (#7317) * Adding a mask estimator which can …

* Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Fix refiner issue on FID * Fix refiner seeding issue * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Add training config with no cropping and extra conditioning. * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci …

* [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]>

* Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix typos (#7581) Signed-off-by: Sasha Meister <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added per tests Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * rate_punctuation.py Fixed output manifest saving Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Fix tests Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Function name fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Moving PER to speech_to_text_eval.py Added: - "use_per": PER metric computing; - "scores_per_sample": metrics computation sample by sample for wer/cer/punctuation rates; - "output_with_scores_filename": saving manifest with metrics Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update test_metrics.py Updated "punctuation_error_rate" function name Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * guard extra dependencies Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Write metrics to "output_filename" if "scores_per_sample=True" Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * scores_per_sample description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import guards Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Stats printing when HAVE_TABLUATE_AND_PANDAS=False Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Delete examples/asr/rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Added use_per description Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * metric and variables name fixing Signed-off-by: Sasha Meister <[email protected]> * Add else samples = None Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signe…

* [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * move install info to INSTALLATION.md Signed-off-by: Elena Rastorgueva <[email protected]> * tidy up links Signed-off-by: Elena Rastorgueva <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typos (#7581) Signed-off-by: Elena Rastorgueva <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * add outline of asr quickstart info to asr/intro.rst Signed-off-by: Elena Rastorgueva <[email protected]> * add CLI, LM and real-time transcription sections Signed-off-by: Elena Rastorgueva <[email protected]> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-…

* Create pos_emb.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update pos_emb.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update pos_emb.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update pos_emb.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update pos_emb.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update pos_emb.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update pos_emb.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update and rename docs/source/nlp/pos_emb.rst to docs/source/nlp/nemo_megatron /positional_embeddings.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Rename positional_embeddings.rst to positional_embeddings.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Create flash_attention.rst Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Changed value for model.seq_len_interpolation_factor to 2 Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fixed flash_attention enabling for t5 Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix typos (#7581) Signed-off-by: Sasha Meister <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha…

* ControlNet TRT export * Final MR before release * SD2 update * Fixed export issue * Fix for instruct p2p and reformat * Fix SD export issue * Add nemo clip export for DB * Fix ins pix2pix * fix sd2 config * [Mingyuan Ma] BF16 and SD conversion script * [Imagen] NHWC Feature * Fix .nemo loading issue for NeMo CLIP in SD * NeMo r1.20.0 Multimodal Merge * fix the inductor issue in inference * Fix inductor loading .nemo issue * Add Neva Model Support * Imagen Optimizations * Neva inference code * NeMo TOT 1.21 to Internal/main * Update neva_inference.yaml * REBASING for latest code changes * Update internal/main to main tot * Parallel DDIM implementation * 1. Fixing indentation bug. (#7352) Signed-off-by: Micha Livne <[email protected]> * NeMo MCore llama2 support + MCore PEFT adapters (#7299) * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove imports Signed-off-by: ericharper <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * mcore llama2 ckpt conversion & small fix Signed-off-by: jasonwan <[email protected]> * Add inference & sft config by Hongbin Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: jasonwan <[email protected]> * fix config Signed-off-by: jasonwan <[email protected]> * add inference param. update TP/PP script to support mcore gpt Signed-off-by: jasonwan <[email protected]> * p-tuning Signed-off-by: jasonwan <[email protected]> * modify ckpt conversion script (adding model cast) Signed-off-by: jasonwan <[email protected]> * ckpt conversion use relative path for config Signed-off-by: jasonwan <[email protected]> * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * set vp size to none if it is 1 Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add TransformerConfig Signed-off-by: ericharper <[email protected]> * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove import Signed-off-by: ericharper <[email protected]> * small clean up Signed-off-by: ericharper <[email protected]> * update hidden size in peft base model, add mcore commit to jenkins Signed-off-by: ericharper <[email protected]> * update module args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add config obj to flash attention tests Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove sequence parallel arg Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add config to test Signed-off-by: ericharper <[email protected]> * get hidden_size from config Signed-off-by: ericharper <[email protected]> * add try except Signed-off-by: ericharper <[email protected]> * use default Signed-off-by: ericharper <[email protected]> * update config with hidden size Signed-off-by: ericharper <[email protected]> * remove arg Signed-off-by: ericharper <[email protected]> * comment out jenkins test Signed-off-by: ericharper <[email protected]> * revert import Signed-off-by: ericharper <[email protected]> * remove optimizer_idx Signed-off-by: eharper <[email protected]> * prefetch num microbatches Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start adding gpt from megatron core path Signed-off-by: ericharper <[email protected]> * set model parallel config Signed-off-by: ericharper <[email protected]> * use model parallel config object Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * fix for p-tuning sequence parallel Signed-off-by: jasonwan <[email protected]> * support SFT/distOpt mcore (#7207) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * start updating to TransformerConfig Signed-off-by: ericharper <[email protected]> * revert to model parallel config Signed-off-by: ericharper <[email protected]> * add hidden_size to model_parallel_config Signed-off-by: ericharper <[email protected]> * remove imports Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update module args Signed-off-by: ericharper <[email protected]> * add config to self Signed-off-by: ericharper <[email protected]> * build transformer config Signed-off-by: ericharper <[email protected]> * add model to provider func Signed-off-by: ericharper <[email protected]> * update forward and float16 wrapper Signed-off-by: ericharper <[email protected]> * instantiate model parallel config after init model parallel Signed-off-by: ericharper <[email protected]> * set virtual rank Signed-off-by: ericharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add GQA config to megatron gpt model (#7096) * Add GQA config in gpt config file Signed-off-by: jasonwan <[email protected]> * Verify mcore is enabled when using GQA Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> * revert Signed-off-by: ericharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rollback model cast for p-tuning Signed-off-by: jasonwan <[email protected]> * update for dist adam Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use get_gpt_module_list Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion script Signed-off-by: jasonwan <[email protected]> * ptl2.0 patch for llama config Signed-off-by: jasonwan <[email protected]> * add plugins to trainer in scripts Signed-off-by: jasonwan <[email protected]> * fix activation checkpointing mcore Signed-off-by: jasonwan <[email protected]> * fix variable names Signed-off-by: jasonwan <[email protected]> * overwrite normalization type for mcore/te Signed-off-by: jasonwan <[email protected]> * Update megatron_llama_sft.yaml Signed-off-by: Jason Wang <[email protected]> * add PEFT adapter support for mcore gpt path (#7276) * implementation for mcore adapter/mxins Signed-off-by: jasonwan <[email protected]> * small fix for lora and ptuning Signed-off-by: jasonwan <[email protected]> * support layerwise peft Signed-off-by: jasonwan <[email protected]> * support multiple target layers Signed-off-by: jasonwan <[email protected]> * support lora GQA Signed-off-by: jasonwan <[email protected]> * support amp O2 Signed-off-by: jasonwan <[email protected]> * revert & more O2 fix Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lora inject to attention Signed-off-by: jasonwan <[email protected]> * support lora weight tying Signed-off-by: jasonwan <[email protected]> * add copyright header Signed-off-by: jasonwan <[email protected]> * rollback ptuning name change. full string match mcore target Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove comment Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * clean up config Signed-off-by: jasonwan <[email protected]> * Sync llama branch (#7297) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * change layer names for SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug in SFT Signed-off-by: Hongbin Liu <[email protected]> * fix bug: cpu initialization is not really enabled Signed-off-by: Hongbin Liu <[email protected]> * add use_cpu_initialization to TransformerConfig Signed-off-by: Hongbin Liu <[email protected]> * fix bug: wrong config path when using relative cjpt path Signed-off-by: Hongbin Liu <[email protected]> * revert mcore config change Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * clean up ckpt conversion script Signed-off-by: jasonwan <[email protected]> * rollback git merge errors Signed-off-by: jasonwan <[email protected]> * update mcore, add check for mcore+te Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * formatting Signed-off-by: jasonwan <[email protected]> * make sft test dataset optional. fix indentation in config Signed-off-by: jasonwan <[email protected]> * one more fix for optional test set Signed-off-by: jasonwan <[email protected]> * support merging lora weights in mcore Signed-off-by: jasonwan <[email protected]> * update mcore for cpu init Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update ckpt conversion for code llama Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add seq_len_interpolation_factor support for long-context llama ckpts (#7312) * add inference param. update TP/PP script to support mcore gpt * p-tuning Signed-off-by: jasonwan <[email protected]> * add seq_len_interpolation_factor Signed-off-by: Hongbin Liu <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * fix old ptuning model, update mcore to support seq_len_interpolation_factor Signed-off-by: jasonwan <[email protected]> * support fused layernorm linear, fix ptuning O2 Signed-off-by: jasonwan <[email protected]> * drop loss mask for mcore for now Signed-off-by: jasonwan <[email protected]> * disable dist ckpt in peft Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix loading non dist ckpt Signed-off-by: jasonwan <[email protected]> * add ckpt conversion to CI Signed-off-by: jasonwan <[email protected]> * update CI Signed-off-by: jasonwan <[email protected]> * mcore_mixin docstring Signed-off-by: jasonwan <[email protected]> * minor change in mcore peft error message Signed-off-by: jasonwan <[email protected]> * fix amp o2 in lora weight tying Signed-off-by: jasonwan <[email protected]> * correct mcore fp8 config Signed-off-by: jasonwan <[email protected]> * add TE installation Signed-off-by: jasonwan <[email protected]> * support mcore adapter tuning Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out new CI test. rollback docker image Signed-off-by: jasonwan <[email protected]> * ignore FA tests, try new CI on 23.08 Signed-off-by: jasonwan <[email protected]> * mark new CI as L2, put to beginning to test Signed-off-by: jasonwan <[email protected]> * minor fix for prompt learning Signed-off-by: jasonwan <[email protected]> * rollback to 23.06. comment out CI Signed-off-by: jasonwan <[email protected]> * minor fix ckpt conversion script Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor rollback gpt model change Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Hongbin Liu <[email protected]> Signed-off-by: Jason Wang <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: eharper <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Co-authored-by: Kelvin Liu <[email protected]> * Hiddens modules documentation (#7303) * 1. Changed hiddens transformations module from `transformations` to `hiddens`. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * 1. Finished doc. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> --------- Signed-off-by: Micha Livne <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Support for flash attention 2.0 (#7063) * Add flash attn 2 Signed-off-by: MaximumEntropy <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FA2 feature Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove debugging Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * lora merge fix for O2 names (#7325) * wip Signed-off-by: arendu <[email protected]> * adjust key names based on O2 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * minor Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Load buffers in checkpoint (#7357) Signed-off-by: Jason Wang <[email protected]> * Add migration guide for lightning 2.0 upgrade (#7360) * Add lightning 2.0 migration guide in NeMo docs Signed-off-by: Abhishree <[email protected]> * Add remaining guide for lightning 2.0 upgrade Signed-off-by: Abhishree <[email protected]> * Remove line spill over and continue in next line Signed-off-by: Abhishree <[email protected]> * Add missing dataloader_iter in the guide Signed-off-by: Abhishree <[email protected]> * Fix minor typo Signed-off-by: Abhishree <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> * adding bias_dropout_add_fusion option for BERT (#7332) Signed-off-by: Alexander Jipa <[email protected]> Co-authored-by: Alexander Jipa <[email protected]> * [TTS] Change audio codec token type to TokenIndex (#7356) Signed-off-by: Ryan <[email protected]> * enable selective unfreeze (#7326) * wip Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * wip Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * avoid PTL method conflicts Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix typos (#7361) * fix typos Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typos Signed-off-by: omahs <[email protected]> * fix typos Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typos Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> * fix typo Signed-off-by: omahs <[email protected]> --------- Signed-off-by: omahs <[email protected]> * pin numba=0.57.1 to fix reinstall.sh error (#7366) Signed-off-by: Xuesong Yang <[email protected]> * Update new conversion script for converting safetensors. * Upgrade pytorch container to 23.08 (#7353) * upgrade pytorch container Signed-off-by: eharper <[email protected]> * use mcore Signed-off-by: eharper <[email protected]> * revert test change Signed-off-by: eharper <[email protected]> * pleasefixme Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for ampere Signed-off-by: eharper <[email protected]> * comment test temporarily Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * enable fp32 optimizer for output_layer in mcore (#7355) Signed-off-by: lhb8125 <[email protected]> * revert comment (#7368) Signed-off-by: eharper <[email protected]> * Update to core 23.08 branch ToT (#7371) Signed-off-by: Abhinav Khattar <[email protected]> * upper bounding ptl (#7370) Signed-off-by: eharper <[email protected]> * fix pipeline parallel inference (#7367) * fix pp inference Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix for peft tied weights (#7372) Signed-off-by: arendu <[email protected]> * fixed trainer.strategy=auto from None. (#7369) Signed-off-by: Xuesong Yang <[email protected]> * add O2 option in gpt eval (#7358) * add O2 option in eval Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add doc for O2 config Signed-off-by: jasonwan <[email protected]> * add to llama inference config Signed-off-by: jasonwan <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add suppor…

* Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signe…

…IA#7970) * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-of…

* [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update per.py - if __name__ == "__main__" removed (now metric can be imported); - removed excessive classes (like "Sample" and "Statistics"); - transition from pandas df to dict of dicts; - removed unnecessary "return"; - notation fixing; - reduced calculation time Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Create punctuation_rates.py Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * Format fixing Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * added nemo.logging, header, docstrings, how to use Signed-off-by: Sasha Meister <[email protected]> * Added asserions to rate_punctuation.py Signed-off-by: Sasha Meister <[email protected]> * fix typo Signed-off-by: Sasha Meister <[email protected]> * added function for import and call, docstrings Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithi…

) * Hotfix (#7501) (#7568) Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: jbaczek <[email protected]> * Avoid duplicated checkpoint save (#7555) (#7566) Signed-off-by: Mikołaj Błaż <[email protected]> Co-authored-by: mikolajblaz <[email protected]> * Cache FP8 weight and transpose only at the first micro-batch in each validation and test routine (#7470) (#7483) * Cache weight and transpose only in the first batch in all training, val, and test runs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add an option to disable manual GC in validation (#7467) (#7476) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> * Remove PUBLICATIONS.md, point to github.io NeMo page instead (#7694) (#7695) * update publications section to point to blog website page * add hyphen * use double backquotes for code formatting --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Fix multi rank finetune for ASR (#7684) (#7699) * Fix multi rank finetune for ASR * Actually add time * Actually add time --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * Update docs: readme, getting started, ASR intro (#7679) * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * move install info to INSTALLATION.md Signed-off-by: Elena Rastorgueva <[email protected]> * tidy up links Signed-off-by: Elena Rastorgueva <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix typos (#7581) Signed-off-by: Elena Rastorgueva <[email protected]> * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Elena Rastorgueva <[email protected]> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Elena Rastorgueva <[email protected]> * add outline of asr quickstart info to asr/intro.rst Signed-off-by: Elena Rastorgueva <[email protected]> * add CLI, LM and real-time transcription sections Signed-off-by: Elena Rastorgueva <[email protected]> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * […

* Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> * Fix refiner issue on FID * Fix refiner seeding issue * Updating FlashAttention API to match FlashAttentionV2 * Multiple fixes for mm * Fix CI inductor issue and update to torch compile * Remove suppress error * Fix when conversion config uses fp16 and it complains about precision plugin * Add training config with no cropping and extra conditioning. * Fixing FAv2 API usage * Initial release of content filtering model * Added synthetic dataloader for precached and online mode * Mingyuanm/dreambooth opt * Add llama2 support in neva training * Fix sampler length * Fix all precision issues in nemo multimodal * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Automatic Lip Reading Recognition (ALR) - ASR/CV (Visual ASR) (#7330) * striding_conv1d_k5 and dw_striding_conv1d_k5 subsampling Signed-off-by: mburchi <[email protected]> * transpose conv1d inputs Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: mburchi <[email protected]> * Update subsampling.py change striding_conv1d_k5 to striding_conv1d Signed-off-by: Maxime Burchi <[email protected]> * cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * video manifest Signed-off-by: mburchi <[email protected]> * add collection classes Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test_step_outputs Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * correct manifest bug when having only audio or only videos Signed-off-by: mburchi <[email protected]> * clean references Signed-off-by: mburchi <[email protected]> * freeze unfreeze transcribe cv models Signed-off-by: mburchi <[email protected]> * correct manifest get_full_path bug Signed-off-by: mburchi <[email protected]> * update for PR Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * guard torchvision Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * _video_speech_collate_fn in cv/data/video_to_text.py Signed-off-by: mburchi <[email protected]> * add self.out = None to asr subsampling Signed-off-by: mburchi <[email protected]> * Update nemo/collections/cv/data/video_to_text_dataset.py Co-authored-by: Igor Gitman <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> * cv -> multimodal/speech_cv branch Signed-off-by: mburchi <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: mburchi <[email protected]> Signed-off-by: Maxime Burchi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Igor Gitman <[email protected]> * HF StarCoder to NeMo conversion script (#7421) * Script to convert HF StarCoder checkpoint to NeMo Signed-off-by: Jan Lasek <[email protected]> * StarCoder conversion test Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Lasek <[email protected]> * Fix test Signed-off-by: Jan Lasek <[email protected]> * Catch up with save_to changes Signed-off-by: Jan Lasek <[email protected]> * Don't abbreviate args for clarity Signed-off-by: Jan Lasek <[email protected]> * Configurable precision: BF16 vs FP32 Signed-off-by: Jan Lasek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jan Lasek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix bug when loading dist ckpt in peft (#7452) Signed-off-by: Hongbin Liu <[email protected]> Co-authored-by: Hongbin Liu <[email protected]> * Fix adding positional embeddings in-place in transformer module (#7440) Signed-off-by: Tamerlan Tabolov <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix (#7478) Signed-off-by: Cheng-Ping Hsieh <[email protected]> * add sleep (#7498) (#7499) * add sleep * add sleep onto config instead * add comment --------- Signed-off-by: Gerald Shen <[email protected]> Co-authored-by: Gerald Shen <[email protected]> * Fix exp manager check for sleep (#7503) (#7504) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * bugfix: trainer.accelerator=auto from None. (#7492) (#7493) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * [doc] fix broken link (#7481) Signed-off-by: Stas Bekman <[email protected]> * [TTS] Read audio as int32 to avoid flac read errors (#7477) * [TTS] Read audio as int32 to avoid flac read errors Signed-off-by: Ryan <[email protected]> * [TTS] Add comment about read failures Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS (#7409) * Add dataset 'AISHELL-3' from OpenSLR for training mandarin TTS * Train 'AISHELL-3' dataset with multi-speakers Signed-off-by: Robin Dong <[email protected]> * Update get_data.py update copyright header Signed-off-by: Xuesong Yang <[email protected]> * Update get_data.py added a disclaimer Signed-off-by: Xuesong Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add new configuration file for AISHELL3 with multispeaker of fastpitch Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * dllogger - log on rank 0 only (#7513) Signed-off-by: Stas Bekman <[email protected]> * Fix TTS FastPitch tutorial (#7494) (#7516) * Fix --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * Fix get_dist() tensor dimension (#7506) (#7515) Signed-off-by: Jocelyn Huang <[email protected]> Co-authored-by: Jocelyn <[email protected]> * bugfix: specify trainer.strategy=auto when devices=1 (#7509) (#7512) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix (#7511) Signed-off-by: Abhinav Khattar <[email protected]> * [TTS] Fix FastPitch data prep tutorial (#7524) Signed-off-by: Ryan <[email protected]> * add italian tokenization (#7486) * add italian tokenization Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more ipa lexicon it Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix error deletion Signed-off-by: GiacomoLeoneMaria <[email protected]> * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Replace None strategy with auto in tutorial notebooks (#7521) (#7527) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * unpin setuptools (#7534) (#7535) Signed-off-by: fayejf <[email protected]> Co-authored-by: fayejf <[email protected]> * remove auto generated examples (#7510) * explicitly remove autogenerated examples for data parallel evaluation Signed-off-by: arendu <[email protected]> * mark autogenrated and remove it for test Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add the `strategy` argument to `MegatronGPTModel.generate()` (#7264) It is passed as an explicit argument rather than through `**strategy_args` so as to ensure someone cannot accidentally pass other arguments that would end up being ignored. It is a keyword-only argument to ensure that if in the future we want to update the signature to `**strategy_args`, we can do it without breaking code. Signed-off-by: Olivier Delalleau <[email protected]> * Fix PTL2.0 related ASR bugs in r1.21.0: Val metrics logging, None dataloader issue (#7531) (#7533) * fix none dataloader issue ptl2 * ptl2.0 logging fixes for rnnt_models --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Nithin Rao <[email protected]> * gpus -> devices (#7542) (#7545) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Update FFMPEG version to fix issue with torchaudio (#7551) (#7553) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * PEFT GPT & T5 Refactor (#7308) * initial implementation of add_adapters API * correct type hint * Add config in add_adapters for save and load (@author bobchen) * Remove AdapterConfig to avoid import error * Add AdaterConfig back and move adaptermixin to sft model * Add NLPSaveRestoreConnector as default in NLPModel.restore_from * Add restore_from_nemo_with_adapter and test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename t5 file and classes to be consistent with GPT * add t5 sft dataset * add support for single-file format with T5SFTDataset * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Various small changes to make T5 SFT work like GPT SFT * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add adapter evaluation test script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add MultiAdaterConfig for ia3 and fix builder issue * Make ptuning for T5SFTModel work using mixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add IA3_Adapter for AdapterName * Add adapter name for ptuning and attention adapter * Make test script GPT/T5 agnostic * Add layer selection feature * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Integrate adapter name and config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt peft tuning script to new API * add t5 peft tuning script with new API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix IA3 layer selection issue * Override state_dict on SFT model instead of mixin * Add load adapter by adapter config * move peft config map away from example script * auto get config from nemo adapter * Move PEFTConfig to new file * fix ckpt save/load for t5 * name change: add_adapters -> add_adapter * variable name change * update t5 script * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix t5 issues * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add weight tying * update gpt tuning script * PEFT-API proposal * Fix according to comments * update tuning scripts * move merge_cfg_with to mixin class since it applies to both gpt and t5 and requires the model class for restore * Add mcore_gpt support for NLPAdapterMixin * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * variable name change to distinguish "peft" and "adapter" * override `load_adapters` to support `add_adapter` name change * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tuning and eval script for adapter save/load * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Ptuning on first stage only * add lora tutorial for review * Fix layer selection for mcore * add landing page * fix resume training Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add mcore condition in sharded_state_dict to make sft work * Update lora_tutorial.md First edit of this file for PEFT documentation for NeMO Signed-off-by: hkelly33 <[email protected]> * rename Adapter to AttentionAdapter to avoid confusion in doc * Change load_adapters to load .nemo * add quick start guide * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add load_adapters with .ckpt * Remove setup_complete changes in load_adapters * update landing page * remove typo * Updated quick_start.md per Chen Cui Signed-off-by: hkelly33 <[email protected]> * Add inference config merger and tutorial * Add doc string for NLPAdapterModelMixin and deprecated warning on MegatronGPTPEFTModel * add supported_methods.md and update other documentations * Update supported_methods.md minor updates. Signed-off-by: Adi Renduchintala <[email protected]> * Update landing_page.md minor update. Signed-off-by: Adi Renduchintala <[email protected]> * Modify doc string for NLPAdapterModelMixin * Add doc string add_adapters in NLPAdapterModelMixin * rename canonical adapters * remove mcore hard dependency * [PATCH] move microbatch calculator to nemo from apex * remove apex dependency in gpt and t5 sft models * remove apex dependency in gpt model * render doc strings * fix * Add missing virtual_tokens on ptuning * fix docstrings * update gpt-style model coverage in docs * update docstring * Remove pdb * add lightning_fabric to make docstring rendering work * Add Ptuning missing key * try docstring rendering * Fix ptuning issue * update gpt t5 peft tuning and eval scripts * typos * update eval config * fix bug relating to apex dependency removal * typo * make predict step behave the same as test step * make lora tutorial work in notebook * cosmetics * update yaml scripts * mcore_gpt attribute optional * typo * update eval scripts and fix T5 eval bugs * add NLPDDPStrategyNotebook and trainer builder logic to use it * update lora notebook to use new trainer builder * fix microbatch calculator bug for inference after training * Convert markdown files to RST and incorporate with doc * typo * revise language * remove extra cell * remove unnecessary inheritance * remove old tests * move layer selection default so logging messages make sense * remove `save_adapters` as adapter weights are saved automatically during training * initialize weights from a checkpoint instead of randomly * multiple fields can form a context (#7147) * list of context fields and flexible prompt template Signed-off-by: arendu <[email protected]> * list of fields for context Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add multiple truncation fields and middle truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Compatible to old ckpt Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix tokenize detokenize issue Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove detokenization, add truncation augmentation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve comments Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove unused import Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert eos Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Add tokenizer space_sensitive attribute Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix error Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Fix erorr and use re Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Change assert logic Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Follow adi suggestion Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove merge function Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add example and comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove context_key and add comment Signed-off-by: Cheng-Ping Hsieh <[email protected]> * Remove random truncation Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix template none Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * revert config changes * remove accidental breakpoint * support TP>1 loading * infer adapter type from checkpoint in during eval * breakup add adapter * enable interpolation of train_ds and validation_ds * update metric calc script to conform to single-file eval format * remove extraneous print * update lora notebook for updated merge_inference_cfg * Update nlp_adapter_mixins.py variable name change Signed-off-by: Chen Cui <[email protected]> * turn off grad scaler for PP to match old scripts * remove PEFTSaveRestoreConnector since functionality all covered by the new mixin class * remove resume_from_checkpoint check since covered in #7335 * revert changes made in eval config interpolation * more interpolation * typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove dup line Signed-off-by: Chen Cui <[email protected]> * code style warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix config mistake Signed-off-by: Chen Cui <[email protected]> * add copyright header Signed-off-by: Chen Cui <[email protected]> * fix code check warnings Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert changes to remove apex dependency (mixed apex+nemo microbatch calculator broke some CI tests) Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * update deprecation notices Signed-off-by: Chen Cui <[email protected]> * consolidate peft and sft scripts Signed-off-by: Chen Cui <[email protected]> * update CI tests Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * notebook branch points to main to prepare for merge Signed-off-by: Chen Cui <[email protected]> * fix gpt and t5 validation with any metric other than loss Signed-off-by: Chen Cui <[email protected]> * support pre-extracted checkpoints Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: jasonwan <[email protected]> Signed-off-by: hkelly33 <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Cheng-Ping Hsieh <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marc Romeyn <[email protected]> Co-authored-by: jasonwan <[email protected]> Co-authored-by: hkelly33 <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Yuanzhe Dong <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * fix a typo (#7496) Signed-off-by: BestJuly <[email protected]> * [TTS] remove curly braces from ${BRANCH} in jupyer notebook cell. (#7554) (#7560) * remove curly braces. * remove installation of pynini. --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * add youtube embed url (#7570) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Remap speakers to continuous range of speaker_id for dataset AISHELL3 (#7536) * Remap speakers to continuous range of speaker_id for dataset AISHELL3 * Add new key/value pair to record raw speaker for AISHELL3 dataset Signed-off-by: Robin Dong <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix validation_step_outputs initialization for multi-dataloader (#7546) (#7572) * added correct validation_step_outputs initialization for mutli-dataloader * changed kernel for display * Update logic for validation and test step outputs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert multidataloader changes in multilang ASR notebook --------- Signed-off-by: KunalDhawan <[email protected]> Signed-off-by: smajumdar <[email protected]> Co-authored-by: Kunal Dhawan <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Append output of val step to self.validation_step_outputs (#7530) (#7532) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * [TTS] fixed trainer's accelerator and strategy. (#7569) (#7574) Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Append val/test output to instance variable in EncDecSpeakerLabelModel (#7562) (#7573) * Append val/test output to the instance variable in EncDecSpeakerLabelModel * Handle test case in evaluation_step * Replace type with isinstance --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix CustomProgressBar for resume (#7427) (#7522) * Fix CustomProgress Bar for resume and multiple epochs * Edit num_training_batches * Use max_steps as total for progress bar for resume * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix typos in nfa and speech enhancement tutorials (#7580) (#7583) Signed-off-by: Elena Rastorgueva <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> * Add strategy as ddp_find_unused_parameters_true for glue_benchmark.py (#7454) (#7461) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * update strategy (#7577) (#7578) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * Fix typos (#7581) * Change hifigan finetune strategy to ddp_find_unused_parameters_true (#7579) (#7584) * Change strategy to auto --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: Cheng-Ping Hsieh <[email protected]> * [BugFix] Add missing quotes for auto strategy in tutorial notebooks (#7541) (#7548) * Add missing quotes for auto strategy * Revert trainer.gpus to trainer.devices in Self_Supervised_Pre_Training.ipynb --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * add build os key (#7596) (#7599) * add build os key * add tools * update to stable version --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * StarCoder SFT test + bump PyT NGC image to 23.09 (#7540) * Add SFT StarCoder test Signed-off-by: Jan Lasek <[email protected]> * Remove _modify_config call as it is covered in load_from_nemo just below Signed-off-by: Jan Lasek <[email protected]> * Test with pyt:23.09 container Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * defaults changed (#7600) * defaults changed Signed-off-by: arendu <[email protected]> * typo Signed-off-by: arendu <[email protected]> * update Signed-off-by: arendu <[email protected]> --------- Signed-off-by: arendu <[email protected]> * add ItalianPhonemesTokenizer (#7587) * add ItalianPhonemesTokenizer Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Italian phonemes Signed-off-by: GiacomoLeoneMaria <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test Signed-off-by: GiacomoLeoneMaria <[email protected]> --------- Signed-off-by: GiacomoLeoneMaria <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Xuesong Yang <[email protected]> * best ckpt fix (#7564) (#7588) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Add files via upload (#7598) specifies the branch Signed-off-by: George <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix validation in G2PModel and ThutmoseTaggerModel (#7597) (#7606) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Bound transformers version in requirements (#7620) Signed-off-by: Abhishree <[email protected]> * fix llama2 70b lora tuning bug (#7622) * fix llama2 70b lora tuning bug Signed-off-by: Chen Cui <[email protected]> * Update peft_config.py brackets Signed-off-by: Adi Renduchintala <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * Fix import error no module name model_utils (#7629) Signed-off-by: Mehadi Hasan Menon <[email protected]> * add fc large ls models (#7641) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> * bugfix: trainer.gpus, trainer.strategy, trainer.accelerator (#7621) (#7642) * [TTS] bugfix for Tacotron2 tutorial due to PTL 2.0 * trainer.gpus -> trainer.devices * fixed related tutorial bugs --------- Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * fix ssl models ptl monitor val through logging (#7608) (#7614) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> * Fix metrics for SE tutorial (#7604) (#7612) Signed-off-by: Ante Jukić <[email protected]> Co-authored-by: anteju <[email protected]> * Add ddp_find_unused_parameters=True and change accelerator to auto (#7623) (#7644) * Add ddp_find_unused_parameters=True and change acclerator to auto * Add ddp_find_unused_parameters True for normalization_as_tagging_train.py --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> * Fix py3.11 dataclasses issue (#7616) * Fix py3.11 dataclasses issue (#7582) * Update ASR configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Update TTS configs to support Python 3.11 Signed-off-by: smajumdar <[email protected]> * Guard MeCab and Ipadic Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix remaining ASR dataclasses Signed-off-by: smajumdar <[email protected]> * Fix scripts Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update name to ConfidenceMethodConfig Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain (#7576) (#7586) * Broadcast loss only when using pipeline parallelism and within the pipeline parallel domain * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Safeguard nemo_text_processing installation on ARM (#7485) * safeguard nemo_text_processing installing Signed-off-by: Jason <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update check Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Jason <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix changes to confidence measure Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Jason <[email protected]> * [Stable Diffusion/ControlNet] Enable O2 training for SD and Fix ControlNet CI failure * Mingyuanm/dreambooth fix * Fix NeMo CI Infer Issue * DreamFusion * Move neva export changes * Add Imagen Synthetic Dataloader * Add VITWrapper and export stuff to wrapper * Update neva with megatron-core support * Fix issues with Dockerfile (#7650) (#7652) Signed-off-by: smajumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> * [ASR] RNN-T greedy decoding max_frames fix for alignment and confidence (#7635) * decoding and test fix Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [ASR] Fix type error in jasper (#7636) (#7653) Signed-off-by: Ryan <[email protected]> Co-authored-by: Ryan Langman <[email protected]> * [TTS] Add STFT and SI-SDR loss to audio codec recipe (#7468) * [TTS] Add STFT and SI-SDR loss to audio codec recipe Signed-off-by: Ryan <[email protected]> * [TTS] Fix STFT resolution Signed-off-by: Ryan <[email protected]> * [TTS] Fix training metric logging Signed-off-by: Ryan <[email protected]> * [TTS] Add docstring to mel and stft losses Signed-off-by: Ryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Ryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Create per.py (#7538) * Move model precision copy (#7336) * move cfg precision set to megatron base model Signed-off-by: Maanu Grover <[email protected]> * remove copy from other models Signed-off-by: Maanu Grover <[email protected]> * modify attribute not arg Signed-off-by: Maanu Grover <[email protected]> * fix gpt model test for ptl 2.0 Signed-off-by: Maanu Grover <[email protected]> * rename function and add docstring Signed-off-by: Maanu Grover <[email protected]> * replace precision to dtype conditionals with func call Signed-off-by: Maanu Grover <[email protected]> * unnecessary function and cfg reset Signed-off-by: Maanu Grover <[email protected]> * set default value Signed-off-by: Maanu Grover <[email protected]> * fix precision lookup in a few more places Signed-off-by: Maanu Grover <[email protected]> * rename mapping function Signed-off-by: Maanu Grover <[email protected]> * ununsed import Signed-off-by: Maanu Grover <[email protected]> * save torch datatype to model Signed-off-by: Maanu Grover <[email protected]> * set weights precision wrt amp o2 Signed-off-by: Maanu Grover <[email protected]> * Revert "set weights precision wrt amp o2" This reverts commit 313a4bfe5eb69d771a6d2433898c0685836aef5c. Signed-off-by: Maanu Grover <[email protected]> * revert half precision at inference attempt Signed-off-by: Maanu Grover <[email protected]> * move autocast dtype to base model Signed-off-by: Maanu Grover <[email protected]> * move params dtype to base model, enable fp16 O2 inf Signed-off-by: Maanu Grover <[email protected]> * unused imports Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix PEFT checkpoint loading (#7388) * Fix PEFT checkpoint loading Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Use distributed optimizer support for multiple dtypes (#7359) * Update distopt wrapper with multiple dtype support Remove manual handling of separate FP32 optimizer. Signed-off-by: Tim Moon <[email protected]> * Use distopt support for contiguous buffers with multiple dtypes Signed-off-by: Tim Moon <[email protected]> * Fix typo Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Separate distopt buckets for first GPT layer and non-overlapped params Signed-off-by: Tim Moon <[email protected]> * Add distopt logic for int dtypes Signed-off-by: Tim Moon <[email protected]> * Update Apex commit Signed-off-by: Tim Moon <[email protected]> * Remove unused variables Signed-off-by: Tim Moon <[email protected]> * Update Apex commit in README and Jenkensfile Signed-off-by: Tim Moon <[email protected]> * Debug Dockerfile and Jenkinsfile Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * minor fix for llama ckpt conversion script (#7387) * minor fix for llama ckpt conversion script Signed-off-by: Jason Wang <[email protected]> * Update Jenkinsfile Signed-off-by: Jason Wang <[email protected]> * remove fast_swiglu configuration Signed-off-by: Jason Wang <[email protected]> --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix wrong calling of librosa.get_duration() in notebook (#7376) Signed-off-by: Robin Dong <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [PATCH] PEFT import mcore (#7393) * [PATCH] PEFT import mcore Signed-off-by: Jason Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jason Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Create per.py Script for calculation Punctuation Error Rate and related rates (correct rate, deletions rate, etc.) Signed-off-by: Sasha Meister <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Sasha Meister <[email protected]> * [TTS] Added a callback for logging initial data (#7384) Signed-off-by: Ante Jukić <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update Core Commit (#7402) * Update Core Commit Signed-off-by: Abhinav Khattar <[email protected]> * update commit Signed-off-by: Abhinav Khattar <[email protected]> --------- Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Use cfg attribute in bert (#7394) * use cfg attribute instead of arg Signed-off-by: Maanu Grover <[email protected]> * use torch_dtype in place of cfg.precision Signed-off-by: Maanu Grover <[email protected]> * move precision copy before super constructor Signed-off-by: Maanu Grover <[email protected]> * use trainer arg Signed-off-by: Maanu Grover <[email protected]> --------- Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add support for bias conversion in Swiglu models (#7386) * Add support for bias conversion in Swiglu models Signed-off-by: smajumdar <[email protected]> * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support for auto extracting tokenizer model Signed-off-by: smajumdar <[email protected]> * Fix issue with missing tokenizer Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * Refactor Signed-off-by: smajumdar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: smajumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Update save_to and restore_from for dist checkpointing (#7343) * add dist ckpt to save to, in progress Signed-off-by: eharper <[email protected]> * move dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update restore from, need to figure out how to initialize distributed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * launch distrib if needed when restoring dist ckpt Signed-off-by: eharper <[email protected]> * when using mcore we can change tp pp on the fly Signed-off-by: eharper <[email protected]> * add load_from_checkpoint support for dist ckpt Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update llama convert script to save dist .nemo Signed-off-by: eharper <[email protected]> * fix load dist ckpt Signed-off-by: jasonwan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup TE TP groups if needed Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup te tp groups if needed Signed-off-by: eharper <[email protected]> * remove import Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Signed-off-by: jasonwan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: jasonwan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * fix forward for with mcore=false (#7403) Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix logging to remove 's/it' from progress bar in Megatron models and add train_step_timing (#7374) * Add CustomProgressBar class to exp_manager and trainer callbacks Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the progress bar to reflect total microbatch cnt Signed-off-by: Abhishree <[email protected]> * Modify CustomProgressBar class 1) Modify CustomProgressBar class to update progress bar per global_step instead of per microbatch 2) Add the callback to other megatron training/finetuning files that are not using MegatronTrainerBuilder Signed-off-by: Abhishree <[email protected]> * Add CustomProgressBar callback to tuning files Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Set Activation Checkpointing Defaults (#7404) * Set Activation Checkpointing Defaults Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * check for None Signed-off-by: Abhinav Khattar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * make loss mask default to false (#7407) Signed-off-by: eharper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add dummy userbuffer config files (#7408) Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * add missing ubconf files (#7412) Signed-off-by: Abhinav Khattar <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * New tutorial on Speech Data Explorer (#7405) * Added Google Colab based tutorial on Speech Data Explorer Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update ptl training ckpt conversion script to work with dist ckpt (#7416) * update ptl convert script Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * don't break legacy Signed-off-by: eharper <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: eharper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Allow disabling sanity checking when num_sanity_val_steps=0 (#7413) * Allow disabling sanity checking when num_sanity_val_steps=0 Signed-off-by: Abhishree <[email protected]> * Update num_sanity_val_steps to be a multiple of num_microbatches Signed-off-by: Abhishree Thittenamane <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add comprehensive error messages (#7261) Signed-off-by: Anton Peganov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * check NEMO_PATH (#7418) Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * layer selection for ia3 (#7417) * layer selection for ia3 Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: arendu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Fix missing pip package 'einops' (#7397) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of pyaudio in Google Colab (#7396) Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Update README.md: output_path --> output_manifest_filepath (#7442) Signed-off-by: Samuele Cornell <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Add rope dynamic linear scaling (#7437) * Add dynamic linear scaling Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bug Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> --------- Signed-off-by: Cheng-Ping Hsieh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix None dataloader issue in PTL2.0 (#7455) * Fix None dataloader issue in PTL2.0 Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * updating values of self._validation_dl and self._test_dl as well Signed-off-by: KunalDhawan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: KunalDhawan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [ASR] Confidence measure -> method renames (#7434) * measure -> method Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * Add steps for document of getting dataset 'SF Bilingual Speech' (#7378) * Add steps for document of getting dataset 'SF Bilingual Speech' Signed-off-by: Robin Dong <[email protected]> * Update datasets.rst added a link from a tutorial demonstrating detailed data prep steps. Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * RNN-T confidence and alignment bugfix (#7381) * new frame_confidence and alignments lists are now always created after the while loop Signed-off-by: Aleksandr Laptev <[email protected]> * tests added Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix resume from checkpoint in exp_manager (#7424) (#7426) Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix checking of cuda/cpu device for inputs of Decoder (#7444) * Fix checking of cuda/cpu device for inputs of Decoder Signed-off-by: Robin Dong <[email protected]> * Update tacotron2.py Signed-off-by: Jason <[email protected]> --------- Signed-off-by: Robin Dong <[email protected]> Signed-off-by: Jason <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix failure of ljspeech's get_data.py (#7430) * Fix failure of ljspeech's get_data.py Signed-off-by: Robin Dong <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Robin Dong <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Fix audio codec type checks (#7373) * [TTS] Fix audio codec type checks Signed-off-by: Ryan <[email protected]> * [TTS] Fix audio codec tests Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * [TTS] Add dataset to path of logged artifacts (#7462) * [TTS] Add dataset to path of logged artifacts Signed-off-by: Ryan <[email protected]> * [TTS] Revert axis name back to Audio Frames Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> Signed-off-by: Sasha Meister <[email protected]> * Fix sft dataset truncation (#7464) * Add fix Signed-off-by: Cheng-Ping Hsieh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci …

rlangman requested review from KunalDhawan, nithinraok and anteju September 5, 2023 23:11

github-actions bot added the TTS label Sep 5, 2023

nithinraok previously approved these changes Sep 6, 2023

View reviewed changes

anteju previously approved these changes Sep 7, 2023

View reviewed changes

rlangman force-pushed the codec_typecheck branch 2 times, most recently from 3402183 to 53e12b7 Compare September 8, 2023 23:50

rlangman added 2 commits September 11, 2023 07:07

[TTS] Fix audio codec type checks

8a389e6

Signed-off-by: Ryan <[email protected]>

[TTS] Fix audio codec tests

97ee70f

Signed-off-by: Ryan <[email protected]>

rlangman dismissed stale reviews from anteju and nithinraok via 97ee70f September 11, 2023 14:12

rlangman force-pushed the codec_typecheck branch from 53e12b7 to 97ee70f Compare September 11, 2023 14:12

rlangman requested a review from anteju September 19, 2023 15:47

anteju approved these changes Sep 19, 2023

View reviewed changes

rlangman merged commit 43c93d8 into main Sep 19, 2023
12 checks passed

rlangman deleted the codec_typecheck branch September 19, 2023 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TTS] Fix audio codec type checks #7373

[TTS] Fix audio codec type checks #7373

rlangman commented Sep 5, 2023

nithinraok left a comment

nithinraok commented Sep 6, 2023

rlangman commented Sep 6, 2023

titu1994 commented Sep 6, 2023

anteju left a comment

[TTS] Fix audio codec type checks #7373

[TTS] Fix audio codec type checks #7373

Conversation

rlangman commented Sep 5, 2023

What does this PR do ?

Changelog

Before your PR is "Ready for review"

nithinraok left a comment

Choose a reason for hiding this comment

nithinraok commented Sep 6, 2023

rlangman commented Sep 6, 2023

titu1994 commented Sep 6, 2023

anteju left a comment

Choose a reason for hiding this comment