Added memory logger #8395

sanandaraj5597 · 2024-02-10T03:18:48Z

This PR adds an env variable to be set when you want to log peak memory usage at the end of each training step.
This env variable "NEMO_LOG_MEMORY_USAGE" should be set to 1 for the logging to be enabled, this prints the memory usage of rank 0 only.
This env variable will be added to perf CI for NeMo.

Signed-off-by: Selvaraj Anandaraj <[email protected]>

for more information, see https://pre-commit.ci

ericharper

LGTM. Thanks!

ericharper · 2024-02-23T00:30:03Z

jenkins

sanandaraj5597 · 2024-02-23T18:19:31Z

jenkins

* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]>

* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]>

* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]>

* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]>

* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Zeeshan Patel <[email protected]>

* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Signed-off-by: Zeeshan Patel <[email protected]>

* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: ataghibakhsh <[email protected]>

* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Signed-off-by: ataghibakhsh <[email protected]>

* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Pablo Garay <[email protected]>

* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Signed-off-by: Pablo Garay <[email protected]>

* Refactor conversion scripts one in all Signed-off-by: yaoyu-33 <[email protected]> * Move bert converter Signed-off-by: yaoyu-33 <[email protected]> * [TTS] Add modules for mel spectrogram codec (#8238) * [TTS] Add modules for mel spectrogram codec Signed-off-by: Ryan <[email protected]> * [TTS] Add mel band validation Signed-off-by: Ryan <[email protected]> * [TTS] Add fullband mel encoder and more documentation Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> * Account for mpirun use case in get_rank (#8429) Signed-off-by: Jan Lasek <[email protected]> * Add settings to suppress bf16 compile errors in CI on V100 (#8481) (#8482) * Add settings to suppress bf16 compile errors in CI on V100 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix canary chunk infer bug (#8449) * fix chunk infer bug Signed-off-by: stevehuang52 <[email protected]> * add support for duration=None, add lhotse support for relative audio path Signed-off-by: stevehuang52 <[email protected]> * add tests Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> * Add Baichuan2 support (#8282) * Add Baichuan2 support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 (#7920) * Initital commit of reworked MegatronPretrainingRandomBatchSampler Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed small length based bug Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Euynaheh <[email protected]> * Add Baichuan2 support Signed-off-by: Euynaheh <[email protected]> * Add NeMo to HF conversion * fix code format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Baichuan jenkins test * add_BOS bug fix * Update Jenkinsfile Signed-off-by: Euynaheh <[email protected]> --------- Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: Euynaheh <[email protected]> Signed-off-by: Euynaheh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: trias702 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana G…

* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]>

* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]>

* Refactor conversion scripts one in all Signed-off-by: yaoyu-33 <[email protected]> * Move bert converter Signed-off-by: yaoyu-33 <[email protected]> * [TTS] Add modules for mel spectrogram codec (#8238) * [TTS] Add modules for mel spectrogram codec Signed-off-by: Ryan <[email protected]> * [TTS] Add mel band validation Signed-off-by: Ryan <[email protected]> * [TTS] Add fullband mel encoder and more documentation Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> * Account for mpirun use case in get_rank (#8429) Signed-off-by: Jan Lasek <[email protected]> * Add settings to suppress bf16 compile errors in CI on V100 (#8481) (#8482) * Add settings to suppress bf16 compile errors in CI on V100 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix canary chunk infer bug (#8449) * fix chunk infer bug Signed-off-by: stevehuang52 <[email protected]> * add support for duration=None, add lhotse support for relative audio path Signed-off-by: stevehuang52 <[email protected]> * add tests Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> * Add Baichuan2 support (#8282) * Add Baichuan2 support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 (#7920) * Initital commit of reworked MegatronPretrainingRandomBatchSampler Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed small length based bug Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Euynaheh <[email protected]> * Add Baichuan2 support Signed-off-by: Euynaheh <[email protected]> * Add NeMo to HF conversion * fix code format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Baichuan jenkins test * add_BOS bug fix * Update Jenkinsfile Signed-off-by: Euynaheh <[email protected]> --------- Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: Euynaheh <[email protected]> Signed-off-by: Euynaheh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: trias702 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana G…

Added memory logger

fe3bd25

Signed-off-by: Selvaraj Anandaraj <[email protected]>

github-actions bot added the NLP label Feb 10, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

39e358c

for more information, see https://pre-commit.ci

ericharper approved these changes Feb 23, 2024

View reviewed changes

Merge branch 'main' into memory_logger

ceabf5a

sanandaraj5597 merged commit 53a7b72 into main Feb 23, 2024
15 checks passed

sanandaraj5597 deleted the memory_logger branch February 23, 2024 22:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added memory logger #8395

Added memory logger #8395

sanandaraj5597 commented Feb 10, 2024

ericharper left a comment

ericharper commented Feb 23, 2024

sanandaraj5597 commented Feb 23, 2024

Added memory logger #8395

Added memory logger #8395

Conversation

sanandaraj5597 commented Feb 10, 2024

ericharper left a comment

Choose a reason for hiding this comment

ericharper commented Feb 23, 2024

sanandaraj5597 commented Feb 23, 2024