-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added memory logger #8395
Merged
Merged
Added memory logger #8395
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Selvaraj Anandaraj <[email protected]>
for more information, see https://pre-commit.ci
ericharper
approved these changes
Feb 23, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
jenkins |
1 similar comment
jenkins |
akoumpa
pushed a commit
that referenced
this pull request
Feb 26, 2024
* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]>
akoumpa
added a commit
that referenced
this pull request
Feb 26, 2024
* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]>
yaoyu-33
pushed a commit
that referenced
this pull request
Feb 26, 2024
* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]>
yaoyu-33
added a commit
that referenced
this pull request
Feb 26, 2024
* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]>
zpx01
pushed a commit
to zpx01/NeMo
that referenced
this pull request
Mar 8, 2024
* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Zeeshan Patel <[email protected]>
zpx01
pushed a commit
to zpx01/NeMo
that referenced
this pull request
Mar 8, 2024
* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Signed-off-by: Zeeshan Patel <[email protected]>
JRD971000
pushed a commit
that referenced
this pull request
Mar 15, 2024
* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: ataghibakhsh <[email protected]>
JRD971000
added a commit
that referenced
this pull request
Mar 15, 2024
* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Signed-off-by: ataghibakhsh <[email protected]>
pablo-garay
pushed a commit
that referenced
this pull request
Mar 19, 2024
* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Pablo Garay <[email protected]>
pablo-garay
added a commit
that referenced
this pull request
Mar 19, 2024
* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Signed-off-by: Pablo Garay <[email protected]>
ericharper
added a commit
that referenced
this pull request
Mar 19, 2024
* Refactor conversion scripts one in all Signed-off-by: yaoyu-33 <[email protected]> * Move bert converter Signed-off-by: yaoyu-33 <[email protected]> * [TTS] Add modules for mel spectrogram codec (#8238) * [TTS] Add modules for mel spectrogram codec Signed-off-by: Ryan <[email protected]> * [TTS] Add mel band validation Signed-off-by: Ryan <[email protected]> * [TTS] Add fullband mel encoder and more documentation Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> * Account for mpirun use case in get_rank (#8429) Signed-off-by: Jan Lasek <[email protected]> * Add settings to suppress bf16 compile errors in CI on V100 (#8481) (#8482) * Add settings to suppress bf16 compile errors in CI on V100 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix canary chunk infer bug (#8449) * fix chunk infer bug Signed-off-by: stevehuang52 <[email protected]> * add support for duration=None, add lhotse support for relative audio path Signed-off-by: stevehuang52 <[email protected]> * add tests Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> * Add Baichuan2 support (#8282) * Add Baichuan2 support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 (#7920) * Initital commit of reworked MegatronPretrainingRandomBatchSampler Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed small length based bug Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Euynaheh <[email protected]> * Add Baichuan2 support Signed-off-by: Euynaheh <[email protected]> * Add NeMo to HF conversion * fix code format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Baichuan jenkins test * add_BOS bug fix * Update Jenkinsfile Signed-off-by: Euynaheh <[email protected]> --------- Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: Euynaheh <[email protected]> Signed-off-by: Euynaheh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: trias702 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana G…
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
* Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]>
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
* MoE parameter passing (#8255) * MoE parameter passing Signed-off-by: Alexandros Koumparoulis <[email protected]> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <[email protected]> * PR fixes Signed-off-by: Alexandros Koumparoulis <[email protected]> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <[email protected]> * CI fix Signed-off-by: Alexandros Koumparoulis <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Signed-off-by: Abhishree <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Naga Venkatesh Gavini <[email protected]> Signed-off-by: Selvaraj Anandaraj <[email protected]> Signed-off-by: Travis Bartley <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: jbaczek <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: Jan Baczek <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Naga Venkatesh Gavini <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: tbartley94 <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]>
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
* Refactor conversion scripts one in all Signed-off-by: yaoyu-33 <[email protected]> * Move bert converter Signed-off-by: yaoyu-33 <[email protected]> * [TTS] Add modules for mel spectrogram codec (#8238) * [TTS] Add modules for mel spectrogram codec Signed-off-by: Ryan <[email protected]> * [TTS] Add mel band validation Signed-off-by: Ryan <[email protected]> * [TTS] Add fullband mel encoder and more documentation Signed-off-by: Ryan <[email protected]> --------- Signed-off-by: Ryan <[email protected]> * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> * Account for mpirun use case in get_rank (#8429) Signed-off-by: Jan Lasek <[email protected]> * Add settings to suppress bf16 compile errors in CI on V100 (#8481) (#8482) * Add settings to suppress bf16 compile errors in CI on V100 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <[email protected]> Co-authored-by: Abhishree Thittenamane <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix canary chunk infer bug (#8449) * fix chunk infer bug Signed-off-by: stevehuang52 <[email protected]> * add support for duration=None, add lhotse support for relative audio path Signed-off-by: stevehuang52 <[email protected]> * add tests Signed-off-by: stevehuang52 <[email protected]> --------- Signed-off-by: stevehuang52 <[email protected]> * Add Baichuan2 support (#8282) * Add Baichuan2 support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reworked MegatronPretrainingRandomBatchSampler to correctly handle epochs > 1 (#7920) * Initital commit of reworked MegatronPretrainingRandomBatchSampler Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed small length based bug Signed-off-by: Daniel Egert <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Daniel Egert <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Euynaheh <[email protected]> * Add Baichuan2 support Signed-off-by: Euynaheh <[email protected]> * Add NeMo to HF conversion * fix code format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code format * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add Baichuan jenkins test * add_BOS bug fix * Update Jenkinsfile Signed-off-by: Euynaheh <[email protected]> --------- Signed-off-by: Daniel Egert <[email protected]> Signed-off-by: Euynaheh <[email protected]> Signed-off-by: Euynaheh <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: trias702 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> * Jiaqiz/option to disable adapters & merge all lora layers (#8029) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * use adapter only when it is enabled Signed-off-by: jiaqi zeng <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lora merge script (#8113) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> * add peft ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * merge lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * support/fix cpu initialization Signed-off-by: Chen Cui <[email protected]> * add example usage Signed-off-by: Chen Cui <[email protected]> * fix TP due to distributed checkpoint Signed-off-by: Chen Cui <[email protected]> * updating the logic of merging lora weights for all layers, mcore only Signed-off-by: Jiaqi Zeng <[email protected]> * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * merge in fp32 then cast back Signed-off-by: Jiaqi Zeng <[email protected]> * remove ckpt to nemo Signed-off-by: Jiaqi Zeng <[email protected]> * fix import Signed-off-by: Jiaqi Zeng <[email protected]> --------- Signed-off-by: jiaqi zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Update k2 version (#8478) Signed-off-by: Vladimir Bataev <[email protected]> * Add mcore full TE transformer layer spec (#8328) * Add spec and implement autocast layer Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * remove try-catchs, these dependecies are mandatory for this file Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jan Baczek <[email protected]> * Check out this cool try/except clause Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused import Signed-off-by: Jan Baczek <[email protected]> * Add import tests to Jenkinsfile Signed-off-by: Jan Baczek <[email protected]> * Move import tests to Jenkins and remove code that is developed only for passing tests Signed-off-by: Jan Baczek <[email protected]> * Make test robust to faulty base configs Signed-off-by: Jan Baczek <[email protected]> * Use proper GPT implementation in the test Signed-off-by: Jan Baczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Sudhakar Singh <[email protected]> Signed-off-by: jbaczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Update nemo/collections/nlp/models/language_modeling/megatron/gpt_full_te_layer_autocast_spec.py Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: jbaczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add TE knobs to the copy of AutocastTransformerLayer Signed-off-by: Jan Baczek <[email protected]> * Add dummy parameter to accomodated for the changes in mcore Signed-off-by: Jan Baczek <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mcore to 0.5.0 in Jenkins pipeline Signed-off-by: Jan Baczek <[email protected]> * Bump mcore commit. This is commit from tot, not any release. Signed-off-by: Jan Baczek <[email protected]> * Remove from the test config option that is incompatible with bias_activation_fusion Signed-off-by: Jan Baczek <[email protected]> * Bump TE version in CI to 1.4 Signed-off-by: Jan Baczek <[email protected]> * Update test Signed-off-by: Jan Baczek <[email protected]> * Change precision for the test - current runnens don't support bf16 Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Jan Baczek <[email protected]> Signed-off-by: jbaczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sudhakar Singh <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> * Handle float limit_val_batches (#8426) * Handle float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Rectify reconfiguration of float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Remove unused imports Signed-off-by: Abhishree <[email protected]> * Scale len(val_dataloader) with float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Return len(dataloader) in microbatches Signed-off-by: Abhishree <[email protected]> * Add back resetting of num val samples Signed-off-by: Abhishree <[email protected]> * Fix to ensure float limit_val_batches is multiple of num_micro_batches Signed-off-by: Abhishree <[email protected]> * Remove forcing eval samples to 1 for float limit_val_batches Signed-off-by: Abhishree <[email protected]> * Fix bug wrt 0 limiot_val_batches Signed-off-by: Abhishree <[email protected]> * Add missing mock_dataset line Signed-off-by: Abhishree <[email protected]> * Avoid ensuring limit_val_batches is a mutliple of microbatches for 1.0 Signed-off-by: Abhishree <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore the hack forcing number of validation and test epochs to 1 Signed-off-by: Jan Baczek <[email protected]> * Change limit_val_batches to 1.0 for GPT pretraining test. The integer value is covered in other tests Signed-off-by: Jan Baczek <[email protected]> --------- Signed-off-by: Abhishree <[email protected]> Signed-off-by: Jan Baczek <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jan Baczek <[email protected]> * Fix tutorial links in user guide (#8497) Signed-off-by: yaoyu-33 <[email protected]> * Sequence Parallel for LoRA (#8369) * support lora + sequence parallel Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more comments Signed-off-by: Chen Cui <[email protected]> * add lora SP CI test Signed-off-by: Chen Cui <[email protected]> * support lora for all linear modules as in #7988 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Call proper method to replace (#8498) Signed-off-by: Naga Venkatesh Gavini <[email protected]> * Added memory logger (#8395) * Added memory logger Signed-off-by: Selvaraj Anandaraj <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: Selvaraj Anandaraj <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> * Canary refactor for Riva (#8363) * initial commit of bleu score tracking Signed-off-by: Travis Bartley <[email protected]> * initial commit, refactoring aed models for riva Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updating Canary to support torch metrics Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fixes Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * missed an empty batch conditional Signed-off-by: Travis Bartley <[email protected]> * Fixing dataloader issues Signed-off-by: Travis Bartley <[email protected]> * Finishing merge conflict with transcribe update Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style fix Signed-off-by: Travis Bartley <[email protected]> * copyright header fix Signed-off-by: Travis Bartley <[email protected]> * yet another merge conflict Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * making paired data management safer Signed-off-by: Travis Bartley <[email protected]> * sentencepiece needs bigger tokenizer... Signed-off-by: Travis Bartley <[email protected]> * sentencepiece tokenizer vocab needs to be +2 from vocab for canary Signed-off-by: Travis Bartley <[email protected]> * Update canary tokenizer to be more generic, updated metrics to manage special tokens removal themselves. Signed-off-by: Travis Bartley <[email protected]> * merge conflit Signed-off-by: Travis Bartley <[email protected]> * Simplified tokenizer and corrected bug in dataloader Signed-off-by: Travis Bartley <[email protected]> * Cleaning up docstrings and fixing inference bug. Signed-off-by: Travis Bartley <[email protected]> * adding example scripts Signed-off-by: Travis Bartley <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleaning up useless imports Signed-off-by: Travis Bartley <[email protected]> * adding unit tests Signed-off-by: Travis Bartley <[email protected]> * fixing unit tests Signed-off-by: Travis Bartley <[email protected]> * cfg name change Signed-off-by: Travis Bartley <[email protected]> * adding custom check to pass pytests Signed-off-by: Travis Bartley <[email protected]> * removing print script Signed-off-by: Travis Bartley <[email protected]> * catching bugs regarding tokens. Signed-off-by: Travis Bartley <[email protected]> * added docstrings and made examples scripts more generic Signed-off-by: Travis Bartley <[email protected]> * docstring deleted by accident Signed-off-by: Travis Bartley <[email protected]> * plurals in namespace Signed-off-by: Travis Bartley <[email protected]> * changing example script Signed-off-by: Travis Bartley <[email protected]> --------- Signed-off-by: Travis Bartley <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> * add alpha scaling to lora (#8248) * removed pdeprecated eft model Signed-off-by: arendu <[email protected]> * add alpha Signed-off-by: arendu <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add alpha scaling to lora (#8483) * coldfix (#8412) Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixed errors in the CTM gen functions (#8416) (#8420) Signed-off-by: Taejin Park <[email protected]> Co-authored-by: Taejin Park <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add change_vocabulary and save_tokenizers() support to Multitask ASR models (#8357) (#8367) * Add change_vocabulary and save_tokenizers() support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update nemo/collections/asr/models/aed_multitask_models.py --------- Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * fix path location and branch (#8314) * fix path location and branch (#8304) * fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <[email protected]> * updat ebranch in tutorial Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Michal Futrega <[email protected]> * Add TP comm overlap knobs to AutocastTransformerLayer (#8290) Signed-off-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add deallocate pipeline output optimization (#8279) (#8318) * add deallocate pipeline output optimization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * remove assertion (#8302) (#8321) Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Keep max_seqlen and cu_seqlens_argmin for later micro-batches when PP>1 (#8334) (#8346) Signed-off-by: Sangkug Lym <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Enable megatron core loggers for GPT pretraining (#8354) (#8384) * Logging changes tested for gpt_pretraining * Additional args * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Aishwarya Bhandare <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fix dreambooth data sampler issue (#8400) (#8413) * Turn on drop last * Some neva fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * add ensemble decoding fix (#8427) (#8433) Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeVA Tutorial Notebook (#8217) * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> * init commit - neva tutorial Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * NeVA tutorial notebook Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> * add inference via script Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * requested changes Signed-off-by: Pratyush Muthukumar <[email protected]> * add codeblocks to run torchrun in notebook Signed-off-by: Pratyush Muthukumar <[email protected]> --------- Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * mcore customization doc minor fix (#8421) (#8437) Signed-off-by: Huiying Li <[email protected]> Co-authored-by: Huiying <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add `loop_labels` algorithm for TDT greedy decoding (#8215) * Add `loop_labels` algorithm for TDT greedy decoding Signed-off-by: Vladimir Bataev <[email protected]> * Use `loop_labels` by default Signed-off-by: Vladimir Bataev <[email protected]> * Loop labels greedy decoding v2 Signed-off-by: Vladimir Bataev <[email protected]> * Add comments. Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched hypotheses Signed-off-by: Vladimir Bataev <[email protected]> * Add tests for batched alignments Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix comment Signed-off-by: Vladimir Bataev <[email protected]> * Fix test Signed-off-by: Vladimir Bataev <[email protected]> * Add computer for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Fix TDT decoding algorithm Signed-off-by: Vladimir Bataev <[email protected]> * Use loop frames by default for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Remove "loop frames" implementation for TDT Signed-off-by: Vladimir Bataev <[email protected]> * Clean up Signed-off-by: Vladimir Bataev <[email protected]> * Add comments Signed-off-by: Vladimir Bataev <[email protected]> * Fix confidence. Use tensor for durations. Signed-off-by: Vladimir Bataev <[email protected]> --------- Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add dist ckpt support for regular optimizers (#7749) (#8293) * Add dist ckpt support for regular optimizers * [tutorial] fixed missing RIR scripts file. (#8257) * fix imports * imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook * revert asr notebook --------- Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Multimodal r1.23.0 bug fix (#8315) (#8339) * Rename quick-gelu * ddpm config guard * Fix ddpm edit api * Fix insert_image_token cfg issue * neva updates * reformat * Add back jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jenkins * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix bugs * Update default neva template --------- Signed-off-by: yaoyu-33 <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * mcore ds fix (#8283) (#8385) * [tutorial] fixed missing RIR scripts file. (#8257) * add values to en tts dict (#7879) * mcore ds fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore * revert asr files * add comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset * update mcore version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg * update mcore commit * fix Bert unit tests * update bert tests * fix bert mcore test * fix gpt jenkins tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update apex & TE commits * revert apex installation * turn off the fusion for jenkins --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * MCore dataset compatibility for tokenizers (#8390) (#8397) * Add unique_identifiers for all tokenizers and eod for SentencePieceTokenizer * Add generalized token aliases to TokenizerSpec to conform with MegatronTokenizer's interface. Remove now-redundant individual fixes from AutoTokenizer and SentencePieceTokenizer. --------- Signed-off-by: Valerie Sarge <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Canary: inference tokenization improvements; preserving custom keys when creating tarred manifests (#8432) * Improvements for Canary: - carry over custom keys when creatin tarred manifests - selectable text field in ASR eval - get rid of prompt slicing, create proper inference prompts Signed-off-by: Piotr Żelasko <[email protected]> * set ensure_ascii=False in tarred conversion to avoid breaking tokenizers trained on UTF-8 encoding Signed-off-by: Piotr Żelasko <[email protected]> --------- Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * add sbert to IR (#8445) * add sbert to IR Signed-off-by: ataghibakhsh <[email protected]> * add doc Signed-off-by: ataghibakhsh <[email protected]> * fix the auto_tokenizer property method reset bug Signed-off-by: ataghibakhsh <[email protected]> * addressed bot comments Signed-off-by: ataghibakhsh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: ataghibakhsh <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Michal Futrega <[email protected]> * Update readme (#8440) * update Signed-off-by: eharper <[email protected]> * udpate Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * landing pages added * landing page added for vision * landing pages updated * some minor changes to the main readme * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * update Signed-off-by: eharper <[email protected]> * typo fixed * update Signed-off-by: eharper <[email protected]> --------- Signed-off-by: eharper <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * NeMo-Mistral to HF converter bugfix. (#8353) (#8442) Signed-off-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Fixing mcore bert for TP, PP and SP (#8336) (#8443) * Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile * Update Jenkinsfile * Update Jenkinsfile --------- Signed-off-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add LoRA support to all linear layers (#7988) * Added LoRA support for the Dense layer of Attention * Added LoRA MLP support to MCore and NeMo models. * Change LoRA config default to QKV. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bug with ddp training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * MCoreMixin chages. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * using new commit of meg-LM Signed-off-by: arendu <[email protected]> * add cpu_offloading_num_layers to conversion script until bug in megatron is fixed Signed-off-by: Chen Cui <[email protected]> * fix peft mixin arguments to follow mcore 0.5 Signed-off-by: Chen Cui <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update megatron commit to fix ci error Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * try to fix ci Signed-off-by: Chen Cui <[email protected]> * add cfg default Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Add Neva Template for NV-DPO Models (#8358) * add/rename from nvgpt to nv_steerlm, add nv_dpo template Signed-off-by: HuiyingLi <[email protected]> * add nv_dpo conversation to accomendate empty system message Signed-off-by: HuiyingLi <[email protected]> * handle nv_dpo template text generation Signed-off-by: HuiyingLi <[email protected]> * add prompt string to nvgpt Signed-off-by: HuiyingLi <[email protected]> * bugfix for inference prompt template Signed-off-by: HuiyingLi <[email protected]> * bug fix for grabbing clean text Signed-off-by: Huiying Li <[email protected]> * fix code format Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> * default for alpha Signed-off-by: arendu <[email protected]> Signed-off-by: Michal Futrega <[email protected]> * Rebase scaling alpha Signed-off-by: Michal Futrega <[email protected]> --------- Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: arendu <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Adi Renduchintala <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: arendu <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> --------- Signed-off-by: arendu <[email protected]> Signed-off-by: George Zelenfroynd <[email protected]> Signed-off-by: Michal Futrega <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: smajumdar <[email protected]> Signed-off-by: Somshubra Majumdar <[email protected]> Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: Jaemin Choi <[email protected]> Signed-off-by: Jimmy Zhang <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Sangkug Lym <[email protected]> Signed-off-by: Aishwarya Bhandare <[email protected]> Signed-off-by: yaoyu-33 <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Pratyush Muthukumar <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Mikołaj Błaż <[email protected]> Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: Valerie Sarge <[email protected]> Signed-off-by: Piotr Żelasko <[email protected]> Signed-off-by: ataghibakhsh <[email protected]> Signed-off-by: eharper <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: Shanmugam Ramasamy <[email protected]> Signed-off-by: Adi Renduchintala <[email protected]> Signed-off-by: Jiaqi Zeng <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Michal Futrega <[email protected]> Co-authored-by: George <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: Somshubra Majumdar <[email protected]> Co-authored-by: Piotr Żelasko <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: Jaemin Choi <[email protected]> Co-authored-by: JimmyZhang12 <[email protected]> Co-authored-by: Jimmy Zhang <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Eric Harper <[email protected]> Co-authored-by: ashbhandare <[email protected]> Co-authored-by: Aishwarya Bhandare <[email protected]> Co-authored-by: yaoyu-33 <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Pratyush Muthukumar <[email protected]> Co-authored-by: Huiying <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: mikolajblaz <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: dimapihtar <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Pablo Garay <[email protected]> Co-authored-by: Valerie Sarge <[email protected]> Co-authored-by: Ali Taghibakhshi <[email protected]> Co-authored-by: ntajbakhsh <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Shanmugam Ramasamy <[email protected]> Co-authored-by: Tugrul Konuk <[email protected]> Co-authored-by: Jiaqi Zeng <[email protected]> Co-authored-by: HeyyyyyyG <[email protected]> Co-authored-by: Chen Cui <[email protected]> * Update PEFT Doc (#8501) * update peft doc Signed-off-by: Chen Cui <[email protected]> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * fix table Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> * revert accidental commit Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> * release updates (#8394) * release updates (#8378) * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <[email protected]> * add values to en tts dict (#7879) Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> * mcore ds fix Signed-off-by: Dmytro Pykhtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update mcore Signed-off-by: dimapihtar <[email protected]> * revert asr files Signed-off-by: dimapihtar <[email protected]> * add comments Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for mcore mock dataset Signed-off-by: dimapihtar <[email protected]> * update mcore version Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update gpt cfg Signed-off-by: dimapihtar <[email protected]> * update mcore commit Signed-off-by: dimapihtar <[email protected]> * fix Bert unit tests Signed-off-by: dimapihtar <[email protected]> * update bert tests Signed-off-by: dimapihtar <[email protected]> * fix bert mcore test Signed-off-by: dimapihtar <[email protected]> * fix gpt jenkins tests Signed-off-by: dimapihtar <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add support for dict data input type Signed-off-by: dimapihtar <[email protected]> * add mock ds test Signed-off-by: dimapihtar <[email protected]> * add test for dict data input type Signed-off-by: dimapihtar <[email protected]> * mcore ds fix Signed-off-by: dimapihtar <[email protected]> * data input fix Signed-off-by: dimapihtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana Graterol Fuenmayor <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Signed-off-by: dimapihtar <[email protected]> Signed-off-by: Dmytro Pykhtar <[email protected]> Co-authored-by: Xuesong Yang <[email protected]> Co-authored-by: Mariana <[email protected]> Co-authored-by: Dmytro Pykhtar <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <[email protected]> * Update megatron_gpt_model.py Signed-off-by: Dmytro Pykhtar <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]> Signed-off-by: Mariana G…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds an env variable to be set when you want to log peak memory usage at the end of each training step.
This env variable "NEMO_LOG_MEMORY_USAGE" should be set to 1 for the logging to be enabled, this prints the memory usage of rank 0 only.
This env variable will be added to perf CI for NeMo.