Advanced Diffusion Training Features #11246

zpx01 · 2024-11-10T03:17:35Z

What does this PR do ?

Adds the following new features to the Diffusion Training Framework:

FSDP support
MovieGen 1B/5B/30B Stages 1-3 Training with FSDP + TP/SP + CP
EC-DiT model (https://arxiv.org/abs/2410.02098)
Adds mock video data module for mock training across stages 1-3
Mixed Image-Video Training with THD Packed Sequencing

Collection: diffusion

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Zeeshan Patel <[email protected]>

github-actions · 2024-11-10T03:19:38Z

beep boop 🤖: 🚨 The following files must be fixed before merge!

Your code was analyzed with PyLint. The following annotations have been identified:


------------------------------------
Your code has been rated at 10.00/10

Thank you for improving NeMo's documentation!

github-actions · 2024-11-10T03:19:54Z

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.

Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.collections.diffusion.models.dit.dit_embeddings
nemo/collections/diffusion/models/dit/dit_embeddings.py:141:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_embeddings.py:147:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/diffusion/models/dit/dit_embeddings.py:173:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_embeddings.py:178:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_embeddings.py:183:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_embeddings.py:16:0: W0611: Unused import math (unused-import)
nemo/collections/diffusion/models/dit/dit_embeddings.py:17:0: W0611: Unused Dict imported from typing (unused-import)
nemo/collections/diffusion/models/dit/dit_embeddings.py:17:0: W0611: Unused Literal imported from typing (unused-import)
nemo/collections/diffusion/models/dit/dit_embeddings.py:17:0: W0611: Unused Optional imported from typing (unused-import)
nemo/collections/diffusion/models/dit/dit_embeddings.py:19:0: W0611: Unused numpy imported as np (unused-import)
nemo/collections/diffusion/models/dit/dit_embeddings.py:21:0: W0611: Unused torch.nn.functional imported as F (unused-import)
nemo/collections/diffusion/models/dit/dit_embeddings.py:24:0: W0611: Unused Rearrange imported from einops.layers.torch (unused-import)
nemo/collections/diffusion/models/dit/dit_embeddings.py:26:0: W0611: Unused get_pos_emb_on_this_cp_rank imported from megatron.core.models.common.embeddings.rotary_pos_embedding (unused-import)
nemo/collections/diffusion/models/dit/dit_embeddings.py:28:0: W0611: Unused nn imported from torch (unused-import)
************* Module nemo.collections.diffusion.models.dit.dit_layer_spec
nemo/collections/diffusion/models/dit/dit_layer_spec.py:194:0: C0301: Line too long (138/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:234:0: C0301: Line too long (132/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:249:0: C0301: Line too long (124/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:268:0: C0301: Line too long (124/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:288:0: C0301: Line too long (124/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:306:0: C0301: Line too long (124/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:362:0: C0301: Line too long (138/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:396:0: C0301: Line too long (129/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:416:0: C0301: Line too long (129/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:549:0: C0301: Line too long (127/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:552:0: C0301: Line too long (138/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:53:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:59:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:65:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:74:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:108:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:112:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:116:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:120:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:132:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:138:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:157:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:221:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:383:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:482:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:565:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:623:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:643:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:675:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:736:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:775:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:800:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:826:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:845:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:21:0: W0611: Unused rearrange imported from einops (unused-import)
nemo/collections/diffusion/models/dit/dit_layer_spec.py:22:0: W0611: Unused jit_fuser imported from megatron.core.jit (unused-import)
************* Module nemo.collections.diffusion.models.dit.dit_model
nemo/collections/diffusion/models/dit/dit_model.py:111:0: C0301: Line too long (156/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_model.py:115:0: C0301: Line too long (134/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_model.py:117:0: C0301: Line too long (157/119) (line-too-long)
nemo/collections/diffusion/models/dit/dit_model.py:40:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_model.py:44:0: C0115: Missing class docstring (missing-class-docstring)
nemo/collections/diffusion/models/dit/dit_model.py:53:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_model.py:71:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit/dit_model.py:20:0: W0611: Unused torch.nn.functional imported as F (unused-import)
************* Module nemo.collections.diffusion.models.dit_llama.dit_llama_layer_spec
nemo/collections/diffusion/models/dit_llama/dit_llama_layer_spec.py:76:0: C0301: Line too long (138/119) (line-too-long)
nemo/collections/diffusion/models/dit_llama/dit_llama_layer_spec.py:103:0: C0301: Line too long (129/119) (line-too-long)
nemo/collections/diffusion/models/dit_llama/dit_llama_layer_spec.py:119:0: C0301: Line too long (125/119) (line-too-long)
nemo/collections/diffusion/models/dit_llama/dit_llama_layer_spec.py:88:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/dit_llama/dit_llama_layer_spec.py:195:0: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.diffusion.models.dit_llama.dit_llama_model
nemo/collections/diffusion/models/dit_llama/dit_llama_model.py:25:0: C0115: Missing class docstring (missing-class-docstring)
************* Module nemo.collections.diffusion.models.model
nemo/collections/diffusion/models/model.py:312:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/models/model.py:334:4: C0116: Missing function or method docstring (missing-function-docstring)
************* Module nemo.collections.diffusion.sampler.edm.edm_pipeline
nemo/collections/diffusion/sampler/edm/edm_pipeline.py:20:0: W0611: Unused rearrange imported from einops (unused-import)
************* Module nemo.collections.diffusion.train
nemo/collections/diffusion/train.py:223:0: C0116: Missing function or method docstring (missing-function-docstring)
nemo/collections/diffusion/train.py:258:0: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.50/10

Thank you for improving NeMo's documentation!

nemo/collections/diffusion/data/diffusion_fake_datamodule.py

+
+        video_latent = torch.ones(self.seq_length, c * p**2, dtype=torch.bfloat16) * 0.5
+        text_embedding = torch.randn(self.text_seqlen, self.text_dim, dtype=torch.bfloat16)
+        pos_emb = pos_id_3d.get_pos_id_3d(t=t, h=h // p, w=w // p).reshape(-1, 3)


nemo/collections/diffusion/train.py

+        torch.cuda.memory._record_memory_history(
+            True,
+            # Keep 100,000 alloc/free events from before the snapshot
+            trace_alloc_max_entries=100000,
+            # Record stack information for the trace events
+            trace_alloc_record_context=True,
+        )


github-actions · 2024-11-10T04:33:26Z

[🤖]: Hi @zpx01 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

* Timestamps to transcribe (#10950) * inital version Signed-off-by: Nithin Rao Koluguri <nithinraok> * Support for RNNT, TDT, Hybrid Models Signed-off-by: Nithin Rao Koluguri <nithinraok> * move change of decoder stratery from mixin to individual model class Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * update transcribe_speech.py Signed-off-by: Nithin Rao Koluguri <nithinraok> * uncomment Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * add docs Signed-off-by: Nithin Rao Koluguri <nithinraok> * fix docs Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * codeql fixes Signed-off-by: Nithin Rao Koluguri <nithinraok> * unit tests Signed-off-by: Nithin Rao Koluguri <nithinraok> * minor rebase fix Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * add None case to restore the state set outside using decoding_stratergy() Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> * remove ipdb traces Signed-off-by: Nithin Rao Koluguri <nithinraok> * updates doc for transcription.py Signed-off-by: Nithin Rao Koluguri <nithinraok> * remove preserve alignment for AED models as it doesn;t support it Signed-off-by: Nithin Rao Koluguri <nithinraok> * lint warnings Signed-off-by: Nithin Rao Koluguri <nithinraok> * Apply isort and black reformatting Signed-off-by: nithinraok <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: nithinraok <[email protected]> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 1b8fce7 ! (#11247) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 47ff44e ! (#11254) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Handling tokenizer in PTQ for Nemo 2.0 (#11237) * Handling tokenizer in PTQ for Nemo 2.0 Signed-off-by: Jan Lasek <[email protected]> * Print log msg and enable overriding Signed-off-by: Jan Lasek <[email protected]> * Warning for legacy tokenizer config Signed-off-by: Jan Lasek <[email protected]> * Save HF tokenizer to make tokenizer_config.yaml (almost) redundant Signed-off-by: Jan Lasek <[email protected]> * Handle tokenizer in a unified way Signed-off-by: Jan Lasek <[email protected]> * Move saving context within export Signed-off-by: Jan Lasek <[email protected]> * Fix typo in get_tokenzier Signed-off-by: Jan Lasek <[email protected]> * Reduce diff Signed-off-by: Jan Lasek <[email protected]> * Drop unused import Signed-off-by: Jan Lasek <[email protected]> --------- Signed-off-by: Jan Lasek <[email protected]> * Fix finetuning datamodule resume (#11187) * fix datamodule resume Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix subclass Signed-off-by: Chen Cui <[email protected]> * docstrings and formats Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: cuichenx <[email protected]> * ci: Move `bump mcore` to templates (#11229) * ci: Move `bump mcore` to templates Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * fix Signed-off-by: Oliver Koenig <[email protected]> * final Signed-off-by: Oliver Koenig <[email protected]> --------- Signed-off-by: Oliver Koenig <[email protected]> * fix: Update baseline (#11205) Signed-off-by: Oliver Koenig <[email protected]> * Remove deprecated builder_opt param from build command (#11259) Signed-off-by: Jan Lasek <[email protected]> * chore(beep boop 🤖): Bump `MCORE_TAG=aded519...` (2024-11-12) (#11260) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * [Doc fixes] update file names, installation instructions, bad links (#11045) * rename eval_beamsearch_ngram.py to eval_beamsearch_ngram_ctc.py in docs Signed-off-by: Elena Rastorgueva <[email protected]> * replace out of date installation instructions with pointer to NeMo README installation section Signed-off-by: Elena Rastorgueva <[email protected]> * point to user guide instead of readme Signed-off-by: Elena Rastorgueva <[email protected]> * some link updates Signed-off-by: Elena Rastorgueva <[email protected]> * update more links Signed-off-by: Elena Rastorgueva <[email protected]> --------- Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> * fix(export): GPT models w/ bias=False convert properly (#11255) Signed-off-by: Terry Kong <[email protected]> * ci: Run secrets detector on `pull_request_target` (#11263) Signed-off-by: Oliver Koenig <[email protected]> * fix(export): update API for disabling device reassignment in TRTLLM for Aligner (#10863) * fix(export): update API for disabling device reassignment in TRTLLM for Aligner [feat] Upgrade nemo-export path for aligner to TRTLLM-v12 and use python runtime Signed-off-by: Terry Kong <[email protected]> fix: forgot to always set _disable_torch_cuda_device_set Signed-off-by: Terry Kong <[email protected]> Signed-off-by: Terry Kong <[email protected]> Apply isort and black reformatting Signed-off-by: terrykong <[email protected]> invert torch device set Signed-off-by: Terry Kong <[email protected]> * remove comment Signed-off-by: Terry Kong <[email protected]> --------- Signed-off-by: Terry Kong <[email protected]> * new vfm training features (#11246) Signed-off-by: Zeeshan Patel <[email protected]> Co-authored-by: Zeeshan Patel <[email protected]> * Update pruning and distillation tutorial notebooks (#11091) * Update pruning and distillation tutorial notebooks Signed-off-by: Gomathy Venkata Krishnan <[email protected]> * Update README Signed-off-by: Gomathy Venkata Krishnan <[email protected]> * Update batch size in width pruning script Signed-off-by: Gomathy Venkata Krishnan <[email protected]> * Update README Signed-off-by: Gomathy Venkata Krishnan <[email protected]> --------- Signed-off-by: Gomathy Venkata Krishnan <[email protected]> * Beam search algorithm implementation for TDT models (#10903) * initial commit Signed-off-by: lilithgrigoryan <[email protected]> * add: default beam search implementation Signed-off-by: lilithgrigoryan <[email protected]> * fix: changed to removing duplicate hypothesis in separate function Signed-off-by: lilithgrigoryan <[email protected]> * fix: changed to cartesian product in choosing best hyp Signed-off-by: lilithgrigoryan <[email protected]> * fix: minor fixes in comments Signed-off-by: lilithgrigoryan <[email protected]> * add: maes decoding strategy Signed-off-by: lilithgrigoryan <[email protected]> * add: durations filtering in maes, lm fusion in progress Signed-off-by: lilithgrigoryan <[email protected]> * fix: refactored, added comments, command line args, finalized Signed-off-by: lilithgrigoryan <[email protected]> * fix: removed prints Signed-off-by: lilithgrigoryan <[email protected]> * add: docs Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix: minor fix Signed-off-by: lilithgrigoryan <[email protected]> * fix: rm beam_size=1 exception, rm duplicates check, fix error handling Signed-off-by: lilithgrigoryan <[email protected]> * fix: error handling Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix: removed evaluations file Signed-off-by: lilithgrigoryan <[email protected]> * rn: blank scoring Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * rm: blank scoring and duration beam size Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix: removed durations_beam_size from default beam search Signed-off-by: lilithgrigoryan <[email protected]> * add: logaddexp Signed-off-by: lilithgrigoryan <[email protected]> * rm: prefix search Signed-off-by: lilithgrigoryan <[email protected]> * rn: nested loop over extensions Signed-off-by: lilithgrigoryan <[email protected]> * fix: bug with caching Signed-off-by: lilithgrigoryan <[email protected]> * rm: topk on durations Signed-off-by: lilithgrigoryan <[email protected]> * add: restored prefix search Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * fix: fixed comments Signed-off-by: lilithgrigoryan <[email protected]> * refactored duplicate merging Signed-off-by: lilithgrigoryan <[email protected]> * changes batch scoring Signed-off-by: lilithgrigoryan <[email protected]> * refactored rnnt batch scoring Signed-off-by: lilithgrigoryan <[email protected]> * alsd first working Signed-off-by: lilithgrigoryan <[email protected]> * refactored Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * remove stacking operations Signed-off-by: lilithgrigoryan <[email protected]> * fixes im base class Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * remove potentially uninitialized local variable Signed-off-by: lilithgrigoryan <[email protected]> * default beam search minor fixes Signed-off-by: lilithgrigoryan <[email protected]> * add test, fix maes timesteps Signed-off-by: lilithgrigoryan <[email protected]> * rm file Signed-off-by: lilithgrigoryan <[email protected]> * rm file Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * clean up Signed-off-by: lilithgrigoryan <[email protected]> * fix comments Signed-off-by: lilithgrigoryan <[email protected]> * add ngram lm test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix maes_num_steps=1 Signed-off-by: lilithgrigoryan <[email protected]> * fix kenlm model path Signed-off-by: lilithgrigoryan <[email protected]> * fix kenlm model full path Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * made requested changes Signed-off-by: lilithgrigoryan <[email protected]> * merge after isort Signed-off-by: lilithgrigoryan <[email protected]> * add prints to test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * add Kenlm to asr requirements Signed-off-by: lilithgrigoryan <[email protected]> * remove prints in tests Signed-off-by: lilithgrigoryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add kenlm to test requirements Signed-off-by: lilithgrigoryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rm kenlm from link, add package-name Signed-off-by: lilithgrigoryan <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rm second kenlm installation Signed-off-by: lilithgrigoryan <[email protected]> * rm kenlm from dependencies make test optional Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix in test Signed-off-by: lilithgrigoryan <[email protected]> * fix in test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix comments Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * add comments Signed-off-by: lilithgrigoryan <[email protected]> * add comments Signed-off-by: lilithgrigoryan <[email protected]> * splitted docstrings Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * add comments Signed-off-by: lilithgrigoryan <[email protected]> * splitted docstrings Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * add comments Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fixes to python3 type annotations Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * merging Signed-off-by: lilithgrigoryan <[email protected]> * merging Signed-off-by: lilithgrigoryan <[email protected]> * fix in return type Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * fix test Signed-off-by: lilithgrigoryan <[email protected]> * Apply isort and black reformatting Signed-off-by: lilithgrigoryan <[email protected]> * rm time_idx Signed-off-by: lilithgrigoryan <[email protected]> * fix comments to python3 style Signed-off-by: lilithgrigoryan <[email protected]> --------- Signed-off-by: lilithgrigoryan <[email protected]> Signed-off-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update nemo1->2 conversion according to changes in main (#11253) * update nemo1->2 conversion according to changes in main Signed-off-by: Huiying Li <[email protected]> * Apply isort and black reformatting Signed-off-by: HuiyingLi <[email protected]> * format fix Signed-off-by: Huiying Li <[email protected]> * add docstrings Signed-off-by: Huiying Li <[email protected]> --------- Signed-off-by: Huiying Li <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Co-authored-by: HuiyingLi <[email protected]> * Add llama 3.1 recipes (#11273) * add llama 3.1 recipes Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix pylint Signed-off-by: Chen Cui <[email protected]> * Fix llama3.1 wrong config in io.json --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: Ao Tang <[email protected]> * Fix Finetune Recipe (#11267) * Fix Starcoder_15 SFT recipe * Fix PP type SFT recipe * Fix PP type SFT recipe * Fix Gemma2b SFT TP=1 * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * Fix more sft recipe * remove pp dtype * remove pp dtype * Configure no restart validation loop in nl.Trainer (#11029) * Configure no restart validation loop in nl.Trainer Signed-off-by: Hemil Desai <[email protected]> * fix Signed-off-by: Hemil Desai <[email protected]> * Skip validation whenever restarting=True Signed-off-by: Hemil Desai <[email protected]> * PR feedback Signed-off-by: Hemil Desai <[email protected]> * Apply isort and black reformatting Signed-off-by: hemildesai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Co-authored-by: hemildesai <[email protected]> * Handle _io_unflatten_object when _thread_local.output_dir is not available (#11199) Signed-off-by: Hemil Desai <[email protected]> * change default ckpt name (#11277) Signed-off-by: Maanu Grover <[email protected]> * Use MegatronDataSampler in HfDatasetDataModule (#11274) * Use MegatronDataSampler in HfDataset Signed-off-by: Alexandros Koumparoulis <[email protected]> * Apply isort and black reformatting Signed-off-by: akoumpa <[email protected]> --------- Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Co-authored-by: akoumpa <[email protected]> * Remove opencc upperbound (#10909) Signed-off-by: Dong Hyuk Chang <[email protected]> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Signed-off-by: nithinraok <[email protected]> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Jan Lasek <[email protected]> Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: Oliver Koenig <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Elena Rastorgueva <[email protected]> Signed-off-by: Terry Kong <[email protected]> Signed-off-by: Zeeshan Patel <[email protected]> Signed-off-by: Gomathy Venkata Krishnan <[email protected]> Signed-off-by: lilithgrigoryan <[email protected]> Signed-off-by: lilithgrigoryan <[email protected]> Signed-off-by: Huiying Li <[email protected]> Signed-off-by: HuiyingLi <[email protected]> Signed-off-by: Hemil Desai <[email protected]> Signed-off-by: hemildesai <[email protected]> Signed-off-by: Maanu Grover <[email protected]> Signed-off-by: Alexandros Koumparoulis <[email protected]> Signed-off-by: akoumpa <[email protected]> Signed-off-by: Dong Hyuk Chang <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Co-authored-by: nithinraok <[email protected]> Co-authored-by: oliver könig <[email protected]> Co-authored-by: Jan Lasek <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: cuichenx <[email protected]> Co-authored-by: Elena Rastorgueva <[email protected]> Co-authored-by: Terry Kong <[email protected]> Co-authored-by: Zeeshan Patel <[email protected]> Co-authored-by: gvenkatakris <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: lilithgrigoryan <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huiying <[email protected]> Co-authored-by: HuiyingLi <[email protected]> Co-authored-by: Ao Tang <[email protected]> Co-authored-by: Hemil Desai <[email protected]> Co-authored-by: hemildesai <[email protected]> Co-authored-by: Maanu Grover <[email protected]> Co-authored-by: Alexandros Koumparoulis <[email protected]> Co-authored-by: akoumpa <[email protected]> Co-authored-by: Dong Hyuk Chang <[email protected]>

new vfm training features

069254c

Signed-off-by: Zeeshan Patel <[email protected]>

zpx01 requested a review from ethanhe42 November 10, 2024 03:17

zpx01 added the Run CICD label Nov 10, 2024

github-advanced-security bot found potential problems Nov 10, 2024

View reviewed changes

zpx01 self-assigned this Nov 11, 2024

ethanhe42 approved these changes Nov 12, 2024

View reviewed changes

zpx01 merged commit 6e8e974 into main Nov 13, 2024
165 of 166 checks passed

zpx01 deleted the vfm branch November 13, 2024 04:09

zpx01 restored the vfm branch November 13, 2024 04:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Diffusion Training Features #11246

Advanced Diffusion Training Features #11246

zpx01 commented Nov 10, 2024

github-actions bot commented Nov 10, 2024

github-actions bot commented Nov 10, 2024

github-actions bot commented Nov 10, 2024

Advanced Diffusion Training Features #11246

Advanced Diffusion Training Features #11246

Conversation

zpx01 commented Nov 10, 2024

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

github-actions bot commented Nov 10, 2024

github-actions bot commented Nov 10, 2024

github-actions bot commented Nov 10, 2024