From 183231c8f2d2874a88a753f084b2329b9b6639b2 Mon Sep 17 00:00:00 2001 From: Xu Kai Date: Tue, 12 Sep 2023 17:29:27 +0800 Subject: [PATCH] [gptq] rebase to main (#4695) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * [gemini] fix tensor storage cleaning in state dict collection (#4396) * [hotfix] fix unsafe async comm in zero (#4404) * improve stablility of zero * fix wrong index * add record stream * [doc] update Coati README (#4405) * style: apply formatter * fix: add outdated warnings * docs: add dataset format and polish * docs: polish README * fix: fix json format * fix: fix typos * revert: revert 7b example * [doc] fix a typo in examples/tutorial/auto_parallel/README.md (#4430) Co-authored-by: Siyuan Tian * [cluster] add process group mesh (#4039) * [cluster] add process group mesh * [test] add process group mesh test * force sync * [pipeline] add stage manager (#4093) * [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager * [pipeline] implement p2p communication (#4100) * [pipeline] add p2p communication * [test] add p2p communication test * [test] add rerun decorator * [test] rename to avoid conflict * [pipeline] refactor 1f1b schedule (#4115) * [api] update optimizer wrapper to fit pipeline * [pipeline] add base schedule * [pipeline] add 1f1b schedule * [test] add pipeline schedule utils test * [pipeline] fix import * [pipeline]add pipeline policy and bert forward (#4130) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * [pipeline] add stage manager (#4093) * [pipeline] add stage manager * [test] add pipeline stage manager test * [pipeline] add docstring for stage manager * [pipeline]add pipeline policy and bert forward (#4130) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * [pipeline] build bloom model and policy , revise the base class of policy (#4161) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * [pipeline] update shardformer policy * [pipeline] update shardformer docstring * [test] update shardformer tests * [test] add shard util tests * [shardformer] rename policy file name * [shardformer] fix type hint * [pipeline] add bert_for_pretraining bert_lmhead forward and policy (#4172) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params * [pipeline] move bert related pipeline components to shardformer (#4187) * move bert related pipeline components to shardformer * fix bugs * revision * fix bert model tests * fix bert_lm_head model tests * fix tests * fix tests * done checks * skip bloom * [shardformer] support lazy init (#4202) * [shardformer] support lazy init * [shardformer] linear support lazy init * [shardformer] embedding support lazy init * [shardformer] norm support lazy init * [shardformer] fused linear support lazy init * [test] update shardformer test layer * [test] shardformer with lazy init fit ddp * [lazy] hotfix deepcopy of param * [shardformer] fix bert policy and update test * [shardformer] fix bloom policy and update test * [shardformer] fix opt policy and update test * [shardformer] fix t5 policy and update test * [shardformer] fix gpt2 policy and update test * [shardformer] fix llama policy and update test * [pipeline] Bert pipeline for shardformer and its tests (#4197) * add pipeline forward * complete pipeline forward check * fix bert forward without pipeline * fix comments * discard useless line * add todo * clean prints * fix distribute layers * [pipeline] Llama pipeline (#4205) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * [pipeline] Llama causal lm and llama for sequence classification pipeline (#4208) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * [pipeline] add bloom model pipeline (#4210) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache * [pipeline] Add Pipeline Forward for GPT2Model Shardformer (#4224) * * fix typehint & docstring in sharder.py * * update pipeline forward for GPT2Model * * add test for pipeline forward of GPT2Model * * add cache cleaning in gpt2 test * * change assert to raise command * [shardformer] fix base policy (#4229) * [pipeline] add pipeline forward for variants of gpt2 (#4238) * add forward for GPTLMHeadModel * add test for gpt_lm * arranging get_held_layers method * arrange forward replacement * add forward for GPT2ForTokenClassification * add forward for GPT2ForSequenceClassification * fix test_shard_gpt2.py * add GPT2DoubleHeadsmodel & fix bugs * add id checking in get_shared_params * [pipeline] All bert models (#4233) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * add pure pipeline test * finish some bert models * finish all bert models * finish bert tests * fix bugs * fix bugs * fix test pipeline * fix data gen for qa * update the set pipeline forward * shared params * fix bugs * [pipeline] finish bloom models pipeline and tests (#4223) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * finish bloom model * test shard gpt2 * clear cache * support all bloom models * add bloom models policies * finish bloom pipeline and tests * add set pipeline * finish bloom * [bugs] hot fix some testing bugs for new models (#4268) * hot fix * hot fx tracer * [pipeline] support shardformer for GPT2ForQuestionAnswering & complete pipeline support for GPT2 (#4245) * change for transformers loggers * add forward for GPT2ForQuestionAnswering * fix assert * fix torchrec test * [shardformer] support inplace sharding (#4251) * [shardformer] embedding support inplace sharding * [shardformer] linear support inplace sharding * [shardformer] layernorm support inplace sharding * [shardformer] qkv support inplace sharding * [test] update shardformer layer test * [shardformer] fix shared param sharding * [shardformer] fix bert policy * [shardformer] fix bloom policy * [shardformer] fix llama policy * [shardformer] fix opt policy * [shardformer] fix t5 policy * [shardformer] fix fused qkv linear * [shardformer] fix bugs * force sync * [test] fix bugs * [test] fix transformer version * [pipeline] refactor gpt2 pipeline forwards (#4287) * move gpt2 pipeline forwards to modeling folder * check pipeline status when adding replacing policy * fix typehint * fix arguments processing in gpt2_model_forward * [pipeline] OPT model pipeline (#4258) * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * fix opt * set transformers version * refactor the test pipeline * [hotfix] fix opt pipeline (#4293) * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * opt forward and test * pause * finish opt model pipeline * finish opt pipeline * fix opt * set transformers version * refactor the test pipeline * fix bug * [pipeline] reformat for unified design (#4283) * bert_reformat * reformat * reformat * fix a typo * format * format * fix bug * [pipeline] add pipeline support for T5Stack/T5EncoderModel (#4300) * modify t5 policy & add test * pipeline stage distribution for t5 * complete t5 base policy * t5 stack: halfway * modify gpt2 pipeline test * complete pipeline forward for T5Stack/T5EncoderModel * fix docstring * move t5 util tests to test_pipeline * [pipeline] test pure pipeline process using llama (#4218) * bloom policy * llama pipeline forward and tests * fix the output and attention_mask * fix name * bind argument to policy * Revert "bloom policy" This reverts commit 8dee68a0a22568dbeed6d4563372b25e1e825fb0. This policy should be revert and copied to feature/bloom * revert the bloom changes * cancel unneeded inputs * gpt * finish llama * causal lm and sequence classification * revision * add pure pipeline test * fixed version * fixed version * pure pipeline * [pipeline] add pipeline support for all T5 models (#4310) * complete policy for T5Model & T5ForConditionalGeneration * modify function signature in forwards * add forward for T5model * add forward for T5ForConditionalGeneration * fix a bug * fix hidden_states transporting in decoder * fix the passing of encoder_outputs * [shardformer] support pipeline base vit model (#4284) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * support base vit pipeline * support vit downstream model * fix vit shard test * modify hidden states return type --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> * [plugin] add 3d parallel plugin (#4295) * [amp] add mixed precision optimizer * [plugin] add 3d parallel plugin * [booster] support pipeline * [plugin] 3d parallel plugin support clip grad norm * [shardformer] fix sharder and add plugin test * [plugin] rename 3d parallel plugin * [ci] support testmon core pkg change detection (#4305) * [hotfix] debug testmon * [hotfix] fix llama * [hotfix] fix p2p bugs * [hotfix] fix requirements * [hotfix] fix gemini and zero test (#4333) * [hotfix] fix gemini and zero test * [hotfix] fix lazy init test * [hotfix] fix lazy init test * [pipeline] fix return_dict/fix pure_pipeline_test (#4331) * [pipeline] add unit test for 1f1b (#4303) * add unit test for 1f1b * polish code * polish code and update ut version * fix * [pipeline] refactor test pipeline and remove useless utils in pipeline (#4324) * refactor tests * refactor bloom model * finish policy tests * refactor tests * fix test pure pipeline * remove test pipeline and cutdown launch process * refactor tests * refactor bloom model * finish policy tests * refactor tests * fix test pure pipeline * remove test pipeline and cutdown launch process * [pipeline] support fp32 for HybridPlugin/merge shardformer test and pipeline test into one file (#4354) * add naive optimizer for 3DPlugin/refactor gpt2 shardformer test * merge tests of PP/DP/TP combinations into one test file * fix bug when sync grad for dp in HybridPlugin * update supported precisions for 3DPlugin/fix bug when shifting tp_degree * improve the passing of lazy_init * modify lazy_init/use sync_shared_params * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * update some module with new api version * [test] skip some not compatible models * [test] Hotfix/fix some model test and refactor check util api (#4369) * fix llama test * fix test bug of bert, blip2, bloom, gpt2 * fix llama test * fix opt test * fix sam test * fix sam test * fix t5 test * fix vit test * fix whisper test * fix whisper test * polish code * adjust allclose parameter * Add mistakenly deleted code * addjust allclose * change loss function for some base model * [shardformer] add util functions for shardformer tests/fix sync_shared_param (#4366) * add util functions for shardformer tests & rewrite gpt2 test * fix shared_params & embedding/merging * fix precision * [pipeline] add chatglm (#4363) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params * add chatglm * add * chatglm * chatglm * finish chatglm * deletes * fix rmsnorm * chatglm * fix chatglm shard * init * [Shardformer] Merge flash attention branch to pipeline branch (#4362) * [shardformer] supported flash attention test dependency (#4158) * [shardformer] fix flash attention utils test (#4180) * [shardformer] opt support flash attention (#4163) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] add performance benchmark of shardformer (#4175) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] benchmark fix * [shardformer] benchmark fix * [shardformer] llama support flash attention (#4185) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] llama support flash attention * [shardformer] llama support flash attention * [shardformer] Move the import statement for xformer outside the forward function. * [shardformer] gpt2 support flash attention. (#4191) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] gpt2 support flash attention * [shardformer] gpt2 support flash attention * [shardformer] gpt2 support flash attention * [shardformer] bloom support flash attention (#4188) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bloom suport flash attention * [shardformer] add assert to sequence length * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] bert support flash attention. (#4206) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bert support flash attention * [shardformer] t5 support flash attention. (#4216) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] t5 support flash attention * [shardformer] t5 support flash attention * fix typo * fix typo * fix typo * fix typo * fix typo * fix typo * [shardformer] support 'paddedcausal' type of attention mask in Coloattention. (#4215) * added padded causal attn mask type for ColoAttention * [shardformer]t5 flash attention fix (#4239) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] t5 flash attention fix * [shardformer] update gpt2 to use coloattention. (#4234) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 to use coloattention * [shardformer] update gpt2 * [shardformer] update opt and llama to use coloattention. (#4226) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt to use coloattention * [shardformer]update opt * [shardformer] shardformer support jit fused operator. (#4236) * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] opt support flash attention * [shardformer] move to modeling * [shardformer] move to modeling * [shardformer] bloom support jit fused operator * [shardformer] bloom support jit fused operator * [shardformer] bloom support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] t5 support jit fused operator * [shardformer] add roadmap of flash attention * [shardformer] add roadmap of flash attention * [shardformer] add roadmap of flash attention * [shardformer] add type hint to 'self' param of forward * [shardformer] merge feature/shardformer-models branch to feature/flash-attention-shardformer branch. (#4290) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> * [shardformer] whisper support flash attention (#4301) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] whisper support flash attention * [shardformer] whisper support flash attention * [shardformer]whisper support jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> * [shardformer] sam support flash attention (#4316) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] sam support flash attention --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> * [shardformer] merge blip2/chatglm (#4321) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [shardformer] blip2 support flash attention and jit operator (#4325) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] blip2 support flash attention and jit operator * [shardformer] blip2 support flash attention and jit operator * [shardformer] blip2 support flash attention and jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [shardformer] chatglm support flash attention and jit operator (#4330) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator * [shardformer] chatglm support flash attention and jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [shardformer] vit support flash attention and jit operator (#4334) * Feature/vit support (#4182) * [shardformer] added tests * [shardformer] vit test finish and support * fix attention dropout * [shardformer] support SAM (#4231) * 1.support sam 2.add fused qkv for nn.Linear * update utils support set element in list * overtwrite SamVisionAttention foward to use DropoutForParallelInput * remove unused code * [shardformer] support whisper (#4212) * support whisper * fix bug in vocabembedding * support downstream model of whisper * update readme * Feature/chatglm (#4240) * [shardformer] added tests * [shardformer] vit test finish and support * [shardformer] chatglm ready * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] chatglm shard without mlp sharding * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] fix chatglm configuration with pre-commit * [shardformer] added tests * [shardformer] vit test finish and support * import chatglm * [shardformer] add test kit in model zoo for chatglm * [sharformer] add first version of policy of chatglm * [shardformer] polish chatglm code * [shardformer] polish code * [shardformer] support chatglm without layernorm * [shardformer] delete some file * [shardformer] ChatGLM support layernorm sharding * [shardformer] register without auto policy * [shardformer] pre-commit check files * [shardformer] support ChatGLMForConditionalGeneration & add fusedlayernorm for vit * [shardformer] support Blip2 (#4243) * support base blip2 * add support for downstream blip2 model * update readme * add forward injection * skip not compatible models test * fix test for gemini and low_level_zero_pugin * [shardformer] vit support flash attention and jit operator * [shardformer] vit support flash attention and jit operator --------- Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [pipeline] merge flash attention branch * [pipeline] merge flash attention branch * [pipeline] merge flash attention branch * [pipeline] fix conflict * [pipeline] fix conflict * Merge branch 'feature/pipeline' into feature/pipeline * Merge branch 'feature/pipeline' into feature/pipeline * Merge branch 'feature/pipeline' into feature/pipeline * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * activate checks * fix flash attention tests * gemini ignore whisper * fix vit * fix xformers import handle --------- Co-authored-by: Frank Lee Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> * [pipeline] rewrite t5 tests & support multi-tensor transmitting in pipeline (#4388) * fix remaining t5 bugs/rewrite t5 tests * fix multi-tensor communication in pipeline * rearrange test_config * fix keyerror in sync_shared_params * fix get_held_layers & Randomnizer, complete t5 tests * erase printing * fix get_held_layers through modifying _release_unheld_layers * fix _get_recursive_held_layers bug * [shardformer] update shardformer to use flash attention 2 (#4392) * cherry-pick flash attention 2 cherry-pick flash attention 2 * [shardformer] update shardformer to use flash attention 2 [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix [shardformer] update shardformer to use flash attention 2, fix * [shardformer] test all optimizations (#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations * [pipeline] rewrite bert tests and fix some bugs (#4409) * add pipeline policy and bert forward to be done * add bertmodel pipeline forward and make tests * add Bert_Policy and test for policy * update formatting * update formatting * update the code * fix bugs * fix name confilt * add bloom model and policy ,revise the base class of policy * revise * revision * add bert_for_pretraining * add bert_for_pretraining forward and policy * fix typos * cancel warning * change the imediate output to default dict * change the default output of get_shared_params * rewrite bert test * rewrite bert test * fix some bugs * del pipeline tests * del pipeline tests * del useless print * del useless print * rewrite data repeats * [shardformer]fix, test gpt2 for AMP+TP (#4403) * [shardformer] gpt2 tests fix [shardformer] test all optimizations (#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] gpt2 tests fix * [shardformer] gpt2 tests fix * [shardformer] rewrite tests for opt/bloom/llama/vit/chatglm (#4395) * rewrite opt tests * rewrite llama tests * rewrite bloom & vit tests * rewrite chatglm tests * fix LinearCol for classfiers * add judge for other tp layers, fix lazy init in util * [shardformer] update tests for all optimization (#4413) [shardformer] update tests for all optimization * [shardformer]update t5 tests for using all optimizations. (#4407) * [shardformer] gpt2 tests fix [shardformer] test all optimizations (#4399) [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] test all optimizations [shardformer] gpt2 tests fix * [shardformer]update t5 to use all optimizations * [shardformer] update bloom/llama/vit/chatglm tests (#4420) [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update opt tests [shardformer] update opt tests [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update bloom/llama/vit/chatglm tests [shardformer] update bloom/llama/vit/chatglm tests * [misc] resolve code factor issues (#4433) * [misc] update requirements * [shardformer] fix embedding * [shardformer] fix import * [format] applied code formatting on changed files in pull request 4441 (#4445) Co-authored-by: github-actions * [shardformer/sequence parallel] Cherry pick commit to new branch (#4450) * [shardformer/sequence parallel] Support sequence parallel for gpt2 (#4384) * [sequence parallel] add sequence parallel linear col/row support (#4336) * add sequence parallel linear col/row support * add annotation * add annotation * add support for gpt2 fused qkv linear layer * support sequence parallel in GPT2 * add docstring and note * add requirments * remove unused flash-attb * modify flash attn test * modify flash attn setting * modify flash attn code * add assert before divide, rename forward function * [shardformer/test] fix gpt2 test with seq-parallel * [shardformer/sequence parallel] Overlap input gather and grad computation during col backward (#4401) * overlap gather input / grad computing during col backward * modify test for overlap * simplify code * fix code and modify cuda stream synchronize * [shardformer/sequence parallel] polish code * [shardformer] support DDP in HybridPlugin/add tp+dp tests (#4446) * support DDP for HybridPlugin/add tp+dp tests * add docstring for HybridParallelPlugin * [devops] add large-scale distributed test marker (#4452) * [test] remove cpu marker * [test] remove gpu marker * [test] update pytest markers * [ci] update unit test ci * [shardformer] support interleaved pipeline (#4448) * support interleaved pipeline * fix unit test * remove virtual stage test in stage mgr * add droped type hint and updated bwd * [shardformer/sequence parallel] support gpt2 seq parallel with pp/dp/tp (#4460) * support gpt2 seq parallel with pp/dp/tp * fix a bug when waiting for stream done * delete unused gpt2_seq file * [shardformer] bloom support sequence parallel (#4465) [shardformer] bloom support sequence parallel * [shardformer] bert support sequence parallel. (#4455) * [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel * [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel [shardformer] bert support sequence parallel * [shardformer] bert support sequence parallel * [shardformer] Pipeline/whisper (#4456) * add some base tests and policies * finish whisper base model * add conditional generation * finish basic tests * whisper * finish whisper * finish whisper * del useless whisper test * fix * add argmin to replace * finish revision * [shardformer] support tp+zero for shardformer (#4472) * support tp+zero/input type cast for hybridplugin * add tp+zero tests * fix bucket arguments * [chat] update config and prompt (#4139) * update config and prompt * update config --------- Co-authored-by: Qianran Ma * rename chatglm to chatglm2 (#4484) * [shardformer/sequence parallel] not support opt of seq-parallel, add warning and fix a bug in gpt2 pp (#4488) * [shardformer] chatglm support sequence parallel (#4482) * [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix * [shardformer] tests for 3d parallel (#4493) * [gemini] improve compatibility and add static placement policy (#4479) * [gemini] remove distributed-related part from colotensor (#4379) * [gemini] remove process group dependency * [gemini] remove tp part from colo tensor * [gemini] patch inplace op * [gemini] fix param op hook and update tests * [test] remove useless tests * [test] remove useless tests * [misc] fix requirements * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [test] fix model zoo * [misc] update requirements * [gemini] refactor gemini optimizer and gemini ddp (#4398) * [gemini] update optimizer interface * [gemini] renaming gemini optimizer * [gemini] refactor gemini ddp class * [example] update gemini related example * [example] update gemini related example * [plugin] fix gemini plugin args * [test] update gemini ckpt tests * [gemini] fix checkpoint io * [example] fix opt example requirements * [example] fix opt example * [example] fix opt example * [example] fix opt example * [gemini] add static placement policy (#4443) * [gemini] add static placement policy * [gemini] fix param offload * [test] update gemini tests * [plugin] update gemini plugin * [plugin] update gemini plugin docstr * [misc] fix flash attn requirement * [test] fix gemini checkpoint io test * [example] update resnet example result (#4457) * [example] update bert example result (#4458) * [doc] update gemini doc (#4468) * [example] update gemini related examples (#4473) * [example] update gpt example * [example] update dreambooth example * [example] update vit * [example] update opt * [example] update palm * [example] update vit and opt benchmark * [hotfix] fix bert in model zoo (#4480) * [hotfix] fix bert in model zoo * [test] remove chatglm gemini test * [test] remove sam gemini test * [test] remove vit gemini test * [hotfix] fix opt tutorial example (#4497) * [hotfix] fix opt tutorial example * [hotfix] fix opt tutorial example * [shardformer] vit/llama/t5 ignore the sequence parallelism flag and some fix. (#4498) * [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * activate checks * [format] applied code formatting on changed files in pull request 4479 (#4504) Co-authored-by: github-actions * [zero]support zero2 with gradient accumulation (#4511) * support gradient accumulation with zero2 * fix type * [shardformer] opt fix. (#4514) * [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel [shardformer] chatglm support sequence parallel * fix fix fix fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * [shardformer] jit fused fix * activate checks * [Test] test ci * test ci * test ci * test ci * test ci * test ci * test ci * fix * [shardformer] support sharded checkpoint IO for models of HybridParallelPlugin (#4506) * add APIs * implement save_sharded_model * add test for hybrid checkpointio * implement naive loading for sharded model * implement efficient sharded model loading * open a new file for hybrid checkpoint_io * small fix * fix circular importing * fix docstring * arrange arguments and apis * small fix * [shardformer] zero1+pp and the corresponding tests (#4517) * pause * finish pp+zero1 * Update test_shard_vit.py * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) * fix overlap bug and support bert, add overlap as an option in shardconfig * support overlap for chatglm and bloom * [example] add llama2 example (#4527) * [example] transfer llama-1 example * [example] fit llama-2 * [example] refactor scripts folder * [example] fit new gemini plugin * [cli] fix multinode runner * [example] fit gemini optim checkpoint * [example] refactor scripts * [example] update requirements * [example] update requirements * [example] rename llama to llama2 * [example] update readme and pretrain script * [example] refactor scripts * [shardformer] fix emerged bugs after updating transformers (#4526) * [coati] add chatglm model (#4539) * update configuration of chatglm and add support in coati * add unit test & update chatglm default config & fix bos index issue * remove chatglm due to oom * add dataset pkg in requirement-text * fix parameter issue in test_models * add ref in tokenize & rm unnessary parts * separate source & target tokenization in chatglm * add unit test to chatglm * fix test dataset issue * update truncation of chatglm * fix Colossalai version * fix colossal ai version in test * [shardformer] Add overlap support for gpt2 (#4535) * add overlap support for gpt2 * remove unused code * remove unused code * [coati] update ci * fix colossalai version in coati examples * [shardformer] fix opt test hanging (#4521) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * keep requirements same with main branch * fix runtime prepare pass (#4502) Co-authored-by: lufang.chen * [shardformer] support pp+tp+zero1 tests (#4531) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [example] update streamlit 0.73.1 to 1.11.1 (#4386) * [example] change accelerate version (#4431) Co-authored-by: Siyuan Tian Co-authored-by: Hongxin Liu * [devops] cancel previous runs in the PR (#4546) * [shardformer] fix submodule replacement bug when enabling pp (#4544) * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) * implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp * [shardformer] support from_pretrained when loading model with HybridParallelPlugin (#4575) * hybrid plugin support huggingface from_pretrained * add huggingface compatibility tests * add folder cleaning * fix bugs * [zero]fix zero ckptIO with offload (#4529) * fix zero ckptio with offload * fix load device * saved tensors in ckpt should be on CPU * fix unit test * fix unit test * add clear cache * save memory for CI * Update Dockerfile (#4499) fix dockerfile build * [Fix] Fix compile error (#4357) * [pipeline] 1f1b schedule receive microbatch size (#4589) * [checkpointio] optimize zero optim checkpoint io (#4591) * [zero] update checkpoint io to save memory * [checkpointio] add device map to save memory * [DOC] hotfix/llama2news (#4595) * [doc] add llama2 news * [doc] add llama2 news * [doc] add llama2 news * [doc] add llama2 benchmark (#4604) * [doc] add llama2 benchmark * [doc] add llama2 benchmark * [shardformer] Pytree fix (#4533) * pytree test * test bert * test bert * test bert * revise * add register * add register * [shardformer] update bert finetune example with HybridParallelPlugin (#4584) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * [shardformer] fix opt test hanging * fix * test * test * [shardformer] zero1+pp and the corresponding tests (#4517) * pause * finish pp+zero1 * Update test_shard_vit.py * [shardformer/fix overlap bug] fix overlap bug, add overlap as an option in shardco… (#4516) * fix overlap bug and support bert, add overlap as an option in shardconfig * support overlap for chatglm and bloom * [shardformer] fix emerged bugs after updating transformers (#4526) * test * fix test * fix test * remove print * add fix * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] Add overlap support for gpt2 (#4535) * add overlap support for gpt2 * remove unused code * remove unused code * [shardformer] support pp+tp+zero1 tests (#4531) * [shardformer] fix opt test hanging * fix * test * test * test * fix test * fix test * remove print * add fix * [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] pp+tp+zero1 * [shardformer] fix submodule replacement bug when enabling pp (#4544) * [shardformer] support sharded optimizer checkpointIO of HybridParallelPlugin (#4540) * implement sharded optimizer saving * add more param info * finish implementation of sharded optimizer saving * fix bugs in optimizer sharded saving * add pp+zero test * param group loading * greedy loading of optimizer * fix bug when loading * implement optimizer sharded saving * add optimizer test & arrange checkpointIO utils * fix gemini sharding state_dict * add verbose option * add loading of master params * fix typehint * fix master/working mapping in fp16 amp * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] add bert finetune example * [shardformer] fix epoch change * [shardformer] broadcast add pp group * rebase feature/shardformer * update pipeline * [shardformer] fix * [shardformer] fix * [shardformer] bert finetune fix * [shardformer] add all_reduce operation to loss add all_reduce operation to loss * [shardformer] make compatible with pytree. make compatible with pytree. * [shardformer] disable tp disable tp * [shardformer] add 3d plugin to ci test * [shardformer] update num_microbatches to None * [shardformer] update microbatchsize * [shardformer] update assert * update scheduler * update scheduler --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Bin Jia <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Baizhou Zhang * [checkpointio] support huggingface from_pretrained for all plugins (#4606) * [shardformer] Add overlap optional for HybridParallelPlugin (#4615) * add optional overlap for plugin * remove fixed todo * [shardformer] update shardformer readme (#4617) [shardformer] update shardformer readme [shardformer] update shardformer readme * [test] ignore gpt2 shardformer test (#4619) * [zero] hotfix master param sync (#4618) * [zero] add method to update master params * [zero] update zero plugin * [plugin] update low level zero plugin * [test] fix gemini checkpoint and gpt test (#4620) * [legacy] move trainer to legacy (#4545) * [legacy] move trainer to legacy * [doc] update docs related to trainer * [test] ignore legacy test * [legacy] move engine to legacy (#4560) * [legacy] move engine to legacy * [example] fix seq parallel example * [example] fix seq parallel example * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [test] test gemini pluging hang * [example] update seq parallel requirements * [legacy] move builder and registry to legacy (#4603) * [release] update version (#4623) * [shardformer] Support customized policy for llamav2 based model with HybridParallelPlugin (#4624) * Enable policy assignment in HybridPlugin and enable llama policy for llamav2 * Remove Policy from Plugin * revert changes of plugin HybridParallelModule * revert changes in plugin * upgrade transformers * revert transformers version --------- Co-authored-by: flybird11111 <1829166702@qq.com> * [pipeline] set optimizer to optional in execute_pipeline (#4630) * set optimizer to optional in execute_pipeline * arrange device and mixed precision in booster init * fix execute_pipeline in booster.py * [example] update vit example for hybrid parallel plugin (#4641) * update vit example for hybrid plugin * reset tp/pp size * fix dataloader iteration bug * update optimizer passing in evaluation/add grad_accum * change criterion * wrap tqdm * change grad_accum to grad_checkpoint * fix pbar * [devops] fix concurrency group and compatibility test (#4665) * [devops] fix concurrency group * [devops] fix compatibility test * [devops] fix tensornvme install * [devops] fix tensornvme install * [devops] fix colossalai install * [shardformer] update llama2/opt finetune example and fix llama2 policy (#4645) * [shardformer] update shardformer readme [shardformer] update shardformer readme [shardformer] update shardformer readme * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] update llama2/opt finetune example and shardformer update to llama2 * [shardformer] change dataset * [shardformer] change dataset * [shardformer] fix CI * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix * [shardformer] fix [example] update opt example [example] resolve comments fix fix * [devops] fix concurrency group (#4667) * [legacy] move communication and nn to legacy and refactor logger (#4671) * [legacy] move communication to legacy (#4640) * [legacy] refactor logger and clean up legacy codes (#4654) * [legacy] make logger independent to gpc * [legacy] make optim independent to registry * [legacy] move test engine to legacy * [legacy] move nn to legacy (#4656) * [legacy] move nn to legacy * [checkpointio] fix save hf config * [test] remove useledd rpc pp test * [legacy] fix nn init * [example] skip tutorial hybriad parallel example * [devops] test doc check * [devops] test doc check * [shardformer]fix gpt2 double head (#4663) * [shardformer]fix gpt2 test [shardformer]fix gpt2 test [shardformer]fix gpt2 test * fix * [shardformer] add todo * [shardformer] add todo * [Feature] The first PR to Add TP inference engine, kv-cache manager and related kernels for our inference system (#4577) * [infer] Infer/llama demo (#4503) * add * add infer example * finish * finish * stash * fix * [Kernels] add inference token attention kernel (#4505) * add token forward * fix tests * fix comments * add try import triton * add adapted license * add tests check * [Kernels] add necessary kernels (llama & bloom) for attention forward and kv-cache manager (#4485) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * combine codes (#4509) * [feature] add KV cache manager for llama & bloom inference (#4495) * add kv cache memory manager * add stateinfo during inference * format * format * rename file * add kv cache test * revise on BatchInferState * file dir change * [Bug FIx] import llama context ops fix (#4524) * added _vllm_rms_norm * change place * added tests * added tests * modify * adding kernels * added tests: * adding kernels * modify * added * updating kernels * adding tests * added tests * kernel change * submit * modify * added * edit comments * change name * change commnets and fix import * add * added * fix * add ops into init.py * add * [Infer] Add TPInferEngine and fix file path (#4532) * add engine for TP inference * move file path * update path * fix TPInferEngine * remove unused file * add engine test demo * revise TPInferEngine * fix TPInferEngine, add test * fix * Add Inference test for llama (#4508) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py --------- Co-authored-by: yuanheng-zhao Co-authored-by: CjhHa1 * [infer] Add Bloom inference policy and replaced methods (#4512) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * Revert "[infer] Add Bloom inference policy and replaced methods (#4512)" (#4552) This reverts commit 17cfa5714083a81a505c097f1c411cd28162d922. * [Doc] Add colossal inference doc (#4549) * create readme * add readme.md * fix typos * [infer] Add Bloom inference policy and replaced methods (#4553) * add bloom inference methods and policy * enable pass BatchInferState from model forward * revise bloom infer layers/policies * add engine for inference (draft) * add test for bloom infer * fix bloom infer policy and flow * revise bloom test * fix bloom file path * remove unused codes * fix bloom modeling * fix dir typo * fix trivial * fix policy * clean pr * trivial fix * trivial * Fix Bugs In Llama Model Forward (#4550) * add kv cache memory manager * add stateinfo during inference * add * add infer example * finish * finish * format * format * rename file * add kv cache test * revise on BatchInferState * add inference test for llama * fix conflict * feature: add some new features for llama engine * adapt colossalai triton interface * Change the parent class of llama policy * add nvtx * move llama inference code to tensor_parallel * fix __init__.py * rm tensor_parallel * fix: fix bugs in auto_policy.py * fix:rm some unused codes * mv colossalai/tpinference to colossalai/inference/tensor_parallel * change __init__.py * save change * fix engine * Bug fix: Fix hang * remove llama_infer_engine.py * bug fix: fix bugs about infer_state.is_context_stage * remove pollcies * fix: delete unused code * fix: delete unused code * remove unused coda * fix conflict --------- Co-authored-by: yuanheng-zhao Co-authored-by: CjhHa1 * [doc] add colossal inference fig (#4554) * create readme * add readme.md * fix typos * upload fig * [NFC] fix docstring for colossal inference (#4555) Fix docstring and comments in kv cache manager and bloom modeling * fix docstring in llama modeling (#4557) * [Infer] check import vllm (#4559) * change import vllm * import apply_rotary_pos_emb * change import location * [DOC] add installation req (#4561) * add installation req * fix * slight change * remove empty * [Feature] rms-norm transfer into inference llama.py (#4563) * add installation req * fix * slight change * remove empty * add rmsnorm polciy * add * clean codes * [infer] Fix tp inference engine (#4564) * fix engine prepare data * add engine test * use bloom for testing * revise on test * revise on test * reset shardformer llama (#4569) * [infer] Fix engine - tensors on different devices (#4570) * fix diff device in engine * [codefactor] Feature/colossal inference (#4579) * code factors * remove * change coding (#4581) * [doc] complete README of colossal inference (#4585) * complete fig * Update README.md * [doc]update readme (#4586) * update readme * Update README.md * bug fix: fix bus in llama and bloom (#4588) * [BUG FIX]Fix test engine in CI and non-vllm kernels llama forward (#4592) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * [Kernel]Rmsnorm fix (#4598) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * add triton rmsnorm * delete vllm kernel flag * [Bug Fix]Fix bugs in llama (#4601) * fix tests * clean * clean * fix bugs * add * fix llama non-vllm kernels bug * modify * clean codes * bug fix: remove rotary_positions_ids --------- Co-authored-by: cuiqing.li * [kernel] Add triton layer norm & replace norm for bloom (#4609) * add layernorm for inference * add test for layernorm kernel * add bloom layernorm replacement policy * trivial: path * [Infer] Bug fix rotary embedding in llama (#4608) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * [bench] Add bloom inference benchmark (#4621) * add bloom benchmark * readme - update benchmark res * trivial - uncomment for testing (#4622) * [Infer] add check triton and cuda version for tests (#4627) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * Update sharder.py (#4629) * [Inference] Hot fix some bugs and typos (#4632) * fix * fix test * fix conflicts * [typo]Comments fix (#4633) * fallback * fix commnets * bug fix: fix some bugs in test_llama and test_bloom (#4635) * [Infer] delete benchmark in tests and fix bug for llama and bloom (#4636) * fix rotary embedding * delete print * fix init seq len bug * rename pytest * add benchmark for llama * refactor codes * delete useless code * add check triton and cuda * delete benchmark and fix infer bugs * delete benchmark for tests * delete useless code * delete bechmark function in utils * [Fix] Revise TPInferEngine, inference tests and benchmarks (#4642) * [Fix] revise TPInferEngine methods and inference tests * fix llama/bloom infer benchmarks * fix infer tests * trivial fix: benchmakrs * trivial * trivial: rm print * modify utils filename for infer ops test (#4657) * [Infer] Fix TPInferEngine init & inference tests, benchmarks (#4670) * fix engine funcs * TPInferEngine: receive shard config in init * benchmarks: revise TPInferEngine init * benchmarks: remove pytest decorator * trivial fix * use small model for tests * [NFC] use args for infer benchmarks (#4674) * revise infer default (#4683) * [Fix] optimize/shard model in TPInferEngine init (#4684) * remove using orig model in engine * revise inference tests * trivial: rename --------- Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: Xu Kai Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com> Co-authored-by: yuehuayingxueluo <867460659@qq.com> Co-authored-by: yuanheng-zhao Co-authored-by: CjhHa1 * [doc] Update booster user documents. (#4669) * update booster_api.md * update booster_checkpoint.md * update booster_plugins.md * move transformers importing inside function * fix Dict typing * fix autodoc bug * small fix * [shardformer] update shardformer readme (#4689) * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [shardformer] update shardformer readme * [gptq] add gptq kernel (#4416) * add gptq * refactor code * fix tests * replace auto-gptq * rname inferance/quant * refactor test * add auto-gptq as an option * reset requirements * change assert and check auto-gptq * add import warnings * change test flash attn version * remove example * change requirements of flash_attn * modify tests * [skip ci] change requirements-test * [gptq] faster gptq cuda kernel (#4494) * [skip ci] add cuda kernels * add license * [skip ci] fix max_input_len * format files & change test size * [skip ci] * add gptq tensor parallel * add gptq tp * delete print * add test gptq check * add test auto gptq check --------- Co-authored-by: Baizhou Zhang Co-authored-by: LuGY <74758262+Gy-Lu@users.noreply.github.com> Co-authored-by: Wenhao Chen Co-authored-by: Tian Siyuan Co-authored-by: Siyuan Tian Co-authored-by: Hongxin Liu Co-authored-by: Jianghai <72591262+CjhHa1@users.noreply.github.com> Co-authored-by: FoolPlayer <45593998+FoolPlayer@users.noreply.github.com> Co-authored-by: Kun Lin <81014421+klhhhhh@users.noreply.github.com> Co-authored-by: klhhhhh <1412841649@qq.com> Co-authored-by: FoolPlayer <498107402@qq.com> Co-authored-by: flybird1111 <1829166702@qq.com> Co-authored-by: Frank Lee Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions Co-authored-by: Michelle <97082656+MichelleMa8@users.noreply.github.com> Co-authored-by: Qianran Ma Co-authored-by: yingliu-hpc <138852768+yingliu-hpc@users.noreply.github.com> Co-authored-by: Ying Liu Co-authored-by: Lufang Chen <64068400+vincentccc@users.noreply.github.com> Co-authored-by: lufang.chen Co-authored-by: ChengDaqi2023 <131479795+ChengDaqi2023@users.noreply.github.com> Co-authored-by: 栾鹏 <825485697@qq.com> Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com> Co-authored-by: binmakeswell Co-authored-by: eric8607242 Co-authored-by: Cuiqing Li Co-authored-by: Yuanheng Zhao <54058983+yuanheng-zhao@users.noreply.github.com> Co-authored-by: yuehuayingxueluo <867460659@qq.com> Co-authored-by: yuanheng-zhao Co-authored-by: CjhHa1 --- .github/workflows/build_on_pr.yml | 14 +- .../compatiblity_test_on_dispatch.yml | 7 +- .github/workflows/compatiblity_test_on_pr.yml | 14 +- .../compatiblity_test_on_schedule.yml | 6 +- .github/workflows/doc_check_on_pr.yml | 8 +- .github/workflows/doc_test_on_pr.yml | 8 +- .github/workflows/example_check_on_pr.yml | 8 +- .github/workflows/run_chatgpt_examples.yml | 3 +- .github/workflows/run_chatgpt_unit_tests.yml | 3 +- LICENSE | 33 + README.md | 13 +- applications/Chat/README.md | 126 +- applications/Chat/benchmarks/README.md | 9 +- .../Chat/coati/dataset/sft_dataset.py | 75 +- .../Chat/coati/models/chatglm/__init__.py | 3 + .../coati/models/chatglm/chatglm_actor.py | 34 + .../coati/models/chatglm/chatglm_tokenizer.py | 446 +++++ .../models/chatglm/configuration_chatglm.py | 107 ++ .../coati/models/chatglm/modeling_chatglm.py | 1439 +++++++++++++++++ applications/Chat/coati/ray/README.md | 177 +- applications/Chat/coati/trainer/sft.py | 10 +- applications/Chat/evaluate/README.md | 252 +-- .../Chat/evaluate/config/config_cn.json | 99 +- .../Chat/evaluate/config/config_en.json | 117 +- .../evaluation_prompt_cn.json | 6 +- .../evaluation_prompt_en.json | 6 +- applications/Chat/examples/README.md | 260 +-- .../Chat/examples/community/README.md | 15 +- .../Chat/examples/community/peft/README.md | 6 + .../Chat/examples/community/ray/README.md | 14 + applications/Chat/examples/requirements.txt | 1 + applications/Chat/examples/train_sft.py | 12 +- applications/Chat/inference/README.md | 24 +- applications/Chat/requirements-test.txt | 1 + applications/Chat/requirements.txt | 2 +- applications/Chat/tests/test_dataset.py | 106 +- applications/Chat/tests/test_models.py | 130 +- .../naive_amp/mixed_precision_optimizer.py | 149 ++ .../offload/base_offload_module.py | 2 +- .../passes/runtime_preparation_pass.py | 2 +- .../tensor_shard/node_handler/registry.py | 1 - colossalai/booster/booster.py | 140 +- colossalai/booster/plugin/__init__.py | 3 +- colossalai/booster/plugin/gemini_plugin.py | 106 +- .../booster/plugin/hybrid_parallel_plugin.py | 520 ++++++ .../booster/plugin/low_level_zero_plugin.py | 182 ++- colossalai/booster/plugin/pp_plugin_base.py | 21 + colossalai/checkpoint_io/__init__.py | 3 +- .../checkpoint_io/general_checkpoint_io.py | 4 +- .../hybrid_parallel_checkpoint_io.py | 702 ++++++++ colossalai/checkpoint_io/utils.py | 567 ++++--- colossalai/cli/benchmark/models.py | 2 +- colossalai/cli/launcher/run.py | 4 + colossalai/cluster/__init__.py | 3 +- colossalai/cluster/process_group_mesh.py | 209 +++ colossalai/context/parallel_context.py | 2 +- .../initializer_1d.py | 3 +- .../initializer_2d.py | 2 +- .../initializer_2p5d.py | 3 +- .../initializer_3d.py | 2 +- .../initializer_data.py | 2 +- .../initializer_model.py | 6 +- .../initializer_pipeline.py | 2 +- .../initializer_sequence.py | 2 +- .../initializer_tensor.py | 5 +- colossalai/inference/README.md | 117 ++ .../inference}/__init__.py | 0 .../inference/tensor_parallel/__init__.py | 4 + .../tensor_parallel/batch_infer_state.py | 55 + .../inference/tensor_parallel/engine.py | 294 ++++ .../tensor_parallel/kvcache_manager.py | 101 ++ .../tensor_parallel/modeling/__init__.py | 4 + .../tensor_parallel/modeling/bloom.py | 521 ++++++ .../tensor_parallel/modeling/llama.py | 359 ++++ .../tensor_parallel/policies/__init__.py | 4 + .../tensor_parallel/policies/bloom.py | 66 + .../tensor_parallel/policies/llama.py | 70 + colossalai/initialize.py | 8 +- colossalai/interface/__init__.py | 4 +- colossalai/interface/model.py | 11 + colossalai/interface/optimizer.py | 4 + colossalai/kernel/__init__.py | 7 + colossalai/kernel/cuda_native/__init__.py | 5 +- .../kernel/cuda_native/mha/mem_eff_attn.py | 15 +- colossalai/kernel/jit/option.py | 2 +- colossalai/kernel/triton/__init__.py | 5 + colossalai/kernel/triton/context_attention.py | 184 +++ .../kernel/triton/copy_kv_cache_dest.py | 69 + colossalai/kernel/triton/fused_layernorm.py | 83 + colossalai/kernel/triton/rms_norm.py | 72 + .../kernel/triton/rotary_embedding_kernel.py | 93 ++ .../{ops.py => self_attention_nofusion.py} | 120 +- colossalai/kernel/triton/softmax.py | 96 ++ colossalai/kernel/triton/softmax_kernel.py | 44 - .../kernel/triton/token_attention_kernel.py | 333 ++++ colossalai/lazy/lazy_init.py | 41 +- .../legacy}/__init__.py | 0 colossalai/{ => legacy}/builder/__init__.py | 0 colossalai/{ => legacy}/builder/builder.py | 4 +- .../{ => legacy}/communication/__init__.py | 18 +- .../{ => legacy}/communication/collective.py | 0 colossalai/{ => legacy}/communication/p2p.py | 0 .../{ => legacy}/communication/p2p_v2.py | 0 colossalai/{ => legacy}/communication/ring.py | 0 .../{ => legacy}/communication/utils.py | 0 colossalai/{ => legacy}/engine/__init__.py | 0 .../{ => legacy}/engine/_base_engine.py | 12 +- .../engine/gradient_accumulation/__init__.py | 4 +- .../_gradient_accumulation.py | 4 +- .../engine/gradient_handler/__init__.py | 0 .../_base_gradient_handler.py | 0 .../_data_parallel_gradient_handler.py | 4 +- .../gradient_handler/_moe_gradient_handler.py | 4 +- .../_pipeline_parallel_gradient_handler.py | 2 +- .../_sequence_parallel_gradient_handler.py | 4 +- .../_zero_gradient_handler.py | 2 +- .../engine/gradient_handler/utils.py | 0 .../{ => legacy}/engine/schedule/__init__.py | 0 .../engine/schedule/_base_schedule.py | 2 +- .../engine/schedule/_non_pipeline_schedule.py | 2 +- .../engine/schedule/_pipeline_schedule.py | 12 +- .../engine/schedule/_pipeline_schedule_v2.py | 8 +- colossalai/legacy/nn/__init__.py | 4 + colossalai/{ => legacy}/nn/_ops/__init__.py | 0 colossalai/{ => legacy}/nn/_ops/_utils.py | 4 +- colossalai/{ => legacy}/nn/_ops/addmm.py | 0 colossalai/{ => legacy}/nn/_ops/batch_norm.py | 0 .../{ => legacy}/nn/_ops/element_wise.py | 0 colossalai/{ => legacy}/nn/_ops/embedding.py | 8 +- .../{ => legacy}/nn/_ops/embedding_bag.py | 8 +- colossalai/{ => legacy}/nn/_ops/layernorm.py | 5 +- colossalai/{ => legacy}/nn/_ops/linear.py | 0 colossalai/{ => legacy}/nn/_ops/loss.py | 9 +- colossalai/{ => legacy}/nn/_ops/view.py | 0 colossalai/legacy/nn/layer/__init__.py | 9 + .../{ => legacy}/nn/layer/base_layer.py | 0 .../nn/layer/colossalai_layer/__init__.py | 14 +- .../nn/layer/colossalai_layer/_utils.py | 0 .../nn/layer/colossalai_layer/dropout.py | 0 .../nn/layer/colossalai_layer/embedding.py | 303 ++-- .../nn/layer/colossalai_layer/linear.py | 2 +- .../layer/colossalai_layer/normalization.py | 83 +- .../legacy/nn/layer/parallel_1d/__init__.py | 17 + .../nn/layer/parallel_1d/_operation.py | 0 .../nn/layer/parallel_1d/_utils.py | 3 +- .../nn/layer/parallel_1d/layers.py | 4 +- .../nn/layer/parallel_2d/__init__.py | 11 +- .../nn/layer/parallel_2d/_operation.py | 21 +- .../nn/layer/parallel_2d/_utils.py | 0 .../nn/layer/parallel_2d/layers.py | 21 +- .../nn/layer/parallel_2p5d/__init__.py | 11 +- .../nn/layer/parallel_2p5d/_operation.py | 7 +- .../nn/layer/parallel_2p5d/_utils.py | 0 .../nn/layer/parallel_2p5d/layers.py | 28 +- .../nn/layer/parallel_3d/__init__.py | 11 +- .../nn/layer/parallel_3d/_operation.py | 2 +- .../nn/layer/parallel_3d/_utils.py | 0 .../nn/layer/parallel_3d/layers.py | 6 +- .../nn/layer/parallel_sequence/__init__.py | 2 +- .../nn/layer/parallel_sequence/_operation.py | 6 +- .../nn/layer/parallel_sequence/_utils.py | 0 .../nn/layer/parallel_sequence/layers.py | 10 +- colossalai/legacy/nn/layer/utils/__init__.py | 15 + .../{ => legacy}/nn/layer/utils/common.py | 3 +- .../{ => legacy}/nn/layer/vanilla/__init__.py | 0 .../{ => legacy}/nn/layer/vanilla/layers.py | 2 +- .../{ => legacy}/nn/layer/wrapper/__init__.py | 0 .../nn/layer/wrapper/pipeline_wrapper.py | 6 +- colossalai/legacy/nn/loss/__init__.py | 41 + colossalai/{ => legacy}/nn/loss/loss_1d.py | 211 +-- colossalai/{ => legacy}/nn/loss/loss_2d.py | 13 +- colossalai/{ => legacy}/nn/loss/loss_2p5d.py | 13 +- colossalai/{ => legacy}/nn/loss/loss_3d.py | 13 +- colossalai/{ => legacy}/nn/metric/__init__.py | 54 +- colossalai/{ => legacy}/nn/metric/_utils.py | 14 +- .../{ => legacy}/nn/metric/accuracy_2d.py | 3 +- .../{ => legacy}/nn/metric/accuracy_2p5d.py | 3 +- .../{ => legacy}/nn/metric/accuracy_3d.py | 68 +- .../{ => legacy}/nn/parallel/__init__.py | 0 .../{ => legacy}/nn/parallel/data_parallel.py | 0 .../nn/parallel/layers/__init__.py | 17 +- .../layers/cache_embedding/__init__.py | 4 +- .../layers/cache_embedding/base_embedding.py | 1 + .../layers/cache_embedding/cache_mgr.py | 20 +- .../cache_embedding/cached_embedding.py | 11 +- .../parallel/layers/cache_embedding/copyer.py | 4 +- .../cache_embedding/embedding_config.py | 0 .../parallel_cached_embedding.py | 9 +- .../parallel_cached_embedding_tablewise.py | 13 +- ..._cached_embedding_tablewise_split_cache.py | 14 +- .../nn/parallel/layers/colo_module.py | 5 +- .../nn/parallel/layers/embedding.py | 3 +- .../{ => legacy}/nn/parallel/layers/linear.py | 3 +- .../nn/parallel/layers/module_utils.py | 8 +- .../{ => legacy}/nn/parallel/reducer.py | 0 colossalai/{ => legacy}/registry/__init__.py | 0 colossalai/{ => legacy}/registry/registry.py | 4 +- colossalai/{ => legacy}/trainer/__init__.py | 0 colossalai/{ => legacy}/trainer/_trainer.py | 9 +- .../{ => legacy}/trainer/hooks/__init__.py | 9 +- .../{ => legacy}/trainer/hooks/_base_hook.py | 0 .../trainer/hooks/_checkpoint_hook.py | 7 +- .../{ => legacy}/trainer/hooks/_commons_.py | 0 .../{ => legacy}/trainer/hooks/_log_hook.py | 10 +- .../trainer/hooks/_lr_scheduler_hook.py | 3 +- .../trainer/hooks/_metric_hook.py | 19 +- colossalai/logging/logger.py | 47 +- colossalai/nn/__init__.py | 3 +- colossalai/nn/layer/__init__.py | 8 - colossalai/nn/layer/parallel_1d/__init__.py | 7 - colossalai/nn/layer/utils.py | 14 + colossalai/nn/layer/utils/__init__.py | 7 - colossalai/nn/loss/__init__.py | 40 - colossalai/nn/loss/loss_moe.py | 161 +- colossalai/nn/lr_scheduler/cosine.py | 5 - colossalai/nn/lr_scheduler/linear.py | 3 - colossalai/nn/lr_scheduler/multistep.py | 3 - colossalai/nn/lr_scheduler/onecycle.py | 3 - colossalai/nn/lr_scheduler/poly.py | 3 - colossalai/nn/lr_scheduler/torch.py | 8 +- colossalai/nn/optimizer/cpu_adam.py | 2 - colossalai/nn/optimizer/fused_adam.py | 2 - colossalai/nn/optimizer/fused_lamb.py | 2 - colossalai/nn/optimizer/fused_sgd.py | 2 - colossalai/nn/optimizer/hybrid_adam.py | 2 - colossalai/nn/optimizer/lamb.py | 3 - colossalai/nn/optimizer/lars.py | 36 +- colossalai/pipeline/p2p.py | 222 +++ colossalai/pipeline/pipelinable.py | 25 +- colossalai/pipeline/schedule/__init__.py | 7 + colossalai/pipeline/schedule/_utils.py | 184 +++ colossalai/pipeline/schedule/base.py | 35 + .../pipeline/schedule/interleaved_pp.py | 372 +++++ colossalai/pipeline/schedule/one_f_one_b.py | 320 ++++ colossalai/pipeline/stage_manager.py | 136 ++ colossalai/pipeline/utils.py | 11 +- colossalai/shardformer/README.md | 177 +- colossalai/shardformer/_utils.py | 44 +- ..._benchmark.py => convergence_benchmark.py} | 7 +- ..._benchmark.sh => convergence_benchmark.sh} | 4 +- .../examples/performance_benchmark.py | 88 + colossalai/shardformer/layer/__init__.py | 5 +- colossalai/shardformer/layer/_operation.py | 304 +++- colossalai/shardformer/layer/embedding.py | 82 +- colossalai/shardformer/layer/linear.py | 168 +- colossalai/shardformer/layer/normalization.py | 14 +- .../shardformer/layer/parallel_module.py | 9 +- .../shardformer/layer/qkv_fused_linear.py | 388 ++++- colossalai/shardformer/layer/utils.py | 9 +- colossalai/shardformer/modeling/bert.py | 1285 +++++++++++++++ colossalai/shardformer/modeling/blip2.py | 120 ++ colossalai/shardformer/modeling/bloom.py | 1007 ++++++++++++ colossalai/shardformer/modeling/chatglm2.py | 399 +++++ .../chatglm2_6b/configuration_chatglm.py | 58 + .../modeling/chatglm2_6b/modeling_chatglm.py | 1373 ++++++++++++++++ colossalai/shardformer/modeling/gpt2.py | 988 +++++++++++ colossalai/shardformer/modeling/jit.py | 34 + colossalai/shardformer/modeling/llama.py | 471 ++++++ colossalai/shardformer/modeling/opt.py | 666 ++++++++ colossalai/shardformer/modeling/sam.py | 203 +++ colossalai/shardformer/modeling/t5.py | 786 +++++++++ colossalai/shardformer/modeling/vit.py | 385 +++++ colossalai/shardformer/modeling/whisper.py | 962 +++++++++++ .../{autopolicy.py => auto_policy.py} | 72 +- .../{basepolicy.py => base_policy.py} | 97 +- colossalai/shardformer/policies/bert.py | 413 ++++- colossalai/shardformer/policies/blip2.py | 326 ++++ colossalai/shardformer/policies/bloom.py | 242 ++- colossalai/shardformer/policies/chatglm2.py | 262 +++ colossalai/shardformer/policies/gpt2.py | 333 +++- colossalai/shardformer/policies/llama.py | 161 +- colossalai/shardformer/policies/opt.py | 178 +- colossalai/shardformer/policies/sam.py | 223 +++ colossalai/shardformer/policies/t5.py | 282 +++- colossalai/shardformer/policies/vit.py | 314 +++- colossalai/shardformer/policies/whisper.py | 495 ++++++ colossalai/shardformer/shard/shard_config.py | 35 +- colossalai/shardformer/shard/sharder.py | 112 +- colossalai/shardformer/shard/shardformer.py | 15 +- colossalai/shardformer/shard/utils.py | 19 + colossalai/tensor/colo_parameter.py | 68 +- colossalai/tensor/colo_tensor.py | 298 +--- colossalai/tensor/d_tensor/api.py | 25 + colossalai/tensor/dist_spec_mgr.py | 1 - colossalai/tensor/param_op_hook.py | 101 +- colossalai/utils/__init__.py | 4 + colossalai/utils/common.py | 19 + .../data_sampler/data_parallel_sampler.py | 26 +- colossalai/utils/profiler/profiler.py | 18 +- .../profiler/stateful_tensor_mem_extention.py | 8 +- colossalai/zero/__init__.py | 5 +- colossalai/zero/gemini/__init__.py | 8 +- colossalai/zero/gemini/chunk/chunk.py | 10 +- colossalai/zero/gemini/chunk/manager.py | 16 +- colossalai/zero/gemini/chunk/search_utils.py | 25 +- colossalai/zero/gemini/colo_init_context.py | 2 +- colossalai/zero/gemini/gemini_ddp.py | 263 ++- colossalai/zero/gemini/gemini_mgr.py | 20 +- colossalai/zero/gemini/gemini_optimizer.py | 94 +- .../zero/gemini/memory_tracer/memory_stats.py | 2 +- .../memory_tracer/runtime_mem_tracer.py | 2 +- colossalai/zero/gemini/placement_policy.py | 197 +-- colossalai/zero/gemini/utils.py | 10 +- .../gemini/ophooks/_shard_grad_ophook.py | 2 +- .../gemini/ophooks/_shard_param_ophook.py | 2 +- .../zero/legacy/sharded_model/zero_hook.py | 2 +- .../low_level/bookkeeping/bucket_store.py | 55 +- .../low_level/bookkeeping/gradient_store.py | 4 +- colossalai/zero/low_level/low_level_optim.py | 82 +- colossalai/zero/low_level/readme.md | 44 +- colossalai/zero/wrapper.py | 4 +- docker/Dockerfile | 5 +- docs/README-zh-Hans.md | 14 +- .../advanced_tutorials/add_your_parallel.md | 9 +- ...parallelize_your_training_like_Megatron.md | 2 +- .../train_gpt_using_hybrid_parallelism.md | 9 +- .../train_vit_using_pipeline_parallelism.md | 17 +- .../train_vit_with_hybrid_parallelism.md | 15 +- docs/source/en/basics/booster_api.md | 27 +- docs/source/en/basics/booster_checkpoint.md | 2 +- docs/source/en/basics/booster_plugins.md | 25 +- docs/source/en/basics/engine_trainer.md | 9 +- docs/source/en/basics/model_checkpoint.md | 3 +- docs/source/en/features/gradient_handler.md | 5 +- .../en/features/mixed_precision_training.md | 2 +- docs/source/en/features/pipeline_parallel.md | 3 +- docs/source/en/features/zero_with_chunk.md | 79 +- .../advanced_tutorials/add_your_parallel.md | 9 +- ...parallelize_your_training_like_Megatron.md | 2 +- .../train_gpt_using_hybrid_parallelism.md | 9 +- .../train_vit_using_pipeline_parallelism.md | 17 +- .../train_vit_with_hybrid_parallelism.md | 15 +- docs/source/zh-Hans/basics/booster_api.md | 23 +- .../zh-Hans/basics/booster_checkpoint.md | 12 +- docs/source/zh-Hans/basics/booster_plugins.md | 32 +- docs/source/zh-Hans/basics/engine_trainer.md | 9 +- .../source/zh-Hans/basics/model_checkpoint.md | 3 +- .../zh-Hans/features/gradient_handler.md | 5 +- .../features/mixed_precision_training.md | 2 +- .../zh-Hans/features/pipeline_parallel.md | 3 +- .../zh-Hans/features/zero_with_chunk.md | 81 +- .../roberta/pretraining/run_pretraining.py | 7 +- examples/images/diffusion/requirements.txt | 2 +- examples/images/dreambooth/test_ci.sh | 3 +- .../dreambooth/train_dreambooth_colossalai.py | 53 +- .../train_dreambooth_colossalai_lora.py | 30 +- examples/images/resnet/README.md | 6 +- examples/images/resnet/train.py | 4 +- examples/images/vit/README.md | 4 +- examples/images/vit/args.py | 160 +- examples/images/vit/data.py | 22 +- examples/images/vit/run_benchmark.sh | 11 +- examples/images/vit/run_demo.sh | 13 +- examples/images/vit/test_ci.sh | 7 +- examples/images/vit/vit_benchmark.py | 112 +- examples/images/vit/vit_train_demo.py | 193 ++- examples/inference/bench_bloom.py | 100 ++ examples/inference/bench_llama.py | 128 ++ examples/language/bert/README.md | 18 +- examples/language/bert/finetune.py | 162 +- examples/language/bert/test_ci.sh | 2 +- examples/language/gpt/gemini/run_gemini.sh | 6 - examples/language/gpt/gemini/test_ci.sh | 22 +- .../language/gpt/gemini/train_gpt_demo.py | 106 +- .../language/gpt/titans/dataset/webtext.py | 2 +- examples/language/gpt/titans/model/embed.py | 10 +- examples/language/gpt/titans/model/gpt1d.py | 6 +- .../gpt/titans/model/pipeline_gpt1d.py | 2 +- examples/language/gpt/titans/train_gpt.py | 2 +- examples/language/llama/README.md | 11 - examples/language/llama2/README.md | 194 +++ examples/language/llama2/attn.py | 83 + examples/language/llama2/benchmark.py | 211 +++ examples/language/llama2/data_utils.py | 119 ++ examples/language/llama2/model_utils.py | 32 + .../language/llama2/performance_evaluator.py | 102 ++ examples/language/llama2/pretrain.py | 275 ++++ examples/language/llama2/requirements.txt | 9 + .../llama2/scripts/benchmark_70B/3d.sh | 17 + .../llama2/scripts/benchmark_70B/gemini.sh | 13 + .../scripts/benchmark_70B/gemini_auto.sh | 13 + .../llama2/scripts/benchmark_7B/gemini.sh | 13 + .../scripts/benchmark_7B/gemini_auto.sh | 13 + .../language/{llama => llama2}/test_ci.sh | 0 examples/language/opt/args.py | 140 +- examples/language/opt/opt_benchmark.py | 47 +- examples/language/opt/opt_train_demo.py | 128 +- examples/language/opt/run_demo.sh | 2 +- examples/language/palm/train.py | 90 +- examples/tutorial/auto_parallel/README.md | 2 +- examples/tutorial/hybrid_parallel/test_ci.sh | 6 +- examples/tutorial/hybrid_parallel/train.py | 2 +- examples/tutorial/opt/opt/requirements.txt | 2 +- examples/tutorial/opt/opt/run_clm.py | 33 +- examples/tutorial/opt/opt/test_ci.sh | 4 +- .../data/datasets/indexed_dataset.py | 77 +- .../tutorial/sequence_parallel/model/bert.py | 60 +- .../model/layers/bert_layer.py | 24 +- .../sequence_parallel/requirements.txt | 1 + examples/tutorial/sequence_parallel/train.py | 2 +- op_builder/utils.py | 3 +- pytest.ini | 8 +- requirements/requirements-test.txt | 3 +- requirements/requirements.txt | 2 +- .../components_to_test/hanging_param_model.py | 2 +- tests/components_to_test/inline_op_model.py | 2 +- tests/components_to_test/nested_model.py | 2 +- .../repeated_computed_layers.py | 2 +- tests/components_to_test/simple_net.py | 2 +- tests/kit/model_zoo/transformers/__init__.py | 5 + tests/kit/model_zoo/transformers/albert.py | 13 +- tests/kit/model_zoo/transformers/bert.py | 52 +- tests/kit/model_zoo/transformers/blip2.py | 62 + tests/kit/model_zoo/transformers/bloom.py | 30 +- tests/kit/model_zoo/transformers/chatglm2.py | 58 + tests/kit/model_zoo/transformers/gpt.py | 66 +- tests/kit/model_zoo/transformers/llama.py | 3 + tests/kit/model_zoo/transformers/opt.py | 17 +- tests/kit/model_zoo/transformers/sam.py | 52 + tests/kit/model_zoo/transformers/t5.py | 10 +- tests/kit/model_zoo/transformers/vit.py | 64 + tests/kit/model_zoo/transformers/whisper.py | 91 ++ .../test_plugin/test_3d_plugin.py | 99 ++ .../test_plugin/test_gemini_plugin.py | 39 +- .../test_gemini_checkpoint_io.py | 54 +- .../test_gemini_torch_compability.py | 16 +- ...st_hybrid_parallel_plugin_checkpoint_io.py | 164 ++ .../test_low_level_zero_checkpoint_io.py | 26 +- .../test_plugins_huggingface_compatibility.py | 83 + tests/test_cluster/test_process_group_mesh.py | 151 ++ tests/test_config/test_load_config.py | 1 - tests/test_context/test_hybrid_parallel.py | 1 - tests/test_data/test_cifar10_dataset.py | 3 +- tests/test_data/test_data_parallel_sampler.py | 1 - .../test_deterministic_dataloader.py | 1 - .../test_cifar_with_data_pipeline_tensor.py | 100 -- ...test_cifar_with_data_pipeline_tensor_v2.py | 104 -- tests/test_ddp/test_ddp_ignore_params.py | 92 -- tests/test_ddp/test_ddp_state_dict.py | 67 - tests/test_ddp/test_reducer.py | 47 - .../test_hf_model/hf_tracer_utils.py | 1 + .../test_tracer/test_hf_model/test_hf_bert.py | 2 + .../test_tracer/test_hf_model/test_hf_gpt.py | 4 +- tests/test_infer/_utils.py | 53 + tests/test_infer/test_bloom_infer.py | 58 + tests/test_infer/test_infer_engine.py | 94 ++ tests/test_infer/test_kvcache_manager.py | 61 + tests/test_infer/test_llama_infer.py | 84 + .../test_infer_ops/cuda/test_vllm_rmsnorm.py | 60 + .../cuda/test_vllm_rotary_embedding.py | 156 ++ tests/test_infer_ops/triton/kernel_utils.py | 28 + .../triton/test_bloom_context_attention.py | 54 + .../triton/test_copy_kv_dest.py | 39 + .../triton/test_layernorm_triton.py | 44 + .../triton/test_llama_context_attention.py | 53 + .../triton/test_rotary_embedding.py | 56 + .../triton/test_self_attention_nonfusion.py} | 9 +- .../triton}/test_softmax.py | 12 +- .../triton/test_token_attn_1.py | 72 + .../triton/test_token_attn_2.py | 61 + .../triton/test_token_attn_fwd.py | 67 + .../triton/test_token_softmax.py | 48 + tests/test_lazy/test_models.py | 4 +- .../test_comm/test_boardcast_send_recv_v2.py | 2 +- .../{ => test_legacy}/test_comm/test_comm.py | 2 +- .../test_comm/test_object_list_p2p.py | 8 +- .../test_comm/test_object_list_p2p_v2.py | 2 +- .../test_engine/test_engine.py | 0 .../test_engine/test_gradient_accumluation.py | 0 .../test_1d/checks_1d}/__init__.py | 0 .../test_1d/checks_1d/check_layer_1d.py | 2 +- .../test_layers/test_1d/checks_1d/common.py | 31 +- .../test_layers/test_1d/test_1d.py | 0 .../test_2d/checks_2d}/__init__.py | 0 .../test_2d/checks_2d/check_layer_2d.py | 25 +- .../test_2d/checks_2d/check_operation_2d.py | 8 +- .../test_layers/test_2d/checks_2d/common.py | 0 .../test_layers/test_2d/test_2d.py | 0 .../test_2p5d/checks_2p5d}/__init__.py | 0 .../test_2p5d/checks_2p5d/check_layer_2p5d.py | 25 +- .../checks_2p5d/check_operation_2p5d.py | 7 +- .../test_2p5d/checks_2p5d/common.py | 2 +- .../test_layers/test_2p5d/test_2p5d.py | 0 .../test_layers/test_3d/checks_3d/__init__.py | 0 .../test_3d/checks_3d/check_layer_3d.py | 6 +- .../test_layers/test_3d/checks_3d/common.py | 2 +- .../test_layers/test_3d/test_3d.py | 0 .../test_layers/test_cache_embedding.py | 2 +- .../test_sequence/checks_seq/__init__.py | 0 .../checks_seq/check_layer_seq.py | 2 +- .../test_sequence/test_sequence.py | 5 +- .../test_trainer/test_pipeline/test_p2p.py | 8 +- .../test_pipeline/test_pipeline_schedule.py | 0 .../test_trainer_with_non_pipe_schedule.py | 2 +- .../test_trainer_with_pipe_schedule.py | 2 +- tests/test_moe/test_grad_handler.py | 2 +- tests/test_moe/test_moe_zero_model.py | 2 +- tests/test_moe/test_moe_zero_optim.py | 2 +- tests/test_ops/test_addmm_tp.py | 73 - tests/test_ops/test_embedding_bag_tp.py | 43 - tests/test_ops/test_embedding_tp.py | 44 - tests/test_ops/test_linear_tp.py | 48 - tests/test_ops/test_loss_func.py | 48 - tests/test_ops/test_op.py | 87 - tests/test_ops/test_view.py | 97 -- .../test_cuda_rpc_performance.py | 90 -- tests/test_pipeline/test_p2p_communication.py | 59 + tests/test_pipeline/test_pipelinable.py | 2 + .../test_t5_pipeline_utils.py | 39 + .../test_whisper_pipeline_utils.py | 44 + .../test_schedule/test_interleaved.py | 161 ++ .../test_schedule/test_oneF_oneB.py | 134 ++ .../test_pipeline_schedule_utils.py | 47 + tests/test_pipeline/test_stage_manager.py | 78 + .../test_layer/test_embedding.py | 15 +- .../test_gpt2_qkv_fused_linear_1d.py | 152 ++ .../test_layer/test_layernorm.py | 18 +- .../test_layer/test_linear_1d.py | 102 +- .../test_layer/test_qkv_fused_linear_1d.py | 30 +- .../test_vocab_parallel_embedding_1d.py | 13 +- tests/test_shardformer/test_model/_utils.py | 303 +++- .../test_model/test_shard_bert.py | 239 ++- .../test_model/test_shard_blip2.py | 76 + .../test_model/test_shard_bloom.py | 250 ++- .../test_model/test_shard_chatglm2.py | 219 +++ .../test_model/test_shard_gpt2.py | 262 ++- .../test_model/test_shard_llama.py | 263 ++- .../test_model/test_shard_opt.py | 254 ++- .../test_model/test_shard_sam.py | 69 + .../test_model/test_shard_t5.py | 251 ++- .../test_model/test_shard_vit.py | 221 ++- .../test_model/test_shard_whisper.py | 246 +++ tests/test_shardformer/test_shard_utils.py | 27 + tests/test_shardformer/test_with_torch_ddp.py | 26 +- tests/test_tensor/core/test_tensor.py | 153 -- tests/test_tensor/model/test_gpt2.py | 148 -- tests/test_tensor/model/test_model.py | 334 ---- tests/test_tensor/model/test_module_spec.py | 227 --- .../test_tensor/test_colo_checkpoint_tools.py | 41 - tests/test_tensor/test_context.py | 64 - tests/test_tensor/test_sharded_linear.py | 232 --- tests/test_tensor/test_tp_with_zero.py | 143 -- .../test_activation_checkpointing.py | 1 - .../test_checkpoint/test_checkpoint_1d.py | 2 +- .../test_checkpoint/test_checkpoint_2d.py | 2 +- .../test_checkpoint/test_checkpoint_2p5d.py | 2 +- .../test_checkpoint/test_checkpoint_3d.py | 2 +- tests/test_utils/test_colo_checkpoint.py | 206 --- tests/test_utils/test_flash_attention.py | 3 +- .../test_utils/test_norm_gradient_clipping.py | 1 + .../test_zero/test_gemini/test_chunk_mgrv2.py | 10 +- tests/test_zero/test_gemini/test_chunkv2.py | 4 +- tests/test_zero/test_gemini/test_fwd_bwd.py | 105 +- .../test_gemini/test_gemini_use_rmt.py | 24 +- .../test_gemini/test_get_torch_model.py | 52 - tests/test_zero/test_gemini/test_grad_clip.py | 55 +- tests/test_zero/test_gemini/test_inference.py | 64 +- tests/test_zero/test_gemini/test_optim.py | 81 +- .../test_gemini/test_runtime_mem_tracer.py | 6 +- tests/test_zero/test_gemini/test_search.py | 58 +- .../test_gemini/test_zeroddp_state_dict.py | 80 +- .../test_zeroddp_state_dict_shard.py | 56 - .../test_gemini/test_zerooptim_state_dict.py | 51 +- .../test_zero/test_low_level/test_grad_acc.py | 28 +- .../test_zero/test_low_level/test_zero1_2.py | 2 +- .../test_low_level/test_zero_ckpt.py | 2 +- .../test_low_level/test_zero_init.py | 55 - .../test_zero/test_low_level/test_zero_tp.py | 1 + version.txt | 2 +- 569 files changed, 31709 insertions(+), 8013 deletions(-) create mode 100644 applications/Chat/coati/models/chatglm/__init__.py create mode 100644 applications/Chat/coati/models/chatglm/chatglm_actor.py create mode 100644 applications/Chat/coati/models/chatglm/chatglm_tokenizer.py create mode 100644 applications/Chat/coati/models/chatglm/configuration_chatglm.py create mode 100644 applications/Chat/coati/models/chatglm/modeling_chatglm.py create mode 100644 colossalai/amp/naive_amp/mixed_precision_optimizer.py create mode 100644 colossalai/booster/plugin/hybrid_parallel_plugin.py create mode 100644 colossalai/booster/plugin/pp_plugin_base.py create mode 100644 colossalai/checkpoint_io/hybrid_parallel_checkpoint_io.py create mode 100644 colossalai/cluster/process_group_mesh.py create mode 100644 colossalai/inference/README.md rename {tests/test_layers/test_1d/checks_1d => colossalai/inference}/__init__.py (100%) create mode 100644 colossalai/inference/tensor_parallel/__init__.py create mode 100644 colossalai/inference/tensor_parallel/batch_infer_state.py create mode 100644 colossalai/inference/tensor_parallel/engine.py create mode 100644 colossalai/inference/tensor_parallel/kvcache_manager.py create mode 100644 colossalai/inference/tensor_parallel/modeling/__init__.py create mode 100644 colossalai/inference/tensor_parallel/modeling/bloom.py create mode 100644 colossalai/inference/tensor_parallel/modeling/llama.py create mode 100644 colossalai/inference/tensor_parallel/policies/__init__.py create mode 100644 colossalai/inference/tensor_parallel/policies/bloom.py create mode 100644 colossalai/inference/tensor_parallel/policies/llama.py create mode 100644 colossalai/kernel/triton/__init__.py create mode 100644 colossalai/kernel/triton/context_attention.py create mode 100644 colossalai/kernel/triton/copy_kv_cache_dest.py create mode 100644 colossalai/kernel/triton/fused_layernorm.py create mode 100644 colossalai/kernel/triton/rms_norm.py create mode 100644 colossalai/kernel/triton/rotary_embedding_kernel.py rename colossalai/kernel/triton/{ops.py => self_attention_nofusion.py} (57%) create mode 100644 colossalai/kernel/triton/softmax.py delete mode 100644 colossalai/kernel/triton/softmax_kernel.py create mode 100644 colossalai/kernel/triton/token_attention_kernel.py rename {tests/test_layers/test_2d/checks_2d => colossalai/legacy}/__init__.py (100%) rename colossalai/{ => legacy}/builder/__init__.py (100%) rename colossalai/{ => legacy}/builder/builder.py (96%) rename colossalai/{ => legacy}/communication/__init__.py (53%) rename colossalai/{ => legacy}/communication/collective.py (100%) rename colossalai/{ => legacy}/communication/p2p.py (100%) rename colossalai/{ => legacy}/communication/p2p_v2.py (100%) rename colossalai/{ => legacy}/communication/ring.py (100%) rename colossalai/{ => legacy}/communication/utils.py (100%) rename colossalai/{ => legacy}/engine/__init__.py (100%) rename colossalai/{ => legacy}/engine/_base_engine.py (97%) rename colossalai/{ => legacy}/engine/gradient_accumulation/__init__.py (94%) rename colossalai/{ => legacy}/engine/gradient_accumulation/_gradient_accumulation.py (98%) rename colossalai/{ => legacy}/engine/gradient_handler/__init__.py (100%) rename colossalai/{ => legacy}/engine/gradient_handler/_base_gradient_handler.py (100%) rename colossalai/{ => legacy}/engine/gradient_handler/_data_parallel_gradient_handler.py (90%) rename colossalai/{ => legacy}/engine/gradient_handler/_moe_gradient_handler.py (94%) rename colossalai/{ => legacy}/engine/gradient_handler/_pipeline_parallel_gradient_handler.py (97%) rename colossalai/{ => legacy}/engine/gradient_handler/_sequence_parallel_gradient_handler.py (90%) rename colossalai/{ => legacy}/engine/gradient_handler/_zero_gradient_handler.py (92%) rename colossalai/{ => legacy}/engine/gradient_handler/utils.py (100%) rename colossalai/{ => legacy}/engine/schedule/__init__.py (100%) rename colossalai/{ => legacy}/engine/schedule/_base_schedule.py (98%) rename colossalai/{ => legacy}/engine/schedule/_non_pipeline_schedule.py (97%) rename colossalai/{ => legacy}/engine/schedule/_pipeline_schedule.py (98%) rename colossalai/{ => legacy}/engine/schedule/_pipeline_schedule_v2.py (96%) create mode 100644 colossalai/legacy/nn/__init__.py rename colossalai/{ => legacy}/nn/_ops/__init__.py (100%) rename colossalai/{ => legacy}/nn/_ops/_utils.py (99%) rename colossalai/{ => legacy}/nn/_ops/addmm.py (100%) rename colossalai/{ => legacy}/nn/_ops/batch_norm.py (100%) rename colossalai/{ => legacy}/nn/_ops/element_wise.py (100%) rename colossalai/{ => legacy}/nn/_ops/embedding.py (98%) rename colossalai/{ => legacy}/nn/_ops/embedding_bag.py (97%) rename colossalai/{ => legacy}/nn/_ops/layernorm.py (92%) rename colossalai/{ => legacy}/nn/_ops/linear.py (100%) rename colossalai/{ => legacy}/nn/_ops/loss.py (96%) rename colossalai/{ => legacy}/nn/_ops/view.py (100%) create mode 100644 colossalai/legacy/nn/layer/__init__.py rename colossalai/{ => legacy}/nn/layer/base_layer.py (100%) rename colossalai/{ => legacy}/nn/layer/colossalai_layer/__init__.py (97%) rename colossalai/{ => legacy}/nn/layer/colossalai_layer/_utils.py (100%) rename colossalai/{ => legacy}/nn/layer/colossalai_layer/dropout.py (100%) rename colossalai/{ => legacy}/nn/layer/colossalai_layer/embedding.py (97%) rename colossalai/{ => legacy}/nn/layer/colossalai_layer/linear.py (99%) rename colossalai/{ => legacy}/nn/layer/colossalai_layer/normalization.py (97%) create mode 100644 colossalai/legacy/nn/layer/parallel_1d/__init__.py rename colossalai/{ => legacy}/nn/layer/parallel_1d/_operation.py (100%) rename colossalai/{ => legacy}/nn/layer/parallel_1d/_utils.py (99%) rename colossalai/{ => legacy}/nn/layer/parallel_1d/layers.py (99%) rename colossalai/{ => legacy}/nn/layer/parallel_2d/__init__.py (59%) rename colossalai/{ => legacy}/nn/layer/parallel_2d/_operation.py (98%) rename colossalai/{ => legacy}/nn/layer/parallel_2d/_utils.py (100%) rename colossalai/{ => legacy}/nn/layer/parallel_2d/layers.py (99%) rename colossalai/{ => legacy}/nn/layer/parallel_2p5d/__init__.py (59%) rename colossalai/{ => legacy}/nn/layer/parallel_2p5d/_operation.py (99%) rename colossalai/{ => legacy}/nn/layer/parallel_2p5d/_utils.py (100%) rename colossalai/{ => legacy}/nn/layer/parallel_2p5d/layers.py (99%) rename colossalai/{ => legacy}/nn/layer/parallel_3d/__init__.py (62%) rename colossalai/{ => legacy}/nn/layer/parallel_3d/_operation.py (99%) rename colossalai/{ => legacy}/nn/layer/parallel_3d/_utils.py (100%) rename colossalai/{ => legacy}/nn/layer/parallel_3d/layers.py (99%) rename colossalai/{ => legacy}/nn/layer/parallel_sequence/__init__.py (74%) rename colossalai/{ => legacy}/nn/layer/parallel_sequence/_operation.py (97%) rename colossalai/{ => legacy}/nn/layer/parallel_sequence/_utils.py (100%) rename colossalai/{ => legacy}/nn/layer/parallel_sequence/layers.py (98%) create mode 100644 colossalai/legacy/nn/layer/utils/__init__.py rename colossalai/{ => legacy}/nn/layer/utils/common.py (99%) rename colossalai/{ => legacy}/nn/layer/vanilla/__init__.py (100%) rename colossalai/{ => legacy}/nn/layer/vanilla/layers.py (99%) rename colossalai/{ => legacy}/nn/layer/wrapper/__init__.py (100%) rename colossalai/{ => legacy}/nn/layer/wrapper/pipeline_wrapper.py (99%) create mode 100644 colossalai/legacy/nn/loss/__init__.py rename colossalai/{ => legacy}/nn/loss/loss_1d.py (96%) rename colossalai/{ => legacy}/nn/loss/loss_2d.py (96%) rename colossalai/{ => legacy}/nn/loss/loss_2p5d.py (95%) rename colossalai/{ => legacy}/nn/loss/loss_3d.py (95%) rename colossalai/{ => legacy}/nn/metric/__init__.py (87%) rename colossalai/{ => legacy}/nn/metric/_utils.py (95%) rename colossalai/{ => legacy}/nn/metric/accuracy_2d.py (89%) rename colossalai/{ => legacy}/nn/metric/accuracy_2p5d.py (88%) rename colossalai/{ => legacy}/nn/metric/accuracy_3d.py (85%) rename colossalai/{ => legacy}/nn/parallel/__init__.py (100%) rename colossalai/{ => legacy}/nn/parallel/data_parallel.py (100%) rename colossalai/{ => legacy}/nn/parallel/layers/__init__.py (56%) rename colossalai/{ => legacy}/nn/parallel/layers/cache_embedding/__init__.py (100%) rename colossalai/{ => legacy}/nn/parallel/layers/cache_embedding/base_embedding.py (99%) rename colossalai/{ => legacy}/nn/parallel/layers/cache_embedding/cache_mgr.py (99%) rename colossalai/{ => legacy}/nn/parallel/layers/cache_embedding/cached_embedding.py (98%) rename colossalai/{ => legacy}/nn/parallel/layers/cache_embedding/copyer.py (97%) rename colossalai/{ => legacy}/nn/parallel/layers/cache_embedding/embedding_config.py (100%) rename colossalai/{ => legacy}/nn/parallel/layers/cache_embedding/parallel_cached_embedding.py (96%) rename colossalai/{ => legacy}/nn/parallel/layers/cache_embedding/parallel_cached_embedding_tablewise.py (99%) rename colossalai/{ => legacy}/nn/parallel/layers/cache_embedding/parallel_cached_embedding_tablewise_split_cache.py (99%) rename colossalai/{ => legacy}/nn/parallel/layers/colo_module.py (98%) rename colossalai/{ => legacy}/nn/parallel/layers/embedding.py (92%) rename colossalai/{ => legacy}/nn/parallel/layers/linear.py (93%) rename colossalai/{ => legacy}/nn/parallel/layers/module_utils.py (99%) rename colossalai/{ => legacy}/nn/parallel/reducer.py (100%) rename colossalai/{ => legacy}/registry/__init__.py (100%) rename colossalai/{ => legacy}/registry/registry.py (98%) rename colossalai/{ => legacy}/trainer/__init__.py (100%) rename colossalai/{ => legacy}/trainer/_trainer.py (98%) rename colossalai/{ => legacy}/trainer/hooks/__init__.py (75%) rename colossalai/{ => legacy}/trainer/hooks/_base_hook.py (100%) rename colossalai/{ => legacy}/trainer/hooks/_checkpoint_hook.py (96%) rename colossalai/{ => legacy}/trainer/hooks/_commons_.py (100%) rename colossalai/{ => legacy}/trainer/hooks/_log_hook.py (98%) rename colossalai/{ => legacy}/trainer/hooks/_lr_scheduler_hook.py (97%) rename colossalai/{ => legacy}/trainer/hooks/_metric_hook.py (97%) delete mode 100644 colossalai/nn/layer/parallel_1d/__init__.py create mode 100644 colossalai/nn/layer/utils.py delete mode 100644 colossalai/nn/layer/utils/__init__.py create mode 100644 colossalai/pipeline/p2p.py create mode 100644 colossalai/pipeline/schedule/__init__.py create mode 100644 colossalai/pipeline/schedule/_utils.py create mode 100644 colossalai/pipeline/schedule/base.py create mode 100644 colossalai/pipeline/schedule/interleaved_pp.py create mode 100644 colossalai/pipeline/schedule/one_f_one_b.py create mode 100644 colossalai/pipeline/stage_manager.py rename colossalai/shardformer/examples/{shardformer_benchmark.py => convergence_benchmark.py} (95%) rename colossalai/shardformer/examples/{shardformer_benchmark.sh => convergence_benchmark.sh} (68%) create mode 100644 colossalai/shardformer/examples/performance_benchmark.py create mode 100644 colossalai/shardformer/modeling/bert.py create mode 100644 colossalai/shardformer/modeling/blip2.py create mode 100644 colossalai/shardformer/modeling/chatglm2.py create mode 100644 colossalai/shardformer/modeling/chatglm2_6b/configuration_chatglm.py create mode 100644 colossalai/shardformer/modeling/chatglm2_6b/modeling_chatglm.py create mode 100644 colossalai/shardformer/modeling/gpt2.py create mode 100644 colossalai/shardformer/modeling/jit.py create mode 100644 colossalai/shardformer/modeling/llama.py create mode 100644 colossalai/shardformer/modeling/opt.py create mode 100644 colossalai/shardformer/modeling/sam.py create mode 100644 colossalai/shardformer/modeling/t5.py create mode 100644 colossalai/shardformer/modeling/vit.py create mode 100644 colossalai/shardformer/modeling/whisper.py rename colossalai/shardformer/policies/{autopolicy.py => auto_policy.py} (61%) rename colossalai/shardformer/policies/{basepolicy.py => base_policy.py} (62%) create mode 100644 colossalai/shardformer/policies/blip2.py create mode 100644 colossalai/shardformer/policies/chatglm2.py create mode 100644 colossalai/shardformer/policies/sam.py create mode 100644 colossalai/shardformer/policies/whisper.py create mode 100644 colossalai/shardformer/shard/utils.py create mode 100644 examples/inference/bench_bloom.py create mode 100644 examples/inference/bench_llama.py delete mode 100644 examples/language/llama/README.md create mode 100644 examples/language/llama2/README.md create mode 100644 examples/language/llama2/attn.py create mode 100644 examples/language/llama2/benchmark.py create mode 100644 examples/language/llama2/data_utils.py create mode 100644 examples/language/llama2/model_utils.py create mode 100644 examples/language/llama2/performance_evaluator.py create mode 100644 examples/language/llama2/pretrain.py create mode 100644 examples/language/llama2/requirements.txt create mode 100644 examples/language/llama2/scripts/benchmark_70B/3d.sh create mode 100644 examples/language/llama2/scripts/benchmark_70B/gemini.sh create mode 100644 examples/language/llama2/scripts/benchmark_70B/gemini_auto.sh create mode 100644 examples/language/llama2/scripts/benchmark_7B/gemini.sh create mode 100644 examples/language/llama2/scripts/benchmark_7B/gemini_auto.sh rename examples/language/{llama => llama2}/test_ci.sh (100%) create mode 100644 tests/kit/model_zoo/transformers/blip2.py create mode 100644 tests/kit/model_zoo/transformers/chatglm2.py create mode 100644 tests/kit/model_zoo/transformers/sam.py create mode 100644 tests/kit/model_zoo/transformers/vit.py create mode 100644 tests/kit/model_zoo/transformers/whisper.py create mode 100644 tests/test_booster/test_plugin/test_3d_plugin.py create mode 100644 tests/test_checkpoint_io/test_hybrid_parallel_plugin_checkpoint_io.py create mode 100644 tests/test_checkpoint_io/test_plugins_huggingface_compatibility.py create mode 100644 tests/test_cluster/test_process_group_mesh.py delete mode 100644 tests/test_data_pipeline_tensor_parallel/test_cifar_with_data_pipeline_tensor.py delete mode 100644 tests/test_data_pipeline_tensor_parallel/test_cifar_with_data_pipeline_tensor_v2.py delete mode 100644 tests/test_ddp/test_ddp_ignore_params.py delete mode 100644 tests/test_ddp/test_ddp_state_dict.py delete mode 100644 tests/test_ddp/test_reducer.py create mode 100644 tests/test_infer/_utils.py create mode 100644 tests/test_infer/test_bloom_infer.py create mode 100644 tests/test_infer/test_infer_engine.py create mode 100644 tests/test_infer/test_kvcache_manager.py create mode 100644 tests/test_infer/test_llama_infer.py create mode 100644 tests/test_infer_ops/cuda/test_vllm_rmsnorm.py create mode 100644 tests/test_infer_ops/cuda/test_vllm_rotary_embedding.py create mode 100644 tests/test_infer_ops/triton/kernel_utils.py create mode 100644 tests/test_infer_ops/triton/test_bloom_context_attention.py create mode 100644 tests/test_infer_ops/triton/test_copy_kv_dest.py create mode 100644 tests/test_infer_ops/triton/test_layernorm_triton.py create mode 100644 tests/test_infer_ops/triton/test_llama_context_attention.py create mode 100644 tests/test_infer_ops/triton/test_rotary_embedding.py rename tests/{test_kernels/test_self_attention.py => test_infer_ops/triton/test_self_attention_nonfusion.py} (91%) rename tests/{test_kernels => test_infer_ops/triton}/test_softmax.py (70%) create mode 100644 tests/test_infer_ops/triton/test_token_attn_1.py create mode 100644 tests/test_infer_ops/triton/test_token_attn_2.py create mode 100644 tests/test_infer_ops/triton/test_token_attn_fwd.py create mode 100644 tests/test_infer_ops/triton/test_token_softmax.py rename tests/{ => test_legacy}/test_comm/test_boardcast_send_recv_v2.py (93%) rename tests/{ => test_legacy}/test_comm/test_comm.py (96%) rename tests/{ => test_legacy}/test_comm/test_object_list_p2p.py (98%) rename tests/{ => test_legacy}/test_comm/test_object_list_p2p_v2.py (97%) rename tests/{ => test_legacy}/test_engine/test_engine.py (100%) rename tests/{ => test_legacy}/test_engine/test_gradient_accumluation.py (100%) rename tests/{test_layers/test_2p5d/checks_2p5d => test_legacy/test_layers/test_1d/checks_1d}/__init__.py (100%) rename tests/{ => test_legacy}/test_layers/test_1d/checks_1d/check_layer_1d.py (99%) rename tests/{ => test_legacy}/test_layers/test_1d/checks_1d/common.py (94%) rename tests/{ => test_legacy}/test_layers/test_1d/test_1d.py (100%) rename tests/{test_layers/test_3d/checks_3d => test_legacy/test_layers/test_2d/checks_2d}/__init__.py (100%) rename tests/{ => test_legacy}/test_layers/test_2d/checks_2d/check_layer_2d.py (97%) rename tests/{ => test_legacy}/test_layers/test_2d/checks_2d/check_operation_2d.py (96%) rename tests/{ => test_legacy}/test_layers/test_2d/checks_2d/common.py (100%) rename tests/{ => test_legacy}/test_layers/test_2d/test_2d.py (100%) rename tests/{test_layers/test_sequence/checks_seq => test_legacy/test_layers/test_2p5d/checks_2p5d}/__init__.py (100%) rename tests/{ => test_legacy}/test_layers/test_2p5d/checks_2p5d/check_layer_2p5d.py (98%) rename tests/{ => test_legacy}/test_layers/test_2p5d/checks_2p5d/check_operation_2p5d.py (97%) rename tests/{ => test_legacy}/test_layers/test_2p5d/checks_2p5d/common.py (75%) rename tests/{ => test_legacy}/test_layers/test_2p5d/test_2p5d.py (100%) create mode 100644 tests/test_legacy/test_layers/test_3d/checks_3d/__init__.py rename tests/{ => test_legacy}/test_layers/test_3d/checks_3d/check_layer_3d.py (99%) rename tests/{ => test_legacy}/test_layers/test_3d/checks_3d/common.py (95%) rename tests/{ => test_legacy}/test_layers/test_3d/test_3d.py (100%) rename tests/{ => test_legacy}/test_layers/test_cache_embedding.py (99%) create mode 100644 tests/test_legacy/test_layers/test_sequence/checks_seq/__init__.py rename tests/{ => test_legacy}/test_layers/test_sequence/checks_seq/check_layer_seq.py (91%) rename tests/{ => test_legacy}/test_layers/test_sequence/test_sequence.py (97%) rename tests/{ => test_legacy}/test_trainer/test_pipeline/test_p2p.py (98%) rename tests/{ => test_legacy}/test_trainer/test_pipeline/test_pipeline_schedule.py (100%) rename tests/{ => test_legacy}/test_trainer/test_trainer_with_non_pipe_schedule.py (97%) rename tests/{ => test_legacy}/test_trainer/test_trainer_with_pipe_schedule.py (98%) delete mode 100644 tests/test_ops/test_addmm_tp.py delete mode 100644 tests/test_ops/test_embedding_bag_tp.py delete mode 100644 tests/test_ops/test_embedding_tp.py delete mode 100644 tests/test_ops/test_linear_tp.py delete mode 100644 tests/test_ops/test_loss_func.py delete mode 100644 tests/test_ops/test_op.py delete mode 100644 tests/test_ops/test_view.py delete mode 100644 tests/test_pipeline/test_cuda_rpc_performance.py create mode 100644 tests/test_pipeline/test_p2p_communication.py create mode 100644 tests/test_pipeline/test_pipeline_utils/test_t5_pipeline_utils.py create mode 100644 tests/test_pipeline/test_pipeline_utils/test_whisper_pipeline_utils.py create mode 100644 tests/test_pipeline/test_schedule/test_interleaved.py create mode 100644 tests/test_pipeline/test_schedule/test_oneF_oneB.py create mode 100644 tests/test_pipeline/test_schedule/test_pipeline_schedule_utils.py create mode 100644 tests/test_pipeline/test_stage_manager.py create mode 100644 tests/test_shardformer/test_layer/test_gpt2_qkv_fused_linear_1d.py create mode 100644 tests/test_shardformer/test_model/test_shard_blip2.py create mode 100644 tests/test_shardformer/test_model/test_shard_chatglm2.py create mode 100644 tests/test_shardformer/test_model/test_shard_sam.py create mode 100644 tests/test_shardformer/test_model/test_shard_whisper.py create mode 100644 tests/test_shardformer/test_shard_utils.py delete mode 100644 tests/test_tensor/core/test_tensor.py delete mode 100644 tests/test_tensor/model/test_gpt2.py delete mode 100644 tests/test_tensor/model/test_model.py delete mode 100644 tests/test_tensor/model/test_module_spec.py delete mode 100644 tests/test_tensor/test_colo_checkpoint_tools.py delete mode 100644 tests/test_tensor/test_context.py delete mode 100644 tests/test_tensor/test_sharded_linear.py delete mode 100644 tests/test_tensor/test_tp_with_zero.py delete mode 100644 tests/test_utils/test_colo_checkpoint.py delete mode 100644 tests/test_zero/test_gemini/test_get_torch_model.py delete mode 100644 tests/test_zero/test_gemini/test_zeroddp_state_dict_shard.py delete mode 100644 tests/test_zero/test_low_level/test_zero_init.py diff --git a/.github/workflows/build_on_pr.yml b/.github/workflows/build_on_pr.yml index 8a1bc8e113de..291d6adac2b2 100644 --- a/.github/workflows/build_on_pr.yml +++ b/.github/workflows/build_on_pr.yml @@ -61,8 +61,8 @@ jobs: run: shell: bash concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-repare-cache + cancel-in-progress: true steps: - name: Copy testmon cache run: | # branch name may contain slash, we need to replace it with space @@ -87,8 +87,8 @@ jobs: anyLibraryFileChanged: ${{ steps.find-lib-change.outputs.any_changed }} runs-on: ubuntu-latest concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-detect-change + cancel-in-progress: true steps: - uses: actions/checkout@v2 with: @@ -147,8 +147,8 @@ jobs: run: shell: bash concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-test + cancel-in-progress: true steps: - name: Checkout TensorNVMe uses: actions/checkout@v2 @@ -208,7 +208,7 @@ jobs: - name: Execute Unit Testing run: | - CURL_CA_BUNDLE="" PYTHONPATH=$PWD pytest --testmon --testmon-cov=. --durations=10 tests/ + CURL_CA_BUNDLE="" PYTHONPATH=$PWD pytest -m "not largedist" --testmon --testmon-forceselect --testmon-cov=. --durations=10 tests/ env: DATA: /data/scratch/cifar-10 NCCL_SHM_DISABLE: 1 diff --git a/.github/workflows/compatiblity_test_on_dispatch.yml b/.github/workflows/compatiblity_test_on_dispatch.yml index 1778d64ee287..2f03c8ced98d 100644 --- a/.github/workflows/compatiblity_test_on_dispatch.yml +++ b/.github/workflows/compatiblity_test_on_dispatch.yml @@ -44,7 +44,7 @@ jobs: name: Test for PyTorch Compatibility needs: matrix_preparation if: github.repository == 'hpcaitech/ColossalAI' - runs-on: [self-hosted, gpu] + runs-on: [self-hosted, 8-gpu] strategy: fail-fast: false matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}} @@ -64,7 +64,7 @@ jobs: - name: Install tensornvme run: | cd TensorNVMe - conda install cmake + apt update && apt install -y cmake pip install -r requirements.txt pip install -v . - uses: actions/checkout@v2 @@ -83,8 +83,7 @@ jobs: fi - name: Install Colossal-AI run: | - pip install -r requirements/requirements.txt - pip install -v --no-cache-dir . + CUDA_EXT=1 pip install -v . pip install -r requirements/requirements-test.txt - name: Unit Testing run: | diff --git a/.github/workflows/compatiblity_test_on_pr.yml b/.github/workflows/compatiblity_test_on_pr.yml index c0f45c65a7fc..a621c7e3427d 100644 --- a/.github/workflows/compatiblity_test_on_pr.yml +++ b/.github/workflows/compatiblity_test_on_pr.yml @@ -13,8 +13,8 @@ jobs: outputs: matrix: ${{ steps.set-matrix.outputs.matrix }} concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-prepare-matrix + cancel-in-progress: true steps: - uses: actions/checkout@v3 - id: set-matrix @@ -35,7 +35,7 @@ jobs: name: Test for PyTorch Compatibility needs: matrix_preparation if: github.repository == 'hpcaitech/ColossalAI' - runs-on: [self-hosted, gpu] + runs-on: [self-hosted, 8-gpu] strategy: fail-fast: false matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}} @@ -44,8 +44,8 @@ jobs: options: --gpus all --rm -v /data/scratch/cifar-10:/data/scratch/cifar-10 timeout-minutes: 120 concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-test-${{ matrix.container }} + cancel-in-progress: true steps: - name: Install dependencies run: | @@ -58,7 +58,7 @@ jobs: - name: Install tensornvme run: | cd TensorNVMe - conda install cmake + apt update && apt install -y cmake pip install -r requirements.txt pip install -v . - uses: actions/checkout@v2 @@ -78,7 +78,7 @@ jobs: - name: Install Colossal-AI run: | - pip install -v --no-cache-dir . + CUDA_EXT=1 pip install -v . pip install -r requirements/requirements-test.txt - name: Unit Testing run: | diff --git a/.github/workflows/compatiblity_test_on_schedule.yml b/.github/workflows/compatiblity_test_on_schedule.yml index 15ac4f1a92bb..9933224f5675 100644 --- a/.github/workflows/compatiblity_test_on_schedule.yml +++ b/.github/workflows/compatiblity_test_on_schedule.yml @@ -32,7 +32,7 @@ jobs: name: Test for PyTorch Compatibility needs: matrix_preparation if: github.repository == 'hpcaitech/ColossalAI' - runs-on: [self-hosted, gpu] + runs-on: [self-hosted, 8-gpu] strategy: fail-fast: false matrix: ${{fromJson(needs.matrix_preparation.outputs.matrix)}} @@ -54,7 +54,7 @@ jobs: - name: Install tensornvme run: | cd TensorNVMe - conda install cmake + apt update && apt install -y cmake pip install -r requirements.txt pip install -v . - uses: actions/checkout@v2 @@ -75,7 +75,7 @@ jobs: - name: Install Colossal-AI run: | - pip install -v --no-cache-dir . + CUDA_EXT=1 pip install -v . pip install -r requirements/requirements-test.txt - name: Unit Testing diff --git a/.github/workflows/doc_check_on_pr.yml b/.github/workflows/doc_check_on_pr.yml index 848991bd3a82..ee8a82128dd7 100644 --- a/.github/workflows/doc_check_on_pr.yml +++ b/.github/workflows/doc_check_on_pr.yml @@ -17,8 +17,8 @@ jobs: github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI' runs-on: ubuntu-latest concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-check-i18n + cancel-in-progress: true steps: - uses: actions/checkout@v2 @@ -35,8 +35,8 @@ jobs: github.event.pull_request.base.repo.full_name == 'hpcaitech/ColossalAI' runs-on: ubuntu-latest concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-check-doc + cancel-in-progress: true steps: - uses: actions/checkout@v2 with: diff --git a/.github/workflows/doc_test_on_pr.yml b/.github/workflows/doc_test_on_pr.yml index 2a07a2297bfb..a3df2c50e6d3 100644 --- a/.github/workflows/doc_test_on_pr.yml +++ b/.github/workflows/doc_test_on_pr.yml @@ -20,8 +20,8 @@ jobs: any_changed: ${{ steps.changed-files.outputs.any_changed }} changed_files: ${{ steps.changed-files.outputs.all_changed_files }} concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-detect-change + cancel-in-progress: true name: Detect changed example files steps: - uses: actions/checkout@v3 @@ -63,8 +63,8 @@ jobs: run: shell: bash concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-doctest + cancel-in-progress: true steps: - name: Checkout ColossalAI-Documentation uses: actions/checkout@v2 diff --git a/.github/workflows/example_check_on_pr.yml b/.github/workflows/example_check_on_pr.yml index ee456c25f2b5..ec23b9d1c59f 100644 --- a/.github/workflows/example_check_on_pr.yml +++ b/.github/workflows/example_check_on_pr.yml @@ -21,8 +21,8 @@ jobs: anyChanged: ${{ steps.setup-matrix.outputs.anyChanged }} name: Detect changed example files concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-detect-change + cancel-in-progress: true steps: - uses: actions/checkout@v3 with: @@ -81,8 +81,8 @@ jobs: options: --gpus all --rm -v /data/scratch/examples-data:/data/ timeout-minutes: 10 concurrency: - group: ${{ github.head_ref }} - cancel-in-progress: false + group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-run-example-${{ matrix.directory }} + cancel-in-progress: true steps: - uses: actions/checkout@v3 diff --git a/.github/workflows/run_chatgpt_examples.yml b/.github/workflows/run_chatgpt_examples.yml index 650689498fda..a336526897e2 100644 --- a/.github/workflows/run_chatgpt_examples.yml +++ b/.github/workflows/run_chatgpt_examples.yml @@ -28,9 +28,8 @@ jobs: - name: Checkout ColossalAI uses: actions/checkout@v2 - - name: Install ColossalAI and ChatGPT + - name: Install ChatGPT run: | - pip install -e . cd applications/Chat pip install -v . pip install -r examples/requirements.txt diff --git a/.github/workflows/run_chatgpt_unit_tests.yml b/.github/workflows/run_chatgpt_unit_tests.yml index 47c80fc9a9fe..ec5c8ffa319f 100644 --- a/.github/workflows/run_chatgpt_unit_tests.yml +++ b/.github/workflows/run_chatgpt_unit_tests.yml @@ -30,9 +30,8 @@ jobs: - name: Checkout ColossalAI uses: actions/checkout@v2 - - name: Install ColossalAI and ChatGPT + - name: Install ChatGPT run: | - pip install -e . cd applications/Chat pip install -v . pip install -r requirements-test.txt diff --git a/LICENSE b/LICENSE index 0db47bd8986f..280129eb8f35 100644 --- a/LICENSE +++ b/LICENSE @@ -397,6 +397,39 @@ Copyright 2021- HPC-AI Technology Inc. All rights reserved. ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + + ---------------- LICENSE FOR VLLM TEAM ---------------- + + from VLLM TEAM: + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + https://github.com/vllm-project/vllm/blob/main/LICENSE + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + + ---------------- LICENSE FOR LIGHTLLM TEAM ---------------- + + from LIGHTLLM TEAM: + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + https://github.com/ModelTC/lightllm/blob/main/LICENSE + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + ---------------- LICENSE FOR AutoGPTQ ---------------- From AutoGPTQ: diff --git a/README.md b/README.md index 44e4f97f1f4e..0ddcdab741a4 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,7 @@ ## Latest News +* [2023/09] [70 Billion Parameter LLaMA2 Model Training Accelerated by 195%](https://www.hpc-ai.tech/blog/70b-llama2-training) * [2023/07] [HPC-AI Tech Raises 22 Million USD in Series A Funding](https://www.hpc-ai.tech/blog/hpc-ai-tech-raises-22-million-usd-in-series-a-funding-to-fuel-team-expansion-and-business-growth) * [2023/07] [65B Model Pretraining Accelerated by 38%, Best Practices for Building LLaMA-Like Base Models Open-Source](https://www.hpc-ai.tech/blog/large-model-pretraining) * [2023/03] [ColossalChat: An Open-Source Solution for Cloning ChatGPT With a Complete RLHF Pipeline](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) @@ -50,7 +51,7 @@
  • Parallel Training Demo