-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gemini] gemini support tensor parallelism. #4942
Commits on Nov 9, 2023
-
Configuration menu - View commit details
-
Copy full SHA for dc0dc0b - Browse repository at this point
Copy the full SHA dc0dc0bView commit details -
[inference] Add smmoothquant for llama (hpcaitech#4904)
* [inference] add int8 rotary embedding kernel for smoothquant (hpcaitech#4843) * [inference] add smoothquant llama attention (hpcaitech#4850) * add smoothquant llama attention * remove uselss code * remove useless code * fix import error * rename file name * [inference] add silu linear fusion for smoothquant llama mlp (hpcaitech#4853) * add silu linear * update skip condition * catch smoothquant cuda lib exception * prcocess exception for tests * [inference] add llama mlp for smoothquant (hpcaitech#4854) * add llama mlp for smoothquant * fix down out scale * remove duplicate lines * add llama mlp check * delete useless code * [inference] add smoothquant llama (hpcaitech#4861) * add smoothquant llama * fix attention accuracy * fix accuracy * add kv cache and save pretrained * refactor example * delete smooth * refactor code * [inference] add smooth function and delete useless code for smoothquant (hpcaitech#4895) * add smooth function and delete useless code * update datasets * remove duplicate import * delete useless file * refactor codes (hpcaitech#4902) * rafactor code * add license * add torch-int and smoothquant license
Configuration menu - View commit details
-
Copy full SHA for dd59ca2 - Browse repository at this point
Copy the full SHA dd59ca2View commit details -
Update flash_attention_patch.py
To be compatible with the new change in the Transformers library, where a new argument 'padding_mask' was added to forward function of attention layer. huggingface/transformers#25598
Configuration menu - View commit details
-
Copy full SHA for 52707c6 - Browse repository at this point
Copy the full SHA 52707c6View commit details -
[kernel] support pure fp16 for cpu adam and update gemini optim tests (…
…hpcaitech#4921) * [kernel] support pure fp16 for cpu adam (hpcaitech#4896) * [kernel] fix cpu adam kernel for pure fp16 and update tests (hpcaitech#4919) * [kernel] fix cpu adam * [test] update gemini optim test
Configuration menu - View commit details
-
Copy full SHA for 61ec9f7 - Browse repository at this point
Copy the full SHA 61ec9f7View commit details -
[format] applied code formatting on changed files in pull request 4908 (
hpcaitech#4918) Co-authored-by: github-actions <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 561553b - Browse repository at this point
Copy the full SHA 561553bView commit details -
[gemini] support gradient accumulation (hpcaitech#4869)
* add test * fix no_sync bug in low level zero plugin * fix test * add argument for grad accum * add grad accum in backward hook for gemini * finish implementation, rewrite tests * fix test * skip stuck model in low level zero test * update doc * optimize communication & fix gradient checkpoint * modify doc * cleaning codes * update cpu adam fp16 case
Configuration menu - View commit details
-
Copy full SHA for 8d42002 - Browse repository at this point
Copy the full SHA 8d42002View commit details -
[hotfix] fix torch 2.0 compatibility (hpcaitech#4936)
* [hotfix] fix launch * [test] fix test gemini optim * [shardformer] fix vit
Configuration menu - View commit details
-
Copy full SHA for da55732 - Browse repository at this point
Copy the full SHA da55732View commit details -
Configuration menu - View commit details
-
Copy full SHA for 775ea1b - Browse repository at this point
Copy the full SHA 775ea1bView commit details -
[format] applied code formatting on changed files in pull request 4820 (
hpcaitech#4886) Co-authored-by: github-actions <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 0074178 - Browse repository at this point
Copy the full SHA 0074178View commit details -
Configuration menu - View commit details
-
Copy full SHA for 907aa98 - Browse repository at this point
Copy the full SHA 907aa98View commit details -
[Refactor] Integrated some lightllm kernels into token-attention (hpc…
…aitech#4946) * add some req for inference * clean codes * add codes * add some lightllm deps * clean codes * hello * delete rms files * add some comments * add comments * add doc * add lightllm deps * add lightllm cahtglm2 kernels * add lightllm cahtglm2 kernels * replace rotary embedding with lightllm kernel * add some commnets * add some comments * add some comments * add * replace fwd kernel att1 * fix a arg * add * add * fix token attention * add some comments * clean codes * modify comments * fix readme * fix bug * fix bug --------- Co-authored-by: cuiqing.li <[email protected]> Co-authored-by: CjhHa1 <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 31fddbc - Browse repository at this point
Copy the full SHA 31fddbcView commit details -
[test] merge old components to test to model zoo (hpcaitech#4945)
* [test] add custom models in model zoo * [test] update legacy test * [test] update model zoo * [test] update gemini test * [test] remove components to test
Configuration menu - View commit details
-
Copy full SHA for 8633a87 - Browse repository at this point
Copy the full SHA 8633a87View commit details -
[inference] add reference and fix some bugs (hpcaitech#4937)
* add reference and fix some bugs * update gptq init --------- Co-authored-by: Xu Kai <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9d543af - Browse repository at this point
Copy the full SHA 9d543afView commit details -
[Inference]ADD Bench Chatglm2 script (hpcaitech#4963)
* add bench chatglm * fix bug and make utils --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
Configuration menu - View commit details
-
Copy full SHA for fe79560 - Browse repository at this point
Copy the full SHA fe79560View commit details -
[Pipeline inference] Combine kvcache with pipeline inference (hpcaite…
…ch#4938) * merge kvcache with pipeline inference and refactor the code structure * support ppsize > 2 * refactor pipeline code * do pre-commit * modify benchmark * fix bench mark * polish code * add docstring and update readme * refactor the code * fix some logic bug of ppinfer * polish readme * fix typo * skip infer test
Configuration menu - View commit details
-
Copy full SHA for a610046 - Browse repository at this point
Copy the full SHA a610046View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3b8137d - Browse repository at this point
Copy the full SHA 3b8137dView commit details -
[Inference] Dynamic Batching Inference, online and offline (hpcaitech…
…#4953) * [inference] Dynamic Batching for Single and Multiple GPUs (hpcaitech#4831) * finish batch manager * 1 * first * fix * fix dynamic batching * llama infer * finish test * support different lengths generating * del prints * del prints * fix * fix bug --------- Co-authored-by: CjhHa1 <cjh18671720497outlook.com> * [inference] Async dynamic batching (hpcaitech#4894) * finish input and output logic * add generate * test forward * 1 * [inference]Re push async dynamic batching (hpcaitech#4901) * adapt to ray server * finish async * finish test * del test --------- Co-authored-by: yuehuayingxueluo <[email protected]> * Revert "[inference]Re push async dynamic batching (hpcaitech#4901)" (hpcaitech#4905) This reverts commit fbf3c09. * Revert "[inference] Async dynamic batching (hpcaitech#4894)" This reverts commit fced140. * Revert "[inference] Async dynamic batching (hpcaitech#4894)" (hpcaitech#4909) This reverts commit fced140. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * [infer]Add Ray Distributed Environment Init Scripts (hpcaitech#4911) * Revert "[inference] Async dynamic batching (hpcaitech#4894)" This reverts commit fced140. * Add Ray Distributed Environment Init Scripts * support DynamicBatchManager base function * revert _set_tokenizer version * add driver async generate * add async test * fix bugs in test_ray_dist.py * add get_tokenizer.py * fix code style * fix bugs about No module named 'pydantic' in ci test * fix bugs in ci test * fix bugs in ci test * fix bugs in ci test * support dynamic batch for bloom model and is_running function * [Inference]Test for new Async engine (hpcaitech#4935) * infer engine * infer engine * test engine * test engine * new manager * change step * add * test * fix * fix * finish test * finish test * finish test * finish test * add license --------- Co-authored-by: yuehuayingxueluo <[email protected]> * add assertion for config (hpcaitech#4947) * [Inference] Finish dynamic batching offline test (hpcaitech#4948) * test * fix test * fix quant * add default * fix * fix some bugs * fix some bugs * fix * fix bug * fix bugs * reset param --------- Co-authored-by: yuehuayingxueluo <[email protected]> Co-authored-by: Cuiqing Li <[email protected]> Co-authored-by: CjhHa1 <cjh18671720497outlook.com>
Configuration menu - View commit details
-
Copy full SHA for 9fce43b - Browse repository at this point
Copy the full SHA 9fce43bView commit details -
[Kernels]Updated Triton kernels into 2.1.0 and adding flash-decoding …
…for llama token attention (hpcaitech#4965) * adding flash-decoding * clean * adding kernel * adding flash-decoding * add integration * add * adding kernel * adding kernel * adding triton 2.1.0 features for inference * update bloom triton kernel * remove useless vllm kernels * clean codes * fix * adding files * fix readme * update llama flash-decoding --------- Co-authored-by: cuiqing.li <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 62eb99f - Browse repository at this point
Copy the full SHA 62eb99fView commit details -
fix ColossalEval (hpcaitech#4992)
Co-authored-by: Xu Yuanchen <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for fa1cbd3 - Browse repository at this point
Copy the full SHA fa1cbd3View commit details -
[doc]Update doc for colossal-inference (hpcaitech#4989)
* update doc * Update README.md --------- Co-authored-by: cuiqing.li <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3209431 - Browse repository at this point
Copy the full SHA 3209431View commit details -
[hotfix] Fix the bug where process groups were not being properly rel…
…eased. (hpcaitech#4940) * Fix the bug where process groups were not being properly released. * test * Revert "test" This reverts commit 479900c.
Configuration menu - View commit details
-
Copy full SHA for f0482f4 - Browse repository at this point
Copy the full SHA f0482f4View commit details -
Configuration menu - View commit details
-
Copy full SHA for cd8ad65 - Browse repository at this point
Copy the full SHA cd8ad65View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5266946 - Browse repository at this point
Copy the full SHA 5266946View commit details -
[Pipeline Inference] Merge pp with tp (hpcaitech#4993)
* refactor pipeline into new CaiInferEngine * updata llama modeling forward * merge tp with pp * update docstring * optimize test workflow and example * fix typo * add assert and todo
Configuration menu - View commit details
-
Copy full SHA for ab8468c - Browse repository at this point
Copy the full SHA ab8468cView commit details -
[release] update version (hpcaitech#4995)
* [release] update version * [hotfix] fix ci
Configuration menu - View commit details
-
Copy full SHA for f9c1920 - Browse repository at this point
Copy the full SHA f9c1920View commit details -
[gemini] gemini support tp [gemini] gemini support tp [gemini] gemini support tp [gemini] gemini support tp
Configuration menu - View commit details
-
Copy full SHA for 2043b9d - Browse repository at this point
Copy the full SHA 2043b9dView commit details -
Configuration menu - View commit details
-
Copy full SHA for da1915d - Browse repository at this point
Copy the full SHA da1915dView commit details -
update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO update checkpointIO
Configuration menu - View commit details
-
Copy full SHA for 9fd9e69 - Browse repository at this point
Copy the full SHA 9fd9e69View commit details -
support fused layernorm support fused layernorm
Configuration menu - View commit details
-
Copy full SHA for a89f2fd - Browse repository at this point
Copy the full SHA a89f2fdView commit details -
update fusedlayernorm update fusedlayernorm
Configuration menu - View commit details
-
Copy full SHA for 2406cb0 - Browse repository at this point
Copy the full SHA 2406cb0View commit details -
Configuration menu - View commit details
-
Copy full SHA for a0509a6 - Browse repository at this point
Copy the full SHA a0509a6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 12cd780 - Browse repository at this point
Copy the full SHA 12cd780View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0110902 - Browse repository at this point
Copy the full SHA 0110902View commit details -
Configuration menu - View commit details
-
Copy full SHA for 86a5eca - Browse repository at this point
Copy the full SHA 86a5ecaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6f13876 - Browse repository at this point
Copy the full SHA 6f13876View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5f16e4f - Browse repository at this point
Copy the full SHA 5f16e4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for adead50 - Browse repository at this point
Copy the full SHA adead50View commit details -
Configuration menu - View commit details
-
Copy full SHA for ed825dc - Browse repository at this point
Copy the full SHA ed825dcView commit details -
Configuration menu - View commit details
-
Copy full SHA for 37494c3 - Browse repository at this point
Copy the full SHA 37494c3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 73da4ca - Browse repository at this point
Copy the full SHA 73da4caView commit details -
Configuration menu - View commit details
-
Copy full SHA for cf2bc63 - Browse repository at this point
Copy the full SHA cf2bc63View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6c85a9e - Browse repository at this point
Copy the full SHA 6c85a9eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 8dd4b41 - Browse repository at this point
Copy the full SHA 8dd4b41View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3d8319e - Browse repository at this point
Copy the full SHA 3d8319eView commit details -
modify tp gather method modify tp gather method modify tp gather method
Configuration menu - View commit details
-
Copy full SHA for 66ffed5 - Browse repository at this point
Copy the full SHA 66ffed5View commit details -
Configuration menu - View commit details
-
Copy full SHA for c40c459 - Browse repository at this point
Copy the full SHA c40c459View commit details -
Configuration menu - View commit details
-
Copy full SHA for bc575a2 - Browse repository at this point
Copy the full SHA bc575a2View commit details