Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] add gptq for inference #4754

Merged
merged 14 commits into from
Sep 22, 2023
Merged

Commits on Sep 19, 2023

  1. [gptq] add gptq kernel (hpcaitech#4416)

    * add gptq
    
    * refactor code
    
    * fix tests
    
    * replace auto-gptq
    
    * rname inferance/quant
    
    * refactor test
    
    * add auto-gptq as an option
    
    * reset requirements
    
    * change assert and check auto-gptq
    
    * add import warnings
    
    * change test flash attn version
    
    * remove example
    
    * change requirements of flash_attn
    
    * modify tests
    
    * [skip ci] change requirements-test
    Xu-Kai committed Sep 19, 2023
    Configuration menu
    Copy the full SHA
    08b928b View commit details
    Browse the repository at this point in the history
  2. [gptq] faster gptq cuda kernel (hpcaitech#4494)

    * [skip ci] add cuda kernels
    
    * add license
    
    * [skip ci] fix max_input_len
    
    * format files & change test size
    
    * [skip ci]
    Xu-Kai committed Sep 19, 2023
    Configuration menu
    Copy the full SHA
    5bd381d View commit details
    Browse the repository at this point in the history
  3. [gptq] add gptq tensor parallel (hpcaitech#4538)

    * add gptq tensor parallel
    
    * add gptq tp
    
    * delete print
    
    * add test gptq check
    
    * add test auto gptq check
    Xu-Kai committed Sep 19, 2023
    Configuration menu
    Copy the full SHA
    145ff94 View commit details
    Browse the repository at this point in the history
  4. [gptq] combine gptq and kv cache manager (hpcaitech#4706)

    * combine gptq and kv cache manager
    
    * add init bits
    
    * delete useless code
    
    * add model path
    
    * delete usless print and update test
    
    * delete usless import
    
    * move option gptq to shard config
    Xu-Kai committed Sep 19, 2023
    Configuration menu
    Copy the full SHA
    aefe767 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    27b48b3 View commit details
    Browse the repository at this point in the history
  6. update bloom policy

    Xu-Kai committed Sep 19, 2023
    Configuration menu
    Copy the full SHA
    d896733 View commit details
    Browse the repository at this point in the history
  7. delete useless code

    Xu-Kai committed Sep 19, 2023
    Configuration menu
    Copy the full SHA
    aa8201f View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    8c30608 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    c430416 View commit details
    Browse the repository at this point in the history
  10. update import linear for tests

    Xu-Kai committed Sep 19, 2023
    Configuration menu
    Copy the full SHA
    6f2159f View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2023

  1. Configuration menu
    Copy the full SHA
    d4db1bf View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f085c54 View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2023

  1. fix triton kernel

    Xu-Kai committed Sep 21, 2023
    Configuration menu
    Copy the full SHA
    ee16a32 View commit details
    Browse the repository at this point in the history
  2. add triton import

    Xu-Kai committed Sep 21, 2023
    Configuration menu
    Copy the full SHA
    9d4d7ff View commit details
    Browse the repository at this point in the history