Release PyPi package + Create GitHub workflow #9

casper-hansen · 2023-08-28T10:43:38Z

expand on setup.py for pypi package
automatically build on new tag being pushed to help release pypi package

git tag v0.0.1
git push origin v0.0.1

create additional example
convert prints to logging
standardize asm calls in cuda kernel

casper-hansen · 2023-08-28T13:57:08Z

It seems like the build fails silently on multiple kernels.

logs.txt

`pos_encoding_kernels.cu`

Command:

nvcc  \
-I/usr/share/miniconda/envs/build/lib/python3.8/site-packages/torch/include \
-I/usr/share/miniconda/envs/build/lib/python3.8/site-packages/torch/include/torch/csrc/api/include \
-I/usr/share/miniconda/envs/build/lib/python3.8/site-packages/torch/include/TH \
-I/usr/share/miniconda/envs/build/lib/python3.8/site-packages/torch/include/THC \
-I/usr/share/miniconda/envs/build/include \
-I/usr/share/miniconda/envs/build/include/python3.8 \
-c /home/runner/work/AutoAWQ/AutoAWQ/awq_cuda/position_embedding/pos_encoding_kernels.cu \
-o /home/runner/work/AutoAWQ/AutoAWQ/build/temp.linux-x86_64-cpython-38/awq_cuda/position_embedding/pos_encoding_kernels.o \
-D__CUDA_NO_HALF_OPERATORS__ \
-D__CUDA_NO_HALF_CONVERSIONS__ \
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ \
-D__CUDA_NO_HALF2_OPERATORS__ \
--expt-relaxed-constexpr \
--compiler-options '-fPIC' \
-O3 \
-std=c++17 \
--threads 2 \
-gencode arch=compute_80,code=sm_80 \
-gencode arch=compute_89,code=sm_89 \
-gencode arch=compute_90,code=sm_90 \
-gencode arch=compute_86,code=sm_86 \
-DTORCH_API_INCLUDE_EXTENSION_H \
-DPYBIND11_COMPILER_TYPE=\"_gcc\" \
-DPYBIND11_STDLIB=\"_libstdcpp\" \
-DPYBIND11_BUILD_ABI=\"_cxxabi1011\" \
-DTORCH_EXTENSION_NAME=awq_inference_engine \
-D_GLIBCXX_USE_CXX11_ABI=0

`gemm_cuda_gen.cu`

Command:

nvcc  \
-I/usr/share/miniconda/envs/build/lib/python3.9/site-packages/torch/include \
-I/usr/share/miniconda/envs/build/lib/python3.9/site-packages/torch/include/torch/csrc/api/include \
-I/usr/share/miniconda/envs/build/lib/python3.9/site-packages/torch/include/TH \
-I/usr/share/miniconda/envs/build/lib/python3.9/site-packages/torch/include/THC \
-I/usr/share/miniconda/envs/build/include \
-I/usr/share/miniconda/envs/build/include/python3.9 \
-c /home/runner/work/AutoAWQ/AutoAWQ/awq_cuda/quantization/gemm_cuda_gen.cu \
-o /home/runner/work/AutoAWQ/AutoAWQ/build/temp.linux-x86_64-cpython-39/awq_cuda/quantization/gemm_cuda_gen.o \
-D__CUDA_NO_HALF_OPERATORS__ \
-D__CUDA_NO_HALF_CONVERSIONS__ \
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ \
-D__CUDA_NO_HALF2_OPERATORS__ \
--expt-relaxed-constexpr \
--compiler-options '-fPIC' \
-O3 \
-std=c++17 \
--threads 2 \
-gencode arch=compute_80,code=sm_80 \
-gencode arch=compute_89,code=sm_89 \
-gencode arch=compute_90,code=sm_90 \
-gencode arch=compute_86,code=sm_86 \
-DTORCH_API_INCLUDE_EXTENSION_H \
-DPYBIND11_COMPILER_TYPE=\"_gcc\" \
-DPYBIND11_STDLIB=\"_libstdcpp\" \
-DPYBIND11_BUILD_ABI=\"_cxxabi1011\" \
-DTORCH_EXTENSION_NAME=awq_inference_engine \
-D_GLIBCXX_USE_CXX11_ABI=0

`RunPod NVCC`

This seems to work without problems.

nvcc \
-I/usr/local/lib/python3.10/dist-packages/torch/include \
-I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include \
-I/usr/local/lib/python3.10/dist-packages/torch/include/TH \
-I/usr/local/lib/python3.10/dist-packages/torch/include/THC \
-I/usr/local/cuda/include \
-I/usr/include/python3.10 \
-c awq_cuda/position_embedding/pos_encoding_kernels.cu \
-o build/temp.linux-x86_64-cpython-310/awq_cuda/position_embedding/pos_encoding_kernels.o \
-D__CUDA_NO_HALF_OPERATORS__ \
-D__CUDA_NO_HALF_CONVERSIONS__ \
-D__CUDA_NO_BFLOAT16_CONVERSIONS__ \
-D__CUDA_NO_HALF2_OPERATORS__ \
--expt-relaxed-constexpr \
--compiler-options '-fPIC' \
-O3 \
-std=c++17 \
--threads 2 \
-gencode arch=compute_80,code=sm_80 \
-gencode arch=compute_89,code=sm_89 \
-gencode arch=compute_90,code=sm_90 \
-gencode arch=compute_86,code=sm_86 \
-DTORCH_API_INCLUDE_EXTENSION_H \
-DPYBIND11_COMPILER_TYPE=\"_gcc\" \
-DPYBIND11_STDLIB=\"_libstdcpp\" \
-DPYBIND11_BUILD_ABI=\"_cxxabi1011\" \
-DTORCH_EXTENSION_NAME=awq_inference_engine \
-D_GLIBCXX_USE_CXX11_ABI=0

casper-hansen · 2023-08-29T22:02:18Z

Releasing the package is getting close, however, only Linux will be supported at this time due to building on Windows failing. I tried the following code in setup.py, but it did not end up fixing the build.

if os.name == "nt":
    # On Windows, fix an error LNK2001: unresolved external symbol cublasHgemm bug in the compilation
    cuda_path = os.environ.get("CUDA_PATH", None)
    if cuda_path is None:
        raise ValueError("The environment variable CUDA_PATH must be set to the path to the CUDA install when installing from source on Windows systems.")

    pytorch_dir = get_python_lib() + '/torch'
    extra_link_args = ["-L", f"{cuda_path}/lib/x64/cudart.lib", 
                       "-L", f"{cuda_path}/lib/x64/cublas.lib", 
                       "-L", f'{pytorch_dir}/lib/libtorch.lib']
    
    print('Windows: Detected CUDA path, adding extra link args')
    print(extra_link_args)
else:
    extra_link_args = []

extensions = [
    cpp_extension.CppExtension(
        "awq_inference_engine",
        [
            "awq_cuda/pybind.cpp",
            "awq_cuda/quantization/gemm_cuda_gen.cu",
            "awq_cuda/layernorm/layernorm.cu",
            "awq_cuda/position_embedding/pos_encoding_kernels.cu"
        ], extra_compile_args={
            "cxx": ["-g", "-O3", "-fopenmp", "-lgomp", "-std=c++17"],
            "nvcc": ["-O3", "-std=c++17"]
        }, extra_link_args=extra_link_args
    )

windows support

casper-hansen · 2023-09-01T14:55:30Z

Now that Linux and Windows support is added, it is ready to be merged.

casper-hansen added 18 commits August 27, 2023 19:33

Import from awq

c39ac59

Add basic quant example

6029607

Create more detailed setup.py

ff556eb

Update install instructions

77ca833

Add compute capability flags and n_threads as str

7fee854

Fix capability flags

98ae978

Initial workflow, needs testing

5abd53a

run jobs on new release starting with "v"

80ccf31

Remove python from build

98d874d

Quote on python versions (fix: 3.10 was interpreted as 3.1)

f741f40

Replace print with logging, Remove uncommented code

b491c2d

Add basic generation example

9ba3afa

Standardize ASM calls. Add Windows support.

d6bd9db

Define HALF_FLT_MAX

acdf7d9

NVCC: Use asm volatile

4c39a76

Remove windows build

a20d30d

Add Python checkout

6aa94f5

Separate PyTorch install

5bc0916

casper-hansen added 9 commits August 29, 2023 12:13

Add cuda_runtime in include_dirs

7e361d1

Improved setup.py structure and build instructions

b574767

Reduce build file

2d70f22

Update build.yaml

04b164c

Create release after build upon new tag

8149d36

Add CUDA 11.8 to requirements list

0fab60a

Replace creating release with script

51f89b6

Move creating release to start of build

248f641

Fixed path to script

e2768c2

casper-hansen marked this pull request as ready for review August 29, 2023 22:02

Build 3.11

bfa5ba7

casper-hansen mentioned this pull request Aug 31, 2023

Cuda issue when trying to install #13

Closed

casper-hansen added 7 commits August 31, 2023 16:45

Add compute capability

83c2f09

Fix torch import

ac36d82

Move release to end

87c5940

Rollback build

7e69373

Move release to end

3af1802

Move release back up

ce2dc06

Python 3.8 compatibility

bdde6c9

casper-hansen mentioned this pull request Aug 31, 2023

Compatibility in Python 3.8 when running entry.py #11

Closed

qwopqwop200 and others added 4 commits September 1, 2023 15:10

windows support

87c2e01

Merge pull request #16 from qwopqwop200/release_package

00ec82b

windows support

Generalize to Linux and Windows

1a3acf0

Add Windows build to workflow

afcce1a

casper-hansen merged commit f0eba43 into main Sep 1, 2023

casper-hansen deleted the release_package branch September 2, 2023 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release PyPi package + Create GitHub workflow #9

Release PyPi package + Create GitHub workflow #9

casper-hansen commented Aug 28, 2023

casper-hansen commented Aug 28, 2023 •

edited

Loading

casper-hansen commented Aug 29, 2023 •

edited

Loading

casper-hansen commented Sep 1, 2023

Release PyPi package + Create GitHub workflow #9

Release PyPi package + Create GitHub workflow #9

Conversation

casper-hansen commented Aug 28, 2023

casper-hansen commented Aug 28, 2023 • edited Loading

pos_encoding_kernels.cu

gemm_cuda_gen.cu

RunPod NVCC

casper-hansen commented Aug 29, 2023 • edited Loading

casper-hansen commented Sep 1, 2023

casper-hansen commented Aug 28, 2023 •

edited

Loading

`pos_encoding_kernels.cu`

`gemm_cuda_gen.cu`

`RunPod NVCC`

casper-hansen commented Aug 29, 2023 •

edited

Loading