Build flash-attn takes a lot of time #1038

Sayli2000 · 2024-07-10T10:40:08Z

I'm trying to install the flash-attn package but it takes too much time.
I've made sure that ninja is installed.

Ph0rk0z · 2024-07-10T10:48:12Z

If I ever get out of GH jail: #1025 (comment)

tridao · 2024-07-10T17:56:15Z

yep it takes a long time because of all the templating

puneeshkhanna · 2024-07-30T09:41:28Z

Same here

MAX_JOBS=4 pip -v install flash-attn==2.6.3 --no-build-isolation

I used verbose option ; it gets stuck in C++ compilation indefinitely. I tried other versions but same problem.

copying flash_attn/ops/triton/init.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops/triton
copying flash_attn/ops/triton/cross_entropy.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops/triton
copying flash_attn/ops/triton/k_activations.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops/triton
copying flash_attn/ops/triton/layer_norm.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops/triton
copying flash_attn/ops/triton/linear.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops/triton
copying flash_attn/ops/triton/mlp.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops/triton
copying flash_attn/ops/triton/rotary.py -> build/lib.linux-x86_64-cpython-310/flash_attn/ops/triton
running build_ext
/lustre1/tier2/users/puneesh.khanna/miniconda3/envs/falcon-moe/lib/python3.10/site-packages/torch/utils/cpp_extension.py:418: UserWarning: The detected CUDA version (12.2) has a minor version mismatch with the version that was used to compile PyTorch (12.1). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/lustre1/tier2/users/puneesh.khanna/miniconda3/envs/falcon-moe/lib/python3.10/site-packages/torch/utils/cpp_extension.py:428: UserWarning: There are no g++ version bounds defined for CUDA version 12.2
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'flash_attn_2_cuda' extension
creating /tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/build/temp.linux-x86_64-cpython-310
creating /tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/build/temp.linux-x86_64-cpython-310/csrc
creating /tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/build/temp.linux-x86_64-cpython-310/csrc/flash_attn
creating /tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/src
Emitting ninja build file /tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Using envvar MAX_JOBS (4) as the number of workers...
[1/85] c++ -MMD -MF /tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/flash_api.o.d -pthread -B /lustre1/tier2/users/puneesh.khanna/miniconda3/envs/venv/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /lustre1/tier2/users/puneesh.khanna/miniconda3/envs/venv/include -fPIC -O2 -isystem /lustre1/tier2/users/puneesh.khanna/miniconda3/envs/venv/include -fPIC -I/tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/csrc/flash_attn -I/tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/csrc/flash_attn/src -I/tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/csrc/cutlass/include -I/lustre1/tier2/users/puneesh.khanna/miniconda3/envs/venv/lib/python3.10/site-packages/torch/include -I/lustre1/tier2/users/puneesh.khanna/miniconda3/envs/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/lustre1/tier2/users/puneesh.khanna/miniconda3/envs/venv/lib/python3.10/site-packages/torch/include/TH -I/lustre1/tier2/users/puneesh.khanna/miniconda3/envs/venv/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/lustre1/tier2/users/puneesh.khanna/miniconda3/envs/venv/include/python3.10 -c -c /tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/csrc/flash_attn/flash_api.cpp -o /tmp/pip-install-14eos5qz/flash-attn_021be3b5eaac41e793324f2128cf5d4c/build/temp.linux-x86_64-cpython-310/csrc/flash_attn/flash_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build flash-attn takes a lot of time #1038

Build flash-attn takes a lot of time #1038

Sayli2000 commented Jul 10, 2024

Ph0rk0z commented Jul 10, 2024

tridao commented Jul 10, 2024

puneeshkhanna commented Jul 30, 2024

Build flash-attn takes a lot of time #1038

Build flash-attn takes a lot of time #1038

Comments

Sayli2000 commented Jul 10, 2024

Ph0rk0z commented Jul 10, 2024

tridao commented Jul 10, 2024

puneeshkhanna commented Jul 30, 2024