AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #1846

maxmaier59 · 2022-03-18T22:37:34Z

When I try to do finetuning with Deepspeed I get the following error message:

Traceback (most recent call last):
File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 97, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

I have built Deepspeed with

git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check

It seems that ds_opt_adam was not built

This is the output I've got:

/media/max/Volume/GPT/finetune/DeepSpeed
Using pip 21.2.4 from /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/pip (python 3.8)
Obtaining file:///media/max/Volume/GPT/finetune/DeepSpeed
/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/pip/_internal/commands/install.py:229: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
cmdoptions.check_install_build_global(options)
Running command python setup.py egg_info
DS_BUILD_OPS=0
Installed CUDA version 11.4 does not match the version torch was compiled with 11.5 but since the APIs are compatible, accepting this combination
Install Ops={'cpu_adam': 1, 'cpu_adagrad': False, 'fused_adam': False, 'fused_lamb': False, 'sparse_attn': False, 'transformer': False, 'stochastic_transformer': False, 'async_io': 1, 'utils': 1, 'quantizer': False, 'transformer_inference': False}
version=0.6.0+a32e9b33, git_hash=a32e9b33, git_branch=HEAD
install_requires=['hjson', 'ninja', 'numpy', 'packaging', 'psutil', 'py-cpuinfo', 'torch', 'tqdm', 'triton==1.0.0']
compatible_ops={'cpu_adam': True, 'cpu_adagrad': True, 'fused_adam': True, 'fused_lamb': True, 'sparse_attn': True, 'transformer': True, 'stochastic_transformer': True, 'async_io': True, 'utils': True, 'quantizer': True, 'transformer_inference': True}
ext_modules=[<setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7f2b7bd0e820>, <setuptools.extension.Extension('deepspeed.ops.aio.async_io_op') at 0x7f2b7bbdd790>, <setuptools.extension.Extension('deepspeed.ops.utils_op') at 0x7f2b7bb5ff70>]
running egg_info
creating /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info
writing /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/dependency_links.txt
writing entry points to /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/entry_points.txt
writing requirements to /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/requires.txt
writing top-level names to /tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '.hip' under directory 'deepspeed'
warning: no files found matching '.cc' under directory 'deepspeed'
warning: no files found matching '.tr' under directory 'csrc'
warning: no files found matching '.cc' under directory 'csrc'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-pip-egg-info-vqyrd9dj/deepspeed.egg-info/SOURCES.txt'
deepspeed build time = 0.36443185806274414 secs
Requirement already satisfied: hjson in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (3.0.2)
Requirement already satisfied: ninja in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (1.10.2.3)
Requirement already satisfied: numpy in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (1.22.3)
Requirement already satisfied: packaging in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (21.3)
Requirement already satisfied: psutil in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (5.9.0)
Requirement already satisfied: py-cpuinfo in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (8.0.0)
Requirement already satisfied: torch in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (1.11.0+cu115)
Requirement already satisfied: tqdm in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (4.63.0)
Requirement already satisfied: triton==1.0.0 in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from deepspeed==0.6.0+a32e9b33) (1.0.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from packaging->deepspeed==0.6.0+a32e9b33) (3.0.4)
Requirement already satisfied: typing-extensions in /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages (from torch->deepspeed==0.6.0+a32e9b33) (3.10.0.2)
Installing collected packages: deepspeed
Attempting uninstall: deepspeed
Found existing installation: deepspeed 0.5.9+d0ab7224
Uninstalling deepspeed-0.5.9+d0ab7224:
Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/deepspeed
Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/deepspeed.pt
Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/ds
Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/ds_elastic
Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/ds_report
Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/bin/ds_ssh
Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed-0.5.9+d0ab7224-py3.8.egg-info
Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed/
Removing file or directory /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/op_builder/
Successfully uninstalled deepspeed-0.5.9+d0ab7224
Running setup.py develop for deepspeed
Running command /home/max/anaconda3/envs/gptneo_finetuned/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/media/max/Volume/GPT/finetune/DeepSpeed/setup.py'"'"'; file='"'"'/media/max/Volume/GPT/finetune/DeepSpeed/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' build_ext -j8 develop --no-deps
DS_BUILD_OPS=0
Installed CUDA version 11.4 does not match the version torch was compiled with 11.5 but since the APIs are compatible, accepting this combination
Install Ops={'cpu_adam': 1, 'cpu_adagrad': False, 'fused_adam': False, 'fused_lamb': False, 'sparse_attn': False, 'transformer': False, 'stochastic_transformer': False, 'async_io': 1, 'utils': 1, 'quantizer': False, 'transformer_inference': False}
version=0.6.0+a32e9b33, git_hash=a32e9b33, git_branch=HEAD
install_requires=['hjson', 'ninja', 'numpy', 'packaging', 'psutil', 'py-cpuinfo', 'torch', 'tqdm', 'triton==1.0.0']
compatible_ops={'cpu_adam': True, 'cpu_adagrad': True, 'fused_adam': True, 'fused_lamb': True, 'sparse_attn': True, 'transformer': True, 'stochastic_transformer': True, 'async_io': True, 'utils': True, 'quantizer': True, 'transformer_inference': True}
ext_modules=[<setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7f41e6e48f10>, <setuptools.extension.Extension('deepspeed.ops.aio.async_io_op') at 0x7f41e6214790>, <setuptools.extension.Extension('deepspeed.ops.utils_op') at 0x7f41e6193f40>]
running build_ext
building 'deepspeed.ops.adam.cpu_adam_op' extension
building 'deepspeed.ops.aio.async_io_op' extension
creating build
creating build/temp.linux-x86_64-3.8
building 'deepspeed.ops.utils_op' extension
creating build/temp.linux-x86_64-3.8/csrc
creating build/temp.linux-x86_64-3.8/csrc
creating build/temp.linux-x86_64-3.8/csrc/adam
creating build/temp.linux-x86_64-3.8/csrc/utils
creating build/temp.linux-x86_64-3.8/csrc/aio
gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/utils/flatten_unflatten.cpp -o build/temp.linux-x86_64-3.8/csrc/utils/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=utils_op -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
creating build/temp.linux-x86_64-3.8/csrc/aio/py_lib
creating build/temp.linux-x86_64-3.8/csrc/common
gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/includes -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-3.8/csrc/adam/cpu_adam.o -O3 -std=c++14 -g -Wno-reorder -L/home/max/anaconda3/envs/gptneo_finetuned/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
creating build/temp.linux-x86_64-3.8/csrc/aio/common
gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/deepspeed_py_copy.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_copy.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++

In file included from csrc/includes/cpu_adam.h:12,
from csrc/adam/cpu_adam.cpp:1:
csrc/includes/simd.h:63: warning: ignoring #pragma unroll [-Wunknown-pragmas]
63 | #pragma unroll
|
csrc/includes/simd.h:71: warning: ignoring #pragma unroll [-Wunknown-pragmas]
71 | #pragma unroll
|
csrc/includes/simd.h:79: warning: ignoring #pragma unroll [-Wunknown-pragmas]
79 | #pragma unroll
|
csrc/includes/simd.h:87: warning: ignoring #pragma unroll [-Wunknown-pragmas]
87 | #pragma unroll
|
csrc/includes/simd.h:95: warning: ignoring #pragma unroll [-Wunknown-pragmas]
95 | #pragma unroll
|
csrc/includes/simd.h:103: warning: ignoring #pragma unroll [-Wunknown-pragmas]
103 | #pragma unroll
|
csrc/includes/simd.h:109: warning: ignoring #pragma unroll [-Wunknown-pragmas]
109 | #pragma unroll
|
csrc/includes/simd.h:115: warning: ignoring #pragma unroll [-Wunknown-pragmas]
115 | #pragma unroll
|
csrc/includes/simd.h:121: warning: ignoring #pragma unroll [-Wunknown-pragmas]
121 | #pragma unroll
|
csrc/includes/simd.h:127: warning: ignoring #pragma unroll [-Wunknown-pragmas]
127 | #pragma unroll
|
csrc/includes/simd.h:133: warning: ignoring #pragma unroll [-Wunknown-pragmas]
133 | #pragma unroll
|
gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/py_ds_aio.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/py_ds_aio.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX256__ -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/deepspeed
creating build/lib.linux-x86_64-3.8/deepspeed/ops
g++ -pthread -shared -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-rpath=/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.8/csrc/utils/flatten_unflatten.o -L/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/deepspeed/ops/utils_op.cpython-38-x86_64-linux-gnu.so
csrc/adam/cpu_adam.cpp: In member function ‘void Adam_Optimizer::Step_1(float*, float*, float*, float*, size_t, half*, bool)’:
csrc/adam/cpu_adam.cpp:45:17: warning: ‘params_cast_h’ may be used uninitialized in this function [-Wmaybe-uninitialized]
45 | half* params_cast_h;
| ^~~~~~~~~~~~~
csrc/adam/cpu_adam.cpp:44:17: warning: ‘grads_cast_h’ may be used uninitialized in this function [-Wmaybe-uninitialized]
44 | half* grads_cast_h;
| ^~~~~~~~~~~~
/home/max/anaconda3/envs/gptneo_finetuned/bin/nvcc -Icsrc/includes -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/common/custom_cuda_kernel.cu -o build/temp.linux-x86_64-3.8/csrc/common/custom_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS_ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
creating build/lib.linux-x86_64-3.8/deepspeed/ops/adam
g++ -pthread -shared -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-rpath=/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.8/csrc/adam/cpu_adam.o build/temp.linux-x86_64-3.8/csrc/common/custom_cuda_kernel.o -L/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/lib -L/home/max/anaconda3/envs/gptneo_finetuned/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-3.8/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so
gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/deepspeed_py_aio.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_aio.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/deepspeed_py_aio_handle.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_aio_handle.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/py_lib/deepspeed_aio_thread.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_aio_thread.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++

gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/common/deepspeed_aio_utils.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_utils.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX256__ -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/common/deepspeed_aio_common.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_common.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
csrc/aio/common/deepspeed_aio_common.cpp: In function ‘void _do_io_submit_singles(long long int, long long int, std::unique_ptr<aio_context>&, std::vector<std::chrono::duration >&)’:
csrc/aio/common/deepspeed_aio_common.cpp:76:20: warning: unused variable ‘submit_ret’ [-Wunused-variable]
76 | const auto submit_ret = io_submit(aio_ctxt->_io_ctxt, 1, aio_ctxt->_iocbs.data() + i);
| ^~~~~~~~~~
csrc/aio/common/deepspeed_aio_common.cpp: In function ‘void _do_io_submit_block(long long int, long long int, std::unique_ptr<aio_context>&, std::vector<std::chrono::duration >&)’:
csrc/aio/common/deepspeed_aio_common.cpp:96:16: warning: unused variable ‘submit_ret’ [-Wunused-variable]
96 | const auto submit_ret = io_submit(aio_ctxt->_io_ctxt, n_iocbs, aio_ctxt->iocbs.data());
| ^~~~~~~~~~
csrc/aio/common/deepspeed_aio_common.cpp: In function ‘int regular_read(const char*, std::vector&)’:
csrc/aio/common/deepspeed_aio_common.cpp:280:16: warning: unused variable ‘f_size’ [-Wunused-variable]
280 | const auto f_size = get_file_size(filename, num_bytes);
| ^~~~~~
csrc/aio/common/deepspeed_aio_common.cpp: In function ‘bool validate_buffer(const char*, void*, long long int)’:
csrc/aio/common/deepspeed_aio_common.cpp:307:16: warning: unused variable ‘reg_ret’ [-Wunused-variable]
307 | const auto reg_ret = regular_read(filename, regular_buffer);
| ^~~~~~~
gcc -pthread -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Icsrc/aio/py_lib -Icsrc/aio/common -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/TH -I/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/include/THC -I/home/max/anaconda3/envs/gptneo_finetuned/include/python3.8 -c csrc/aio/common/deepspeed_aio_types.cpp -o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_types.o -g -Wall -O0 -std=c++14 -shared -fPIC -Wno-reorder -march=native -fopenmp -D__AVX256 -laio -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=async_io_op -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
creating build/lib.linux-x86_64-3.8/deepspeed/ops/aio
g++ -pthread -shared -B /home/max/anaconda3/envs/gptneo_finetuned/compiler_compat -L/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,-rpath=/home/max/anaconda3/envs/gptneo_finetuned/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_copy.o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/py_ds_aio.o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_aio.o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_py_aio_handle.o build/temp.linux-x86_64-3.8/csrc/aio/py_lib/deepspeed_aio_thread.o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_utils.o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_common.o build/temp.linux-x86_64-3.8/csrc/aio/common/deepspeed_aio_types.o -L/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-3.8/deepspeed/ops/aio/async_io_op.cpython-38-x86_64-linux-gnu.so -laio
running develop
running egg_info
creating deepspeed.egg-info
writing deepspeed.egg-info/PKG-INFO
writing dependency_links to deepspeed.egg-info/dependency_links.txt
writing entry points to deepspeed.egg-info/entry_points.txt
writing requirements to deepspeed.egg-info/requires.txt
writing top-level names to deepspeed.egg-info/top_level.txt
writing manifest file 'deepspeed.egg-info/SOURCES.txt'
reading manifest file 'deepspeed.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/utils/cpp_extension.py:788: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.5). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
warning: no files found matching '.hip' under directory 'deepspeed'
warning: no files found matching '.cc' under directory 'deepspeed'
warning: no files found matching '.tr' under directory 'csrc'
warning: no files found matching '.cc' under directory 'csrc'
adding license file 'LICENSE'
writing manifest file 'deepspeed.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.8/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so -> deepspeed/ops/adam
copying build/lib.linux-x86_64-3.8/deepspeed/ops/aio/async_io_op.cpython-38-x86_64-linux-gnu.so -> deepspeed/ops/aio
copying build/lib.linux-x86_64-3.8/deepspeed/ops/utils_op.cpython-38-x86_64-linux-gnu.so -> deepspeed/ops
Creating /home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/deepspeed.egg-link (link to .)
Adding deepspeed 0.6.0+a32e9b33 to easy-install.pth file
Installing deepspeed script to /home/max/anaconda3/envs/gptneo_finetuned/bin
Installing deepspeed.pt script to /home/max/anaconda3/envs/gptneo_finetuned/bin
Installing ds script to /home/max/anaconda3/envs/gptneo_finetuned/bin
Installing ds_ssh script to /home/max/anaconda3/envs/gptneo_finetuned/bin
Installing ds_report script to /home/max/anaconda3/envs/gptneo_finetuned/bin
Installing ds_elastic script to /home/max/anaconda3/envs/gptneo_finetuned/bin

Installed /media/max/Volume/GPT/finetune/DeepSpeed
/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/utils/cpp_extension.py:788: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.5). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
deepspeed build time = 90.15858387947083 secs

jeffra · 2022-03-19T03:30:37Z

Can you share the output of ds_report after your install?

Also I recently discovered a potential issue with this pre compile style (see #1840). Can you see if you get the same error installing this way:

DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .

maxmaier59 · 2022-03-19T14:33:20Z

Here is the output of ds_report

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

cpu_adam ............... [YES] ...... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [YES] ...... [OKAY]
utils .................. [YES] ...... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]

DeepSpeed general environment info:
torch install path ............... ['/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch']
torch version .................... 1.11.0+cu115
torch cuda version ............... 11.5
torch hip version ................ None
nvcc version ..................... 11.4
deepspeed install path ........... ['/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed']
deepspeed info ................... 0.6.0+a32e9b33, a32e9b3, HEAD
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.5, hip 0.0

maxmaier59 · 2022-03-19T14:41:35Z

Here is the command I've used for installation:

TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_OP_ADAM=1 DS_BUILD_UTILS=1 DS_BUILD_AIO=1 pip install -e.
#--global-option="build_ext" --global-option="-j8" --no-cache -v
#--disable-pip-version-check 2>&1 | tee build.log

using
DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e .

makes no difference

maxmaier59 · 2022-03-20T21:25:56Z

I think the root cause of the problem is this:

ImportError: /media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator

jeffra · 2022-03-20T21:35:15Z

Ohh I see your comments on this issue now as well (pytorch/pytorch#69666). If you try a recent torch nightly build does it still exhibit the issue?

maxmaier59 · 2022-03-20T22:11:01Z

Hmm, I've tried with the torch nightly build but I am getting still the same error message

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

cpu_adam ............... [YES] ...... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [YES] ...... [OKAY]
utils .................. [YES] ...... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]

DeepSpeed general environment info:
torch install path ............... ['/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch']
torch version .................... 1.12.0.dev20220320+cu115
torch cuda version ............... 11.5
torch hip version ................ None
nvcc version ..................... 11.4
deepspeed install path ........... ['/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed']
deepspeed info ................... 0.6.0+a32e9b33, a32e9b3, HEAD
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.5, hip 0.0

Please let me know if you need any additional information

maxmaier59 · 2022-03-22T20:35:46Z

I wonder if there is any hope to get this fixed or a work around.
Is it just me having this problem or other people as well who want to use DeepSpeed with the Adam optimizer ?

tjruwase · 2022-03-22T21:33:00Z

@maxmaier59, can clarify whether your intention is to use DeepSpeed CPUAdam or torch Adam optimizer?

maxmaier59 · 2022-03-22T21:41:10Z

I am not sure what the difference ist.
Without a better understanding I would like to use the DeepSpeed CPUAdam otpimizer

tjruwase · 2022-03-23T20:19:04Z

@maxmaier59, CPUAdam was created for executing optimizer computations on CPU instead of GPU. Please see this tutorial for more details.

maxmaier59 · 2022-03-23T21:06:09Z

In this case I need CPUAdam

maxmaier59 · 2022-03-24T21:08:56Z

Please can somebody help me to solve this problem?

I wonder what is going on. It seems to me that either CPUAdam optimizer for Deepspeed has been abandoned or I am doing something wrong. If the latter is the case, can somebody please help me to find my error to fix the problem?

If the first is the case I wonder why the optimizer had been dropped. Is there any alternative?

tjruwase · 2022-03-24T21:23:11Z

@maxmaier59, apologies for the delayed response. CPUAdam is still very much an important part of DeepSpeed as our offloading technologies depend on it. I am a bit confused about whether the original issue was observed during build or during an actual run. The issue mentions an attribute error which suggests this occurred during a run, so in that case can you please repaste or point me to the stack trace? Sorry for asking you to provide this again.

maxmaier59 · 2022-03-24T21:40:19Z

Many thanks for getting back to me!
The error occurs during an actual run.
Here is the command to start deepspeed:

deepspeed --num_gpus=2 run_clm.py
--deepspeed ds_config.json
--model_name_or_path EleutherAI/gpt-neo-2.7B
--train_file train.csv
--validation_file validation.csv
--do_train
--do_eval
--fp16
--overwrite_cache
--evaluation_strategy="steps"
--output_dir finetuned
--num_train_epochs 1
--eval_steps 15
--gradient_accumulation_steps 2
--per_device_train_batch_size 4
--use_fast_tokenizer False
--learning_rate 5e-06
--warmup_steps 10

And here is the output:

[2022-03-24 22:33:28,352] [WARNING] [runner.py:155:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2022-03-24 22:33:28,382] [INFO] [runner.py:438:main] cmd = /home/max/anaconda3/envs/gptneo_finetuned/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 run_clm.py --deepspeed ds_config.json --model_name_or_path EleutherAI/gpt-neo-2.7B --train_file train.csv --validation_file validation.csv --do_train --do_eval --fp16 --overwrite_cache --evaluation_strategy=steps --output_dir finetuned --num_train_epochs 1 --eval_steps 15 --gradient_accumulation_steps 2 --per_device_train_batch_size 4 --use_fast_tokenizer False --learning_rate 5e-06 --warmup_steps 10
[2022-03-24 22:33:29,110] [INFO] [launch.py:103:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2022-03-24 22:33:29,110] [INFO] [launch.py:109:main] nnodes=1, num_local_procs=2, node_rank=0
[2022-03-24 22:33:29,111] [INFO] [launch.py:122:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2022-03-24 22:33:29,111] [INFO] [launch.py:123:main] dist_world_size=2
[2022-03-24 22:33:29,111] [INFO] [launch.py:125:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2022-03-24 22:33:30,474] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
03/24/2022 22:33:30 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
03/24/2022 22:33:30 - INFO - main - Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=ds_config.json,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_steps=15,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=True,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=2,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=5e-06,
length_column_name=length,
load_best_model_at_end=False,
local_rank=0,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=finetuned/runs/Mar24_22-33-30_max-Desktop,
logging_first_step=False,
logging_nan_inf_filter=True,
logging_steps=500,
logging_strategy=IntervalStrategy.STEPS,
lr_scheduler_type=SchedulerType.LINEAR,
max_grad_norm=1.0,
max_steps=-1,
metric_for_best_model=None,
mp_parameters=,
no_cuda=False,
num_train_epochs=1.0,
optim=OptimizerNames.ADAMW_HF,
output_dir=finetuned,
overwrite_output_dir=False,
past_index=-1,
per_device_eval_batch_size=8,
per_device_train_batch_size=4,
prediction_loss_only=False,
push_to_hub=False,
push_to_hub_model_id=None,
push_to_hub_organization=None,
push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
remove_unused_columns=True,
report_to=[],
resume_from_checkpoint=None,
run_name=finetuned,
save_on_each_node=False,
save_steps=500,
save_strategy=IntervalStrategy.STEPS,
save_total_limit=None,
seed=42,
sharded_ddp=[],
skip_memory_metrics=True,
tf32=None,
tpu_metrics_debug=False,
tpu_num_cores=None,
use_legacy_prediction_loop=False,
warmup_ratio=0.0,
warmup_steps=10,
weight_decay=0.0,
xpu_backend=None,
)
03/24/2022 22:33:30 - WARNING - main - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: True
03/24/2022 22:33:31 - WARNING - datasets.builder - Using custom data configuration default-e1878cb86e47ddff
03/24/2022 22:33:31 - WARNING - datasets.builder - Using custom data configuration default-e1878cb86e47ddff
03/24/2022 22:33:31 - WARNING - datasets.builder - Reusing dataset csv (/home/max/.cache/huggingface/datasets/csv/default-e1878cb86e47ddff/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519)
100%|███████████████████████████████████████████| 2/2 [00:00<00:00, 1381.75it/s]
03/24/2022 22:33:31 - WARNING - datasets.builder - Reusing dataset csv (/home/max/.cache/huggingface/datasets/csv/default-e1878cb86e47ddff/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519)
100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 361.31it/s]
[INFO|configuration_utils.py:648] 2022-03-24 22:33:31,586 >> loading configuration file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/config.json from cache at /home/max/.cache/huggingface/transformers/3c80ef2946e1aacc6dd37cb986ea989c29c92775701655bedf14d8791825a30b.f1ede5af01beb85af6cba189a5671dbac3fe256282f737ff0fedf1db882ca729
[INFO|configuration_utils.py:684] 2022-03-24 22:33:31,589 >> Model config GPTNeoConfig {
"_name_or_path": "EleutherAI/gpt-neo-2.7B",
"activation_function": "gelu_new",
"architectures": [
"GPTNeoForCausalLM"
],
"attention_dropout": 0,
"attention_layers": [
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local"
],
"attention_types": [
[
[
"global",
"local"
],
16
]
],
"bos_token_id": 50256,
"embed_dropout": 0,
"eos_token_id": 50256,
"gradient_checkpointing": false,
"hidden_size": 2560,
"initializer_range": 0.02,
"intermediate_size": null,
"layer_norm_epsilon": 1e-05,
"max_position_embeddings": 2048,
"model_type": "gpt_neo",
"num_heads": 20,
"num_layers": 32,
"resid_dropout": 0,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50,
"temperature": 0.9
}
},
"tokenizer_class": "GPT2Tokenizer",
"transformers_version": "4.17.0",
"use_cache": true,
"vocab_size": 50257,
"window_size": 256
}

[INFO|configuration_utils.py:648] 2022-03-24 22:33:32,544 >> loading configuration file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/config.json from cache at /home/max/.cache/huggingface/transformers/3c80ef2946e1aacc6dd37cb986ea989c29c92775701655bedf14d8791825a30b.f1ede5af01beb85af6cba189a5671dbac3fe256282f737ff0fedf1db882ca729
[INFO|configuration_utils.py:684] 2022-03-24 22:33:32,546 >> Model config GPTNeoConfig {
"_name_or_path": "EleutherAI/gpt-neo-2.7B",
"activation_function": "gelu_new",
"architectures": [
"GPTNeoForCausalLM"
],
"attention_dropout": 0,
"attention_layers": [
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local"
],
"attention_types": [
[
[
"global",
"local"
],
16
]
],
"bos_token_id": 50256,
"embed_dropout": 0,
"eos_token_id": 50256,
"gradient_checkpointing": false,
"hidden_size": 2560,
"initializer_range": 0.02,
"intermediate_size": null,
"layer_norm_epsilon": 1e-05,
"max_position_embeddings": 2048,
"model_type": "gpt_neo",
"num_heads": 20,
"num_layers": 32,
"resid_dropout": 0,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50,
"temperature": 0.9
}
},
"tokenizer_class": "GPT2Tokenizer",
"transformers_version": "4.17.0",
"use_cache": true,
"vocab_size": 50257,
"window_size": 256
}

[INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/vocab.json from cache at /home/max/.cache/huggingface/transformers/d4455fdc7c8e2bcf94a0bfe134b748a93c37ecadb7b8f6b0eb508ffdd433a61e.a1b97b074a5ac71fad0544c8abc1b3581803d73832476184bde6cff06a67b6bb
[INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/merges.txt from cache at /home/max/.cache/huggingface/transformers/5660be25091706bde0cfb60f17ae72c7a2aa40223d68954d4d8ffd1fc6995643.f5b91da9e34259b8f4d88dbc97c740667a0e8430b96314460cdb04e86d4fc435
[INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/special_tokens_map.json from cache at /home/max/.cache/huggingface/transformers/953b5ce47652cf8b6e945b3570bfa7621164c337e05419b954dbe0a4d16a7480.3ae9ae72462581d20e36bc528e9c47bb30cd671bb21add40ca0b24a0be9fac22
[INFO|tokenization_utils_base.py:1786] 2022-03-24 22:33:34,931 >> loading file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/tokenizer_config.json from cache at /home/max/.cache/huggingface/transformers/57ccc3b8af045ea106fffa36bcc8b764e9702b5f4c1f7b3aad70ccfcaa931221.c31b6b7d3225be0c43bc0f8e5d84d03a8b49fdb6b9f6009bbfff1f9cc5ec18bc
[INFO|configuration_utils.py:648] 2022-03-24 22:33:35,408 >> loading configuration file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/config.json from cache at /home/max/.cache/huggingface/transformers/3c80ef2946e1aacc6dd37cb986ea989c29c92775701655bedf14d8791825a30b.f1ede5af01beb85af6cba189a5671dbac3fe256282f737ff0fedf1db882ca729
[INFO|configuration_utils.py:684] 2022-03-24 22:33:35,409 >> Model config GPTNeoConfig {
"_name_or_path": "EleutherAI/gpt-neo-2.7B",
"activation_function": "gelu_new",
"architectures": [
"GPTNeoForCausalLM"
],
"attention_dropout": 0,
"attention_layers": [
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local",
"global",
"local"
],
"attention_types": [
[
[
"global",
"local"
],
16
]
],
"bos_token_id": 50256,
"embed_dropout": 0,
"eos_token_id": 50256,
"gradient_checkpointing": false,
"hidden_size": 2560,
"initializer_range": 0.02,
"intermediate_size": null,
"layer_norm_epsilon": 1e-05,
"max_position_embeddings": 2048,
"model_type": "gpt_neo",
"num_heads": 20,
"num_layers": 32,
"resid_dropout": 0,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50,
"temperature": 0.9
}
},
"tokenizer_class": "GPT2Tokenizer",
"transformers_version": "4.17.0",
"use_cache": true,
"vocab_size": 50257,
"window_size": 256
}

[INFO|modeling_utils.py:1431] 2022-03-24 22:33:36,020 >> loading weights file https://huggingface.co/EleutherAI/gpt-neo-2.7B/resolve/main/pytorch_model.bin from cache at /home/max/.cache/huggingface/transformers/0839a11efa893f2a554f8f540f904b0db0e5320a2b1612eb02c3fd25471c189a.a144c17634fa6a7823e398888396dd623e204dce9e33c3175afabfbf24bd8f56
[INFO|modeling_utils.py:1485] 2022-03-24 22:33:40,536 >> Detected DeepSpeed ZeRO-3: activating zero.init() for this model
[2022-03-24 22:33:59,526] [INFO] [partition_parameters.py:456:exit] finished initializing model with 2.78B parameters
/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/nn/modules/module.py:1383: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/torch/nn/modules/module.py:1383: UserWarning: positional arguments and argument "destination" are deprecated. nn.Module.state_dict will not accept them in the future. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
warnings.warn(
[INFO|modeling_utils.py:1702] 2022-03-24 22:34:17,407 >> All model checkpoint weights were used when initializing GPTNeoForCausalLM.

[INFO|modeling_utils.py:1710] 2022-03-24 22:34:17,407 >> All the weights of GPTNeoForCausalLM were initialized from the model checkpoint at EleutherAI/gpt-neo-2.7B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use GPTNeoForCausalLM for predictions without further training.
0%| | 0/1 [00:00<?, ?ba/s][WARNING|tokenization_utils_base.py:3397] 2022-03-24 22:34:23,519 >> Token indices sequence length is longer than the specified maximum sequence length for this model (1462828 > 2048). Running this sequence through the model will result in indexing errors
100%|█████████████████████████████████████████████| 1/1 [00:05<00:00, 5.40s/ba]
Token indices sequence length is longer than the specified maximum sequence length for this model (1462828 > 2048). Running this sequence through the model will result in indexing errors
100%|█████████████████████████████████████████████| 1/1 [00:05<00:00, 5.44s/ba]
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 51.73ba/s]
run_clm.py:360: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(
03/24/2022 22:34:24 - WARNING - main - The tokenizer picked seems to have a very large model_max_length (2048). Picking 1024 instead. You can change that default value by passing --block_size xxx.
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 47.52ba/s]
run_clm.py:360: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(
03/24/2022 22:34:24 - WARNING - main - The tokenizer picked seems to have a very large model_max_length (2048). Picking 1024 instead. You can change that default value by passing --block_size xxx.
100%|█████████████████████████████████████████████| 1/1 [00:01<00:00, 1.14s/ba]
100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 147.62ba/s]
[INFO|trainer.py:457] 2022-03-24 22:34:25,574 >> Using amp half precision backend
[2022-03-24 22:34:25,578] [INFO] [logging.py:69:log_dist] [Rank 0] DeepSpeed info: version=0.6.0+a32e9b33, git-hash=a32e9b33, git-branch=HEAD
100%|█████████████████████████████████████████████| 1/1 [00:01<00:00, 1.11s/ba]
100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 182.52ba/s]
[2022-03-24 22:34:25,842] [INFO] [engine.py:277:init] DeepSpeed Flops Profiler Enabled: False
Traceback (most recent call last):
File "run_clm.py", line 478, in
main()
File "run_clm.py", line 441, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/trainer.py", line 1240, in train
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/deepspeed.py", line 424, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/init.py", line 119, in initialize
Traceback (most recent call last):
File "run_clm.py", line 478, in
engine = DeepSpeedEngine(args=args,
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 293, in init
main()
File "run_clm.py", line 441, in main
self._configure_optimizer(optimizer, model_parameters)
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1062, in _configure_optimizer
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/trainer.py", line 1240, in train
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1147, in _configure_basic_optimizer
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/site-packages/transformers/deepspeed.py", line 424, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/init.py", line 119, in initialize
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 83, in init
engine = DeepSpeedEngine(args=args,
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 293, in init
self.ds_opt_adam = CPUAdamBuilder().load()
self._configure_optimizer(optimizer, model_parameters)
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 455, in load
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1062, in _configure_optimizer
return importlib.import_module(self.absolute_name())
File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/importlib/init.py", line 127, in import_module
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/runtime/engine.py", line 1147, in _configure_basic_optimizer
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
optimizer = DeepSpeedCPUAdam(model_parameters,
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 83, in init
File "", line 975, in _find_and_load_unlocked
self.ds_opt_adam = CPUAdamBuilder().load()
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 455, in load
File "", line 657, in _load_unlocked
File "", line 556, in module_from_spec
return importlib.import_module(self.absolute_name())
File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 1166, in create_module
File "", line 991, in _find_and_load
File "", line 219, in _call_with_frames_removed
File "", line 975, in _find_and_load_unlocked
ImportError: /media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator
File "", line 657, in _load_unlocked
File "", line 556, in module_from_spec
File "", line 1166, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator

Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fa491b11a60>
Traceback (most recent call last):
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 97, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7fe75daa6a60>
Traceback (most recent call last):
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam.py", line 97, in del
self.ds_opt_adam.destroy_adam(self.opt_id)
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2022-03-24 22:34:27,182] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 69193
[2022-03-24 22:34:27,183] [INFO] [launch.py:178:sigkill_handler] Killing subprocess 69194
[2022-03-24 22:34:27,183] [ERROR] [launch.py:184:sigkill_handler] ['/home/max/anaconda3/envs/gptneo_finetuned/bin/python', '-u', 'run_clm.py', '--local_rank=1', '--deepspeed', 'ds_config.json', '--model_name_or_path', 'EleutherAI/gpt-neo-2.7B', '--train_file', 'train.csv', '--validation_file', 'validation.csv', '--do_train', '--do_eval', '--fp16', '--overwrite_cache', '--evaluation_strategy=steps', '--output_dir', 'finetuned', '--num_train_epochs', '1', '--eval_steps', '15', '--gradient_accumulation_steps', '2', '--per_device_train_batch_size', '4', '--use_fast_tokenizer', 'False', '--learning_rate', '5e-06', '--warmup_steps', '10'] exits with return code = 1

tjruwase · 2022-03-24T22:41:38Z

--deepspeed ds_config.json

Thanks! Can you please share the contents of ds_config.json?

maxmaier59 · 2022-03-25T21:09:34Z

Here is the ds_config.json

{
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"nvme_path": "nvme_data",
"pin_memory": false,
"buffer_count": 4,
"fast_init": false
},
"offload_param": {
"device": "cpu",
"nvme_path": "nvme_param",
"pin_memory": false,
"buffer_count": 5,
"buffer_size": 1e8,
"max_in_cpu": 1e10
},
"aio": {
"block_size": 262144,
"queue_depth": 32,
"thread_count": 1,
"single_submit": false,
"overlap_events": true
},
"overlap_comm": true,
"contiguous_gradients": true,
"sub_group_size": 1e9,
"reduce_bucket_size": "auto",
"stage3_prefetch_bucket_size": "auto",
"stage3_param_persistence_threshold": "auto",
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_fp16_weights_on_model_save": true
},

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false

}

maxmaier59 · 2022-03-27T21:33:58Z

BTW, there is a simpler way to reproduce the problem:
Please see the DeepSpeed tutorial for installation:
https://www.deepspeed.ai/tutorials/advanced-install/

DS_BUILD_OPS=1 pip install deepspeed

And then run this:
python -c 'import deepspeed; deepspeed.ops.adam.cpu_adam.CPUAdamBuilder().load()'

Error message:

Traceback (most recent call last):
File "", line 1, in
File "/media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/op_builder/builder.py", line 455, in load
return importlib.import_module(self.absolute_name())
File "/home/max/anaconda3/envs/gptneo_finetuned/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 657, in _load_unlocked
File "", line 556, in module_from_spec
File "", line 1166, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /media/max/Volume/GPT/finetune/DeepSpeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator

Can you please let me know what is the official way to build DeepSpeed to be able to run the cpu_adam optimizer?

To me this seems fundamentally broken.
Or maybe I am fundamentally misunderstanding how this is supposed to work

jeffra · 2022-03-28T06:03:08Z

Hi @maxmaier59, so sorry you’re running into this issue. One thing I don’t recall if I’ve asked. Can you use JIT to compile cpu Adam successfully? You can try this by installing deepspeed w/o any DS_* variables or via: “DS_BUILD_OPS=0 pip install deepspeed”.

After install you can force a build of cpu Adam in a Python shell via:

import deepspeed
deepspeed.ops.op_builder.CPUAdamBuilder().load()

You’ll need ninja installed for this to work, many setups already have this though. More info here: https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packages

maxmaier59 · 2022-03-28T20:52:23Z

Hello Jeff,
many thanks for your suggestions. This fixed my problem!
With that I was able to get the cpu Adam optimizer compiled and the finetuning started!
:-)
Many, many thanks to you and also to Olatunji
This was exactly what I was looking for.

jeffra · 2022-03-28T21:08:45Z

Excellent, really glad to hear. It still concerns me that the pre-compilation method doesn't work for you but I am glad you are unblocked for now at least. I'll close this issue for now, feel free to re-open if you have further issues along this line.

sayakpaul · 2022-04-04T06:13:48Z

I am also facing a similar problem and I have detailed about it here: https://discuss.huggingface.co/t/run-translation-py-example-is-erroring-out-with-the-recommended-settings/16432

sayakpaul · 2022-04-04T06:23:53Z

#1846 (comment) solved my problem too but I think it's a matter of concern still.

stas00 · 2022-04-04T15:47:24Z

@jeffra, if you remember these 2 interconnected threads:

I am pretty sure that's the cause of the problem for pre-building.

If you remember torch conda build worked but pip was failing.

@maxmaier59, please check if the problem goes away if you installed torch via conda.

sayakpaul · 2022-04-04T16:09:06Z

Is there a recent workaround I could refer to in case installing via conda isn't an option?

@stas00

stas00 · 2022-04-04T16:32:18Z

JIT build is the workaround if conda is not an option. And the main thread is pytorch/pytorch#69666

For some reason the problem went away for me with pip and pre-building. but perhaps it's not the case for all configurations?

could you please post the output of your: python -m torch.utils.collect_env, mine is:

PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 21.10 (x86_64)
GCC version: (Ubuntu 10.3.0-11ubuntu1) 10.3.0
Clang version: 13.0.0-2
CMake version: version 3.21.3
Libc version: glibc-2.34

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.15.32-051532-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration:
GPU 0: NVIDIA Graphics Device
GPU 1: NVIDIA GeForce RTX 3090

Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.3.3
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.3.3
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] torch==1.11.0
[pip3] torch-scatter==2.0.9
[pip3] torchaudio==0.11.0
[pip3] torchvision==0.12.0+cu115
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               h2bc3f7f_2
[conda] functorch                 0.0.1a0+2228c3b           dev_0    <develop>
[conda] mkl                       2021.4.0           h06a4308_640
[conda] mkl-service               2.4.0            py38h7f8727e_0
[conda] mkl_fft                   1.3.1            py38hd3c417c_0
[conda] mkl_random                1.2.2            py38h51133e4_0
[conda] mypy-extensions           0.4.3                    pypi_0    pypi
[conda] numpy                     1.20.3                   pypi_0    pypi
[conda] numpy-base                1.21.2           py38h79a1101_0
[conda] pytorch                   1.11.0          py3.8_cuda11.3_cudnn8.2.0_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch-nightly
[conda] torch-scatter             2.0.9                    pypi_0    pypi
[conda] torchaudio                0.11.0               py38_cu113    pytorch
[conda] torchvision               0.12.0+cu115             pypi_0    pypi

maxmaier59 · 2022-04-04T20:20:32Z

As I've mentioned above building with pip fails
DS_BUILD_OPS=1
pip install deepspeed

as well as

TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_UTILS=1 python setup.py build_ext -j8 bdist_wheel

Here is my environment:

Collecting environment information...
PyTorch version: 1.12.0.dev20220320+cu115
Is debug build: False
CUDA used to build PyTorch: 11.5
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (crosstool-NG 1.24.0.133_b0863d8_dirty) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.13.0-39-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.4.48
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3060
GPU 1: NVIDIA GeForce RTX 3060

Nvidia driver version: 510.54
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.12.0.dev20220320+cu115
[pip3] torchaudio==0.12.0.dev20220320+cu115
[pip3] torchvision==0.13.0.dev20220320+cu115
[conda] cudatoolkit-dev 11.4.0 h5e8e339_5 conda-forge
[conda] numpy 1.22.3 pypi_0 pypi
[conda] torch 1.12.0.dev20220320+cu115 pypi_0 pypi
[conda] torchaudio 0.12.0.dev20220320+cu115 pypi_0 pypi
[conda] torchvision 0.13.0.dev20220320+cu115 pypi_0 pypi

stas00 · 2022-04-05T01:23:42Z

OK, I have created a new conda env and I'm able to reproduce the problem:

conda create -y -n py38-pt112 python=3.8
conda activate py38-pt112
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu115/torch_nightly.html -U
pip install deepspeed

note, I first install deepspeed normally, so that it installs all the binary dependencies correctly. With pre-build it gets forced to build binary dependencies from scratch rather than fetch them from pypi.

git clone https://github.com/microsoft/DeepSpeed
cd DeepSpeed
TORCH_CUDA_ARCH_LIST="8.6" DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check

no failure reported during the build.

python -c "import deepspeed; deepspeed.ops.op_builder.CPUAdamBuilder().load()"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/mnt/nvme0/code/github/00optimize/deepspeed/deepspeed/ops/op_builder/builder.py", line 461, in load
    return importlib.import_module(self.absolute_name())
  File "/home/stas/anaconda3/envs/py38-pt112/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 657, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 556, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1166, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /mnt/nvme0/code/github/00optimize/deepspeed/deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so: undefined symbol: curandCreateGenerator

sayakpaul · 2022-04-05T01:36:30Z

@stas00 mine is:

Collecting environment information...
PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 10 (buster) (x86_64)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: Could not collect
CMake version: version 3.13.4
Libc version: glibc-2.10

Python version: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-4.19.0-20-cloud-amd64-x86_64-with-debian-10.12
Is CUDA available: True
CUDA runtime version: 11.0.221
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB

Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.19.5
[pip3] torch==1.11.0+cu113
[conda] mypy_extensions           0.4.3            py37h89c1867_4    conda-forge
[conda] numpy                     1.19.5           py37h038b26d_2    conda-forge
[conda] torch                     1.11.0+cu113             pypi_0    pypi

stas00 · 2022-04-10T03:58:08Z

So glad to hear it finally worked, @maxmaier59!

Does it mean that you tried installing the binary wheel and it didn't work?

djaym7 · 2023-03-06T22:25:10Z

current main branch 0.8.2+4ae3a3da gives the same error. pip installing 0.8.1 works fine

Misoknisky · 2023-07-17T09:31:34Z

And since the JIT build worked for you, let's see what you have under , i.e:~/.cache/torch_extensions/py38_cu115
find ~/.cache/torch_extensions/py38_cu115/cpu_adam/
I wonder if somehow something gets messed up there.

e.g. on my set up I have:
$ cd DeepSpeed
$ ls -l deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 stas stas 11M Apr  5 08:41 deepspeed/ops/adam/cpu_adam_op.cpython-38-x86_64-linux-gnu.so

$ ls -l /home/stas/.cache/torch_extensions/py38_cu113/cpu_adam/cpu_adam.so
-rwxrwxr-x 1 stas stas 11M Apr  8 13:15 /home/stas/.cache/torch_extensions/py38_cu113/cpu_adam/cpu_adam.so*
to get the latter one, I of course needed to do JIT, so I had to do:
pip uninstall deepspeed -y
pip install deepspeed 
python -c "from deepspeed.ops.op_builder import CPUAdamBuilder; CPUAdamBuilder().load()"
and may be let's see the log of the last command, and then we can compare its build to the prebuild log - and perhaps find what's mismatching.

Sorry, I also have this problem, I execute this command, the outputs as follow, but l aso have the proble "AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'", so how to fix it ?

Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.4532833099365234 seconds

BramVanroy · 2023-07-24T22:23:32Z

For the record, I did not have this issue on 0.10.0 but when I upgraded to the current main (777ae39), I got the issue as well. Pre-building the cpu adam solved the issue but regardless, it seems an important issue to raise.

flckv · 2023-07-26T09:31:01Z

hi @BramVanroy how did you Pre-building the cpu adam?

BramVanroy · 2023-07-26T11:50:59Z

hi @BramVanroy how did you Pre-building the cpu adam?

You find the instructions here https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops

bobo0810 · 2023-09-12T03:37:54Z

DS_BUILD_CPU_ADAM=1 BUILD_UTILS=1 pip install deepspeed -U ✅

finlytics-hub · 2024-01-06T09:45:01Z

I'm still facing the same issue.

RunTimeError during finetuning:

Installed CUDA version 11.5 does not match the version torch was compiled with 11.8 but since the APIs are compatible, accepting this combination
Using /home/asad/.cache/torch_extensions/py311_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/asad/.cache/torch_extensions/py311_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/4] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/TH -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
FAILED: custom_cuda_kernel.cuda.o 
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/TH -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 |         function(_Functor&& __f)
    |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 |         operator=(_Functor&& __f)
    |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
[2/4] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/TH -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -c /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o 
[3/4] c++ -MMD -MF cpu_adam_impl.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/TH -isystem /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256__ -D__ENABLE_CUDA__ -c /home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/csrc/adam/cpu_adam_impl.cpp -o cpu_adam_impl.o 
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2095, in _run_ninja_build
  subprocess.run(
File "/usr/local/lib/python3.11/subprocess.py", line 571, in run
  raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/asad/Spectral/challenge-2/DeepSeek-Coder/finetune/finetune_deepseekcoder.py", line 193, in <module>
  train()
File "/home/asad/Spectral/challenge-2/DeepSeek-Coder/finetune/finetune_deepseekcoder.py", line 187, in train
  trainer.train()
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/transformers/trainer.py", line 1537, in train
  return inner_training_loop(
         ^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/transformers/trainer.py", line 1675, in _inner_training_loop
  model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/accelerate/accelerator.py", line 1209, in prepare
  result = self._prepare_deepspeed(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/accelerate/accelerator.py", line 1582, in _prepare_deepspeed
  engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/__init__.py", line 171, in initialize
  engine = DeepSpeedEngine(args=args,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 304, in __init__
  self._configure_optimizer(optimizer, model_parameters)
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1208, in _configure_optimizer
  basic_optimizer = self._configure_basic_optimizer(model_parameters)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1279, in _configure_basic_optimizer
  optimizer = DeepSpeedCPUAdam(model_parameters,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/adam/cpu_adam.py", line 94, in __init__
  self.ds_opt_adam = CPUAdamBuilder().load()
                     ^^^^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 458, in load
  return self.jit_load(verbose)
         ^^^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/op_builder/builder.py", line 502, in jit_load
  op_module = load(name=self.name,
              ^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1305, in load
  return _jit_compile(
         ^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1709, in _jit_compile
  _write_ninja_file_and_build_library(
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1822, in _write_ninja_file_and_build_library
  _run_ninja_build(
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2111, in _run_ninja_build
  raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f7601f99800>
Traceback (most recent call last):
File "/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed/ops/adam/cpu_adam.py", line 102, in __del__
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

ds_report

DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
    runtime if needed. Op compatibility means that your system
    meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING]  using untested triton version (2.2.0+e28a256d71), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/torch']
torch version .................... 2.3.0.dev20240105+cu118
deepspeed install path ........... ['/home/asad/Spectral/challenge-2/c2env/lib/python3.11/site-packages/deepspeed']
deepspeed info ................... 0.12.6, unknown, unknown
torch cuda version ............... 11.8
torch hip version ................ None
nvcc version ..................... 11.5
deepspeed wheel compiled w. ...... torch 0.0, cuda 0.0
shared memory (/dev/shm) size .... 29.42 GB

What I have tried:

normal pip install deepspeed
@jeffra suggestion here

Stack trace of Jeffra's suggestion

>>> deepspeed.ops.op_builder.CPUAdamBuilder().load()
Installed CUDA version 11.5 does not match the version torch was compiled with 11.8 but since the APIs are compatible, accepting this combination
Using /home/asad/.cache/torch_extensions/py311_cu118 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/asad/.cache/torch_extensions/py311_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/asad/Spectral/challenge-2/DeepSpeed/csrc/includes -I/usr/include -isystem /home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/include -isystem /home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/include/TH -isystem /home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/asad/Spectral/challenge-2/DeepSpeed/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
FAILED: custom_cuda_kernel.cuda.o 
/usr/bin/nvcc --generate-dependencies-with-compile --dependency-output custom_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/asad/Spectral/challenge-2/DeepSpeed/csrc/includes -I/usr/include -isystem /home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/include -isystem /home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/include/TH -isystem /home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/include/THC -isystem /usr/local/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -c /home/asad/Spectral/challenge-2/DeepSpeed/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o 
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 |         function(_Functor&& __f)
    |                                                                                                                                                 ^ 
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 |         operator=(_Functor&& __f)
    |                                                                                                                                                  ^ 
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2095, in _run_ninja_build
  subprocess.run(
File "/usr/local/lib/python3.11/subprocess.py", line 571, in run
  raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/asad/Spectral/challenge-2/DeepSpeed/op_builder/builder.py", line 478, in load
  return self.jit_load(verbose)
         ^^^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/DeepSpeed/op_builder/builder.py", line 522, in jit_load
  op_module = load(name=self.name,
              ^^^^^^^^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1305, in load
  return _jit_compile(
         ^^^^^^^^^^^^^
File "/home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1709, in _jit_compile
  _write_ninja_file_and_build_library(
File "/home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1822, in _write_ninja_file_and_build_library
  _run_ninja_build(
File "/home/asad/Spectral/challenge-2/c2env2/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2111, in _run_ninja_build
  raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'

Any help will be appreciated please!

Protekly · 2024-02-24T11:31:45Z

clone the git and do python setup.py install

rangehow · 2024-04-04T10:19:10Z

Even the latest version of deepspeed still has this problem ...

42elenz · 2024-04-27T00:18:59Z

I am also struggling with this problem.
I installed Pytorch via pip and via conda because if I install via conda it doesn't recognize my cuda.
As I understood so far the problem is not there if you are installing with conda, but this is not working because than it doesn't recognize my Cuda.
I don't get the prebuilt solutions from above.
Can someone give a step by step instructions to solve this problem in an easy way with commands I should run?
I also have no sudo rights so installing cudatoolkit should be working with cuda (or with pip if this exists).

Summary:
Just pip installation for Pytorch (pip3 install torch torchvision torchaudio)
cudatoolkit installation only with cuda or pip possible (no sudo rights)

Hopeing for help.

Protekly · 2024-04-27T01:03:10Z

Either install deepspeed from source or upgrade to the newest version of deepspeed, ```pip install --upgrade deepspeed```

…

On Fri, Apr 26, 2024 at 7:19 PM Esra Lenz ***@***.***> wrote: I am also struggling with this problem. I installed Pytorch via pip and via conda because if I install via conda it doesn't recognize my cuda. As I understood so far the problem is not there if you are installing with conda, but this is not working because than it doesn't recognize my Cuda. I don't get the prebuilt solutions from above. Can someone give a step by step instructions to solve this problem in an easy way with commands I should run? I also have no sudo rights so installing cudatoolkit should be working with cuda (or with pip if this exists). Summary: Just pip installation for Pytorch (pip3 install torch torchvision torchaudio) cudatoolkit installation only with cuda or pip possible (no sudo rights) Hopeing for help. — Reply to this email directly, view it on GitHub <#1846 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A3WU7BYZIPRMT4E6DH5YSIDY7LVIRAVCNFSM5RDB6W42U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYGAZDMMRUG42A> . You are receiving this because you commented.Message ID: ***@***.***>

Alpha-Girl · 2024-05-08T12:50:19Z

I face the same problem [AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam']
deepspeed==0.14.2 python=3.10

yannikkellerde · 2024-06-10T12:06:13Z

Just had the same problem:
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
preceded by
subprocess.CalledProcessError: Command '['which', '/u/xxxxx/conda-envs/xxxxx/bin/x86_64-conda-linux-gnu-c++']' returned non-zero exit status

I managed to fix it by installing x86_64-conda-linux-gnu-c++ using conda install gxx_linux-64. Maybe that helps someone.

lekurile · 2024-07-15T22:48:07Z

Hi @yannikkellerde, @Alpha-Girl, @42elenz, @rangehow,

Can you please try uninstalling deepspeed:

python3 -m pip uninstall deepspeed

If deepspeed was installed from source, also clean the repo:

cd DeepSpeed
git clean -fdx

Then install w/ DS_BUILD_CPU_ADAM=1:

DS_BUILD_CPU_ADAM=1 python3 -m pip install deepspeed

From source:

cd DeepSpeed
DS_BUILD_CPU_ADAM=1 python3 -m pip install -e .

lekurile · 2024-08-01T20:11:22Z

Hi @maxmaier59,

Can you please try installing the latest DeepSpeed from source since #5780 has been merged addressing this issue.

Thanks,
Lev

LiamZhao326 · 2024-08-03T08:31:13Z

Hi @maxmaier59, so sorry you’re running into this issue. One thing I don’t recall if I’ve asked. Can you use JIT to compile cpu Adam successfully? You can try this by installing deepspeed w/o any DS_* variables or via: “DS_BUILD_OPS=0 pip install deepspeed”.

After install you can force a build of cpu Adam in a Python shell via:

import deepspeed deepspeed.ops.op_builder.CPUAdamBuilder().load()

You’ll need ninja installed for this to work, many setups already have this though. More info here: https://github.com/ninja-build/ninja/wiki/Pre-built-Ninja-packages

hello jeff! I follow your advice but I find a new mistake:
Emitting ninja build file /home/kodak/.cache/torch_extensions/py38_cu113/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/kodak/.conda/envs/control-v11/lib/python3.8/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
FAILED: cpu_adam.so
c++ cpu_adam.o cpu_adam_impl.o -shared -lcurand -L/home/kodak/.conda/envs/control-v11/lib/python3.8/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o cpu_adam.so
/usr/bin/ld: cannot find -lcurand
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/kodak/.conda/envs/control-v11/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1808, in _run_ninja_build
subprocess.run(
File "/home/kodak/.conda/envs/control-v11/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/raid/disk1/zzcworkspace/AdaBLDM-main/train1.py", line 12, in
deepspeed.ops.op_builder.CPUAdamBuilder().load()
File "/home/kodak/.conda/envs/control-v11/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 508, in load
return self.jit_load(verbose)
File "/home/kodak/.conda/envs/control-v11/lib/python3.8/site-packages/deepspeed/ops/op_builder/builder.py", line 555, in jit_load
op_module = load(name=self.name,
File "/home/kodak/.conda/envs/control-v11/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/home/kodak/.conda/envs/control-v11/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/kodak/.conda/envs/control-v11/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1537, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/kodak/.conda/envs/control-v11/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1824, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'cpu_adam'
can you help me with that? Thank you so much!

LiamZhao326 · 2024-08-03T08:58:40Z

I’ve solved the problem here

den-run-ai · 2024-10-08T20:01:25Z

this solved:

DS_BUILD_OPS=0 DS_BUILD_CPU_ADAM=1 pip install deepspeed --no-cache

gndps · 2024-11-02T21:00:11Z

First you wanna check if cpu_adam is compatible with your (driver version + cuda version) pair. You can do this using ds_report command as mentioned here. Then once you confirm that your system is compatible to build cpu adam, you can further debug using your error log. For me, the steps to fix were these:

ds_report | grep adam
which returned op... installed_status..... compatibility like this:

fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]

I knew that it was compatible, so I stopped fixing compatibility issues and focused on other parts of the error logs. Then I noticed a missing python.h in the log. Which lead me to doing this:

sudo apt install python3-dev
sudo apt install libpython3.11-dev

which fixed the problem for me

jeffra closed this as completed Mar 28, 2022

stas00 mentioned this issue Apr 4, 2022

undefined symbol curandCreateGenerator for torch extensions pytorch/pytorch#69666

Closed

vboginskey mentioned this issue Aug 16, 2022

[BUG] Prebuilt transformer_inference op not linked against libcurand #2226

Closed

zyuh mentioned this issue Mar 31, 2023

About deepspeed and fsdp speed differences？ OptimalScale/LMFlow#24

Closed

iMountTai mentioned this issue May 17, 2023

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' ymcui/Chinese-LLaMA-Alpaca#368

Closed

JohnTailor mentioned this issue Jun 26, 2023

DeepSpeed compilation (cpu_adam issue) tatsu-lab/stanford_alpaca#288

Open

orangetin mentioned this issue Jul 2, 2023

Problem with Deepspeed integration huggingface/transformers#24438

Closed

4 tasks

This was referenced Jul 6, 2023

[BUG] No supported gcc/g++ host compiler found. (torchCuda11.8 + transformer + deepspeed ) #3890

Closed

No supported gcc/g++ host compiler found. (torchCuda11.8 + transformer + deepspeed ) pytorch/pytorch#104708

Closed

yuchenlin mentioned this issue Aug 3, 2023

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' SwiftSage/SwiftSage#4

Closed

FantasticGNU mentioned this issue Oct 18, 2023

训练报错 CASIA-IVA-Lab/AnomalyGPT#34

Closed

markli404 mentioned this issue Jan 4, 2024

sft报错：ValueError: YiForCausalLM does not support Flash Attention 2.0 yet. 01-ai/Yi#281

Closed

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #1846

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam' #1846

Comments

maxmaier59 commented Mar 18, 2022

jeffra commented Mar 19, 2022

maxmaier59 commented Mar 19, 2022

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

maxmaier59 commented Mar 19, 2022

maxmaier59 commented Mar 20, 2022

jeffra commented Mar 20, 2022

maxmaier59 commented Mar 20, 2022

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

maxmaier59 commented Mar 22, 2022

tjruwase commented Mar 22, 2022

maxmaier59 commented Mar 22, 2022

tjruwase commented Mar 23, 2022

maxmaier59 commented Mar 23, 2022

maxmaier59 commented Mar 24, 2022

tjruwase commented Mar 24, 2022

maxmaier59 commented Mar 24, 2022

tjruwase commented Mar 24, 2022

maxmaier59 commented Mar 25, 2022

maxmaier59 commented Mar 27, 2022

jeffra commented Mar 28, 2022 • edited Loading

maxmaier59 commented Mar 28, 2022

jeffra commented Mar 28, 2022

sayakpaul commented Apr 4, 2022

sayakpaul commented Apr 4, 2022

stas00 commented Apr 4, 2022 • edited Loading

sayakpaul commented Apr 4, 2022

stas00 commented Apr 4, 2022

maxmaier59 commented Apr 4, 2022

stas00 commented Apr 5, 2022

sayakpaul commented Apr 5, 2022

stas00 commented Apr 10, 2022

djaym7 commented Mar 6, 2023

Misoknisky commented Jul 17, 2023

BramVanroy commented Jul 24, 2023

flckv commented Jul 26, 2023

BramVanroy commented Jul 26, 2023

bobo0810 commented Sep 12, 2023

finlytics-hub commented Jan 6, 2024

Protekly commented Feb 24, 2024

rangehow commented Apr 4, 2024

42elenz commented Apr 27, 2024

Protekly commented Apr 27, 2024 via email

Alpha-Girl commented May 8, 2024

yannikkellerde commented Jun 10, 2024

lekurile commented Jul 15, 2024 • edited Loading

lekurile commented Aug 1, 2024

LiamZhao326 commented Aug 3, 2024

LiamZhao326 commented Aug 3, 2024

den-run-ai commented Oct 8, 2024

gndps commented Nov 2, 2024

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

jeffra commented Mar 28, 2022 •

edited

Loading

stas00 commented Apr 4, 2022 •

edited

Loading

lekurile commented Jul 15, 2024 •

edited

Loading