Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT-LLM build issue #464

Closed
tapansstardog opened this issue Nov 24, 2023 · 1 comment
Closed

TensorRT-LLM build issue #464

tapansstardog opened this issue Nov 24, 2023 · 1 comment
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@tapansstardog
Copy link

tapansstardog commented Nov 24, 2023

Team,

I have g5.12xlarge machine and have started facing this problem since yesterday.
I was able to build tensorrt_llm image successfully a month ago. Unfortunately, I deleted the image and when I am building it using the same process as described here, I am getting the below error while installing mpi4py

#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -c src/lib-pmpi/vt-hyb.c -o build/temp.linux-x86_64-3.10/src/lib-pmpi/vt-hyb.o
#0 47.00       /usr/local/mpi/bin/mpicc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,--no-as-needed build/temp.linux-x86_64-3.10/src/lib-pmpi/vt-hyb.o -o build/lib.linux-x86_64-3.10/mpi4py/lib-pmpi/libvt-hyb.so
#0 47.00       running build_ext
#0 47.00       MPI configuration: [mpi] from 'mpi.cfg'
#0 47.00       MPI C compiler:    /usr/local/mpi/bin/mpicc
#0 47.00       MPI C++ compiler:  /usr/local/mpi/bin/mpicxx
#0 47.00       MPI F compiler:    /usr/local/mpi/bin/mpifort
#0 47.00       MPI F90 compiler:  /usr/local/mpi/bin/mpif90
#0 47.00       MPI F77 compiler:  /usr/local/mpi/bin/mpif77
#0 47.00       checking for dlopen() availability ...
#0 47.00       checking for header 'dlfcn.h' ...
#0 47.00       x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o
#0 47.00       success!
#0 47.00       checking for library 'dl' ...
#0 47.00       x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       x86_64-linux-gnu-gcc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for function 'dlopen' ...
#0 47.00       x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       x86_64-linux-gnu-gcc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       building 'mpi4py.dl' extension
#0 47.00       x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DHAVE_DLFCN_H=1 -DHAVE_DLOPEN=1 -I/usr/include/python3.10 -c src/dynload.c -o build/temp.linux-x86_64-3.10/src/dynload.o
#0 47.00       x86_64-linux-gnu-gcc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.10/src/dynload.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o build/lib.linux-x86_64-3.10/mpi4py/dl.cpython-310-x86_64-linux-gnu.so
#0 47.00       checking for MPI compile and link ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for missing MPI functions/symbols ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o
#0 47.00       checking for function 'MPI_Type_create_f90_integer' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for function 'MPI_Type_create_f90_real' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for function 'MPI_Type_create_f90_complex' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for function 'MPI_Status_c2f' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for function 'MPI_Status_f2c' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for symbol 'MPI_LB' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for symbol 'MPI_UB' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for dlopen() availability ...
#0 47.00       checking for header 'dlfcn.h' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o
#0 47.00       success!
#0 47.00       checking for library 'dl' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       checking for function 'dlopen' ...
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.10 -c _configtest.c -o _configtest.o
#0 47.00       /usr/local/mpi/bin/mpicc _configtest.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o _configtest
#0 47.00       success!
#0 47.00       removing: _configtest.c _configtest.o _configtest
#0 47.00       building 'mpi4py.MPI' extension
#0 47.00       /usr/local/mpi/bin/mpicc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DHAVE_DLFCN_H=1 -DHAVE_DLOPEN=1 -I/usr/include/python3.10 -c src/MPI.c -o build/temp.linux-x86_64-3.10/src/MPI.o
#0 47.00       /usr/local/mpi/bin/mpicc -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -g -fwrapv -O2 -Wl,-Bsymbolic-functions -g -fwrapv -O2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.10/src/MPI.o -Lbuild/temp.linux-x86_64-3.10 -ldl -o build/lib.linux-x86_64-3.10/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so
#0 47.00       writing build/lib.linux-x86_64-3.10/mpi4py/mpi.cfg
#0 47.00       Traceback (most recent call last):
#0 47.00         File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
#0 47.00           main()
#0 47.00         File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
#0 47.00           json_out['return_val'] = hook(**hook_input['kwargs'])
#0 47.00         File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
#0 47.00           return _build_backend().build_wheel(wheel_directory, config_settings,
#0 47.00         File "/tmp/pip-build-env-akzi4u8r/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 404, in build_wheel
#0 47.00           return self._build_with_temp_dir(
#0 47.00         File "/tmp/pip-build-env-akzi4u8r/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 389, in _build_with_temp_dir
#0 47.00           self.run_setup()
#0 47.00         File "/tmp/pip-build-env-akzi4u8r/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
#0 47.00           exec(code, locals())
#0 47.00         File "<string>", line 644, in <module>
#0 47.00         File "<string>", line 641, in main
#0 47.00         File "<string>", line 492, in run_setup
#0 47.00         File "/tmp/pip-install-4_ju1s6y/mpi4py_a4ad3742c507435085a6fc4b887bf5d4/conf/mpidistutils.py", line 541, in setup
#0 47.00           return fcn_setup(**attrs)
#0 47.00         File "/tmp/pip-build-env-akzi4u8r/overlay/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 103, in setup
#0 47.00           return distutils.core.setup(**attrs)
#0 47.00         File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
#0 47.00           dist.run_commands()
#0 47.00         File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
#0 47.00           self.run_command(cmd)
#0 47.00         File "/tmp/pip-build-env-akzi4u8r/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 963, in run_command
#0 47.00           super().run_command(command)
#0 47.00         File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
#0 47.00           cmd_obj.run()
#0 47.00         File "/tmp/pip-build-env-akzi4u8r/overlay/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 371, in run
#0 47.00           install = self.reinitialize_command("install", reinit_subcommands=True)
#0 47.00         File "/tmp/pip-build-env-akzi4u8r/overlay/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 216, in reinitialize_command
#0 47.00           cmd = _Command.reinitialize_command(self, command, reinit_subcommands)
#0 47.00         File "/usr/lib/python3.10/distutils/cmd.py", line 305, in reinitialize_command
#0 47.00           return self.distribution.reinitialize_command(command,
#0 47.00         File "/usr/lib/python3.10/distutils/dist.py", line 938, in reinitialize_command
#0 47.00           command = self.get_command_obj(command_name)
#0 47.00         File "/usr/lib/python3.10/distutils/dist.py", line 858, in get_command_obj
#0 47.00           cmd_obj = self.command_obj[command] = klass(self)
#0 47.00         File "/tmp/pip-build-env-akzi4u8r/overlay/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 174, in __init__
#0 47.00           super().__init__(dist)
#0 47.00         File "/usr/lib/python3.10/distutils/cmd.py", line 62, in __init__
#0 47.00           self.initialize_options()
#0 47.00         File "/tmp/pip-build-env-akzi4u8r/overlay/local/lib/python3.10/dist-packages/setuptools/command/install.py", line 50, in initialize_options
#0 47.00           orig.install.initialize_options(self)
#0 47.00         File "/usr/lib/python3.10/_distutils_system_mod.py", line 33, in initialize_options
#0 47.00           super().initialize_options()
#0 47.00       TypeError: super(type, obj): obj must be an instance or subtype of type
#0 47.00       [end of output]
#0 47.00   
#0 47.00   note: This error originates from a subprocess, and is likely not a problem with pip.
#0 47.00   ERROR: Failed building wheel for mpi4py
#0 47.00 Failed to build mpi4py
#0 47.00 ERROR: Could not build wheels for mpi4py, which is required to install pyproject.toml-based projects
#0 47.35 
#0 47.35 [notice] A new release of pip is available: 23.2.1 -> 23.3.1
#0 47.35 [notice] To update, run: python -m pip install --upgrade pip
------
Dockerfile.multi:14
--------------------
  12 |     
  13 |     COPY docker/common/install_base.sh install_base.sh
  14 | >>> RUN bash ./install_base.sh && rm install_base.sh
  15 |     
  16 |     COPY docker/common/install_cmake.sh install_cmake.sh
--------------------
ERROR: failed to solve: process "/bin/bash -c bash ./install_base.sh && rm install_base.sh" did not complete successfully: exit code: 1
make: *** [Makefile:47: release_build] Error 1

I had run the below steps before building tensorrt_llm:

git submodule update --init --recursive
git lfs install
git lfs pull

I have also built the images: tritonserver_cibase, tritonserver, tritonserver-trtllm, tritonserver_buildbase

Any suggestions?

@matichon-vultureprime
Copy link
Contributor

Solved in issue Issue447.

@Shixiaowei02 Shixiaowei02 self-assigned this Nov 24, 2023
@Shixiaowei02 Shixiaowei02 added invalid triaged Issue has been triaged by maintainers labels Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants