Improvement suggestions for the multi-backend-refactor installation instructions #1219

Titus-von-Koeller · 2024-05-24T14:43:59Z

Titus-von-Koeller
May 24, 2024
Maintainer

Please help us out with snippets and recommendations to make the experience as pain-free as possible.

lhl · 2024-05-26T04:07:55Z

lhl
May 26, 2024

I just tested out the multi-backend-refactor for ROCm (Ubuntu 22.04 LTS HWE, ROCm 6.0 (using the standard AMD ROCm repo)) on RDNA3 navi3x gfx1000 (W7900 and 7900XTX). The basic installation instructions worked fine for me.

I tested loading a llama3 model w/ load_in_8bits and load_in_4bits (I can never remember the new config settings) and it seemed to run fine on each card and w/ both GPUs (although it only loads on the first) or each separately using HIP_VISIBLE_DEVICES.

Quick perf notes on the W7900 and a Llama 3 8B model, using torch.nograd() - 2.3.0+rocm6.0:

19 tok/s w/ full model
6 tok/s w/ load_in_8bit
26 tok/s w/ load_in_4bit

I tested a couple times just to make sure it wasn't a fluke, but looks like for gfx1100 at least, load_in_8bit is very slow for some reason.

1 reply

Titus-von-Koeller Jun 3, 2024
Maintainer Author

Really cool, thanks so much @lhl for your valuable feedback, really appreciated!

cc @pnunna93 (from AMD)

sanchitintel · 2024-05-28T22:05:12Z

sanchitintel
May 28, 2024

Hi @jianan-gu @mingfeima @ashokei @Kanya-Mo,

As mentioned in the bitsandbytes README, we can provide feedback for Intel CPUs & GPUs here.
Among other things, it'd allow us to report the status of support of #1226 (while @jianan-gu contributed to the multi-backend-refactor branch, it has also had other commits since, and since it doesn't seem to be covered by a CI job, we could report its latest status here).

Thanks!

1 reply

Titus-von-Koeller Jun 5, 2024
Maintainer Author

Hey @sanchitintel, thanks for getting in touch. I think for general reporting and discussion topics, this discussion is better suited. I would like to keep these threads here mostly about the installation process + instructions.

Be sure to always tag me, so that it doesn't slip my radar. Currently, we're really short on resources and otherwise it go unnoticed for too long. Thanks!

PatchouliPatch · 2024-05-31T11:51:38Z

PatchouliPatch
May 31, 2024

how do I compile it from source? is the installation for both HIP and CUDA the same?

11 replies

PatchouliPatch Jun 5, 2024

also, just curious but do you think it's possible to achieve a sort-of CPU + GPU training with some of the layers offloaded to the CPU with bitsandbytes? Or maybe even an NPU in the future?

pnunna93 Jun 5, 2024

Hi, Could you please share outputs of following commands? pip show torch hipconfig --version

here ya go friend

(lowlevnltnt) gabriel@gabriel-B650-AORUS-ELITE-AX:~/Documents/llava-data-prep$ pip3 show torch
Name: torch
Version: 2.3.0+rocm6.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /home/gabriel/anaconda3/envs/lowlevnltnt/lib/python3.10/site-packages
Requires: filelock, fsspec, jinja2, networkx, pytorch-triton-rocm, sympy, typing-extensions
Required-by: accelerate, bitsandbytes, flash-attn, lion-pytorch, thop, torchaudio, torchvision, ultralytics

hipconfig --version -> 6.1.40092-038397aaa

The torch version is for rocm6.0. Please reinstall 6.1 torch and build bitsandbytes again:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/

PatchouliPatch Jun 6, 2024

Alright, this worked for me too. Didn't know that ROCm support was this strict. Thanks for this. What kind of metrics and data would be useful for you guys for me to test?

jmelovich Jun 24, 2024

This was helpful for me, as I have been having issues building but this thread made me realize that I had torch 2.3.1+rocm5.7 installed. Uninstalling it and reinstalling it with pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/ fixed this issue.

Also for anyone else who might be facing a similar situation as me, I was getting this error while trying to build: Compiling the HIP compiler identification source file "CMakeHIPCompilerId.hip" failed. Compiler: /opt/rocm-6.1.3/llvm/bin/clang++

This fixed it for me: sudo apt-get install libstdc++-12-dev

Titus-von-Koeller Jul 17, 2024
Maintainer Author

also, just curious but do you think it's possible to achieve a sort-of CPU + GPU training with some of the layers offloaded to the CPU with bitsandbytes? Or maybe even an NPU in the future?

@PatchouliPatch In principle these things could be possible going forward if they're possible in vanilla PyTorch. Currently, the multi-backend approach is still implemented with some custom logic, but in the upcoming weeks I'm going to refactor things so that the BNB operations currently in the Backend class will instead be registered with PyTorch as custom ops via the torch.library API. This will make it so that the PyTorch dispatcher is used to decide which implementations to choose in the same way it's done in PyTorch. A big part of that is which device the tensor is on. It's my understanding that if the tensor is on NPU or CPU, then the implementation for that hardware are automatically triggered.

Either way, I think that's not the main use-case we're aiming for, but with the upcoming refactor things like that might be possible in the framework of what PyTorch allows. We'd be happy to hear and support to a certain degree, if folks choose to experiment with this in the future.

Regarding NPU, that of course also depends if someone from the community is willing to contribute the implementations for that. At BNB we'll continue to focus on CUDA implementations, but are happy to assist with and merge community implementations, as is the case now with AMD and Intel. The idea is that Apple Silicon follows next. In principle every other platform can be supported if there's enough interest from the community and they take the lead on implementation and continued maintenance thereof.

mohamedyassin1 · 2024-06-19T23:11:03Z

mohamedyassin1
Jun 19, 2024

Hi, I just tested out building bitsandbytes from ROCm installation instructions

Environment details:

OS: Ubuntu 22.04.4
Graphics Card: AMD Radeon PRO W7900, gfx1100
Using a docker container built from the rocm/dev-ubuntu-22.04:6.1 image tag

After navigating to the "multi-backend-refactor" branch, and following the installation instructions, I get these errors when trying to perform the step cmake -DCOMPUTE_BACKEND=hip -S .:

Could not find a package configuration file provided by "hiprand"
Could not find a package configuration file provided by "hipsparse"

I was able to fix these errors with:

apt install hiprand
apt install hipsparse

Then, when at the step to make, I get the following errors:
bitsandbytes/csrc/ops.hip:10:10: fatal error: 'hipcub/hipcub.hpp' file not found 10 | #include <hipcub/hipcub.hpp>
bitsandbytes/csrc/kernels.hip:13:10: fatal error: 'thrust/host_vector.h' file not found 13 | #include <thrust/host_vector.h>

I was able to fix these errors with:

apt install hipcub
apt install rocthrust

Furthermore, when trying to run the example script bitsandbytes/examples/int8_inference_huggingface.py, these two dependencies were needed and were not mentioned in the instructions:

pip install sentencepiece
pip install protobuf

I realize these issues may be due to the fact that I'm using a Docker container as my environment, but figured I'd share my process regardless.

(Extra) Here are some Dockerfile snippets I used to help build bitsandbytes on the multi-backend-refactor branch:

FROM rocm/dev-ubuntu-22.04:6.1

RUN apt-get update && \
    apt-get install -y \
    git git-lfs \
    rocthrust \
    hipcub \
    hiprand \
    hipblas \
    hipblaslt \
    hipsparse \
    curl \
    && curl -L https://cmake.org/files/v3.29/cmake-3.29.6-linux-x86_64.tar.gz --output /tmp/cmake-3.29.6.tar.gz \
    && tar -xzf /tmp/cmake-3.29.6.tar.gz -C /tmp/ && cd /tmp/cmake-3.29.6-linux-x86_64/ \
    && cp bin/ share/ doc/ /usr/local/ -r && rm -rf /tmp/cmake-3.29.6*

RUN python3 -m pip install sentencepiece
RUN python3 -m pip install protobuf
RUN python3 -m pip install --upgrade pip
RUN python3 -m pip install einops lion_pytorch accelerate
RUN python3 -m pip install git+https://github.com/ROCm/transformers.git
RUN python3 -m pip install -U "huggingface_hub[cli]"

Update:

To get the example script bitsandbytes/examples/int8_inference_huggingface.py to successfully run without any modifications, I changed the rocm docker tag from 6.1 to 6.0. Might come back with a more detailed update regarding issues with 6.1 after some more tests.

Relevant pip modules for torch that I used:

torch 2.3.1+rocm6.0
torchaudio 2.3.1+rocm6.0
torchvision 0.18.1+rocm6.0
pytorch-triton-rocm 2.3.1

If I recall correctly, pip list only showed torch 2.3.1 not torch 2.3.1+rocm6.0 after following the ROCm installation instructions , so I manually installed torch using: pip3 install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0. I might be mistaken about this though.

5 replies

pnunna93 Jun 20, 2024

@mohamedyassin1 , please use rocm/pytorch:latest docker, it has all hip packages and pytorch installed. If you'd like to use a rocm base docker and install pytorch manually you can use rocm/dev-ubuntu-22.04:6.1-complete, it comes with all hip packages.

mohamedyassin1 Jun 20, 2024

Thank you! that helps a lot

Titus-von-Koeller Jul 17, 2024
Maintainer Author

@mohamedyassin1 Thank you so much for your well written and detailed post, as well as giving things fully reproducible through providing the Dockerfile, really cool and helpful!

@pnunna93 Is there anything we can improve in the AMD related docs based on this feedback? On another note, I think while the Docker approach is really helpful, we shouldn't depend on only that for our main docs. A lot of people can't or don't want to use Docker or only have access to a machine without admin, for example in an academic lab. Looking at it from that perspective, it would be great if one could install the missing system dependencies with Conda-forge, because that doesn't require sudo.. Not sure if that's possible?

Would you be so kind to submit a PR to update the docs for the example? That would be super helpful! The example worked fine for you otherwise, @mohamedyassin1 ?

mohamedyassin1 Jul 19, 2024

@Titus-von-Koeller Can do! Will double check the example and update documentation accordingly through a PR.

pnunna93 Jul 31, 2024

@Titus-von-Koeller, I have linked ROCm official install instructions to the documentation in packaging PR. It provides alternatives to install ROCm directly.

PatchouliPatch · 2024-07-17T01:09:05Z

PatchouliPatch
Jul 17, 2024

Heya, I had the unfortunate event of my SSD corrupting so I had to reinstall Ubuntu.

For some reason, Bitsandbytes compiles fine but fails when I run python3 -m bitsandbytes to test.
I get this:

(torch) user@chingu:~/bitsandbytes$ python3 -m bitsandbytes
Could not find the bitsandbytes CUDA binary at PosixPath('/home/bitsandbytes/bitsandbytes/libbitsandbytes_hip.so')
Could not load bitsandbytes native library: /home/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
  File "/home/bitsandbytes/bitsandbytes/cextension.py", line 124, in <module>
    lib = get_native_library()
          ^^^^^^^^^^^^^^^^^^^^
  File "/home/bitsandbytes/bitsandbytes/cextension.py", line 104, in get_native_library
    dll = ct.cdll.LoadLibrary(str(binary_path))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anaconda3/envs/torch/lib/python3.11/ctypes/__init__.py", line 454, in LoadLibrary
    return self._dlltype(name)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/anaconda3/envs/torch/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: /home/bitsandbytes/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory

CUDA Setup failed despite CUDA being available. Please run the following command to get more information:

python -m bitsandbytes

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++ BUG REPORT INFORMATION ++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
CUDA specs: CUDASpecs(highest_compute_capability=(11, 0), cuda_version_string='61', cuda_version_tuple=(6, 1))
PyTorch settings found: CUDA_VERSION=61, Highest Compute Capability: (11, 0).
Library not found: /home/gabriel/bitsandbytes/bitsandbytes/libbitsandbytes_hip.so. Maybe you need to compile it from source?
If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION`,
for example, `make CUDA_VERSION=113`.

The CUDA version for the compile might depend on your conda install, if using conda.
Inspect CUDA version via `conda list | grep cuda`.
WARNING: CUDA versions lower than 11 are currently not supported for LLM.int8().
You will be only to use 8-bit optimizers and quantization routines!
To manually override the PyTorch CUDA version please see: https://github.com/TimDettmers/bitsandbytes/blob/main/docs/source/nonpytorchcuda.mdx
The directory listed in your path is found to be non-existent: local/chingu
The directory listed in your path is found to be non-existent: @/tmp/.ICE-unix/1630,unix/chingu
The directory listed in your path is found to be non-existent: /etc/xdg/xdg-ubuntu
The directory listed in your path is found to be non-existent: /org/gnome/Terminal/screen/135c10b6_a8a1_4974_8ac8_294d12aa05a3
CUDA SETUP: WARNING! CUDA runtime files not found in any environmental path.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Checking that the library is importable and CUDA is callable...
Couldn't load the bitsandbytes library, likely due to missing binaries.
Please ensure bitsandbytes is properly installed.

For source installations, compile the binaries with `cmake -DCOMPUTE_BACKEND=cuda -S .`.
See the documentation for more details if needed.

Trying a simple check anyway, but this will likely fail...
Traceback (most recent call last):
  File "/home/bitsandbytes/bitsandbytes/diagnostics/main.py", line 66, in main
    sanity_check()
  File "/home/bitsandbytes/bitsandbytes/diagnostics/main.py", line 40, in sanity_check
    adam.step()
  File "/home/anaconda3/envs/torch/lib/python3.11/site-packages/torch/optim/optimizer.py", line 487, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/anaconda3/envs/torch/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/bitsandbytes/bitsandbytes/optim/optimizer.py", line 287, in step
    self.update_step(group, p, gindex, pindex)
  File "/home/anaconda3/envs/torch/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/bitsandbytes/bitsandbytes/optim/optimizer.py", line 496, in update_step
    F.optimizer_update_32bit(
  File "/home/bitsandbytes/bitsandbytes/functional.py", line 1160, in optimizer_update_32bit
    return backends[g.device.type].optimizer_update_32bit(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bitsandbytes/bitsandbytes/backends/cuda.py", line 870, in optimizer_update_32bit
    optim_func = str2optimizer32bit[optimizer_name][0]
                 ^^^^^^^^^^^^^^^^^^
NameError: name 'str2optimizer32bit' is not defined
Above we output some debug information.
Please provide this info when creating an issue via https://github.com/TimDettmers/bitsandbytes/issues/new/choose
WARNING: Please be sure to sanitize sensitive info from the output before posting it.

am I doing anything wrong?

hipconfig --version: 6.1.40093-bd86f1708
torch:
Name: torch
Version: 2.5.0.dev20240715+rocm6.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: /home/gabriel/anaconda3/envs/torch/lib/python3.11/site-packages
Requires: filelock, fsspec, jinja2, networkx, pytorch-triton-rocm, sympy, typing-extensions
Required-by: bitsandbytes, torchaudio, torchvision

does this only work with certain python versions?

2 replies

PatchouliPatch Jul 17, 2024

Nevermind, solved with:

cmake -DCOMPUTE_BACKEND=hip -S .
make

apparently, the instructions above were inadequate

Titus-von-Koeller Jul 17, 2024
Maintainer Author

@PatchouliPatch thanks a lot for the feedback!

@pnunna93 I think the part that's outputting

For source installations, compile the binaries with `cmake -DCOMPUTE_BACKEND=cuda -S .`.
See the documentation for more details if needed.

would profit from an update that detects if rocm is the available and add's a fitting installation command there. We haven't tested that yet, but should we consider people having both cuda and rocm?

Either way, I think that string should also contain a && make, because otherwise it's incomplete..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement suggestions for the multi-backend-refactor installation instructions #1219

{{title}}

Replies: 5 comments 20 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Improvement suggestions for the multi-backend-refactor installation instructions #1219

Titus-von-Koeller May 24, 2024 Maintainer

Replies: 5 comments · 20 replies

Titus-von-Koeller Jun 3, 2024 Maintainer Author

Titus-von-Koeller Jun 5, 2024 Maintainer Author

Titus-von-Koeller Jul 17, 2024 Maintainer Author

Environment details:

Update:

Titus-von-Koeller Jul 17, 2024 Maintainer Author

Titus-von-Koeller Jul 17, 2024 Maintainer Author

Titus-von-Koeller
May 24, 2024
Maintainer

Replies: 5 comments 20 replies

Titus-von-Koeller Jun 3, 2024
Maintainer Author

Titus-von-Koeller Jun 5, 2024
Maintainer Author

Titus-von-Koeller Jul 17, 2024
Maintainer Author

Titus-von-Koeller Jul 17, 2024
Maintainer Author

Titus-von-Koeller Jul 17, 2024
Maintainer Author