CUDA out of memory #7

Tronic · 2023-04-20T22:36:57Z

Trying to load medium or large model, I get out of memory errors. Loading small with float16 precision works but takes all my 24 GB VRAM. Is there any way to limit Jax memory usage? The OpenAI model is far more modest in its requirements. Reducing the model weights to float16 should be a good idea too.

sanchit-gandhi · 2023-04-21T11:47:42Z

See related: huggingface/transformers#22224

sanchit-gandhi · 2023-04-21T17:45:39Z

You can also convert the parameters to float16/bfloat16 as follows:

# for fp16
pipeline.params = pipeline.model.to_fp16(pipeline.params)
# for bf16
pipeline.params = pipeline.model.to_bf16(pipeline.params)

arnavmehta7 · 2023-04-21T18:00:14Z

@sanchit-gandhi It is a bit concerning that it can take up to 30+ gbs of GPU memory during batch inference. How much batch size will be ideal to keep usage low? Like under 12gb VRAM

seboslaw · 2023-04-24T11:08:13Z

I tried running the medium model on a T4 colab instance. Took 14mins to transcribe a 10min audio. Is this due to the memory constraints and the model paging out? Or is it running on the CPU altogether?

themanyone · 2023-04-27T12:18:54Z

I get this error after updating the video card drivers or kernel and forgetting to reboot afterwards. You can use GreenWithEnvy (gwe), available in most distro repos, to profile Nvidia cards and see what, if anything, is going on there. Update: gwe seems like a bloated version of nvidia-smi, which comes with the video drivers already, so just use that.

sanchit-gandhi · 2023-05-02T09:00:56Z

Note that the phenomenon of JAX using 90% of your GPU memory just to load the model is due to JAX's GPU memory allocation: https://jax.readthedocs.io/en/latest/gpu_memory_allocation.html

JAX doesn't actually require all of this memory, but blocks it out to prevent fragmentation.

If you want to disable this, you can do so with the global var XLA_PYTHON_CLIENT_PREALLOCATE:

XLA_PYTHON_CLIENT_PREALLOCATE=false python run_benchmark.py

A more reliable way of monitoring your JAX memory is jax-smi: https://github.com/ayaka14732/jax-smi

Still working on figuring out how we can load the large-v2 checkpoint on a 16 GB T4 GPU!

ahxxm mentioned this issue Apr 23, 2023

Recreate Benchmarks on A100 #5

Open

tdzz1102 mentioned this issue Oct 28, 2023

Out of vram and reboot #153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory #7

CUDA out of memory #7

Tronic commented Apr 20, 2023

sanchit-gandhi commented Apr 21, 2023

sanchit-gandhi commented Apr 21, 2023 •

edited

Loading

arnavmehta7 commented Apr 21, 2023 •

edited

Loading

seboslaw commented Apr 24, 2023

themanyone commented Apr 27, 2023 •

edited

Loading

sanchit-gandhi commented May 2, 2023

CUDA out of memory #7

CUDA out of memory #7

Comments

Tronic commented Apr 20, 2023

sanchit-gandhi commented Apr 21, 2023

sanchit-gandhi commented Apr 21, 2023 • edited Loading

arnavmehta7 commented Apr 21, 2023 • edited Loading

seboslaw commented Apr 24, 2023

themanyone commented Apr 27, 2023 • edited Loading

sanchit-gandhi commented May 2, 2023

sanchit-gandhi commented Apr 21, 2023 •

edited

Loading

arnavmehta7 commented Apr 21, 2023 •

edited

Loading

themanyone commented Apr 27, 2023 •

edited

Loading