-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory due to huge memory allocation request #18767
Comments
Thank you for the analysis. This is most likely an index/shape issue in the new onnx path and will need triage. It may be incomplete support for a negative index on an op that then gets multiplied through to what you are seeing. Let me raise it to the appropriate folks. |
I'm able to repro the error on
On
CPU compilation failure may/not be related to the GPU runtime failures |
Can you please post the IR dump after all. For this model. |
will be lower to
@PhaneeshB Since the input data size for torch.aten.reshape is ‘?x?’, we need to figure out a way to covert ‘?’ to a positive value/real shape size(materialize -1 in the IR). And since aten.reshape is decompose to aten.view op, we need to add/debug this sematic either in ConvertAtenViewOp/ConvertAtenViewOpStrict/ConvertAtenViewOpToReshape in TorchToLinalg/DataMovement.cpp. Here is the previous work on view op: Based on https://pytorch.org/docs/stable/generated/torch.reshape.html, a single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in input. |
What happened?
I compiled GPT and tried to run it using
iree-run-module
. It errored with the following message:Before the error message there is the statistics dump:
And right before it, is the command that caused the error:
If I understand the statistics correctly, this is not caused by my GPU not having enough memory, as the allocation is for ~4GB, the peak GPU usage so far was ~466MB and my GPU has 32GB of memory. So I went digging.
I inserted a print statement at iree/runtime/src/iree/hal/drivers/cuda/memory_pools.c:234, inside the function
iree_hal_cuda_memory_pools_alloca(...)
right before the call tocuMemAllocFromPoolAsync(...)
(actually right before the wrapperIREE_CURESULT_TO_STATUS
) to see how much memory the runtime tried to allocate (the allocation size was a 64-bit unsigned integer for my machine so i used the %zu format identifier).The amount printed was actually 18446744073709548544 bytes which explains the error. I traced the call chain back to the file
iree/runtime/src/iree/modules/hal/module.c
and specifically to the shim definition at lines 1096-1122. In that function, the argument to the allocation command is cast to the HAL device size. So, I added 2 print statements to see what the value of the argument was originally and what the value of the allocation size is, after the cast.The size before the cast was -3072 (though compiler errors, I determined the right format identifier for the
args->i7
variable). For int32_t, 4294964224 (which is the supposed allocation size, see trace) overflows to -3072 so I suspect an overflow is the cause of all this.Thank you in advance.
Steps to reproduce your issue
Download the GPT onnx model
Either directly download it from https://drive.google.com/file/d/1w-TgnDylg43YUOQtffjo1SeRTCF_h7lE/view?usp=sharing
Or:
Export the pretrained huggingface TF model from
openai-community/openai-gpt
(view gist)Convert it to ONNX using the tf2onnx package and the command
python -m tf2onnx.convert --saved-model ./gpt-tf/ --output model.onnx --opset 17
**(
pip install transformers tensorflow tf-keras tf2onnx
)Import it with the command
iree-import-onnx model.onnx -o model.mlir
(**pip install iree-compiler[onnx]
)Compile it with the command
iree-compile --iree-hal-target-backends=cuda --iree-cuda-target=sm_70 --dump-compilation-phases-to=./model-phases/ --iree-vm-target-index-bits=64 --iree-stream-resource-index-bits=64 model.mlir -o model.vmfb > output.mlir 2>&1
Run the compiled module with the command
iree-run-module --trace_execution=true --print_statistics=true --device=cuda --module=model.vmfb --function=tf2onnx --input="1x4xsi32=1" --input="1x4xsi32=1" --input="1x4xsi32=1" > trace.txt 2>&1
View the trace file
trace.txt
What component(s) does this issue relate to?
Runtime
Version information
Commit hash of iree:
1f3382d7305d7b2920fe7cb6072b07ca81945f28
Versions of pip packages used:
Additional context
Build environment and commands:
My system has a 80-thread Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz and a Tesla V100-PCIE-32GB GPU.
Im using the following docker file:
To build the compiler and runtime Im using the following commands:
python -m pip install -r runtime/bindings/python/iree/runtime/build_requirements.txt
cmake --build ../iree-build/ -j 30
The text was updated successfully, but these errors were encountered: