CUDA out of memory due to huge memory allocation request #18767

pxanthopoulos · 2024-10-13T18:43:17Z

What happened?

I compiled GPT and tried to run it using iree-run-module. It errored with the following message:

iree/runtime/src/iree/hal/drivers/cuda/memory_pools.c:236: RESOURCE_EXHAUSTED; CUDA error 'CUDA_ERROR_OUT_OF_MEMORY' (2): out of memory; while invoking native function hal.device.queue.alloca; while calling import;
[ 1] bytecode module.tf2onnx$async:14026 [
    model.mlir:192:12,
    model.mlir:1511:13
  ]
[ 0] bytecode module.tf2onnx:66 model.mlir:2:3; invoking function 'tf2onnx'

Before the error message there is the statistics dump:

[[ iree_hal_allocator_t memory statistics ]]
  HOST_LOCAL:         1536B peak /            0B allocated /            0B freed /            0B live
DEVICE_LOCAL:    466140144B peak /         1216B allocated /         1216B freed /            0B live

And right before it, is the command that caused the error:

[module.tf2onnx$async+000036AE]    %r8 = vm.call @hal.device.queue.alloca(%r7(!hal.device/0x0x56299d139b70), %i262(4294967295), %r3(!hal.fence/0x0x56299dcc7050), %r5(!hal.fence/0x0x56299dcc7cd0), %i189(0), %i37(48), %i36(3075), %i290(4294964224))

If I understand the statistics correctly, this is not caused by my GPU not having enough memory, as the allocation is for ~4GB, the peak GPU usage so far was ~466MB and my GPU has 32GB of memory. So I went digging.

I inserted a print statement at iree/runtime/src/iree/hal/drivers/cuda/memory_pools.c:234, inside the function iree_hal_cuda_memory_pools_alloca(...) right before the call to cuMemAllocFromPoolAsync(...) (actually right before the wrapper IREE_CURESULT_TO_STATUS) to see how much memory the runtime tried to allocate (the allocation size was a 64-bit unsigned integer for my machine so i used the %zu format identifier).

...
  CUdeviceptr device_ptr = 0;
  printf("~~~~~~~~~~~~~ TRYING TO ALLOCATE %zu BYTES\n", allocation_size);
  iree_status_t status = IREE_CURESULT_TO_STATUS(
      pools->cuda_symbols,
      cuMemAllocFromPoolAsync(&device_ptr, (size_t)allocation_size, memory_pool,
                              stream),
      "cuMemAllocFromPoolAsync");
...

The amount printed was actually 18446744073709548544 bytes which explains the error. I traced the call chain back to the file iree/runtime/src/iree/modules/hal/module.c and specifically to the shim definition at lines 1096-1122. In that function, the argument to the allocation command is cast to the HAL device size. So, I added 2 print statements to see what the value of the argument was originally and what the value of the allocation size is, after the cast.

...
  iree_device_size_t allocation_size = iree_hal_cast_device_size(args->i7);

  const iree_hal_buffer_params_t params = {
      .type = memory_types,
      .usage = buffer_usage,
  };
  printf("\n~~~~~~~~~~~~~ CALLING iree_hal_device_queue_alloca TO ALLOCATE %zu BYTES\n", allocation_size);
  printf("~~~~~~~~~~~~~ PREVIOUS SIZE BEFORE CAST: %ld BYTES\n", args->i7);
  iree_hal_buffer_t* buffer = NULL;
  IREE_RETURN_IF_ERROR(iree_hal_device_queue_alloca(
      device, queue_affinity, iree_hal_fence_semaphore_list(wait_fence),
      iree_hal_fence_semaphore_list(signal_fence), pool, params,
      allocation_size, &buffer));
...

The size before the cast was -3072 (though compiler errors, I determined the right format identifier for the args->i7 variable). For int32_t, 4294964224 (which is the supposed allocation size, see trace) overflows to -3072 so I suspect an overflow is the cause of all this.

Thank you in advance.

Steps to reproduce your issue

Download the GPT onnx model
Either directly download it from https://drive.google.com/file/d/1w-TgnDylg43YUOQtffjo1SeRTCF_h7lE/view?usp=sharing
Or:
Export the pretrained huggingface TF model from openai-community/openai-gpt (view gist)
Convert it to ONNX using the tf2onnx package and the command python -m tf2onnx.convert --saved-model ./gpt-tf/ --output model.onnx --opset 17
**(pip install transformers tensorflow tf-keras tf2onnx)
Import it with the command iree-import-onnx model.onnx -o model.mlir (**pip install iree-compiler[onnx])
Compile it with the command iree-compile --iree-hal-target-backends=cuda --iree-cuda-target=sm_70 --dump-compilation-phases-to=./model-phases/ --iree-vm-target-index-bits=64 --iree-stream-resource-index-bits=64 model.mlir -o model.vmfb > output.mlir 2>&1
Run the compiled module with the command iree-run-module --trace_execution=true --print_statistics=true --device=cuda --module=model.vmfb --function=tf2onnx --input="1x4xsi32=1" --input="1x4xsi32=1" --input="1x4xsi32=1" > trace.txt 2>&1
View the trace file trace.txt

What component(s) does this issue relate to?

Runtime

Version information

Commit hash of iree: 1f3382d7305d7b2920fe7cb6072b07ca81945f28
Versions of pip packages used:

transformers: 4.45.2
tensorflow: 2.17.0
tf-keras: 2.17.0
tf2onnx: 1.16.1
iree-compiler[onnx]: 20240828.999

Additional context

Build environment and commands:

My system has a 80-thread Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz and a Tesla V100-PCIE-32GB GPU.

Im using the following docker file:

FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04

RUN apt-get update && apt-get upgrade -y 
RUN apt-get install -y --allow-change-held-packages libcudnn8 libcudnn8-dev libnccl-dev libnccl2
RUN apt-get install -y cmake ninja-build lld ccache python-is-python3 python3-pip git libtinfo5

RUN ccache --max-size=40G

RUN python -m pip install --upgrade pip

WORKDIR /workspace

COPY clang+llvm-18.1.8-x86_64-linux-gnu-ubuntu-18.04/ /usr/

ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib/x86_64-linux-gnu:${LD_LIBRARY_PATH}

To build the compiler and runtime Im using the following commands:

python -m pip install -r runtime/bindings/python/iree/runtime/build_requirements.txt

cmake -G Ninja -B ../iree-build/ -S . \
    -DCMAKE_BUILD_TYPE=Debug \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DCMAKE_C_COMPILER_LAUNCHER=ccache \
    -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
    \
    -DIREE_BUILD_DOCS=ON \
    \
    -DIREE_BUILD_BINDINGS_TFLITE=OFF \
    -DIREE_BUILD_BINDINGS_TFLITE_JAVA=OFF \
    \
    -DIREE_HAL_DRIVER_DEFAULTS=OFF \
    -DIREE_HAL_DRIVER_CUDA=ON \
    -DIREE_HAL_DRIVER_LOCAL_SYNC=ON \
    -DIREE_HAL_DRIVER_LOCAL_TASK=ON \
    \
    -DIREE_TARGET_BACKEND_DEFAULTS=OFF \
    -DIREE_TARGET_BACKEND_LLVM_CPU=ON \
    -DIREE_TARGET_BACKEND_CUDA=ON \
    \
    -DIREE_DEV_MODE=ON \
    -DIREE_ENABLE_ASSERTIONS=ON \
    -DIREE_ENABLE_SPLIT_DWARF=ON \
    -DIREE_ENABLE_LLD=ON

cmake --build ../iree-build/ -j 30

The text was updated successfully, but these errors were encountered:

stellaraccident · 2024-10-13T18:57:53Z

Thank you for the analysis. This is most likely an index/shape issue in the new onnx path and will need triage. It may be incomplete support for a negative index on an op that then gets multiplied through to what you are seeing.

Let me raise it to the appropriate folks.

PhaneeshB · 2024-10-15T22:02:51Z

I'm able to repro the error on cuda and
see the same failure with hip too on Mi250
Trace:

[module.tf2onnx$async+0000395A]    %r9 = vm.call @hal.device.queue.alloca(%r8(!hal.device/0x0x55d5a0d56930), %i266(4294967295), 
%r5(null), %r6(!hal.fence/0x0x55d5a06d7930), %i192(0), %i40(48), %i39(3075), %i292(4294964224))
---
[[ iree_hal_allocator_t memory statistics ]]
  HOST_LOCAL:         1536B peak /            0B allocated /            0B freed /            0B live
DEVICE_LOCAL:    466140272B peak /         1216B allocated /         1216B freed /            0B live
---
iree/runtime/src/iree/hal/drivers/hip/memory_pools.c:236: RESOURCE_EXHAUSTED; HIP driver error 'hipErrorOutOfMemory' (2): out of memory; while invoking native function hal.device.queue.alloca; while calling import;
[ 1] bytecode module.tf2onnx$async:14710 [
    model.mlir:192:12,
    model.mlir:1510:13,
    model.mlir:1498:13,
    model.mlir:1502:13,
    model.mlir:1499:13
  ]
[ 0] bytecode module.tf2onnx:66 model.mlir:2:3; invoking function 'tf2onnx'

On llvm-cpu there is a compilation crash at the final stage (hal -> vm):
backtrace:

iree-compile: /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2420: llvm::LogicalResult legalizeUnresolvedMaterialization(mlir::Rewrit
erBase &, (anonymous namespace)::UnresolvedMaterializationRewrite *): Assertion `newMaterialization.getType() == outputType && "materialization callback produced value of incorrect type"' failed.
Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.      Program arguments: ../iree-build/tools/iree-compile --iree-hal-target-backends=llvm-cpu --dump-compilation-phases-to=./model-phases-cpu/ --iree-stream-resource-index-bits
=64 model2.mlir -o modelcpu.vmfb
 #0 0x00007f019c6b4907 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /mnt/NOD/IREE/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00007f019c6b2b50 llvm::sys::RunSignalHandlers() /mnt/NOD/IREE/iree/third_party/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00007f019c6b4fca SignalHandler(int) /mnt/NOD/IREE/iree/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007f0195442520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520)
 #4 0x00007f01954969fc pthread_kill (/lib/x86_64-linux-gnu/libc.so.6+0x969fc)
 #5 0x00007f0195442476 gsignal (/lib/x86_64-linux-gnu/libc.so.6+0x42476)
 #6 0x00007f01954287f3 abort (/lib/x86_64-linux-gnu/libc.so.6+0x287f3)
 #7 0x00007f019542871b (/lib/x86_64-linux-gnu/libc.so.6+0x2871b)
 #8 0x00007f0195439e96 (/lib/x86_64-linux-gnu/libc.so.6+0x39e96)
 #9 0x00007f01a0b38315 mlir::InFlightDiagnostic& mlir::InFlightDiagnostic::append<char const (&) [53]>(char const (&) [53]) & /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/include/mlir/IR/Diagnostics.h:340:5
#10 0x00007f01a0b38315 mlir::InFlightDiagnostic&& mlir::InFlightDiagnostic::operator<<<char const (&) [53]>(char const (&) [53]) && /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/include/mlir/IR/Diagnostics.h:334:22
#11 0x00007f01a0b38315 legalizeUnresolvedMaterialization(mlir::RewriterBase&, (anonymous namespace)::UnresolvedMaterializationRewrite*) /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2427:29
#12 0x00007f01a0b38315 mlir::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>) /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:2494:18
#13 0x00007f01a0b3d70b mlir::applyPartialConversion(llvm::ArrayRef<mlir::Operation*>, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:3183:22
#14 0x00007f01a0b3d70b mlir::applyPartialConversion(mlir::Operation*, mlir::ConversionTarget const&, mlir::FrozenRewritePatternSet const&, mlir::ConversionConfig) /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/lib/Transforms/Utils/DialectConversion.cpp:3189:10
#15 0x00007f019e05b6e2 mlir::iree_compiler::IREE::VM::ConversionPass::runOnOperation() /mnt/NOD/IREE/iree/compiler/src/iree/compiler/Dialect/VM/Transforms/Conversion.cpp:159:16
#16 0x00007f019c862625 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7::operator()() const /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:17
#17 0x00007f019c862625 void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_7>(long) /mnt/NOD/IREE/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
#18 0x00007f019c862625 llvm::function_ref<void ()>::operator()() const /mnt/NOD/IREE/iree/third_party/llvm-project/llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
#19 0x00007f019c862625 void mlir::MLIRContext::executeAction<mlir::PassExecutionAction, mlir::Pass&>(llvm::function_ref<void ()>, llvm::ArrayRef<mlir::IRUnit>, mlir::Pass&) /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/include/mlir/IR/MLIRContext.h:275:7
#20 0x00007f019c862625 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:520:21
#21 0x00007f019c862d98 llvm::LogicalResult::failed() const /mnt/NOD/IREE/iree/third_party/llvm-project/llvm/include/llvm/Support/LogicalResult.h:43:43
#22 0x00007f019c862d98 llvm::failed(llvm::LogicalResult) /mnt/NOD/IREE/iree/third_party/llvm-project/llvm/include/llvm/Support/LogicalResult.h:71:58
#23 0x00007f019c862d98 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:592:9
#24 0x00007f019c865109 mlir::PassManager::run(mlir::Operation*) /mnt/NOD/IREE/iree/third_party/llvm-project/mlir/lib/Pass/Pass.cpp:0:0
#25 0x00007f019c608de0 llvm::LogicalResult::failed() const /mnt/NOD/IREE/iree/third_party/llvmproject/llvm/include/llvm/Support/LogicalResult.h:43:43
#26 0x00007f019c608de0 llvm::failed(llvm::LogicalResult) /mnt/NOD/IREE/iree/third_party/llvmproject/llvm/include/llvm/Support/LogicalResult.h:71:58
#27 0x00007f019c608de0 mlir::iree_compiler::embed::(anonymous namespace)::Invocation::runPipeline(iree_compiler_pipeline_t) /mnt/NOD/IREE/iree/compiler/src/iree/compiler/API/Internal/CompilerDriver.cpp:997:7
#28 0x00007f019c608de0 ireeCompilerInvocationPipeline /mnt/NOD/IREE/iree/compiler/src/iree/compiler/API/Internal/CompilerDriver.cpp:1432:23
#29 0x00007f019c828fc8 mlir::iree_compiler::runIreecMain(int, char**)::$_2::operator()(iree_compiler_source_t*) const /mnt/NOD/IREE/iree/compiler/src/iree/compiler/Tools/iree_compile_lib.cc:254:11
#30 0x00007f019c828801 mlir::iree_compiler::runIreecMain(int, char**) /mnt/NOD/IREE/iree/compiler/src/iree/compiler/Tools/iree_compile_lib.cc:0:10
#31 0x00007f0195429d90 (/lib/x86_64-linux-gnu/libc.so.6+0x29d90)
#32 0x00007f0195429e40 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e40)
#33 0x000055bd8c98f6b5 _start (../iree-build/tools/iree-compile+0x16b5)

CPU compilation failure may/not be related to the GPU runtime failures
working with @AmosLewis on getting a minimal repro to further debug this.

PhaneeshB · 2024-10-16T22:17:24Z

When trying to get a min repro for the above issue, I found that there is another issue just before, with onnx.Reshape

Turns out that onnx.Reshape with data having dynamic dims along with a -1 in the shape dim is causing an OUT OF RANGE error
and output of this Reshape is fed into the indices input of onnx.Gather which results in the above RESOURCE_EXHAUSTED error
Dealing with Reshape correctly may resolve the problem with Gather so documenting the steps to reproduce the error on Rocm (Mi250)

reproducer torch_onnx mlir :
const_fold_opt__2625 => -1

module {
  func.func @tf2onnx(%arg0: !torch.vtensor<[?,?],si32>) -> !torch.vtensor<[?,?],si32> attributes {torch.onnx_meta.ir_version = 8 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.opset_versions = {ai.onnx.ml = 2 : si64}, torch.onnx_meta.producer_name = "tf2onnx", torch.onnx_meta.producer_version = "1.16.1 15c810"} {
    %0 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<const_starts__778> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %1 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<const_fold_opt__2625> : tensor<1xsi32>} : () -> !torch.vtensor<[1],si32> 
    %2 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<const_ends__1662> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %3 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<const_axes__1378> : tensor<1xsi64>} : () -> !torch.vtensor<[1],si64> 
    %none = torch.constant.none
    %4 = torch.operator "onnx.Shape"(%arg0) : (!torch.vtensor<[?,?],si32>) -> !torch.vtensor<[2],si64> 
    %5 = torch.operator "onnx.Cast"(%4) {torch.onnx.to = 6 : si64} : (!torch.vtensor<[2],si64>) -> !torch.vtensor<[2],si32> 
    %6 = torch.operator "onnx.Slice"(%5, %2, %0, %3) : (!torch.vtensor<[2],si32>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>, !torch.vtensor<[1],si64>) -> !torch.vtensor<[1],si32> 
    %7 = torch.operator "onnx.Concat"(%1, %6) {torch.onnx.axis = 0 : si64} : (!torch.vtensor<[1],si32>, !torch.vtensor<[1],si32>) -> !torch.vtensor<[2],si32> 
    %8 = torch.operator "onnx.Cast"(%7) {torch.onnx.to = 7 : si64} : (!torch.vtensor<[2],si32>) -> !torch.vtensor<[2],si64> 
    %9 = torch.operator "onnx.Reshape"(%arg0, %8) : (!torch.vtensor<[?,?],si32>, !torch.vtensor<[2],si64>) -> !torch.vtensor<[?,?],si32> 
    return %9 : !torch.vtensor<[?,?],si32>
  }
}

{-#
  dialect_resources: {
    builtin: {
      const_starts__778: "0x080000000200000000000000",
      const_fold_opt__2625: "0x08000000FFFFFFFF",
      const_ends__1662: "0x080000000100000000000000",
      const_axes__1378: "0x080000000000000000000000"
    }
  }
#-}

with the command below the expected value of %8 => [-1, 4]

commands:

iree-compile  --iree-hal-target-backends=rocm  --iree-hip-target=gfx90a --dump-compilation-phases-to=./model-phases-rocm/ model.torch.onnx.mlir -o model.vmfb > output.mlir 2>&1

iree-run-module --trace_execution=true --print_statistics=true --device=hip://<UUID> --module=model.vmfb --function=tf2onnx --input="1x4xsi32=1"  > trace.txt 2>&1

Runtime Error:

iree/runtime/src/iree/hal/buffer.c:591: OUT_OF_RANGE; attempted to access an address outside of the valid buffer range (offset=0, adjusted_length=18446744073709551600, end=18446744073709551599, buffer byte_length=16); invalid subspan of an existing buffer (source_offset=0, length=18446744073709551600); while invoking native function hal.buffer_view.create; while calling import; 
[ 1] bytecode module.tf2onnx$async:1930 /home/pbarwari/NOD/SHARK-TestSuite/alt_e2eshark/test-run/mygpt4_trunc_Reshape_0/model.torch_onnx.mlir:14:5
[ 0] bytecode module.tf2onnx:62 /home/pbarwari/NOD/SHARK-TestSuite/alt_e2eshark/test-run/mygpt4_trunc_Reshape_0/model.torch_onnx.mlir:2:3; invoking function 'tf2onnx'

From the onnx graph this is the computation we are looking at

MaheshRavishankar · 2024-10-17T00:16:45Z

Can you please post the IR dump after all. For this model.

AmosLewis · 2024-10-17T05:09:05Z

%9 = torch.operator "onnx.Reshape"(%arg0, %8) : (!torch.vtensor<[?,?],si32>, !torch.vtensor<[2],si64>) -> !torch.vtensor<[?,?],si32>

will be lower to

%42 = torch.aten.reshape %arg0, %41 : !torch.vtensor<[?,?],si32>, !torch.list<int> -> !torch.vtensor<[?,?],si32>

@PhaneeshB Since the input data size for torch.aten.reshape is ‘?x?’, we need to figure out a way to covert ‘?’ to a positive value/real shape size(materialize -1 in the IR). And since aten.reshape is decompose to aten.view op, we need to add/debug this sematic either in ConvertAtenViewOp/ConvertAtenViewOpStrict/ConvertAtenViewOpToReshape in TorchToLinalg/DataMovement.cpp. Here is the previous work on view op:
llvm/torch-mlir#2470
llvm/torch-mlir#2567

Based on https://pytorch.org/docs/stable/generated/torch.reshape.html, a single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in input.

pxanthopoulos added the bug 🐞 Something isn't working label Oct 13, 2024

AmosLewis assigned PhaneeshB and AmosLewis Oct 15, 2024

AmosLewis mentioned this issue Oct 15, 2024

Truncated GPT model examples for iree debug nod-ai/SHARK-TestSuite#371

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory due to huge memory allocation request #18767

CUDA out of memory due to huge memory allocation request #18767

pxanthopoulos commented Oct 13, 2024

stellaraccident commented Oct 13, 2024

PhaneeshB commented Oct 15, 2024

PhaneeshB commented Oct 16, 2024 •

edited

Loading

MaheshRavishankar commented Oct 17, 2024

AmosLewis commented Oct 17, 2024 •

edited

Loading

CUDA out of memory due to huge memory allocation request #18767

CUDA out of memory due to huge memory allocation request #18767

Comments

pxanthopoulos commented Oct 13, 2024

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

stellaraccident commented Oct 13, 2024

PhaneeshB commented Oct 15, 2024

PhaneeshB commented Oct 16, 2024 • edited Loading

MaheshRavishankar commented Oct 17, 2024

AmosLewis commented Oct 17, 2024 • edited Loading

PhaneeshB commented Oct 16, 2024 •

edited

Loading

AmosLewis commented Oct 17, 2024 •

edited

Loading