CUDA_ERROR_NOT_INITIALIZED from reduce call #1842

jonasdelacour · 2024-09-12T19:41:18Z

I previously posted about 1 issue related to static default policies in the oneapi/dpl headers, which setting the environment variable ONEDPL_USE_PREDEFINED_POLICIES 0 fixed.

I have now obtained the same issue in a slightly different way:

#define ONEDPL_USE_PREDEFINED_POLICIES 0
#include <oneapi/dpl/algorithm>
#include <oneapi/dpl/execution>
#include <unistd.h>
#include <sys/wait.h>
#include <vector>

int main(){
    {
        pid_t pid = fork();
        sycl::queue Q(sycl::gpu_selector_v);
        auto policy = oneapi::dpl::execution::make_device_policy(Q);
        std::vector<int> vec = {1, 2, 3, 4, 5};
        oneapi::dpl::reduce(policy, vec.begin(), vec.end());
        waitpid(pid, NULL, 0);
    }

    {
        pid_t pid = fork();
        waitpid(pid, NULL, 0);
    }
    return 0;
}

This again produces the error:

UR CUDA ERROR:
        Value:           3
        Name:            CUDA_ERROR_NOT_INITIALIZED
        Description:     initialization error
        Function:        setContext
        Source Location: /tmp/tmp.HhqPmzG672/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/context.hpp:142

Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)

UR CUDA ERROR:
        Value:           3
        Name:            CUDA_ERROR_NOT_INITIALIZED
        Description:     initialization error
        Function:        setContext
        Source Location: /tmp/tmp.HhqPmzG672/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/context.hpp:142

Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)

Without the code in the second set of braces the program no longer produces any runtime errors. Crucially if no call to oneapi::dpl::reduce is made the program produces no runtime errors. This suggests to me that this function somehow leaks shared pointers to a sycl::queue or something similar that would prevent the release of the underlying CUDA context.

ICPX version:

Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1 (2024.2.1.20240711)

compilation command:

icpx minimum_crash.cc -fsycl -fsycl-targets=nvptx64-nvidia-cuda

The text was updated successfully, but these errors were encountered:

mmichel11 · 2024-09-12T21:06:28Z

Hi, @jonasdelacour

I was able to reproduce your issue with the CUDA backend. I was also able to reproduce it independent of oneDPL using just pure SYCL code:

#include <unistd.h>
#include <sys/wait.h>
#include <vector>
#include <sycl/sycl.hpp>

int main(){
    {
        pid_t pid = fork();
        sycl::queue Q(sycl::gpu_selector_v);
        std::vector<int> vec = {1, 2, 3, 4, 5};
        {
            sycl::buffer<int> buf(vec.data(), vec.size());
            Q.submit([&](sycl::handler& cgh) {
                        auto acc = buf.get_access<sycl::access::mode::read_write>(cgh);
                        cgh.parallel_for(sycl::range<1>(vec.size()), [=](sycl::item<1> it) {
                            acc[it] += 1;
                        });
                    });
        }
        waitpid(pid, NULL, 0);
    }

    {
        pid_t pid = fork();
        waitpid(pid, NULL, 0);
    }
    return 0;
}

UR CUDA ERROR:
        Value:           3
        Name:            CUDA_ERROR_NOT_INITIALIZED
        Description:     initialization error
        Function:        setContext
        Source Location: /tmp/tmp.IpEAV9Rdzp/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/context.hpp:142

Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)

UR CUDA ERROR:
        Value:           3
        Name:            CUDA_ERROR_NOT_INITIALIZED
        Description:     initialization error
        Function:        setContext
        Source Location: /tmp/tmp.IpEAV9Rdzp/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/context.hpp:142

Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES)

This seems to be a general problem with the SYCL CUDA backend as opposed to an issue within oneDPL, so I would recommend filing an issue to https://github.com/intel/llvm.

jonasdelacour added the bug label Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA_ERROR_NOT_INITIALIZED from reduce call #1842

CUDA_ERROR_NOT_INITIALIZED from reduce call #1842

jonasdelacour commented Sep 12, 2024 •

edited

Loading

mmichel11 commented Sep 12, 2024

CUDA_ERROR_NOT_INITIALIZED from reduce call #1842

CUDA_ERROR_NOT_INITIALIZED from reduce call #1842

Comments

jonasdelacour commented Sep 12, 2024 • edited Loading

mmichel11 commented Sep 12, 2024

jonasdelacour commented Sep 12, 2024 •

edited

Loading