DirectMLNpuInference fails to run on the intel NPU #625

Lucashien · 2024-08-14T02:43:43Z

I’m encountering issues when attempting to run DirectML inference on an Intel NPU.
Specifically, the sample code will use my GPU instead of targeting the NPU. Here’s the relevant code as below.
When I set the GUID to DXCORE_HARDWARE_TYPE_ATTRIBUTE_NPU, the application fails to find the NPU device, printing "No NPU device found."

ComPtr<IDXCoreAdapter> adapter;
if (factory)
{
    const GUID dxGUIDs[] = { DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE };
    ComPtr<IDXCoreAdapterList> adapterList;
    THROW_IF_FAILED(factory->CreateAdapterList(ARRAYSIZE(dxGUIDs), dxGUIDs, IID_PPV_ARGS(&adapterList)));
    for (uint32_t i = 0, adapterCount = adapterList->GetAdapterCount(); i < adapterCount; i++)
    {
        ComPtr<IDXCoreAdapter> currentGpuAdapter;
        THROW_IF_FAILED(adapterList->GetAdapter(static_cast<uint32_t>(i), IID_PPV_ARGS(&currentGpuAdapter)));

        if (!forceComputeOnlyDevice && !forceGenericMLDevice)
        {
            // No device restrictions
            adapter = std::move(currentGpuAdapter);
            break;
        }
        else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE))
        {
            adapter = std::move(currentGpuAdapter);
            break;
        }
        else if (forceGenericMLDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML))
        {
            adapter = std::move(currentGpuAdapter);
            break;
        }
    }
}

Here are the specifics of my hardware and software setup:

CPU: Intel(R) Core(TM) Ultra 9 185H
GPU: RTX 4060 Laptop
NPU: Intel(R) AI Boost
Driver Version: 32.0.100.2688
DirectX Version: 12

Nuget information:

Has anyone successfully run DirectML inference on an Intel NPU? If so, what steps were taken to properly configure the adapter and ensure the NPU was used?

Thank you for your assistance!

The text was updated successfully, but these errors were encountered:

WTian-Yu · 2024-08-19T08:21:39Z

Hi I can run this code on Intel Ultra 7 155U
I've already update OS to 24H2 Dev channel, and install Windows 11 SDK(10.0.26100.0) in visual studio.

void InitializeDirectML(ID3D12Device1** d3dDeviceOut, ID3D12CommandQueue** commandQueueOut, IDMLDevice** dmlDeviceOut) {
    // Whether to skip adapters which support Graphics in order to target NPU for testing
    //bool forceComputeOnlyDevice = true;
    ComPtr<IDXCoreAdapterFactory> factory;
    HMODULE dxCoreModule = LoadLibraryW(L"DXCore.dll");
    if (dxCoreModule)
    {
        auto dxcoreCreateAdapterFactory = reinterpret_cast<HRESULT(WINAPI*)(REFIID, void**)>(
            GetProcAddress(dxCoreModule, "DXCoreCreateAdapterFactory")
            );
        if (dxcoreCreateAdapterFactory)
        {
            dxcoreCreateAdapterFactory(IID_PPV_ARGS(&factory));
        }
    }
    // Create the DXCore Adapter
    ComPtr<IDXCoreAdapter> adapter;
    if (factory)
    {
        const GUID dxGUIDs[] = { DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML };
        ComPtr<IDXCoreAdapterList> adapterList;
        THROW_IF_FAILED(factory->CreateAdapterList(ARRAYSIZE(dxGUIDs), dxGUIDs, IID_PPV_ARGS(&adapterList)));
        for (uint32_t i = 0, adapterCount = adapterList->GetAdapterCount(); i < adapterCount; i++)
        {
            ComPtr<IDXCoreAdapter> nextGpuAdapter;
            THROW_IF_FAILED(adapterList->GetAdapter(static_cast<uint32_t>(i), IID_PPV_ARGS(&nextGpuAdapter)));
            if (nextGpuAdapter->IsAttributeSupported(DXCORE_HARDWARE_TYPE_ATTRIBUTE_NPU))
            {
                adapter = std::move(nextGpuAdapter);
                break;
            }
        }
    }
    // Create the D3D12 Device
    ComPtr<ID3D12Device1> d3dDevice;
    if (adapter)
    {
        HMODULE d3d12Module = LoadLibraryW(L"d3d12.dll");
        if (d3d12Module)
        {
            auto d3d12CreateDevice = reinterpret_cast<HRESULT(WINAPI*)(IUnknown*, D3D_FEATURE_LEVEL, REFIID, void*)>(
                GetProcAddress(d3d12Module, "D3D12CreateDevice")
                );
            if (d3d12CreateDevice)
            {
                THROW_IF_FAILED(d3d12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_1_0_GENERIC, IID_PPV_ARGS(&d3dDevice)));
            }
        }
    }
    // Create the DML Device and D3D12 Command Queue
    ComPtr<IDMLDevice> dmlDevice;
    ComPtr<ID3D12CommandQueue> commandQueue;
    if (d3dDevice)
    {
        D3D12_COMMAND_QUEUE_DESC queueDesc = {};
        queueDesc.Type = D3D12_COMMAND_LIST_TYPE_COMPUTE;
        THROW_IF_FAILED(d3dDevice->CreateCommandQueue(
            &queueDesc,
            IID_PPV_ARGS(commandQueue.ReleaseAndGetAddressOf())));
        HMODULE dmlModule = LoadLibraryW(L"DirectML.dll");
        if (dmlModule)
        {
            auto dmlCreateDevice = reinterpret_cast<HRESULT(WINAPI*)(ID3D12Device*, DML_CREATE_DEVICE_FLAGS, DML_FEATURE_LEVEL, REFIID, void*)>(
                GetProcAddress(dmlModule, "DMLCreateDevice1")
                );
            if (dmlCreateDevice)
            {
                THROW_IF_FAILED(dmlCreateDevice(d3dDevice.Get(), DML_CREATE_DEVICE_FLAG_NONE, DML_FEATURE_LEVEL_5_0, IID_PPV_ARGS(dmlDevice.ReleaseAndGetAddressOf())));
            }
        }
    }

    d3dDevice.CopyTo(d3dDeviceOut);
    commandQueue.CopyTo(commandQueueOut);
    dmlDevice.CopyTo(dmlDeviceOut);
}

Lucashien · 2024-08-20T03:12:34Z

Thanks for your experience. I will try to update my OS to Dev channel. Thank you

xiaoweiChen · 2024-08-28T02:26:10Z

Update to Windows 11 SDK(10.0.26100.0) would work for DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE not found on my side.

kmaki565 · 2024-08-31T13:04:57Z

@Lucashien
I was able to make NPU run the model with the following changes:

In my case, the third adapter seems the NPU device (Intel AI Boost). Upgrading to Windows Insider was not necessary.

HW: ThinkPad X1 Carbon Gen 12, Intel(R) Core(TM) Ultra 7 155U
OS: Windows 11 23H2 (Build 22631.4037)

idg10 · 2024-10-08T09:07:07Z

I'm on the older Intel NPU that is present in the Surface Laptop Studio 2. I believe it's a Movidius 3700VC. (Its PCI hardware id is ven_8086&dev_6240.)

Although I was able to force this example to use that device simply by adjusting the for loop so it starts at a higher offset, thus skipping past the various other devices the example would otherwise choose, I get a problem when I reach this line:

THROW_IF_FAILED(d3d12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_1_0_CORE, IID_PPV_ARGS(&d3dDevice)));

I've added code to enable the D3D debug layer, and with that in place, I see this:

Exception thrown at 0x00007FF91BE76D9A in DirectMLNpuInference.exe: Microsoft C++ exception: _com_error at memory location 0x000000824F0FC310.
Exception thrown at 0x00007FF91BE76D9A in DirectMLNpuInference.exe: Microsoft C++ exception: SHASTA::Exception<D3D12::KMB::AdapterTraits,long> at memory location 0x000000824F0FC470.
D3D12: Removing Device.
D3D12 WARNING: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR: There is strong evidence that the driver has performed an undefined operation; but it may be because the application performed an illegal or undefined operation to begin with.). [ EXECUTION WARNING #233: DEVICE_REMOVAL_PROCESS_POSSIBLY_AT_FAULT]

Initially I was on v31.0.100.2016 of the NPU driver, which is what Windows Update installs. I found that the Intel NPU driver page lists newer versions, but the latest (32.0.100.2820) doesn't actually support this device. But 32.0.100.2408 does support the device, and I've been able to install that. (And apparently there is a package on Windows Update that includes this version but I couldn't work out how to get Windows to offer me that.)

But I still get the same error.

So I think there are two issues here:

the logic in the device selection loop isn't quite right
this example just doesn't work for the Intel NPU that's in a Surface Laptop Studio 2

I think 1 is down to this line here:

else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE))

That won't select a compute-only device. It will select any device that offers compute. On my laptop, every device (Intel(R) Iris(R) Xe Graphics, NVIDIA GeForce RTX 4060 Laptop GPU, Intel(R) NPU, and even the Microsoft Basic Render Driver software device).

I think that should probably be this:

else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE)
    && !currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_GRAPHICS))

So this will match only if the device supports compute and it does not support graphics. That's what I'd expect "compute only device" to mean, and this does indeed reject all devices except for the Intel NPU.

But having fixed that, the code just doesn't seem to work. I know the Intel driver still reports DirectML support as "preview". Are there any examples anywhere that show successful DirectML use on the Intel NPU that's in the Surface Laptop Studio 2?

xiaoweiChen mentioned this issue Aug 28, 2024

DirectMLNpuInference fails to run on the ARM64 NPU #640

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DirectMLNpuInference fails to run on the intel NPU #625

DirectMLNpuInference fails to run on the intel NPU #625

Lucashien commented Aug 14, 2024 •

edited

Loading

WTian-Yu commented Aug 19, 2024

Lucashien commented Aug 20, 2024

xiaoweiChen commented Aug 28, 2024

kmaki565 commented Aug 31, 2024

idg10 commented Oct 8, 2024

DirectMLNpuInference fails to run on the intel NPU #625

DirectMLNpuInference fails to run on the intel NPU #625

Comments

Lucashien commented Aug 14, 2024 • edited Loading

WTian-Yu commented Aug 19, 2024

Lucashien commented Aug 20, 2024

xiaoweiChen commented Aug 28, 2024

kmaki565 commented Aug 31, 2024

idg10 commented Oct 8, 2024

Lucashien commented Aug 14, 2024 •

edited

Loading