Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirectMLNpuInference fails to run on the intel NPU #625

Open
Lucashien opened this issue Aug 14, 2024 · 5 comments
Open

DirectMLNpuInference fails to run on the intel NPU #625

Lucashien opened this issue Aug 14, 2024 · 5 comments

Comments

@Lucashien
Copy link

Lucashien commented Aug 14, 2024

I’m encountering issues when attempting to run DirectML inference on an Intel NPU.
Specifically, the sample code will use my GPU instead of targeting the NPU. Here’s the relevant code as below.
When I set the GUID to DXCORE_HARDWARE_TYPE_ATTRIBUTE_NPU, the application fails to find the NPU device, printing "No NPU device found."

ComPtr<IDXCoreAdapter> adapter;
if (factory)
{
    const GUID dxGUIDs[] = { DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE };
    ComPtr<IDXCoreAdapterList> adapterList;
    THROW_IF_FAILED(factory->CreateAdapterList(ARRAYSIZE(dxGUIDs), dxGUIDs, IID_PPV_ARGS(&adapterList)));
    for (uint32_t i = 0, adapterCount = adapterList->GetAdapterCount(); i < adapterCount; i++)
    {
        ComPtr<IDXCoreAdapter> currentGpuAdapter;
        THROW_IF_FAILED(adapterList->GetAdapter(static_cast<uint32_t>(i), IID_PPV_ARGS(&currentGpuAdapter)));

        if (!forceComputeOnlyDevice && !forceGenericMLDevice)
        {
            // No device restrictions
            adapter = std::move(currentGpuAdapter);
            break;
        }
        else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE))
        {
            adapter = std::move(currentGpuAdapter);
            break;
        }
        else if (forceGenericMLDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML))
        {
            adapter = std::move(currentGpuAdapter);
            break;
        }
    }
}

Here are the specifics of my hardware and software setup:

CPU: Intel(R) Core(TM) Ultra 9 185H
GPU: RTX 4060 Laptop
NPU: Intel(R) AI Boost
Driver Version: 32.0.100.2688
DirectX Version: 12

Nuget information:
image

Has anyone successfully run DirectML inference on an Intel NPU? If so, what steps were taken to properly configure the adapter and ensure the NPU was used?

Thank you for your assistance!

@WTian-Yu
Copy link

Hi I can run this code on Intel Ultra 7 155U
I've already update OS to 24H2 Dev channel, and install Windows 11 SDK(10.0.26100.0) in visual studio.

void InitializeDirectML(ID3D12Device1** d3dDeviceOut, ID3D12CommandQueue** commandQueueOut, IDMLDevice** dmlDeviceOut) {
    // Whether to skip adapters which support Graphics in order to target NPU for testing
    //bool forceComputeOnlyDevice = true;
    ComPtr<IDXCoreAdapterFactory> factory;
    HMODULE dxCoreModule = LoadLibraryW(L"DXCore.dll");
    if (dxCoreModule)
    {
        auto dxcoreCreateAdapterFactory = reinterpret_cast<HRESULT(WINAPI*)(REFIID, void**)>(
            GetProcAddress(dxCoreModule, "DXCoreCreateAdapterFactory")
            );
        if (dxcoreCreateAdapterFactory)
        {
            dxcoreCreateAdapterFactory(IID_PPV_ARGS(&factory));
        }
    }
    // Create the DXCore Adapter
    ComPtr<IDXCoreAdapter> adapter;
    if (factory)
    {
        const GUID dxGUIDs[] = { DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML };
        ComPtr<IDXCoreAdapterList> adapterList;
        THROW_IF_FAILED(factory->CreateAdapterList(ARRAYSIZE(dxGUIDs), dxGUIDs, IID_PPV_ARGS(&adapterList)));
        for (uint32_t i = 0, adapterCount = adapterList->GetAdapterCount(); i < adapterCount; i++)
        {
            ComPtr<IDXCoreAdapter> nextGpuAdapter;
            THROW_IF_FAILED(adapterList->GetAdapter(static_cast<uint32_t>(i), IID_PPV_ARGS(&nextGpuAdapter)));
            if (nextGpuAdapter->IsAttributeSupported(DXCORE_HARDWARE_TYPE_ATTRIBUTE_NPU))
            {
                adapter = std::move(nextGpuAdapter);
                break;
            }
        }
    }
    // Create the D3D12 Device
    ComPtr<ID3D12Device1> d3dDevice;
    if (adapter)
    {
        HMODULE d3d12Module = LoadLibraryW(L"d3d12.dll");
        if (d3d12Module)
        {
            auto d3d12CreateDevice = reinterpret_cast<HRESULT(WINAPI*)(IUnknown*, D3D_FEATURE_LEVEL, REFIID, void*)>(
                GetProcAddress(d3d12Module, "D3D12CreateDevice")
                );
            if (d3d12CreateDevice)
            {
                THROW_IF_FAILED(d3d12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_1_0_GENERIC, IID_PPV_ARGS(&d3dDevice)));
            }
        }
    }
    // Create the DML Device and D3D12 Command Queue
    ComPtr<IDMLDevice> dmlDevice;
    ComPtr<ID3D12CommandQueue> commandQueue;
    if (d3dDevice)
    {
        D3D12_COMMAND_QUEUE_DESC queueDesc = {};
        queueDesc.Type = D3D12_COMMAND_LIST_TYPE_COMPUTE;
        THROW_IF_FAILED(d3dDevice->CreateCommandQueue(
            &queueDesc,
            IID_PPV_ARGS(commandQueue.ReleaseAndGetAddressOf())));
        HMODULE dmlModule = LoadLibraryW(L"DirectML.dll");
        if (dmlModule)
        {
            auto dmlCreateDevice = reinterpret_cast<HRESULT(WINAPI*)(ID3D12Device*, DML_CREATE_DEVICE_FLAGS, DML_FEATURE_LEVEL, REFIID, void*)>(
                GetProcAddress(dmlModule, "DMLCreateDevice1")
                );
            if (dmlCreateDevice)
            {
                THROW_IF_FAILED(dmlCreateDevice(d3dDevice.Get(), DML_CREATE_DEVICE_FLAG_NONE, DML_FEATURE_LEVEL_5_0, IID_PPV_ARGS(dmlDevice.ReleaseAndGetAddressOf())));
            }
        }
    }

    d3dDevice.CopyTo(d3dDeviceOut);
    commandQueue.CopyTo(commandQueueOut);
    dmlDevice.CopyTo(dmlDeviceOut);
}

@Lucashien
Copy link
Author

Thanks for your experience. I will try to update my OS to Dev channel. Thank you

@xiaoweiChen
Copy link

Update to Windows 11 SDK(10.0.26100.0) would work for DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE not found on my side.

@kmaki565
Copy link

@Lucashien
I was able to make NPU run the model with the following changes:
image
In my case, the third adapter seems the NPU device (Intel AI Boost). Upgrading to Windows Insider was not necessary.

HW: ThinkPad X1 Carbon Gen 12, Intel(R) Core(TM) Ultra 7 155U
OS: Windows 11 23H2 (Build 22631.4037)

@idg10
Copy link

idg10 commented Oct 8, 2024

I'm on the older Intel NPU that is present in the Surface Laptop Studio 2. I believe it's a Movidius 3700VC. (Its PCI hardware id is ven_8086&dev_6240.)

Although I was able to force this example to use that device simply by adjusting the for loop so it starts at a higher offset, thus skipping past the various other devices the example would otherwise choose, I get a problem when I reach this line:

THROW_IF_FAILED(d3d12CreateDevice(adapter.Get(), D3D_FEATURE_LEVEL_1_0_CORE, IID_PPV_ARGS(&d3dDevice)));

I've added code to enable the D3D debug layer, and with that in place, I see this:

Exception thrown at 0x00007FF91BE76D9A in DirectMLNpuInference.exe: Microsoft C++ exception: _com_error at memory location 0x000000824F0FC310.
Exception thrown at 0x00007FF91BE76D9A in DirectMLNpuInference.exe: Microsoft C++ exception: SHASTA::Exception<D3D12::KMB::AdapterTraits,long> at memory location 0x000000824F0FC470.
D3D12: Removing Device.
D3D12 WARNING: ID3D12Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR: There is strong evidence that the driver has performed an undefined operation; but it may be because the application performed an illegal or undefined operation to begin with.). [ EXECUTION WARNING #233: DEVICE_REMOVAL_PROCESS_POSSIBLY_AT_FAULT]

Initially I was on v31.0.100.2016 of the NPU driver, which is what Windows Update installs. I found that the Intel NPU driver page lists newer versions, but the latest (32.0.100.2820) doesn't actually support this device. But 32.0.100.2408 does support the device, and I've been able to install that. (And apparently there is a package on Windows Update that includes this version but I couldn't work out how to get Windows to offer me that.)

But I still get the same error.

So I think there are two issues here:

  1. the logic in the device selection loop isn't quite right
  2. this example just doesn't work for the Intel NPU that's in a Surface Laptop Studio 2

I think 1 is down to this line here:

else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE))

That won't select a compute-only device. It will select any device that offers compute. On my laptop, every device (Intel(R) Iris(R) Xe Graphics, NVIDIA GeForce RTX 4060 Laptop GPU, Intel(R) NPU, and even the Microsoft Basic Render Driver software device).

I think that should probably be this:

else if (forceComputeOnlyDevice && currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_CORE_COMPUTE)
    && !currentGpuAdapter->IsAttributeSupported(DXCORE_ADAPTER_ATTRIBUTE_D3D12_GRAPHICS))

So this will match only if the device supports compute and it does not support graphics. That's what I'd expect "compute only device" to mean, and this does indeed reject all devices except for the Intel NPU.

But having fixed that, the code just doesn't seem to work. I know the Intel driver still reports DirectML support as "preview". Are there any examples anywhere that show successful DirectML use on the Intel NPU that's in the Surface Laptop Studio 2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants