[BUG]: RTX 4080 - crash while loading Vulkan #886

SeriousOldMan · 2024-08-01T16:46:10Z

Description

System: Win 11, RTX 4080 with abolute latest Nvidia driver

No CUDA installed in system
cuda 11, cuda 12 and vulkan all installed in runtimes folder
When starting up and loading model, an exception is thrown while initializing Vulkan
Error occurs also, when cuda 11 and cuda 12 are not present

This error is gone, once CUDA is installed from the Nvidia site, but I wanted to use Vulkan, since this does not require to install additional drivers from Nvidia (in theory).

Reproduction Steps

See above.

Code:

using LLama.Common;
using LLama;
using System.Globalization;
using System.Text;

namespace LLMRuntime;

public class LLMExecutor
{
    string ModelPath;
    double Temperature;
    int MaxTokens;
    int GPULayers;

    ModelParams Parameters;
    LLamaWeights Model;
    InteractiveExecutor Executor;

    public LLMExecutor(string modelPath, double temperature, int maxTokens, int gpuLayers)
    {
        ModelPath = modelPath;
        Temperature = temperature;
        MaxTokens = maxTokens;
        GPULayers = gpuLayers;

        Parameters = new ModelParams(modelPath)
        {
            ContextSize = 32768,
            GpuLayerCount = gpuLayers 
        };
        Model = LLamaWeights.LoadFromFile(Parameters);
        Executor = new InteractiveExecutor(Model.CreateContext(Parameters));
    }

    public string ParsePrompt(ChatHistory chatHistory, string prompt)
    {
        void addMessage(AuthorRole role, string message)
        {
            if (role != AuthorRole.Unknown)
                chatHistory.AddMessage(role, message);
        }

        AuthorRole role = AuthorRole.Unknown;
        string message = "";

        foreach (string line in prompt.Split(new string[] { Environment.NewLine, "\n" }, StringSplitOptions.None))
        {
            string input = line.Trim();

            if (input.StartsWith("<|###"))
            {
                addMessage(role, message);

                message = "";

                if (input == "<|### System ###|>")
                    role = AuthorRole.System;
                else if (input == "<|### Assistant ###|>")
                    role = AuthorRole.Assistant;
                else if (input == "<|### User ###|>")
                    role = AuthorRole.User;
            }
            else
                message += (input + Environment.NewLine);
        }

        return (role == AuthorRole.User) ? message : "";
    }

    public async Task<string> AskAsync(string prompt)
    {
        // Add chat histories as prompt to tell AI how to act.
        var chatHistory = new ChatHistory();

        string userInput = ParsePrompt(chatHistory, prompt);

        ChatSession session = new(Executor, chatHistory);

        InferenceParams inferenceParams = new InferenceParams()
        {
            MaxTokens = MaxTokens,
            AntiPrompts = new List<string> { "User:" }
        };

        string result = "";

        await foreach (
            var text
            in session.ChatAsync(
                new ChatHistory.Message(AuthorRole.User, userInput),
                inferenceParams))
            result += text;

        return result;
    }

    public string Ask(string prompt)
    {

        return AskAsync(prompt).Result;
    }
}

static class Program
{
    static string WaitForPrompt(string fileName)
    {
        while (true)
        {
            if (File.Exists(fileName))
            {
                StreamReader promptStream = new StreamReader(fileName);

                string prompt = promptStream.ReadToEnd();

                promptStream.Close();

                File.Delete(fileName);

                return prompt;
            }

            Thread.Sleep(100);
        }
    }

    [STAThread]
    static void Main(string[] args)
    {
        Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US");

        try
        {
            LLMExecutor executor = new LLMExecutor(args[2],
                                                   (args.Length > 3) ? Double.Parse(args[3]) : 0.5,
                                                   (args.Length > 4) ? int.Parse(args[4]) : 2048,
                                                   (args.Length > 5) ? int.Parse(args[5]) : 0);

            while (true)
            {
                string prompt = WaitForPrompt(args[0]);

                if (prompt.Trim() == "Exit")
                    break;

                try
                {
                    string answer = executor.Ask(prompt);
                    StreamWriter outStream = new StreamWriter(args[1], false, Encoding.Unicode);

                    outStream.Write(answer);
                    outStream.Flush();

                    outStream.Close();
                }
                catch (Exception e)
                {
                    StreamWriter outStream = new StreamWriter(args[1], false, Encoding.Unicode);

                    outStream.Write("Error");
                    outStream.Flush();

                    outStream.Close();
                }
            }
        }
        catch (Exception e)
        {
            System.Environment.Exit(1);
        }
    }
}

Environment & Configuration

See description...

Known Workarounds

Install Cuda

m0nsky · 2024-08-02T08:10:26Z

What is the error?

SeriousOldMan · 2024-08-02T15:18:34Z

What is the error?

Hi, it is the same error, as described in issue #887 (opened just after mine report):

ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce RTX 4080 (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:

at LLama.Native.SafeLlamaModelHandle.llama_load_model_from_file(System.String, LLama.Native.LLamaModelParams)

at LLama.Native.SafeLlamaModelHandle.LoadFromFile(System.String, LLama.Native.LLamaModelParams)
at LLama.LLamaWeights.LoadFromFile(LLama.Abstractions.IModelParams)
at LLMRuntime.LLMExecutor..ctor(System.String, Double, Int32, Int32)
at LLMRuntime.Program.Main(System.String[])

SeriousOldMan · 2024-08-02T15:19:55Z

By the way, it is independent of the context size in the ModelParameters. I tried different values, Vulkan always crashes. CUDA is fine.

m0nsky · 2024-08-06T06:26:51Z

Seems like this is a llama.cpp issue, not a LLamaSharp issue.

Not sure why installing CUDA impacts it, though. Are you sure it was not a coincidence? It seems like the native library loader is correctly selecting the Vulkan backend and it's throwing an error on the llama.cpp side.

LSXAxeller · 2024-08-06T10:28:31Z

ah yes..I submitted an issue on llama.cpp after testing their binaries a few days ago, looks like upstream issue, I thought it was just problem with AMD but it looks like a general issue with the Vulkan backend

SeriousOldMan · 2024-08-06T12:15:43Z

Seems like this is a llama.cpp issue, not a LLamaSharp issue.

Not sure why installing CUDA impacts it, though. Are you sure it was not a coincidence? It seems like the native library loader is correctly selecting the Vulkan backend and it's throwing an error on the llama.cpp side.

It may be that the installation of CUDA fixed it, because then CUDA is selected first and Vulkan is never touched. By the way, it fixes it only, if also the CUDA driver by Nvidia has been installed, otherwise CUDA is skipped and Vulkan is the next try. Seems reasonable...

GalactixGod · 2024-08-18T11:37:17Z

I had this issue using Vulkan as well. I don't have GPU on that device but it would always crash.

Randomly, setting the GPULayer count = 1 got it running for me.
0 crashes, 2 crashes
Both with the same error you are reporting about an attempt to read/write to protected memory.
YMMV

LSXAxeller · 2024-08-18T14:32:35Z

it's problem of driver crash during device initialization, some external program get's hooked to vulkan driver, in my case it was Mirillis's Action! game recorder, after uninstalling it everything ran fine, return to my llama.cpp issue user 0cc4m guided me to fix

SeriousOldMan mentioned this issue Aug 2, 2024

[BUG]: Vulkan backend crash on model loading #887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: RTX 4080 - crash while loading Vulkan #886

[BUG]: RTX 4080 - crash while loading Vulkan #886

SeriousOldMan commented Aug 1, 2024 •

edited by martindevans

Loading

m0nsky commented Aug 2, 2024

SeriousOldMan commented Aug 2, 2024 •

edited

Loading

SeriousOldMan commented Aug 2, 2024

m0nsky commented Aug 6, 2024

LSXAxeller commented Aug 6, 2024

SeriousOldMan commented Aug 6, 2024

GalactixGod commented Aug 18, 2024

LSXAxeller commented Aug 18, 2024

[BUG]: RTX 4080 - crash while loading Vulkan #886

[BUG]: RTX 4080 - crash while loading Vulkan #886

Comments

SeriousOldMan commented Aug 1, 2024 • edited by martindevans Loading

Description

Reproduction Steps

Environment & Configuration

Known Workarounds

m0nsky commented Aug 2, 2024

SeriousOldMan commented Aug 2, 2024 • edited Loading

SeriousOldMan commented Aug 2, 2024

m0nsky commented Aug 6, 2024

LSXAxeller commented Aug 6, 2024

SeriousOldMan commented Aug 6, 2024

GalactixGod commented Aug 18, 2024

LSXAxeller commented Aug 18, 2024

SeriousOldMan commented Aug 1, 2024 •

edited by martindevans

Loading

SeriousOldMan commented Aug 2, 2024 •

edited

Loading