GPTQ Env vars: catch correct type of error #596

ssmi153 · 2023-07-12T15:27:16Z

What does this PR do?

When passing in environment variables like gptq_bits, we still get errors thrown from TGI because the try/catch block is catching the wrong type of error. This PR aims to fix that.

@Narsil - let me know if this is how you want this formatted. My Python is a little shaky, so I hope this syntax is correct.

…erence

Narsil · 2023-07-12T17:06:42Z

Can you provide an example of model where this error is triggered instead ?I 'm very surprised that this error could be raised.

ssmi153 · 2023-07-12T17:30:38Z

This is the line the exception is being thrown from: https://github.com/huggingface/text-generation-inference/blob/f2f0289fb99c7caab0c3749fdf211e4d5ab2938b/server/text_generation_server/utils/weights.py#L49C22-L49C22

Following through the code this is what I've pieced together:
We call bits = self.get_tensor("gptq_bits")
get_tensor() calls self.get_filename(tensor_name)
get_filename() calls self.routing.get() for this tensor, which will return None as the tensor doesn't exist
That leads get_filename() to raise RuntimeError(f"weight {tensor_name} does not exist") => we end up with a RuntimeError

Happy to be proven wrong here!

... I'll find a model in in a sec that triggers this and put the link in here ...

Narsil · 2023-07-12T17:42:15Z

100% !

Catching and reraising at its best...

Narsil · 2023-07-12T17:57:44Z

Merging, tests are red because you don't have access to our secrets.

Narsil · 2023-07-12T17:57:57Z

@olivier FYI

ssmi153 · 2023-07-12T18:10:37Z

Quick update, I've got a model in a private repo that I quantized with GPTQ-for-Llama that triggers this reliably. That's here: https://huggingface.co/ssmi153/student-feedback-llama-30b-guanaco-2023-07-03-final-gptq and I can DM over a HuggingFace token to give you access to it (probably shouldn't put that out in public though!).

I've been trying to find other public GPTQ model files that do it too, but for all of TheBloke's conversions I run into another issue:

2023-07-12T10:57:26.137426358-07:00 {"timestamp":"2023-07-12T17:57:26.137143Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n File \"/opt/conda/lib/python3.9/site-packages/typer/main.py\", line 311, in __call__\n return get_command(self)(*args, **kwargs)\n File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1130, in __call__\n return self.main(*args, **kwargs)\n File \"/opt/conda/lib/python3.9/site-packages/typer/core.py\", line 778, in main\n return _main(\n File \"/opt/conda/lib/python3.9/site-packages/typer/core.py\", line 216, in _main\n rv = self.invoke(ctx)\n File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1657, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 1404, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/opt/conda/lib/python3.9/site-packages/click/core.py\", line 760, in invoke\n return __callback(*args, **kwargs)\n File \"/opt/conda/lib/python3.9/site-packages/typer/main.py\", line 683, in wrapper\n return callback(**use_params) # type: ignore\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py\", line 78, in serve\n server.serve(\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 175, in serve\n asyncio.run(\n File \"/opt/conda/lib/python3.9/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 634, in run_until_complete\n self.run_forever()\n File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 601, in run_forever\n self._run_once()\n File \"/opt/conda/lib/python3.9/asyncio/base_events.py\", line 1905, in _run_once\n handle._run()\n File \"/opt/conda/lib/python3.9/asyncio/events.py\", line 80, in _run\n self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py\", line 142, in serve_inner\n model = get_model(\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py\", line 215, in get_model\n return FlashLlama(\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py\", line 65, in __init__\n model = FlashLlamaForCausalLM(config, weights)\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 371, in __init__\n self.model = FlashLlamaModel(config, weights)\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 310, in __init__\n [\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 311, in <listcomp>\n FlashLlamaLayer(\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 246, in __init__\n self.self_attn = FlashLlamaAttention(\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 121, in __init__\n self.query_key_value = TensorParallelColumnLinear.load_multi(\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/layers.py\", line 251, in load_multi\n weight = weights.get_multi_weights_col(\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 125, in get_multi_weights_col\n w = [self.get_tensor(f\"{p}.g_idx\") for p in prefixes]\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 125, in <listcomp>\n w = [self.get_tensor(f\"{p}.g_idx\") for p in prefixes]\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 62, in get_tensor\n filename, tensor_name = self.get_filename(tensor_name)\n File \"/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/weights.py\", line 49, in get_filename\n raise RuntimeError(f\"weight {tensor_name} does not exist\")\nRuntimeError: weight model.layers.0.self_attn.q_proj.g_idx does not exist\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}

This is from this model: https://huggingface.co/TheBloke/orca_mini_v2_7B-GPTQ

It's also a conversion using GPTQ-for-Llama so should be broadly compatible. Here's the quantization config:

{

| "bits": 4,
| "group_size": 128,
| "damp_percent": 0.01,
| "desc_act": false,
| "sym": true,
| "true_sequential": true
| }

Is this incompatibility due to the desc_act = false, or something else?

Narsil · 2023-07-13T21:27:59Z

Annnnd this is why I don't particularly enjoy maintaining external models...

I'm not sure I have the bandwidth to really investigate an escape hatch.

ssmi153 · 2023-07-14T02:10:06Z

@Narsil , I worked out how to get at least some of the quantised versions of TheBloke’s conversions working which is good news. Take a look at #601 where I’ve added more details.

ssmi153 added 5 commits July 12, 2023 17:25

Bug fixes for GPTQ_BITS env var passthrough

636a4cc

Merge branch 'huggingface:main' into main

073c1a8

GPTQ env vars: Catch Runtime errors

dc761f1

Merge branch 'main' of https://github.com/ssmi153/text-generation-inf…

29ff597

…erence

Tuple rather than list of exception types

549df83

ssmi153 mentioned this pull request Jul 12, 2023

Bug fixes for GPTQ_BITS environment variable passthrough #590

Merged

Narsil approved these changes Jul 12, 2023

View reviewed changes

Narsil merged commit 3628559 into huggingface:main Jul 12, 2023
2 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQ Env vars: catch correct type of error #596

GPTQ Env vars: catch correct type of error #596

ssmi153 commented Jul 12, 2023

Narsil commented Jul 12, 2023

ssmi153 commented Jul 12, 2023 •

edited

Loading

Narsil commented Jul 12, 2023

Narsil commented Jul 12, 2023

Narsil commented Jul 12, 2023

ssmi153 commented Jul 12, 2023

Narsil commented Jul 13, 2023

ssmi153 commented Jul 14, 2023

GPTQ Env vars: catch correct type of error #596

GPTQ Env vars: catch correct type of error #596

Conversation

ssmi153 commented Jul 12, 2023

What does this PR do?

Narsil commented Jul 12, 2023

ssmi153 commented Jul 12, 2023 • edited Loading

Narsil commented Jul 12, 2023

Narsil commented Jul 12, 2023

Narsil commented Jul 12, 2023

ssmi153 commented Jul 12, 2023

{

Narsil commented Jul 13, 2023

ssmi153 commented Jul 14, 2023

ssmi153 commented Jul 12, 2023 •

edited

Loading