ValueError: Failed to resolve model_id when calling text-generation-inference service from Langchain #17779

kunalchamoli · 2024-02-20T08:45:14Z

Checked other resources

I added a very descriptive title to this issue.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain rather than my code.

Example Code

import os

from langchain_community.llms import HuggingFaceTextGenInference

ENDPOINT_URL = "<YOUR_ENDPOINT_URL_HERE>"
HF_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

llm = HuggingFaceTextGenInference(
    inference_server_url=ENDPOINT_URL,
    max_new_tokens=512,
    top_k=50,
    temperature=0.1,
    repetition_penalty=1.03,
    server_kwargs={
        "headers": {
            "Authorization": f"Bearer {HF_TOKEN}",
            "Content-Type": "application/json",
        }
    },
)

Error Message and Stack Trace (if applicable)

File "/app/test_lang.py", line 36, in
chat_model = ChatHuggingFace(llm=llm)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/chat_models/huggingface.py", line 54, in init
self._resolve_model_id()
File "/usr/local/lib/python3.11/site-packages/langchain_community/chat_models/huggingface.py", line 158, in _resolve_model_id
raise ValueError(
ValueError: Failed to resolve model_id Could not find model id for inference server provided: http://xx.xx.xx.xxx/
Make sure that your Hugging Face token has access to the endpoint.

Description

I have hosted text-generation-inference on a seperate instance and i am trying to call it from langchain server hosted on another server.
But i am getting this error.

System Info

langchain==0.1.7
langchain-community==0.0.20
langchain-core==0.1.23

The text was updated successfully, but these errors were encountered:

dosubot · 2024-02-20T08:45:26Z

🤖

Hey @kunalchamoli, great to see you diving into some more adventures with LangChain! Looks like you've stumbled upon a tricky situation. I'm on it and will get back to you with a detailed response shortly.

kunalchamoli · 2024-02-20T08:48:24Z

i can easily call this service from the example shown here by using requests library.

aymeric-roucher · 2024-02-20T08:54:22Z

@kunalchamoli you should try updating your version of langchain-community to 0.0.21: we've just done a release that changes this class a bit, and for me with this last version your code works.

dosubot · 2024-02-20T08:54:42Z

🤖

Based on the error message and stack trace you provided, it seems like the issue is related to the resolution of the model_id in the ChatHuggingFace class. The model_id is resolved from the Language Learning Model (LLM)'s inference server URL. If the model_id cannot be resolved, a ValueError is raised.

The error message suggests that the model_id could not be found for the provided inference server URL. This could be due to a few reasons:

The inference server URL provided does not match any of the available endpoints from the Hugging Face Hub. You can verify this by checking the list of available endpoints from the Hugging Face Hub.
The Hugging Face token does not have access to the endpoint. You can verify this by checking the permissions of your Hugging Face token.

Here is the relevant code snippet from the _resolve_model_id method in the ChatHuggingFace class:

def _resolve_model_id(self) -> None:
    """Resolve the model_id from the LLM's inference_server_url"""

    from huggingface_hub import list_inference_endpoints
    available_endpoints = list_inference_endpoints("*")
    if isinstance(self.llm, HuggingFaceHub) or (
        hasattr(self.llm, "repo_id") and self.llm.repo_id
    ):
        self.model_id = self.llm.repo_id
        return
    elif isinstance(self.llm, HuggingFaceTextGenInference):
        endpoint_url: Optional[str] = self.llm.inference_server_url
    else:
        endpoint_url = self.llm.endpoint_url

    for endpoint in available_endpoints:
        if endpoint.url == endpoint_url:
            self.model_id = endpoint.repository

    if not self.model_id:
        raise ValueError(
            "Failed to resolve model_id:"
            f"Could not find model id for inference server: {endpoint_url}"
            "Make sure that your Hugging Face token has access to the endpoint."
        )

You can find this method in the ChatHuggingFace class.

To resolve this issue, you can try the following steps:

Verify that the inference server URL provided in the HuggingFaceTextGenInference class instantiation matches one of the available endpoints from the Hugging Face Hub.
Verify that your Hugging Face token has the necessary permissions to access the endpoint.

If the issue persists after trying these steps, please provide more information about the inference server URL and the Hugging Face token you are using. This will help in further diagnosing the issue.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

kunalchamoli · 2024-02-21T08:03:39Z

@aymeric-roucher does the example code expects model to be hosted on huggingfacehub or on huggingface servers ?. I have hosted it on a local instance.

Akashgowtham1 · 2024-02-21T17:33:44Z

@kunalchamoli did you find the solution even am facing same issue . I've hosted mistral local GPU instance

anhnh2002 · 2024-03-07T07:23:24Z

Same issue here

anhnh2002 · 2024-03-07T07:31:55Z

@Akashgowtham1 did you solve this problem

sergiovadyen · 2024-03-07T09:52:16Z

Yeah, same error happening to me. This looks like a bug, as I have an on-premise mistral Text Generation Inference server, so it should not be trying to connect to the HuggingFace Hub or HuggingFace servers. This was working fine with the deprecated class, HuggingFaceTextGenInference, but not with its current alternative, HuggingFaceEndpoint.

Simon-Stone · 2024-03-14T17:11:14Z

I have also come across this issue when I tried to migrate my code from the deprecated HuggingFaceTextGenInference to HuggingFaceEndpoint. It seems that HuggingFaceEndpoint is not supposed to be used with self-hosted instances of Text Generation Inference.

Could we please get some clarification if this support will be added in the future? Before or after HuggingFaceTextGenInference is removed?

Taimoor0217 · 2024-05-05T16:05:14Z

Hi everyone,
I also came across the same issue, seems like something might be wrong here with the list_inference_endpoints method is the HuggingFaceHub package.

Anyways, I was able to resolve this issue by explicitly passing in a model_id when initiating the chat model.

For example, in the example below, I explicitly pass in the model_ for llama-3-8b-instruct

llm = HuggingFaceTextGenInference(
    inference_server_url=os.environ['LLAMA_INSTRUCT_URL'],
    max_new_tokens=512,
    top_k=50,
    temperature=0.1,
    repetition_penalty=1.03,
    server_kwargs={
        "headers": {
            "Authorization": f"Bearer {os.environ['HF_TOKEN']}",
            "Content-Type": "application/json",
        }
    },
)

chat_model = ChatHuggingFace(llm=llm,  model_id='meta-llama/Meta-Llama-3-8B-Instruct')

DanielViglione · 2024-09-22T05:55:37Z

I have also come across this issue when I tried to migrate my code from the deprecated HuggingFaceTextGenInference to HuggingFaceEndpoint. It seems that HuggingFaceEndpoint is not supposed to be used with self-hosted instances of Text Generation Inference.

Could we please get some clarification if this support will be added in the future? Before or after HuggingFaceTextGenInference is removed?

The HuggingFaceEndpoint class does work with a local TGI. However, the ChatHuggingFace class can pose an issue. Let's say you load the tokenizer locally and you are running the HF TGI on a local server:

tokenizer = AutoTokenizer.from_pretrained('./local_tokenizer/')
chat = ChatHuggingFace(llm=llm, tokenizer=tokenizer)

The init method of ChatHuggingFace has an internal function call: _resolve_model_id. This checks if you are using HuggingFaceHub, the deprecated HuggingFaceTextGenInference, or HuggingFacePipeline. If not, it checks if your endpoint url is in the list of inference endpoints. If not, throws error.

So one solution is to subclass ChatHuggingFace and override _resolve_model_id:

class MyChatHuggingFace(ChatHuggingFace):
    def _resolve_model_id(self) -> None:
        self.model_id = 'meta-llama/Meta-Llama-3.1-8B-Instruct'

And as long as llm is instance of HuggingFaceEndpoint, then it will work.

DanielViglione · 2024-09-22T12:55:08Z

Note if you use ChatHuggingFace, you are going to have to define your own _stream and _astream methods, which is what I had to do, if you want streaming through the Runnable and RunnableSequence interfaces.

dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Feb 20, 2024

This was referenced May 30, 2024

added default model opea-project/GenAIComps#113

Closed

Bug Fix: chat completions API calls need model_id opea-project/GenAIComps#114

Merged

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 4, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 11, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Failed to resolve model_id when calling text-generation-inference service from Langchain #17779

ValueError: Failed to resolve model_id when calling text-generation-inference service from Langchain #17779

kunalchamoli commented Feb 20, 2024 •

edited

Loading

dosubot bot commented Feb 20, 2024

kunalchamoli commented Feb 20, 2024

aymeric-roucher commented Feb 20, 2024

dosubot bot commented Feb 20, 2024

About Dosu

kunalchamoli commented Feb 21, 2024

Akashgowtham1 commented Feb 21, 2024

anhnh2002 commented Mar 7, 2024

anhnh2002 commented Mar 7, 2024

sergiovadyen commented Mar 7, 2024

Simon-Stone commented Mar 14, 2024 •

edited

Loading

Taimoor0217 commented May 5, 2024 •

edited

Loading

DanielViglione commented Sep 22, 2024 •

edited

Loading

DanielViglione commented Sep 22, 2024

ValueError: Failed to resolve model_id when calling text-generation-inference service from Langchain #17779

ValueError: Failed to resolve model_id when calling text-generation-inference service from Langchain #17779

Comments

kunalchamoli commented Feb 20, 2024 • edited Loading

Checked other resources

Example Code

Error Message and Stack Trace (if applicable)

Description

System Info

dosubot bot commented Feb 20, 2024

kunalchamoli commented Feb 20, 2024

aymeric-roucher commented Feb 20, 2024

dosubot bot commented Feb 20, 2024

Sources

About Dosu

kunalchamoli commented Feb 21, 2024

Akashgowtham1 commented Feb 21, 2024

anhnh2002 commented Mar 7, 2024

anhnh2002 commented Mar 7, 2024

sergiovadyen commented Mar 7, 2024

Simon-Stone commented Mar 14, 2024 • edited Loading

Taimoor0217 commented May 5, 2024 • edited Loading

DanielViglione commented Sep 22, 2024 • edited Loading

DanielViglione commented Sep 22, 2024

kunalchamoli commented Feb 20, 2024 •

edited

Loading

Simon-Stone commented Mar 14, 2024 •

edited

Loading

Taimoor0217 commented May 5, 2024 •

edited

Loading

DanielViglione commented Sep 22, 2024 •

edited

Loading