Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Failed to resolve model_id when calling text-generation-inference service from Langchain #17779

Closed
4 tasks done
kunalchamoli opened this issue Feb 20, 2024 · 13 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@kunalchamoli
Copy link

kunalchamoli commented Feb 20, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.

Example Code

import os

from langchain_community.llms import HuggingFaceTextGenInference

ENDPOINT_URL = "<YOUR_ENDPOINT_URL_HERE>"
HF_TOKEN = os.getenv("HUGGINGFACEHUB_API_TOKEN")

llm = HuggingFaceTextGenInference(
    inference_server_url=ENDPOINT_URL,
    max_new_tokens=512,
    top_k=50,
    temperature=0.1,
    repetition_penalty=1.03,
    server_kwargs={
        "headers": {
            "Authorization": f"Bearer {HF_TOKEN}",
            "Content-Type": "application/json",
        }
    },
)

Error Message and Stack Trace (if applicable)

File "/app/test_lang.py", line 36, in
chat_model = ChatHuggingFace(llm=llm)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/chat_models/huggingface.py", line 54, in init
self._resolve_model_id()
File "/usr/local/lib/python3.11/site-packages/langchain_community/chat_models/huggingface.py", line 158, in _resolve_model_id
raise ValueError(
ValueError: Failed to resolve model_id Could not find model id for inference server provided: http://xx.xx.xx.xxx/
Make sure that your Hugging Face token has access to the endpoint.

Description

I have hosted text-generation-inference on a seperate instance and i am trying to call it from langchain server hosted on another server.
But i am getting this error.

System Info

langchain==0.1.7
langchain-community==0.0.20
langchain-core==0.1.23
Copy link

dosubot bot commented Feb 20, 2024

🤖

Hey @kunalchamoli, great to see you diving into some more adventures with LangChain! Looks like you've stumbled upon a tricky situation. I'm on it and will get back to you with a detailed response shortly.

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Feb 20, 2024
@kunalchamoli
Copy link
Author

i can easily call this service from the example shown here by using requests library.

@aymeric-roucher
Copy link
Contributor

@kunalchamoli you should try updating your version of langchain-community to 0.0.21: we've just done a release that changes this class a bit, and for me with this last version your code works.

Copy link

dosubot bot commented Feb 20, 2024

🤖

Based on the error message and stack trace you provided, it seems like the issue is related to the resolution of the model_id in the ChatHuggingFace class. The model_id is resolved from the Language Learning Model (LLM)'s inference server URL. If the model_id cannot be resolved, a ValueError is raised.

The error message suggests that the model_id could not be found for the provided inference server URL. This could be due to a few reasons:

  1. The inference server URL provided does not match any of the available endpoints from the Hugging Face Hub. You can verify this by checking the list of available endpoints from the Hugging Face Hub.

  2. The Hugging Face token does not have access to the endpoint. You can verify this by checking the permissions of your Hugging Face token.

Here is the relevant code snippet from the _resolve_model_id method in the ChatHuggingFace class:

def _resolve_model_id(self) -> None:
    """Resolve the model_id from the LLM's inference_server_url"""

    from huggingface_hub import list_inference_endpoints
    available_endpoints = list_inference_endpoints("*")
    if isinstance(self.llm, HuggingFaceHub) or (
        hasattr(self.llm, "repo_id") and self.llm.repo_id
    ):
        self.model_id = self.llm.repo_id
        return
    elif isinstance(self.llm, HuggingFaceTextGenInference):
        endpoint_url: Optional[str] = self.llm.inference_server_url
    else:
        endpoint_url = self.llm.endpoint_url

    for endpoint in available_endpoints:
        if endpoint.url == endpoint_url:
            self.model_id = endpoint.repository

    if not self.model_id:
        raise ValueError(
            "Failed to resolve model_id:"
            f"Could not find model id for inference server: {endpoint_url}"
            "Make sure that your Hugging Face token has access to the endpoint."
        )

You can find this method in the ChatHuggingFace class.

To resolve this issue, you can try the following steps:

  1. Verify that the inference server URL provided in the HuggingFaceTextGenInference class instantiation matches one of the available endpoints from the Hugging Face Hub.

  2. Verify that your Hugging Face token has the necessary permissions to access the endpoint.

If the issue persists after trying these steps, please provide more information about the inference server URL and the Hugging Face token you are using. This will help in further diagnosing the issue.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@kunalchamoli
Copy link
Author

@aymeric-roucher does the example code expects model to be hosted on huggingfacehub or on huggingface servers ?. I have hosted it on a local instance.

@Akashgowtham1
Copy link

@kunalchamoli did you find the solution even am facing same issue . I've hosted mistral local GPU instance

@anhnh2002
Copy link

Same issue here

@anhnh2002
Copy link

@Akashgowtham1 did you solve this problem

@sergiovadyen
Copy link

Yeah, same error happening to me. This looks like a bug, as I have an on-premise mistral Text Generation Inference server, so it should not be trying to connect to the HuggingFace Hub or HuggingFace servers. This was working fine with the deprecated class, HuggingFaceTextGenInference, but not with its current alternative, HuggingFaceEndpoint.

@Simon-Stone
Copy link
Contributor

Simon-Stone commented Mar 14, 2024

I have also come across this issue when I tried to migrate my code from the deprecated HuggingFaceTextGenInference to HuggingFaceEndpoint. It seems that HuggingFaceEndpoint is not supposed to be used with self-hosted instances of Text Generation Inference.

Could we please get some clarification if this support will be added in the future? Before or after HuggingFaceTextGenInference is removed?

@Taimoor0217
Copy link

Taimoor0217 commented May 5, 2024

Hi everyone,
I also came across the same issue, seems like something might be wrong here with the list_inference_endpoints method is the HuggingFaceHub package.

Anyways, I was able to resolve this issue by explicitly passing in a model_id when initiating the chat model.

For example, in the example below, I explicitly pass in the model_ for llama-3-8b-instruct

llm = HuggingFaceTextGenInference(
    inference_server_url=os.environ['LLAMA_INSTRUCT_URL'],
    max_new_tokens=512,
    top_k=50,
    temperature=0.1,
    repetition_penalty=1.03,
    server_kwargs={
        "headers": {
            "Authorization": f"Bearer {os.environ['HF_TOKEN']}",
            "Content-Type": "application/json",
        }
    },
)
chat_model = ChatHuggingFace(llm=llm,  model_id='meta-llama/Meta-Llama-3-8B-Instruct')

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 4, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 11, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 11, 2024
@DanielViglione
Copy link

DanielViglione commented Sep 22, 2024

I have also come across this issue when I tried to migrate my code from the deprecated HuggingFaceTextGenInference to HuggingFaceEndpoint. It seems that HuggingFaceEndpoint is not supposed to be used with self-hosted instances of Text Generation Inference.

Could we please get some clarification if this support will be added in the future? Before or after HuggingFaceTextGenInference is removed?

The HuggingFaceEndpoint class does work with a local TGI. However, the ChatHuggingFace class can pose an issue. Let's say you load the tokenizer locally and you are running the HF TGI on a local server:

tokenizer = AutoTokenizer.from_pretrained('./local_tokenizer/')
chat = ChatHuggingFace(llm=llm, tokenizer=tokenizer)

The init method of ChatHuggingFace has an internal function call: _resolve_model_id. This checks if you are using HuggingFaceHub, the deprecated HuggingFaceTextGenInference, or HuggingFacePipeline. If not, it checks if your endpoint url is in the list of inference endpoints. If not, throws error.

So one solution is to subclass ChatHuggingFace and override _resolve_model_id:

class MyChatHuggingFace(ChatHuggingFace):
    def _resolve_model_id(self) -> None:
        self.model_id = 'meta-llama/Meta-Llama-3.1-8B-Instruct'

And as long as llm is instance of HuggingFaceEndpoint, then it will work.

@DanielViglione
Copy link

Note if you use ChatHuggingFace, you are going to have to define your own _stream and _astream methods, which is what I had to do, if you want streaming through the Runnable and RunnableSequence interfaces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

8 participants