-
Notifications
You must be signed in to change notification settings - Fork 15.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Failed to resolve model_id when calling text-generation-inference service from Langchain #17779
Comments
🤖 Hey @kunalchamoli, great to see you diving into some more adventures with LangChain! Looks like you've stumbled upon a tricky situation. I'm on it and will get back to you with a detailed response shortly. |
i can easily call this service from the example shown here by using requests library. |
@kunalchamoli you should try updating your version of |
🤖 Based on the error message and stack trace you provided, it seems like the issue is related to the resolution of the The error message suggests that the
Here is the relevant code snippet from the def _resolve_model_id(self) -> None:
"""Resolve the model_id from the LLM's inference_server_url"""
from huggingface_hub import list_inference_endpoints
available_endpoints = list_inference_endpoints("*")
if isinstance(self.llm, HuggingFaceHub) or (
hasattr(self.llm, "repo_id") and self.llm.repo_id
):
self.model_id = self.llm.repo_id
return
elif isinstance(self.llm, HuggingFaceTextGenInference):
endpoint_url: Optional[str] = self.llm.inference_server_url
else:
endpoint_url = self.llm.endpoint_url
for endpoint in available_endpoints:
if endpoint.url == endpoint_url:
self.model_id = endpoint.repository
if not self.model_id:
raise ValueError(
"Failed to resolve model_id:"
f"Could not find model id for inference server: {endpoint_url}"
"Make sure that your Hugging Face token has access to the endpoint."
) You can find this method in the ChatHuggingFace class. To resolve this issue, you can try the following steps:
If the issue persists after trying these steps, please provide more information about the inference server URL and the Hugging Face token you are using. This will help in further diagnosing the issue. Sources
|
@aymeric-roucher does the example code expects model to be hosted on huggingfacehub or on huggingface servers ?. I have hosted it on a local instance. |
@kunalchamoli did you find the solution even am facing same issue . I've hosted mistral local GPU instance |
Same issue here |
@Akashgowtham1 did you solve this problem |
Yeah, same error happening to me. This looks like a bug, as I have an on-premise mistral Text Generation Inference server, so it should not be trying to connect to the HuggingFace Hub or HuggingFace servers. This was working fine with the deprecated class, HuggingFaceTextGenInference, but not with its current alternative, HuggingFaceEndpoint. |
I have also come across this issue when I tried to migrate my code from the deprecated Could we please get some clarification if this support will be added in the future? Before or after |
Hi everyone, Anyways, I was able to resolve this issue by explicitly passing in a For example, in the example below, I explicitly pass in the model_ for llm = HuggingFaceTextGenInference(
inference_server_url=os.environ['LLAMA_INSTRUCT_URL'],
max_new_tokens=512,
top_k=50,
temperature=0.1,
repetition_penalty=1.03,
server_kwargs={
"headers": {
"Authorization": f"Bearer {os.environ['HF_TOKEN']}",
"Content-Type": "application/json",
}
},
) chat_model = ChatHuggingFace(llm=llm, model_id='meta-llama/Meta-Llama-3-8B-Instruct') |
The HuggingFaceEndpoint class does work with a local TGI. However, the ChatHuggingFace class can pose an issue. Let's say you load the tokenizer locally and you are running the HF TGI on a local server:
The init method of ChatHuggingFace has an internal function call: _resolve_model_id. This checks if you are using HuggingFaceHub, the deprecated HuggingFaceTextGenInference, or HuggingFacePipeline. If not, it checks if your endpoint url is in the list of inference endpoints. If not, throws error. So one solution is to subclass ChatHuggingFace and override _resolve_model_id:
And as long as llm is instance of HuggingFaceEndpoint, then it will work. |
Note if you use ChatHuggingFace, you are going to have to define your own _stream and _astream methods, which is what I had to do, if you want streaming through the Runnable and RunnableSequence interfaces. |
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
Description
I have hosted text-generation-inference on a seperate instance and i am trying to call it from langchain server hosted on another server.
But i am getting this error.
System Info
The text was updated successfully, but these errors were encountered: