Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HF demo #117

Closed
mahmoudaymo opened this issue Sep 14, 2023 · 8 comments
Closed

HF demo #117

mahmoudaymo opened this issue Sep 14, 2023 · 8 comments

Comments

@mahmoudaymo
Copy link

Could you please add some demos on how to use the huggingface models locally and on the server. I am struggling on how to use them.

@okhat
Copy link
Collaborator

okhat commented Sep 14, 2023

@arnavsinghvi11

@arnavsinghvi11
Copy link
Collaborator

Hi @mahmoudaymo ,

Following the using_local_models.md documentation should be particularly helpful from this PR.
The Text-Generation-Inference Server corresponds to using the HuggingFace models on the text-generation inference server supported by HuggingFace.
HFModel corresponds to using the HF models locally.

More class-specific documentation can be found within language_models_client.md in the same PR

Let me know if that helps or if you have any additional questions!

@mahmoudaymo
Copy link
Author

Thank you very much, this is very helpful. I will close the issue.

@mahmoudaymo
Copy link
Author

mahmoudaymo commented Sep 15, 2023

Hi again,

I followed the instruction from docs/using_local_models.md:
llm = dspy.HFModel(model = 'meta-llama/Llama-2-7b-hf')

I got this error:


AttributeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 llm = dspy.HFModel(model='TheBloke/Llama-2-7B-Chat-GGML')
2 #turbo = dspy.OpenAI(model='gpt-3.5-turbo')
3 colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://localhost:8001/wiki17_abstracts')
4
5 dspy.settings.configure(lm=llm, rm=colbertv2_wiki17_abstracts)

AttributeError: module 'dspy' has no attribute 'HFModel'

Then what I did is importing HFModel form dsp as follow:
from dsp import HFModel
llm = HFModel(model='TheBloke/Llama-2-7B-Chat-GGML')

I got the following erro:


TypeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 turbo = HFModel(model='TheBloke/Llama-2-7B-Chat-GGML')
2 #turbo = dspy.OpenAI(model='gpt-3.5-turbo')
3 colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://localhost:8001/wiki17_abstracts')
4
5 dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

/usr/local/lib/python3.10/dist-packages/dsp/modules/hf.py in init(self, model, checkpoint, is_client, hf_device_map)
55 if not self.is_client:
56 try:
---> 57 architecture = AutoConfig.from_pretrained(model).dict["architectures"][0]
58 self.encoder_decoder_model = ("ConditionalGeneration" in architecture) or ("T5WithLMHeadModel" in architecture)
59 self.decoder_only_model = ("CausalLM" in architecture) or ("GPT2LMHeadModel" in architecture)

TypeError: 'NoneType' object is not subscriptable

it seems that architectures is not part of model config.

@mahmoudaymo mahmoudaymo reopened this Sep 15, 2023
@okhat
Copy link
Collaborator

okhat commented Sep 15, 2023

Thanks @mahmoudaymo ! @arnavsinghvi11 is looking into this

@arnavsinghvi11
Copy link
Collaborator

Hi @mahmoudaymo

You are correct in that architectures is not found in the model config .

I would actually suggest the following pivots to overcome this:

  1. Using HFClientTGI over HFModel - the TGI server actually offers significant speed benefits and various other features like weight conversions, sharding, quantization, etc. that will make your runs much smoother compared to HFModel.
  2. Pivoting to the GPTQ format instead of GGML - TGI currently does not support GGML formats but offers support for the GPTQ version of this model which you can use as per this thread. (If using the GGML format is a strict requirement, I can do some more digging on how to support this within some other server frameworks we support).

Some key notes to keep in mind:
-using the latest TGI docker (http://ghcr.io/huggingface/text-generation-inference:latest) as our current docs mention using the 0.9.3 version but the latest version supports more advanced features and models (which we will eventually upgrade to as well)
-passing the --quantize gptq flag for the GPTQ format
-referring to any other issues open on the TGI repo and/or additional relevant info on what other GPTQ models are supported

Please let me know if you have any additional questions!

@mahmoudaymo
Copy link
Author

Thank you very much @arnavsinghvi11. The issue is solved and I will close the issue.
Looking forward to seeing the new features of the library.

@antaripg
Copy link

Hi,
I haven't found any straightforward method to connect to Hugging Face Inference Endpoint from DSPy. I have finetuned model in my huggingface which I can want to use through endpoint as I dont have resources to locally download the model. Previously, I have used other frameworks like LangChain and LlamaIndex to access the endpoint and could easily do so. I dont see similar way in the DSPy document, I have seen vLLM server and HFClientTGI, but both requires the model to be deployed locally. It would be great if anyone can direct me towards a cookbook, code snippet to achieve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants