HF demo #117

mahmoudaymo · 2023-09-14T20:14:26Z

Could you please add some demos on how to use the huggingface models locally and on the server. I am struggling on how to use them.

okhat · 2023-09-14T20:52:28Z

@arnavsinghvi11

arnavsinghvi11 · 2023-09-14T21:12:41Z

Hi @mahmoudaymo ,

Following the using_local_models.md documentation should be particularly helpful from this PR.
The Text-Generation-Inference Server corresponds to using the HuggingFace models on the text-generation inference server supported by HuggingFace.
HFModel corresponds to using the HF models locally.

More class-specific documentation can be found within language_models_client.md in the same PR

Let me know if that helps or if you have any additional questions!

mahmoudaymo · 2023-09-15T06:14:39Z

Thank you very much, this is very helpful. I will close the issue.

mahmoudaymo · 2023-09-15T07:21:42Z

Hi again,

I followed the instruction from docs/using_local_models.md:
llm = dspy.HFModel(model = 'meta-llama/Llama-2-7b-hf')

I got this error:

AttributeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 llm = dspy.HFModel(model='TheBloke/Llama-2-7B-Chat-GGML')
2 #turbo = dspy.OpenAI(model='gpt-3.5-turbo')
3 colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://localhost:8001/wiki17_abstracts')
4
5 dspy.settings.configure(lm=llm, rm=colbertv2_wiki17_abstracts)

AttributeError: module 'dspy' has no attribute 'HFModel'

Then what I did is importing HFModel form dsp as follow:
from dsp import HFModel
llm = HFModel(model='TheBloke/Llama-2-7B-Chat-GGML')

I got the following erro:

TypeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 turbo = HFModel(model='TheBloke/Llama-2-7B-Chat-GGML')
2 #turbo = dspy.OpenAI(model='gpt-3.5-turbo')
3 colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://localhost:8001/wiki17_abstracts')
4
5 dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

/usr/local/lib/python3.10/dist-packages/dsp/modules/hf.py in init(self, model, checkpoint, is_client, hf_device_map)
55 if not self.is_client:
56 try:
---> 57 architecture = AutoConfig.from_pretrained(model).dict["architectures"][0]
58 self.encoder_decoder_model = ("ConditionalGeneration" in architecture) or ("T5WithLMHeadModel" in architecture)
59 self.decoder_only_model = ("CausalLM" in architecture) or ("GPT2LMHeadModel" in architecture)

TypeError: 'NoneType' object is not subscriptable

it seems that architectures is not part of model config.

okhat · 2023-09-15T17:19:08Z

Thanks @mahmoudaymo ! @arnavsinghvi11 is looking into this

arnavsinghvi11 · 2023-09-15T18:59:15Z

Hi @mahmoudaymo

You are correct in that architectures is not found in the model config .

I would actually suggest the following pivots to overcome this:

Using HFClientTGI over HFModel - the TGI server actually offers significant speed benefits and various other features like weight conversions, sharding, quantization, etc. that will make your runs much smoother compared to HFModel.
Pivoting to the GPTQ format instead of GGML - TGI currently does not support GGML formats but offers support for the GPTQ version of this model which you can use as per this thread. (If using the GGML format is a strict requirement, I can do some more digging on how to support this within some other server frameworks we support).

Some key notes to keep in mind:
-using the latest TGI docker (http://ghcr.io/huggingface/text-generation-inference:latest) as our current docs mention using the 0.9.3 version but the latest version supports more advanced features and models (which we will eventually upgrade to as well)
-passing the --quantize gptq flag for the GPTQ format
-referring to any other issues open on the TGI repo and/or additional relevant info on what other GPTQ models are supported

Please let me know if you have any additional questions!

mahmoudaymo · 2023-09-19T12:56:48Z

Thank you very much @arnavsinghvi11. The issue is solved and I will close the issue.
Looking forward to seeing the new features of the library.

antaripg · 2024-09-14T07:53:29Z

Hi,
I haven't found any straightforward method to connect to Hugging Face Inference Endpoint from DSPy. I have finetuned model in my huggingface which I can want to use through endpoint as I dont have resources to locally download the model. Previously, I have used other frameworks like LangChain and LlamaIndex to access the endpoint and could easily do so. I dont see similar way in the DSPy document, I have seen vLLM server and HFClientTGI, but both requires the model to be deployed locally. It would be great if anyone can direct me towards a cookbook, code snippet to achieve this.

mahmoudaymo closed this as completed Sep 15, 2023

mahmoudaymo reopened this Sep 15, 2023

mahmoudaymo closed this as completed Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HF demo #117

HF demo #117

mahmoudaymo commented Sep 14, 2023

okhat commented Sep 14, 2023

arnavsinghvi11 commented Sep 14, 2023

mahmoudaymo commented Sep 15, 2023

mahmoudaymo commented Sep 15, 2023 •

edited

Loading

okhat commented Sep 15, 2023

arnavsinghvi11 commented Sep 15, 2023

mahmoudaymo commented Sep 19, 2023

antaripg commented Sep 14, 2024

HF demo #117

HF demo #117

Comments

mahmoudaymo commented Sep 14, 2023

okhat commented Sep 14, 2023

arnavsinghvi11 commented Sep 14, 2023

mahmoudaymo commented Sep 15, 2023

mahmoudaymo commented Sep 15, 2023 • edited Loading

AttributeError: module 'dspy' has no attribute 'HFModel'

TypeError: 'NoneType' object is not subscriptable

okhat commented Sep 15, 2023

arnavsinghvi11 commented Sep 15, 2023

mahmoudaymo commented Sep 19, 2023

antaripg commented Sep 14, 2024

mahmoudaymo commented Sep 15, 2023 •

edited

Loading