Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: smaller dependency surface on Transformers model types? #408

Closed
andysalerno opened this issue Oct 22, 2023 · 5 comments
Closed

Comments

@andysalerno
Copy link

Hi, this is a very vague feature request :D

There are some parts of the code (in the main branch, I haven't looked at the upcoming branches that were mentioned in #395 ) that depend on some private/internal functionality of Transformers types. Mostly around the cache and logits history.

The problem with this is, it's difficult to create wrappers for other model providers, like GPTQ, AQW, llamacpp etc etc because you have to replicate some pretty specific private code from the Transformer types that Guidance expects to see.

So my request is this - is it possible to define a smaller interface/surface, not specific to Transformers (or at least not relying on anything internal to Transformers), that can be used to host models from other sources?

No idea how difficult this is or if it's realistic, just something that I would personally love to see.

@freckletonj
Copy link

I think guidance is effectively dead now. The maintainers are nearly radio silent, and although i don't love lmql, it's the best I'm aware of right now, so, maybe you'll get more mileage there? I had gptq stuff working with them, and I think they claim compatibility with llamacpp.

That said, if this is something you go after, here are a couple relevant transformers PRs that are changing caching related things:

huggingface/transformers#26681

huggingface/transformers#25086

@paucodeici
Copy link

In fact the branch pythonic they are developing take a long time but this is still active. Just they forget to communicate with community in their movement.

For controlled generation you have LMQL, but personally I find it quite hard to use compared to a guidance. They make many stuff hiding most of what happen and making it hard to debug when things go wrong (and this is often the case, I have unexplainable memory leak with LQML).

Also, you have outlines (I don't try yet but their paper https://arxiv.org/pdf/2307.09702.pdf is at least very easy to understand).

@freckletonj
Copy link

@paucodeici I agree with everything you've said about lmql. It's a pain to use and debug. I still haven't found anything better yet though. Guidance was way better, but alas, it seems pretty dead.

Have you tried the pythonic branch?

@paucodeici
Copy link

paucodeici commented Nov 7, 2023 via email

@marcotcr
Copy link
Collaborator

Please check out our current attempt in the new release, where we integrate llama.cpp as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants