Project Roadmap #57

tgaddair · 2023-11-22T20:53:03Z

RileyCodes · 2023-11-22T22:45:03Z

is AWQ supported?

tgaddair · 2023-11-22T22:57:22Z

Hey @RileyCodes, not yet, will add that to the roadmap!

abhibst · 2023-11-23T15:53:14Z

does we have tested bitsandbytes Quantization ?

tgaddair · 2023-11-23T20:22:51Z

Hey @abhibst, I've done some basic sanity checks on it, but haven't tested it very thoroughly. Please feel free to report any issues you encounter and I'll take a look!

abhibst · 2023-11-23T21:44:58Z

Sure Thanks for confirming

sansavision · 2023-11-29T20:48:29Z

How would you go about adding this in Stable Diffusion? I am really interested in experimenting with that.

tgaddair · 2023-11-29T22:16:03Z

Hey @sansavision, at a high level it would look a lot like the LoRA pipeline used in Diffusers: https://github.com/huggingface/api-inference-community/blob/main/docker_images/diffusers/app/pipelines/text_to_image.py#L25

A v0 shouldn't be too bad, we would basically just run a single forward pass to generate the image and perform postprocessing (as part of the existing Prefill step) and short-circuit the Decode step.

flozi00 · 2023-12-03T21:38:00Z

If no one has started I will start working on awq tomorrow

tgaddair · 2023-12-03T22:14:21Z

Nice! Thanks @flozi00, that would be awesome!

SamGalanakis · 2023-12-06T12:33:00Z

Any plans to support vision transformers from huggingface / timm? A lot of potential use cases there for deploying many classifiers. If not what would that entail? Would be open to contributing if possible.

tgaddair · 2023-12-06T17:49:06Z

Hey @SamGalanakis, great suggestion! The plan at the moment is to start by supporting text classifiers. Once that framework is in place for that, it should be hopefully relatively straightforward to support image classifiers as well. Happy to start a thread on Discord to discuss!

flozi00 · 2023-12-06T18:17:45Z

Whisper would be also very cool 😄

SamGalanakis · 2023-12-06T18:25:26Z

@tgaddair Ok clear, joined the discord will look out for it!

Hap-Zhang · 2023-12-15T07:51:56Z

Hi, @tgaddair , could I know how long it will take to support the stable diffusion model?

tgaddair · 2023-12-16T21:19:15Z

Hey @Hap-Zhang, the plan at the moment is to add it after we add support for embedding generation and text classification. Both of those are planned for January 2024, so in the next month.

Hap-Zhang · 2023-12-18T01:51:50Z

@tgaddair Okay, got it. Thank you very much for your efforts. Stay tuned for it.

AdithyanI · 2024-01-08T16:10:49Z

If we could have OpenAI compatible endpoints that would be great too. So we can use this as drop in replacement for OpenAI models :)

tgaddair · 2024-01-08T17:19:43Z

Hey @AdithyanI, yes, this should be coming this week or next! See #145 to follow progress.

AdithyanI · 2024-01-08T22:36:26Z

@tgaddair oh wow that would be awesome! Thank you so much for the work here.
If you need someone to test it out; let me know. Happy to test it out.

Is the discord still open for others to join :) ?
I followed the link of the repo, and it says it is expired.

tgaddair · 2024-01-09T22:06:20Z

@AdithyanI this should be landing some time today :)

#170

tgaddair · 2024-01-09T22:07:03Z

Hey @AdithyanI, the Discord should be available. Are you using this link?

https://discord.gg/CBgdrGnZjy

AdithyanI · 2024-01-11T07:54:22Z

@tgaddair I asked for outlines repo authors to add support to this : dottxt-ai/outlines#523
Then it would be great to have text guided generation :)

I don't know how hard is it to integrate that here.
Do you folks know if this is something that can be supported by LORAX?

tgaddair · 2024-01-12T05:22:20Z

Thanks for starting the Outlines thread @AdithyanI! Looks like the maintainer created an issue #176. Excited to explore this integration!

K-Mistele · 2024-02-20T21:52:49Z

Would it be possible to add in context length-scaling methods like Self-Extend , Rope scaling, and/or yarn scaling? I know that llama.cpp has a good implementation of these in their server, and self-extend in particular is much more stable than rope or yarn. Having long context or doing context enhancement is super important for RAG applications.

thincal · 2024-02-26T18:42:57Z

About the supported models, could you consider the ChatGLM3 ? @tgaddair

thincal · 2024-03-10T17:22:09Z

LongLoRA

It seems that LongLoRA proposed shifted short attention is compatible with Flash-Attention, and not required during inference (ref: https://huggingface.co/Yukang/Llama-2-13b-longlora-8k#highlights), if that is true, could you share what's the planed support in LoRAX inference side? thanks @tgaddair

remiconnesson · 2024-03-17T15:05:21Z

Do you plan on supporting AQLM to setve LoRa of Mixtral Instruct with Lorax?

tgaddair · 2024-03-17T20:37:58Z

Hey @thincal, the last thing we need to support LongLoRA, if I remember correctly, is #231 which @geoffreyangus is planning to pick up next week.

@remiconnesson, we have PR #233 from @flozi00 for AQLM. It's pretty close to landing, but just needs a little additional work to finish it up. If no one else picks it up, I can probably take a look in the next week or two.

amir-in-a-cynch · 2024-04-01T17:07:31Z

Are T5 based models on the Roadmap?

remiconnesson · 2024-04-01T21:27:34Z

@tgaddair

@remiconnesson, we have PR #233 from @flozi00 for AQLM. It's pretty close to landing, but just needs a little additional work to finish it up. If no one else picks it up, I can probably take a look in the next week or two.

Hello :) How far do you think we are for this PR to be merged? :)

tgaddair · 2024-04-03T16:50:20Z

Hey @remiconnesson, will probably be the next thing I take a look at after wrapping up speculative decoding this week.

@amir-in-a-cynch we can definitely add T5 to the roadmap!

tomrance · 2024-04-22T14:46:57Z

Hello, will you integrate / merge / migrate to the latest hugging face text-generation-inference as it is back now with Apache 2.0 license?

bdalal · 2024-08-09T17:45:27Z

Is there an expected release date for v0.11?

tgaddair added the enhancement New feature or request label Nov 22, 2023

tgaddair pinned this issue Nov 22, 2023

arnavgarg1 unpinned this issue Nov 28, 2023

tgaddair pinned this issue Nov 29, 2023

thincal mentioned this issue Mar 7, 2024

decapoda-research/llama-13b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' #310

Open

remiconnesson mentioned this issue Mar 22, 2024

Are there any tools that can serve AQLM quantized models? Vahe1994/AQLM#44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Roadmap #57

Project Roadmap #57

tgaddair commented Nov 22, 2023 •

edited by ajtejankar

Loading

RileyCodes commented Nov 22, 2023

tgaddair commented Nov 22, 2023

abhibst commented Nov 23, 2023

tgaddair commented Nov 23, 2023

abhibst commented Nov 23, 2023

sansavision commented Nov 29, 2023

tgaddair commented Nov 29, 2023

flozi00 commented Dec 3, 2023

tgaddair commented Dec 3, 2023

SamGalanakis commented Dec 6, 2023

tgaddair commented Dec 6, 2023

flozi00 commented Dec 6, 2023

SamGalanakis commented Dec 6, 2023

Hap-Zhang commented Dec 15, 2023

tgaddair commented Dec 16, 2023

Hap-Zhang commented Dec 18, 2023

AdithyanI commented Jan 8, 2024

tgaddair commented Jan 8, 2024

AdithyanI commented Jan 8, 2024 •

edited

Loading

tgaddair commented Jan 9, 2024

tgaddair commented Jan 9, 2024

AdithyanI commented Jan 11, 2024

tgaddair commented Jan 12, 2024

K-Mistele commented Feb 20, 2024

thincal commented Feb 26, 2024

thincal commented Mar 10, 2024

remiconnesson commented Mar 17, 2024 •

edited

Loading

tgaddair commented Mar 17, 2024

amir-in-a-cynch commented Apr 1, 2024

remiconnesson commented Apr 1, 2024

tgaddair commented Apr 3, 2024

tomrance commented Apr 22, 2024

bdalal commented Aug 9, 2024

Project Roadmap #57

Project Roadmap #57

Comments

tgaddair commented Nov 22, 2023 • edited by ajtejankar Loading

v0.10

v0.11

Previous Releases

v0.9

Backlog

Models

Adapters

Throughput / Latency

Quantization

Usability

RileyCodes commented Nov 22, 2023

tgaddair commented Nov 22, 2023

abhibst commented Nov 23, 2023

tgaddair commented Nov 23, 2023

abhibst commented Nov 23, 2023

sansavision commented Nov 29, 2023

tgaddair commented Nov 29, 2023

flozi00 commented Dec 3, 2023

tgaddair commented Dec 3, 2023

SamGalanakis commented Dec 6, 2023

tgaddair commented Dec 6, 2023

flozi00 commented Dec 6, 2023

SamGalanakis commented Dec 6, 2023

Hap-Zhang commented Dec 15, 2023

tgaddair commented Dec 16, 2023

Hap-Zhang commented Dec 18, 2023

AdithyanI commented Jan 8, 2024

tgaddair commented Jan 8, 2024

AdithyanI commented Jan 8, 2024 • edited Loading

tgaddair commented Jan 9, 2024

tgaddair commented Jan 9, 2024

AdithyanI commented Jan 11, 2024

tgaddair commented Jan 12, 2024

K-Mistele commented Feb 20, 2024

thincal commented Feb 26, 2024

thincal commented Mar 10, 2024

remiconnesson commented Mar 17, 2024 • edited Loading

tgaddair commented Mar 17, 2024

amir-in-a-cynch commented Apr 1, 2024

remiconnesson commented Apr 1, 2024

tgaddair commented Apr 3, 2024

tomrance commented Apr 22, 2024

bdalal commented Aug 9, 2024

tgaddair commented Nov 22, 2023 •

edited by ajtejankar

Loading

AdithyanI commented Jan 8, 2024 •

edited

Loading

remiconnesson commented Mar 17, 2024 •

edited

Loading