-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Lora Dynamic switching for inference #71
Comments
@Jeevi10 hey can you link some resources on Dynamic LoRA specifically for whisper , mainly how this type of inference works and how to use LoRA to finetune whisper |
@StephennFernandes Thank you for your reply. Resources for dynamic lora:https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/enc_dec#run-bart-with-lora I have provided some example repos where I got the idea from. Unfortunately I don't see any specific implementations for whisper directly.Just to provide you an idea I created running example using huggingface transformers and peft,import torch device = "cuda:0" if torch.cuda.is_available() else "cpu" model_id = "distil-whisper/distil-large-v3" base_model = AutoModelForSpeechSeq2Seq.from_pretrained( base_model.to(device) processor = AutoProcessor.from_pretrained(model_id) peft_model_id = "path to checkpoint adapter 1" Enable static cache and compile the forward passmodel.generation_config.cache_implementation = "static" pipe = pipeline( dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") def iterate_data(dataset): set the batch size in accordance to your deviceBATCH_SIZE = 16 predictions = [] run streamed inference adapter 1for out in pipe(iterate_data(dataset), batch_size=BATCH_SIZE): print(predictions) pipe.model.set_adapter('adapter 2') run streamed inference adapter 2for out in pipe(iterate_data(dataset), batch_size=BATCH_SIZE): print(predictions) Whisper Finetuning with lorahttps://github.com/Vaibhavs10/fast-whisper-finetuning |
@Jeevi10 thanks for the heads up. I'll try to write an update for WhisperS2T for being able to use dynamic adapters |
@StephennFernandes I am looking forward to it. |
Dynamic LoRA (Low-Rank Adaptation) switching functionality, allowing users to change LoRA models on-the-fly during inference without reloading the entire model.
The text was updated successfully, but these errors were encountered: