Releases: oobabooga/text-generation-webui
Releases Β· oobabooga/text-generation-webui
v1.16
Backend updates
- Transformers: bump to 4.46.
- Accelerate: bump to 1.0.
Changes
- Add whisper turbo (#6423). Thanks @SeanScripts.
- Add RWKV-World instruction template (#6456). Thanks @MollySophia.
- Minor Documentation update - query cuda compute for docker .env (#6469). Thanks @practical-dreamer.
- Remove lm_eval and optimum from requirements (they don't seem to be necessary anymore).
Bug fixes
- Fix llama.cpp loader not being random. Thanks @reydeljuego12345.
- Fix temperature_last when temperature not in sampler priority (#6439). Thanks @ThisIsPIRI.
- Make token bans work again on HF loaders (#6488). Thanks @ThisIsPIRI.
- Fix for systems that have bash in a non-standard directory (#6428). Thanks @LuNeder.
- Fix intel bug described in #6253 (#6433). Thanks @schorschie.
- Fix locally compiled llama-cpp-python failing to import.
v1.15
Backend updates
- Transformers: bump to 4.45.
- ExLlamaV2: bump to 0.2.3.
- ExllamaV2 tensor parallelism to increase multi gpu inference speeds (#6356). Thanks @RandomInternetPreson.
- flash-attention: bump to 2.6.3.
- llama-cpp-python: bump to 0.3.1.
- bitsandbytes: bump to 0.44.
- PyTorch: bump to 2.4.1.
- ROCm: bump wheels to 6.1.2.
- Remove AutoAWQ, AutoGPTQ, HQQ, and AQLM from
requirements.txt
:- AutoAWQ and AutoGPTQ were removed due to lack of support for PyTorch 2.4.1 and CUDA 12.1.
- HQQ and AQLM were removed to make the project leaner since they're experimental with limited use.
- You can still install those libraries manually if you are interested.
Changes
- Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichΓ©s, and inhibits non-verbatim repetition (#6335). Thanks @p-e-w.
- Make it possible to sort repetition penalties with "Sampler priority". The new keywords are:
repetition_penalty
presence_penalty
frequency_penalty
dry
encoder_repetition_penalty
no_repeat_ngram
xtc
(not a repetition penalty but also added in this update)
- Don't import PEFT unless necessary. This makes the web UI launch faster.
- Add beforeunload event to add confirmation dialog when leaving page (#6279). Thanks @leszekhanusz.
- update API documentation with examples to list/load models (#5902). Thanks @joachimchauvet.
- Training pro update script.py (#6359). Thanks @FartyPants.
Bug fixes
- Fix UnicodeDecodeError for BPE-based Models (especially GLM-4) (#6357). Thanks @GralchemOz.
- API: Relax multimodal format, fixes HuggingFace Chat UI (#6353). Thanks @Papierkorb.
- Force /bin/bash shell for conda (#6386). Thanks @Thireus.
- Do not set value for histories in chat when --multi-user is used (#6317). Thanks @mashb1t.
- typo in OpenAI response format (#6365). Thanks @jsboige.
v1.14
v1.13
Backend updates
- llama-cpp-python: bump to 0.2.85 (adds Llama 3.1 support).
UI updates
- Make
compress_pos_emb
float (#6276). Thanks @hocjordan. - Make
n_ctx
,max_seq_len
, andtruncation_length
numbers rather than sliders, to make it possible to type the context length manually. - Improve the style of headings in chat messages.
- LaTeX rendering:
- Add back single
$
for inline equations. - Fix rendering for equations enclosed between
\[
and\]
. - Fix rendering for multiline equations.
- Add back single
Bug fixes
- Fix saving characters through the UI.
- Fix instruct mode displaying "quotes" as ""double quotes"".
- Fix chat sometimes not scrolling down after sending a message.
- Fix the chat "stop" event.
- Make
--idle-timeout
work for API requests.
Other changes
- Model downloader: improve the progress bar by adding the filename, size, and download speed for each downloaded file.
- Better handle the Llama 3.1 Jinja2 template by not including its optional "tools" headers.
v1.12
Backend updates
- Transformers: bump to 4.43 (adds Llama 3.1 support).
- ExLlamaV2: bump to 0.1.8 (adds Llama 3.1 support).
- AutoAWQ: bump to 0.2.6 (adds Llama 3.1 support).
- Remove AutoAWQ as a standalone loader. I found that hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 works better when loaded directly through Transformers, and that's what the README recommends. AutoAWQ is still used in the background.
UI updates
- Make text between quote characters colored in chat and chat-instruct modes.
- Prevent LaTeX from being rendered for inline "$", as that caused problems for phrases like "apples cost $1, oranges cost $2".
- Make the markdown cache infinite and clear it when switching to another chat. This cache exists because the markdown conversion is CPU-intensive. By making it infinite, messages in a full 128k context will be cached, making the UI more responsive for long conversations.
Bug fixes
- Fix a race condition that caused the default character to not be loaded correctly on startup.
- Fix Linux shebangs (#6110). Thanks @LuNeder.
Other changes
- Make the Google Colab notebook use the one-click installer instead of its own Python environment for better stability.
- Disable flash-attention on Google Colab by default, as its GPU models do not support it.
v1.11
UI updates
- Optimize the UI: events triggered by clicking on buttons, selecting values from dropdown menus, etc have been refactored to minimize the number of connections made between the UI and the server. As a result, the UI is now significantly faster and more responsive.
- Use chat-instruct mode by default: most models nowadays are instruction-following models, and this mode automatically uses the model's Jinja2 template to generate the prompt, leading to higher-quality outputs.
- Improve the style of code blocks in light mode.
- Increase the font weight of chat messages (for chat and chat-instruct modes).
- Use gr.Number for RoPE scaling parameters (#6233). Thanks @Vhallo.
- Don't export the instruction template to settings.yaml on "Save UI defaults to settings.yaml" (it gets ignored and replaced with the model template).
Backend updates
- llama-cpp-python: bump to 0.2.83 (adds Mistral-Nemo support).
Other changes
- training: Added ChatML-format.json format example (#5899). Thanks @FartyPants.
- Customize the subpath for gradio, use with reverse proxy (#5106). Thanks @canoalberto.
Bug fixes
- Fix an issue where the chat contents sometimes disappear for a split second during streaming (#6247). Thanks @Patronics.
- Fix the chat UI losing its vertical scrolling position when the input area grows to more than 1 line.
v1.10.1
v1.10
Library updates
- llama-cpp-python: bump to 0.2.82.
- ExLlamaV2: bump to 0.1.7 (adds Gemma-2 support).
Changes
- Add new
--no_xformers
and--no_sdpa
flags for ExLlamaV2.- Note: to use Gemma-2 with ExLlamaV2, you currently must use the
--no_flash_attn --no_xformers --no_sdpa
flags, or check the corresponding checkboxes in the UI before loading the model, otherwise it will perform very badly.
- Note: to use Gemma-2 with ExLlamaV2, you currently must use the
- Minor UI updates.
v1.9.1
v1.9
Backend updates
- 4-bit and 8-bit kv cache options have been added to llama.cpp and llamacpp_HF. They reuse the existing
--cache_8bit
and--cache_4bit
flags. Thanks @GodEmperor785 for figuring out what values to pass to llama-cpp-python. - Transformers:
- Add eager attention option to make Gemma-2 work correctly (#6188). Thanks @GralchemOz.
- Automatically detect bfloat16/float16 precision when loading models in 16-bit precision.
- Automatically apply eager attention to models with
Gemma2ForCausalLM
architecture. - Gemma-2 support: Automatically detect and apply the optimal settings for this model with the two changes above. No need to set
--bf16 --use_eager_attention
manually.
- Automatically obtain the EOT token from Jinja2 templates and add it to the stopping strings, fixing Llama-3-Instruct not stopping. No need to add
<eot>
to the custom stopping strings anymore.
UI updates
- Whisper STT overhaul: this extension has been rewritten, replacing the Gradio microphone component with a custom microphone element that is much more reliable (#6194). Thanks @RandomInternetPreson, @TimStrauven, and @mamei16.
- Make the character dropdown menu coexist in the "Chat" tab and the "Parameters > Character" tab, after some people pointed out that moving it entirely to the Chat tab makes it harder to edit characters.
- Colors in the light theme have been improved, making it a bit more aesthetic.
- Increase the chat area on mobile devices.
Bug fixes
- Fix the API request to AUTOMATIC1111 in the sd-api-pictures extension.
- Fix a glitch when switching tabs with "Show controls" unchecked in the chat tab and extensions loaded.
Library updates
- llama-cpp-python: bump to 0.2.81 (adds Gemma-2 support).
- Transformers: bump to 4.42 (adds Gemma-2 support).
Support
- GitHub Sponsors: https://github.com/sponsors/oobabooga
- ko-fi: https://ko-fi.com/oobabooga