Skip to content

Commit

Permalink
Nightly (#676)
Browse files Browse the repository at this point in the history
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <[email protected]>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <[email protected]>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <[email protected]>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

* Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651)

* Nightly (#649)

* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (#630)

* Support revision parameter in FastLanguageModel.from_pretrained (#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (#609)

* Update __init__.py (#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix #2 for saving lora

* Test fix #3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (#619)

* llama.cpp failing (#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <[email protected]>

* fix libcuda_dirs import for triton 3.0 (#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <[email protected]>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <[email protected]>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Update mistral.py

* Auto check rope scaling

* Update llama.py

* Update llama.py

* Update llama.py

* GPU support

* Typo

* Update gemma.py

* gpu

* Multiple GGUF saving

* Update save.py

* Update save.py

* check PEFT and base

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update chat_templates.py

---------

Co-authored-by: Michael Han <[email protected]>
Co-authored-by: Eliot Hall <[email protected]>
Co-authored-by: Rickard Edén <[email protected]>
Co-authored-by: XiaoYang <[email protected]>
Co-authored-by: Oseltamivir <[email protected]>
Co-authored-by: mahiatlinux <[email protected]>
Co-authored-by: Sébastien De Greef <[email protected]>
Co-authored-by: Alberto Ferrer <[email protected]>
Co-authored-by: Thomas Viehmann <[email protected]>
Co-authored-by: Walter Korman <[email protected]>

* Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving

* Implemented better list management and then forgot to actually call the new list variable, fixed

* Check type of given quantization method and return type error if not list or string

* Update save.py

---------

Co-authored-by: Daniel Han <[email protected]>
Co-authored-by: Michael Han <[email protected]>
Co-authored-by: Eliot Hall <[email protected]>
Co-authored-by: Rickard Edén <[email protected]>
Co-authored-by: XiaoYang <[email protected]>
Co-authored-by: Oseltamivir <[email protected]>
Co-authored-by: mahiatlinux <[email protected]>
Co-authored-by: Sébastien De Greef <[email protected]>
Co-authored-by: Alberto Ferrer <[email protected]>
Co-authored-by: Thomas Viehmann <[email protected]>
Co-authored-by: Walter Korman <[email protected]>

* Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652)

This reverts commit 30605de.

* Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653)

This reverts commit e2b2083.

* Update llama.py

* peft

* patch

* Update loader.py

* retrain

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* offload

* Update llama.py

* Create a starter script for command-line training to integrate in ML ops pipelines. (#623)

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Ollama

* Update chat_templates.py

* ollama

* Update mapper.py

* Update chat_templates.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Fixes

* clearer messages

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* log

* Update __init__.py

* Update llama.py

* Update __init__.py

---------

Co-authored-by: Michael Han <[email protected]>
Co-authored-by: Eliot Hall <[email protected]>
Co-authored-by: Rickard Edén <[email protected]>
Co-authored-by: XiaoYang <[email protected]>
Co-authored-by: Oseltamivir <[email protected]>
Co-authored-by: mahiatlinux <[email protected]>
Co-authored-by: Sébastien De Greef <[email protected]>
Co-authored-by: Alberto Ferrer <[email protected]>
Co-authored-by: Thomas Viehmann <[email protected]>
Co-authored-by: Walter Korman <[email protected]>
Co-authored-by: ArcadaLabs-Jason <[email protected]>
  • Loading branch information
12 people committed Jun 21, 2024
1 parent 4af390e commit 933d9fe
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 34 deletions.
29 changes: 18 additions & 11 deletions unsloth/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,20 @@
import sys
from packaging.version import Version

# Define a list of modules to check
MODULES_TO_CHECK = ["bitsandbytes"]

# Check if any of the modules in the list have been imported
for module in MODULES_TO_CHECK:
if module in sys.modules:
raise ImportError(f"Unsloth: Please import Unsloth before {module}.")
pass
pass

# Currently only supports 1 GPU, or else seg faults will occur.
# # Define a list of modules to check
# MODULES_TO_CHECK = ["bitsandbytes"]

# # Check if any of the modules in the list have been imported
# for module in MODULES_TO_CHECK:
# if module in sys.modules:
# raise ImportError(f"Unsloth: Please import Unsloth before {module}.")
# pass
# pass

# Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so
# enabling it will require much more work, so we have to prioritize. Please understand!
# We do have a beta version, which you can contact us about!
# Thank you for your understanding and we appreciate it immensely!
if "CUDA_VISIBLE_DEVICES" in os.environ:
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
devices = os.environ["CUDA_VISIBLE_DEVICES"]
Expand All @@ -36,6 +39,10 @@
first_id = devices.split(",")[0]
warnings.warn(
f"Unsloth: 'CUDA_VISIBLE_DEVICES' is currently {devices} \n"\
"Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so "\
"enabling it will require much more work, so we have to prioritize. Please understand!"\
"We do have a beta version, which you can contact us about!\n"\
"Thank you for your understanding and we appreciate it immensely!\n\n"\
"Multiple CUDA devices detected but we require a single device.\n"\
f"We will override CUDA_VISIBLE_DEVICES to first device: {first_id}."
)
Expand Down
45 changes: 24 additions & 21 deletions unsloth/models/llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -1165,10 +1165,10 @@ def from_pretrained(
inner_training_loop = Trainer._original_training_loop
except:
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\
'enabling it will require much more work, so we have to prioritize. Please understand!\n'\
'We do have a separate beta version, which you can contact us about!\n'\
'Thank you for your understanding and we appreciate it immensely!'
)
pass

Expand Down Expand Up @@ -1201,7 +1201,10 @@ def from_pretrained(
output = re.findall(rb'([\\d]{1,})[\\s]{1,}M', output)
output = sum(int(x.decode('utf-8'))/1024 > 4 for x in output)
if output > 1: raise RuntimeError(
'Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.')
'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\\
'enabling it will require much more work, so we have to prioritize. Please understand!\\n'\\
'We do have a separate beta version, which you can contact us about!\\n'\\
'Thank you for your understanding and we appreciate it immensely!')
for _ in range(3):
gc.collect()
torch.cuda.empty_cache()"""
Expand All @@ -1214,10 +1217,10 @@ def from_pretrained(
args.gradient_accumulation_steps // self._train_batch_size
if n_total_devices > 1:
logger.warning_once(
"* Our OSS was designed for people with few GPU resources to level the playing field.\\n"
"* The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\\n"
"* We're a 2 person team, so we still have to fund our development costs - thanks!\\n"
"* If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
'* Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so ' \\
'* enabling it will require much more work, so we have to prioritize. Please understand!\\n' \\
'* We do have a separate beta version, which you can contact us about!\\n'\\
'* Thank you for your understanding and we appreciate it immensely!'
)
debug_info ="""
debug_info = debug_info.split('\n')
Expand All @@ -1244,10 +1247,10 @@ def from_pretrained(
n_total_devices = total_batches // ga // bsz
if n_total_devices > 1:
logger.warning_once(
"* Our OSS was designed for people with few GPU resources to level the playing field.\\n"
"* The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\\n"
"* We're a 2 person team, so we still have to fund our development costs - thanks!\\n"
"* If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
'* Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so ' \\
'* enabling it will require much more work, so we have to prioritize. Please understand!\\n' \\
'* We do have a separate beta version, which you can contact us about!\\n'\\
'* Thank you for your understanding and we appreciate it immensely!'
)
divisor = n_total_devices / 1
bsz = self._train_batch_size = max(int(bsz / divisor), 1)
Expand All @@ -1273,10 +1276,10 @@ def from_pretrained(
)
if "n_total_devices >" not in inner_training_loop:
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\
'enabling it will require much more work, so we have to prioritize. Please understand!\n'\
'We do have a separate beta version, which you can contact us about!\n'\
'Thank you for your understanding and we appreciate it immensely!'
)
pass
inner_training_loop = inner_training_loop.replace(
Expand Down Expand Up @@ -1783,10 +1786,10 @@ def patch_peft_model(
from transformers.trainer import Trainer
if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\
'enabling it will require much more work, so we have to prioritize. Please understand!\n'\
'We do have a separate beta version, which you can contact us about!\n'\
'Thank you for your understanding and we appreciate it immensely!'
)
pass

Expand Down
7 changes: 5 additions & 2 deletions unsloth/tokenizer_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -954,7 +954,7 @@ def patch_sft_trainer_tokenizer():
"\n"\
"if self._inner_training_loop.__name__ != '_fast_inner_training_loop':\n"\
" raise RuntimeError(\n"\
" 'Do not edit specific areas of the Unsloth codebase or you will get CUDA segfaults.'\n"\
" 'Please do not edit specific areas of the Unsloth codebase or you will get CUDA segfaults.'\n"\
" )\n"\
"pass\n"\
"n_devices = torch.cuda.device_count()\n"\
Expand All @@ -964,7 +964,10 @@ def patch_sft_trainer_tokenizer():
"output = re.findall(rb'([\\d]{1,})[\\s]{1,}M', output)\n"\
"output = sum(int(x.decode('utf-8'))/1024 > 4 for x in output)\n"\
"if output > 1: raise RuntimeError(\n"\
" 'Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.')\n"\
" 'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\\\n"\
" 'enabling it will require much more work, so we have to prioritize. Please understand!\\n'\\\n"\
" 'We do have a separate beta version, which you can contact us about!\\n'\\\n"\
" 'Thank you for your understanding and we appreciate it immensely!')\n"\
"for _ in range(3):\n"\
" gc.collect()\n"\
" torch.cuda.empty_cache()\n"\
Expand Down

0 comments on commit 933d9fe

Please sign in to comment.