Nightly (#676)

* Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <[email protected]> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <[email protected]> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <[email protected]> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py * Fix breaking bug in save.py with interpreting quantization_method as a string when saving to gguf (#651) * Nightly (#649) * Update llama.py * offload * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * continued pretraining trainer * Update trainer.py * Update trainer.py * Update trainer.py * Update trainer.py * is_bfloat16_supported * Update __init__.py * Update README.md * Update llama.py * is_bfloat16_supported * Update __init__.py * Mistral v3 * Phi 3 medium * Update chat_templates.py * Update chat_templates.py * Phi-3 * Update save.py * Update README.md Mistral v3 to Mistral v0.3 * Untrained tokens * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update save.py * Update save.py * Update save.py * checkpoint * Update _utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * accelerate * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * train_dataloader * Update llama.py * Update llama.py * Update llama.py * use_fast_convert * Update save.py * Update save.py * Update save.py * Update save.py * remove_special_tokens * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Update chat_templates.py * Support bfloat16 GGUF * Update save.py * Update llama.py * fast_forward_inference * Update mapper.py * Update loader.py * Update llama.py * Update tokenizer_utils.py * info * edits * Create chat template * Fix tokenizer * Update tokenizer_utils.py * fix case where gguf saving fails due to first_conversion dtype (#630) * Support revision parameter in FastLanguageModel.from_pretrained (#629) * support `revision` parameter * match unsloth formatting of named parameters * clears any selected_adapters before calling internal_model.save_pretrained (#609) * Update __init__.py (#602) Check for incompatible modules before importing unsloth * Fixed unsloth/tokenizer_utils.py for chat training (#604) * Add GGML saving option to Unsloth for easier Ollama model creation and testing. (#345) * Add save to llama.cpp GGML to save.py. * Fix conversion command and path of convert to GGML function. * Add autosaving lora to the GGML function * Create lora save function for conversion to GGML * Test fix #2 for saving lora * Test fix #3 to save the lora adapters to convert to GGML * Remove unwated tokenizer saving for conversion to ggml and added a few print statements. * Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages. * Positional arguments didn't work out, so reverted to older version of the code, and added a few comments. * Test fix 1 for arch * Test fix 2 new Mistral error. * Test fix 3 * Revert to old version for testing. * Upload issue test fix 1 * Fix 2 uploading ggml * Positional ags added. * Temporray remove positional args * Fix upload again!!! * Add print statements and fix link * Make the calling name better * Create local saving for GGML * Add choosing directory to save local GGML. * Fix lil variable error in the save_to_custom_dir func * docs: Add LoraConfig parameters documentation (#619) * llama.cpp failing (#371) llama.cpp is failing to generate quantize versions for the trained models. Error: ```bash You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j Once that's done, redo the quantization. ``` But when i do clone this with recursive it works. Co-authored-by: Daniel Han <[email protected]> * fix libcuda_dirs import for triton 3.0 (#227) * fix libcuda_dirs import for triton 3.0 * Update __init__.py * Update __init__.py --------- Co-authored-by: Daniel Han <[email protected]> * Update save.py * Update __init__.py * Update fast_lora.py * Update save.py * Update save.py * Update save.py * Update loader.py * Update save.py * Update save.py * quantize now llama-quantize * Update chat_templates.py * Update loader.py * Update mapper.py * Update __init__.py * embedding size * Update qwen2.py * docs * Update README.md * Update qwen2.py * README: Fix minor typo. (#559) * README: Fix minor typo. One-character typo fix while reading. * Update README.md --------- Co-authored-by: Daniel Han <[email protected]> * Update mistral.py * Update qwen2.py * Update qwen2.py * Update qwen2.py * Update llama.py * Update llama.py * Update llama.py * Update README.md * FastMistralModel * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Update mistral.py * Auto check rope scaling * Update llama.py * Update llama.py * Update llama.py * GPU support * Typo * Update gemma.py * gpu * Multiple GGUF saving * Update save.py * Update save.py * check PEFT and base * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update chat_templates.py --------- Co-authored-by: Michael Han <[email protected]> Co-authored-by: Eliot Hall <[email protected]> Co-authored-by: Rickard Edén <[email protected]> Co-authored-by: XiaoYang <[email protected]> Co-authored-by: Oseltamivir <[email protected]> Co-authored-by: mahiatlinux <[email protected]> Co-authored-by: Sébastien De Greef <[email protected]> Co-authored-by: Alberto Ferrer <[email protected]> Co-authored-by: Thomas Viehmann <[email protected]> Co-authored-by: Walter Korman <[email protected]> * Fix bug in save.py with interpreting quantization_method as a string that prevents GGUF from saving * Implemented better list management and then forgot to actually call the new list variable, fixed * Check type of given quantization method and return type error if not list or string * Update save.py --------- Co-authored-by: Daniel Han <[email protected]> Co-authored-by: Michael Han <[email protected]> Co-authored-by: Eliot Hall <[email protected]> Co-authored-by: Rickard Edén <[email protected]> Co-authored-by: XiaoYang <[email protected]> Co-authored-by: Oseltamivir <[email protected]> Co-authored-by: mahiatlinux <[email protected]> Co-authored-by: Sébastien De Greef <[email protected]> Co-authored-by: Alberto Ferrer <[email protected]> Co-authored-by: Thomas Viehmann <[email protected]> Co-authored-by: Walter Korman <[email protected]> * Revert "Fix breaking bug in save.py with interpreting quantization_method as …" (#652) This reverts commit 30605de. * Revert "Revert "Fix breaking bug in save.py with interpreting quantization_me…" (#653) This reverts commit e2b2083. * Update llama.py * peft * patch * Update loader.py * retrain * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * offload * Update llama.py * Create a starter script for command-line training to integrate in ML ops pipelines. (#623) * Update chat_templates.py * Ollama * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Ollama * Update chat_templates.py * ollama * Update mapper.py * Update chat_templates.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update save.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update chat_templates.py * Update llama.py * Fixes * clearer messages * Update tokenizer_utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * log * Update __init__.py * Update llama.py * Update __init__.py --------- Co-authored-by: Michael Han <[email protected]> Co-authored-by: Eliot Hall <[email protected]> Co-authored-by: Rickard Edén <[email protected]> Co-authored-by: XiaoYang <[email protected]> Co-authored-by: Oseltamivir <[email protected]> Co-authored-by: mahiatlinux <[email protected]> Co-authored-by: Sébastien De Greef <[email protected]> Co-authored-by: Alberto Ferrer <[email protected]> Co-authored-by: Thomas Viehmann <[email protected]> Co-authored-by: Walter Korman <[email protected]> Co-authored-by: ArcadaLabs-Jason <[email protected]>
unslothai · Jun 21, 2024 · 933d9fe · 933d9fe
1 parent 4af390e
commit 933d9fe
Show file tree

Hide file tree

Showing 3 changed files with 47 additions and 34 deletions.
diff --git a/unsloth/__init__.py b/unsloth/__init__.py
@@ -17,17 +17,20 @@
 import sys
 from packaging.version import Version
 
-# Define a list of modules to check
-MODULES_TO_CHECK = ["bitsandbytes"]
-
-# Check if any of the modules in the list have been imported
-for module in MODULES_TO_CHECK:
-    if module in sys.modules:
-        raise ImportError(f"Unsloth: Please import Unsloth before {module}.")
-    pass
-pass
-
-# Currently only supports 1 GPU, or else seg faults will occur.    
+# # Define a list of modules to check
+# MODULES_TO_CHECK = ["bitsandbytes"]
+
+# # Check if any of the modules in the list have been imported
+# for module in MODULES_TO_CHECK:
+#     if module in sys.modules:
+#         raise ImportError(f"Unsloth: Please import Unsloth before {module}.")
+#     pass
+# pass
+
+# Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so
+# enabling it will require much more work, so we have to prioritize. Please understand!
+# We do have a beta version, which you can contact us about!
+# Thank you for your understanding and we appreciate it immensely!
 if "CUDA_VISIBLE_DEVICES" in os.environ:
     os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
     devices = os.environ["CUDA_VISIBLE_DEVICES"]
@@ -36,6 +39,10 @@
         first_id = devices.split(",")[0]
         warnings.warn(
             f"Unsloth: 'CUDA_VISIBLE_DEVICES' is currently {devices} \n"\
+            "Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so "\
+            "enabling it will require much more work, so we have to prioritize. Please understand!"\
+            "We do have a beta version, which you can contact us about!\n"\
+            "Thank you for your understanding and we appreciate it immensely!\n\n"\
             "Multiple CUDA devices detected but we require a single device.\n"\
             f"We will override CUDA_VISIBLE_DEVICES to first device: {first_id}."
         )

diff --git a/unsloth/models/llama.py b/unsloth/models/llama.py
@@ -1165,10 +1165,10 @@ def from_pretrained(
                 inner_training_loop = Trainer._original_training_loop
         except:
             raise RuntimeError(
-                "Our OSS was designed for people with few GPU resources to level the playing field.\n"
-                "The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\n"
-                "We're a 2 person team, so we still have to fund our development costs - thanks!\n"
-                "If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
+                'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\
+                'enabling it will require much more work, so we have to prioritize. Please understand!\n'\
+                'We do have a separate beta version, which you can contact us about!\n'\
+                'Thank you for your understanding and we appreciate it immensely!'
             )
         pass
 
@@ -1201,7 +1201,10 @@ def from_pretrained(
         output = re.findall(rb'([\\d]{1,})[\\s]{1,}M', output)
         output = sum(int(x.decode('utf-8'))/1024 > 4 for x in output)
         if output > 1: raise RuntimeError(
-            'Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.')
+            'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\\
+            'enabling it will require much more work, so we have to prioritize. Please understand!\\n'\\
+            'We do have a separate beta version, which you can contact us about!\\n'\\
+            'Thank you for your understanding and we appreciate it immensely!')
         for _ in range(3):
             gc.collect()
             torch.cuda.empty_cache()"""
@@ -1214,10 +1217,10 @@ def from_pretrained(
             args.gradient_accumulation_steps // self._train_batch_size
         if n_total_devices > 1:
             logger.warning_once(
-                "* Our OSS was designed for people with few GPU resources to level the playing field.\\n"
-                "* The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\\n"
-                "* We're a 2 person team, so we still have to fund our development costs - thanks!\\n"
-                "* If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
+                '* Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so ' \\
+                '* enabling it will require much more work, so we have to prioritize. Please understand!\\n' \\
+                '* We do have a separate beta version, which you can contact us about!\\n'\\
+                '* Thank you for your understanding and we appreciate it immensely!'
             )
         debug_info ="""
         debug_info = debug_info.split('\n')
@@ -1244,10 +1247,10 @@ def from_pretrained(
         n_total_devices = total_batches // ga // bsz
         if n_total_devices > 1:
             logger.warning_once(
-                "* Our OSS was designed for people with few GPU resources to level the playing field.\\n"
-                "* The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\\n"
-                "* We're a 2 person team, so we still have to fund our development costs - thanks!\\n"
-                "* If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
+                '* Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so ' \\
+                '* enabling it will require much more work, so we have to prioritize. Please understand!\\n' \\
+                '* We do have a separate beta version, which you can contact us about!\\n'\\
+                '* Thank you for your understanding and we appreciate it immensely!'
             )
             divisor = n_total_devices / 1
             bsz = self._train_batch_size = max(int(bsz / divisor), 1)
@@ -1273,10 +1276,10 @@ def from_pretrained(
         )
         if "n_total_devices >" not in inner_training_loop:
             raise RuntimeError(
-                "Our OSS was designed for people with few GPU resources to level the playing field.\n"
-                "The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\n"
-                "We're a 2 person team, so we still have to fund our development costs - thanks!\n"
-                "If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
+                'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\
+                'enabling it will require much more work, so we have to prioritize. Please understand!\n'\
+                'We do have a separate beta version, which you can contact us about!\n'\
+                'Thank you for your understanding and we appreciate it immensely!'
             )
         pass
         inner_training_loop = inner_training_loop.replace(
@@ -1783,10 +1786,10 @@ def patch_peft_model(
         from transformers.trainer import Trainer 
         if Trainer._inner_training_loop.__name__ != "_fast_inner_training_loop":
             raise RuntimeError(
-                "Our OSS was designed for people with few GPU resources to level the playing field.\n"
-                "The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\n"
-                "We're a 2 person team, so we still have to fund our development costs - thanks!\n"
-                "If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
+                'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\
+                'enabling it will require much more work, so we have to prioritize. Please understand!\n'\
+                'We do have a separate beta version, which you can contact us about!\n'\
+                'Thank you for your understanding and we appreciate it immensely!'
             )
         pass
 

diff --git a/unsloth/tokenizer_utils.py b/unsloth/tokenizer_utils.py
@@ -954,7 +954,7 @@ def patch_sft_trainer_tokenizer():
     "\n"\
     "if self._inner_training_loop.__name__ != '_fast_inner_training_loop':\n"\
     "    raise RuntimeError(\n"\
-    "       'Do not edit specific areas of the Unsloth codebase or you will get CUDA segfaults.'\n"\
+    "       'Please do not edit specific areas of the Unsloth codebase or you will get CUDA segfaults.'\n"\
     "    )\n"\
     "pass\n"\
     "n_devices = torch.cuda.device_count()\n"\
@@ -964,7 +964,10 @@ def patch_sft_trainer_tokenizer():
     "output = re.findall(rb'([\\d]{1,})[\\s]{1,}M', output)\n"\
     "output = sum(int(x.decode('utf-8'))/1024 > 4 for x in output)\n"\
     "if output > 1: raise RuntimeError(\n"\
-    "    'Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.')\n"\
+    "    'Unsloth currently does not work on multi GPU setups - sadly we are a 2 brother team so '\\\n"\
+    "    'enabling it will require much more work, so we have to prioritize. Please understand!\\n'\\\n"\
+    "    'We do have a separate beta version, which you can contact us about!\\n'\\\n"\
+    "    'Thank you for your understanding and we appreciate it immensely!')\n"\
     "for _ in range(3):\n"\
     "    gc.collect()\n"\
     "    torch.cuda.empty_cache()\n"\