-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training not working with default script #1698
Comments
Hi. I am facing the same issue. Any progress yet? |
I have an interesting observation here. I tried running this in the lightning.ai studio and it is working. The litgpt version is 0.3.0.dev0 while this is not available in https://pypi.org/project/litgpt/#history. Seems to be a dev version rolled out internally. I am facing the above issue while working on version 0.4.11. |
Can you please dump the source code and upload it somewhere? import os
import shutil
import sys
def dump_package_source(package_name, output_dir):
try:
package = __import__(package_name)
package_path = os.path.dirname(package.__file__)
destination_dir = os.path.join(output_dir, package_name)
shutil.copytree(package_path, destination_dir)
print(f"Source code of the package '{package_name}' has been dumped to '{destination_dir}'")
except ImportError:
print(f"Package '{package_name}' is not installed.")
except Exception as e:
print(f"An error occurred: {str(e)}")
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python dump_package.py <package_name> <output_dir>")
else:
package_name = sys.argv[1]
output_dir = sys.argv[2]
dump_package_source(package_name, output_dir) |
Your litdata version may be too high for litgpt. |
Thanks for the note. I was out in the last 2 weeks and haven't had a chance to look into it yet. |
I just tested it in a Studio on CPU and GPU on a Studio, and it seemed to work fine: Progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.05it/s]
Workers are finished.
Finished data processing!
Verifying settings ...
Measured TFLOPs: 7.93
Epoch 8 | iter 128 step 1 | loss train: 10.943, val: n/a | iter time: 445.72 ms (step) remaining time: 0:08:16
Epoch 16 | iter 256 step 2 | loss train: 9.708, val: n/a | iter time: 364.51 ms (step) remaining time: 0:05:51 This was with versions
installed from the latest main branch:
Could you let me know which LitData version you were using? You can use |
There might be a LitData bug. I was getting an error with the LitGPT code and a simpler self-contained example. Reported it here: Lightning-AI/litdata#367 |
litdata==0.2.24 |
Thanks! |
I'm getting the same error @rasbt : I'm a big fan of litgpt (as well as nanoGPT), because they are minimal / clean implementations. Should we consider having an option to run this small example (pythia-160m) without relying on litdata? |
Thanks for the kind comment. Ideally, it would be nice to keep LitData here because then we don't have to maintain two implementations, one for the small scale and one for the larger scale experiments. A contributor to LitData mentioned that they are looking into this issue, so hopefully it gets fixed soon. |
Hi, any resolution to this on linux systems? |
also getting this error. What's the current solution? which older version works. tried downgrading and still did not work |
Bug description
I tried to train using the example script in the README but I get an error. The script is the following:
Here's the entire output from terminal:
What operating system are you using?
Linux
LitGPT Version
The text was updated successfully, but these errors were encountered: