You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The View Experiment screen shows 100% progress, but the workflow is not done. Looking at the output in the CLI from where the backend was launched, it shows it is still progressing through validation.
Ideally it would not show 100%, but the actual progress taking into consideration Validation OR it should at least say its still in the Validation stage somewhere. However, right now, one has to look at a log file to see what is still going on.
To Reproduce
I created an experiment to do a training exercise, default values, except:
LLM Backbone = meta-llama/Meta-Llama-3.1-8B-Instruct
Train Data: oasst
Start it up.
LLM Studio version
v1.13.0-dev
The text was updated successfully, but these errors were encountered:
Thank you for reporting. This is likely a rounding issue only, as we consider the total steps for training and validation to calculate the progress that was made.
Depending on the metric, the validation steps may take a while to run through, which reflects in a rounded 100% and still significant time left for the training.
For better user experience, we can consider to cap the progress percent at 99% as long as the validation didn't fully finish. Even though the number is "wrong" it might better reflect what the user is seeing and expecting.
🐛 Bug
The View Experiment screen shows 100% progress, but the workflow is not done. Looking at the output in the CLI from where the backend was launched, it shows it is still progressing through validation.
Snippet:
INFO: validation progress: 86%|########6 | 57/66 [3:02:13<30:28, 203.12s/it]
Ideally it would not show 100%, but the actual progress taking into consideration Validation OR it should at least say its still in the Validation stage somewhere. However, right now, one has to look at a log file to see what is still going on.
To Reproduce
I created an experiment to do a training exercise, default values, except:
LLM Backbone = meta-llama/Meta-Llama-3.1-8B-Instruct
Train Data: oasst
Start it up.
LLM Studio version
v1.13.0-dev
The text was updated successfully, but these errors were encountered: