[BUG] Experiment Progress shows 100% but status Running with an ETA > 0 #854

bagelbig · 2024-09-13T09:04:32Z

🐛 Bug

The View Experiment screen shows 100% progress, but the workflow is not done. Looking at the output in the CLI from where the backend was launched, it shows it is still progressing through validation.

Snippet:
INFO: validation progress: 86%|########6 | 57/66 [3:02:13<30:28, 203.12s/it]

Ideally it would not show 100%, but the actual progress taking into consideration Validation OR it should at least say its still in the Validation stage somewhere. However, right now, one has to look at a log file to see what is still going on.

To Reproduce

I created an experiment to do a training exercise, default values, except:
LLM Backbone = meta-llama/Meta-Llama-3.1-8B-Instruct
Train Data: oasst

Start it up.

LLM Studio version

v1.13.0-dev

pascal-pfeiffer · 2024-09-13T09:20:37Z

Thank you for reporting. This is likely a rounding issue only, as we consider the total steps for training and validation to calculate the progress that was made.

curr_total_step = curr_step + curr_val_step
total_steps = max(total_training_steps + total_validation_steps, 1)

info["progress"].append(f"{np.round(curr_total_step / total_steps, 2)}")

Depending on the metric, the validation steps may take a while to run through, which reflects in a rounded 100% and still significant time left for the training.

pascal-pfeiffer · 2024-09-13T09:21:58Z

For better user experience, we can consider to cap the progress percent at 99% as long as the validation didn't fully finish. Even though the number is "wrong" it might better reflect what the user is seeing and expecting.

Or use flooring instead of rounding.

bagelbig added the type/bug Bug in code label Sep 13, 2024

pascal-pfeiffer added the type/good first issue Good for newcomers label Sep 17, 2024

pascal-pfeiffer assigned us8945 Sep 19, 2024

us8945 mentioned this issue Sep 19, 2024

Cap progress to 0.99 if experiment is still running #861

Merged

pascal-pfeiffer linked a pull request Sep 19, 2024 that will close this issue

Cap progress to 0.99 if experiment is still running #861

Merged

pascal-pfeiffer closed this as completed in #861 Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Experiment Progress shows 100% but status Running with an ETA > 0 #854

[BUG] Experiment Progress shows 100% but status Running with an ETA > 0 #854

bagelbig commented Sep 13, 2024

pascal-pfeiffer commented Sep 13, 2024

pascal-pfeiffer commented Sep 13, 2024 •

edited

Loading

[BUG] Experiment Progress shows 100% but status Running with an ETA > 0 #854

[BUG] Experiment Progress shows 100% but status Running with an ETA > 0 #854

Comments

bagelbig commented Sep 13, 2024

🐛 Bug

To Reproduce

LLM Studio version

pascal-pfeiffer commented Sep 13, 2024

pascal-pfeiffer commented Sep 13, 2024 • edited Loading

pascal-pfeiffer commented Sep 13, 2024 •

edited

Loading