Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Experiment Progress shows 100% but status Running with an ETA > 0 #854

Closed
bagelbig opened this issue Sep 13, 2024 · 2 comments · Fixed by #861
Closed

[BUG] Experiment Progress shows 100% but status Running with an ETA > 0 #854

bagelbig opened this issue Sep 13, 2024 · 2 comments · Fixed by #861
Assignees
Labels
type/bug Bug in code type/good first issue Good for newcomers

Comments

@bagelbig
Copy link

🐛 Bug

The View Experiment screen shows 100% progress, but the workflow is not done. Looking at the output in the CLI from where the backend was launched, it shows it is still progressing through validation.

Snippet:
INFO: validation progress: 86%|########6 | 57/66 [3:02:13<30:28, 203.12s/it]

Ideally it would not show 100%, but the actual progress taking into consideration Validation OR it should at least say its still in the Validation stage somewhere. However, right now, one has to look at a log file to see what is still going on.

To Reproduce

I created an experiment to do a training exercise, default values, except:
LLM Backbone = meta-llama/Meta-Llama-3.1-8B-Instruct
Train Data: oasst

Start it up.

LLM Studio version

v1.13.0-dev

@bagelbig bagelbig added the type/bug Bug in code label Sep 13, 2024
@pascal-pfeiffer
Copy link
Collaborator

Thank you for reporting. This is likely a rounding issue only, as we consider the total steps for training and validation to calculate the progress that was made.

curr_total_step = curr_step + curr_val_step
total_steps = max(total_training_steps + total_validation_steps, 1)

info["progress"].append(f"{np.round(curr_total_step / total_steps, 2)}")

Depending on the metric, the validation steps may take a while to run through, which reflects in a rounded 100% and still significant time left for the training.

@pascal-pfeiffer
Copy link
Collaborator

pascal-pfeiffer commented Sep 13, 2024

For better user experience, we can consider to cap the progress percent at 99% as long as the validation didn't fully finish. Even though the number is "wrong" it might better reflect what the user is seeing and expecting.

Or use flooring instead of rounding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Bug in code type/good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants