Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce prepare for eval, fix evaluation bug #789

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

Niccolo-Ajroldi
Copy link

@Niccolo-Ajroldi Niccolo-Ajroldi commented Sep 15, 2024

Description

This pull request introduces a prepare_for_eval function and updates the code to support it.

The implementation follows the blueprint of @fsschneider in #719 (comment) and fixes the bug of giving a free evaluation to a submission that goes out of max_runtime (again #719 (comment)).

Function signature

The arguments of prepare_for_eval are the same as update_params, except for batch. I believe that prepare_for_eval should indeed be agnostic to the last batch used during training. The return type is the same as update_params.

def prepare_for_eval(workload: spec.Workload,
                     current_param_container: spec.ParameterContainer,
                     current_params_types: spec.ParameterTypeTree,
                     model_state: spec.ModelAuxiliaryState,
                     hyperparameters: spec.Hyperparameters,
                     loss_type: spec.LossType,
                     optimizer_state: spec.OptimizerState,
                     eval_results: List[Tuple[int, float]],
                     global_step: int,
                     rng: spec.RandomState) -> spec.UpdateReturn:
  return (optimizer_state, current_param_container, model_state)

List of changes

In submission_runner.py:

  • add timed call to prepare_for_eval
  • add profiler
  • move del batch before prepare_for_eval (instead than before evaluation)
  • update accumulated_submission_time after prepare_for_eval
  • compute is_time_remaining after prepare_for_eval
  • proceed to eval iff is_time_remaining
  • add prep_eval_rng

Minor changes:

  • add PrepareForEvalFn to spec
  • add prepare_for_eval to submission template
  • add prepare_for_eval to all pytorch and jax submissions
  • update the docs

Fixes #719 and #758 .

@Niccolo-Ajroldi Niccolo-Ajroldi requested a review from a team as a code owner September 15, 2024 11:32
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inform submission about evaluation step
1 participant