Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[air] Deprecate MlflowTrainableMixin, move to setup_mlflow() function #31295

Merged
merged 24 commits into from
Jan 5, 2023

Conversation

krfricke
Copy link
Contributor

@krfricke krfricke commented Dec 22, 2022

Why are these changes needed?

Following #29828, this PR introduces a setup_mlflow() method to replace the previous wandb_mixin decorator.

The current mlflow mixin can't be used with Ray AIR trainers. By adding a setup_mlflow() function that can just be called in (data parallel) function trainers, we follow the recent changes to wandb and provide a similar interface.

setup_mlflow also follows the wandb integration by acting on the configuration dict.

Lastly, this PR moves AIR integration tests to the correct subdirectory and combines the old test_mlflow.py into the new test_integration_mlflow.py.

Related issue number

Closes #27966

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Kai Fricke added 9 commits December 1, 2022 22:09
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Kai Fricke added 5 commits December 22, 2022 11:12
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
@Yard1
Copy link
Member

Yard1 commented Jan 3, 2023

Should we raise a warning if both the callback and setup_mlflow are used?

Signed-off-by: Kai Fricke <[email protected]>
@krfricke
Copy link
Contributor Author

krfricke commented Jan 3, 2023

It's hard to detect (as we are in different processes). Maybe it works even? :-D

@Yard1
Copy link
Member

Yard1 commented Jan 3, 2023

It probably will, but I imagine it could be an easy mistake to make (I doubt you'd want to log the same metrics twice). Still, saying that in the docs should be enough.

doc/source/tune/examples/tune-mlflow.ipynb Outdated Show resolved Hide resolved
doc/source/tune/examples/tune-mlflow.ipynb Outdated Show resolved Hide resolved
python/ray/air/integrations/mlflow.py Outdated Show resolved Hide resolved
python/ray/air/integrations/mlflow.py Outdated Show resolved Hide resolved
python/ray/air/integrations/mlflow.py Outdated Show resolved Hide resolved

@PublicAPI(stability="alpha")
def setup_mlflow(
config: Optional[Dict] = None, rank_zero_only: bool = True, **kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does config need to be passed in here if all the configuration is passed in through kwargs? Is this primarily for backwards compatibility?

Copy link
Contributor

@amogkam amogkam Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to actually have all the configuration args listed explicitly in the API instead of kwargs, and don't require that any configurations need to be passed in via the config dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this in a previous iteration and the reason I went to config was mainly to keep consistency with wandb_setup.

I'm happy to move back to pure kwargs and not parsing a config, but I think we should then also do this for weights and biases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the main reason we do this for W&B is because we pass the other config parameters as run metadata (wand.init takes a config parameter). We could do the same in mlflow to justify the config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to also log parameters. Let's have a chat offline which API we want to go with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context:

  • We’ve got rid of the @wandb_mixin decorator in favor of a setup_wandb(config) function for W&B
  • We’re doing the same for MlFlow
  • The question is, should we use a config-API setup_mlflow(config) or and explicit API setup_mlflow(experiment_name, experiment_id, tracking_uri, …)

Pro config:

  • Same way we used to configure the mixins
  • Required for W&B so it logs the config to the run (otherwise users have to do this manually. They are used to passing it to wandb.init() )
  • We should be consistent between wandb and mlflow mixins

Pro args:

  • Explicit about which args are supported
  • Passing the mixin configs via config was a workaround because this was the easiest way to pass these settings and still retain some flexibility. But with the explicit setup_xyz() APIs this workaround isn’t needed anymore

Proposal:

  • How about we go with setup_mlflow(config, experiment_name, experiment_id) (similar for wandb). If a mlflow or - wandb config key is set, we print a deprecation warnings and tell people to pass the configs as proper arguments instead.
  • This will get rid of the “special” config keys but we can still log the rest of the config automatically.

python/ray/air/integrations/mlflow.py Show resolved Hide resolved
@krfricke krfricke requested a review from amogkam January 4, 2023 15:32
Copy link
Contributor

@amogkam amogkam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @krfricke, lgtm! Would be good to get another review from @xwjiang2010, @Yard1, or @justinvyu before merging.

python/ray/tune/integration/mlflow.py Outdated Show resolved Hide resolved
python/ray/tune/integration/wandb.py Outdated Show resolved Hide resolved
Copy link
Member

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! just some nits

python/ray/air/integrations/wandb.py Outdated Show resolved Hide resolved
python/ray/air/integrations/mlflow.py Outdated Show resolved Hide resolved
python/ray/air/integrations/mlflow.py Outdated Show resolved Hide resolved
Co-authored-by: Amog Kamsetty <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
@krfricke krfricke merged commit 5817391 into ray-project:master Jan 5, 2023
@krfricke krfricke deleted the air/mlflow-setup branch January 5, 2023 14:47
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
…#31295)

Following #29828, this PR introduces a `setup_mlflow()` method to replace the previous `wandb_mixin` decorator.

The current mlflow mixin can't be used with Ray AIR trainers. By adding a `setup_mlflow()` function that can just be called in (data parallel) function trainers, we follow the recent changes to wandb and provide a similar interface.

`setup_mlflow` also follows the wandb integration by acting on the configuration dict. 

Lastly, this PR moves AIR integration tests to the correct subdirectory and combines the old `test_mlflow.py` into the new `test_integration_mlflow.py`.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Co-authored-by: Amog Kamsetty <[email protected]>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
…ray-project#31295)

Following ray-project#29828, this PR introduces a `setup_mlflow()` method to replace the previous `wandb_mixin` decorator.

The current mlflow mixin can't be used with Ray AIR trainers. By adding a `setup_mlflow()` function that can just be called in (data parallel) function trainers, we follow the recent changes to wandb and provide a similar interface.

`setup_mlflow` also follows the wandb integration by acting on the configuration dict. 

Lastly, this PR moves AIR integration tests to the correct subdirectory and combines the old `test_mlflow.py` into the new `test_integration_mlflow.py`.

Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Co-authored-by: Amog Kamsetty <[email protected]>
Signed-off-by: tmynn <[email protected]>
@SebastianBodza
Copy link
Contributor

Working with MLFow Projects in combination with ray is quite buggy.

When running a Mlflow project, a mlflow session will be started. Then ray will put all mlflow logs into the same run. So the losses, parameters, ... are just overwritten. Same as in #19909

Also there are imho some errors in the setup. When trying to put everything in seperate runs when calling setup_mlflow(config, rank_zero_only =False)
An error is raised that the experiment with the tune name! does not exist. This is a bug in my opinion and the experiment_name and run_name are mixed up.

My suggestion is:

  • If no experiment name is provided, use the default experiment (id=0)
  • If the experiment name is provided actually use the name and do not overwrite that with the default_trial_name
  • Rework the get_world_rank, because this led to an RuntimeError even tough we were in a train session.

@SebastianBodza
Copy link
Contributor

SebastianBodza commented Mar 2, 2023

Ok, this seems to be an issue of the following:

The question is, should we use a config-API setup_mlflow(config) or and explicit API setup_mlflow(experiment_name, experiment_id, tracking_uri, …)

Currently the config-dict has the least priority. This means, when any defaults are generated in the code itself, the config dict will be ignored. An example is the following:

    experiment_id = experiment_id or default_trial_id
    experiment_name = experiment_name or default_trial_name

    # Setup mlflow
    mlflow_util = _MLflowLoggerUtil()
    mlflow_util.setup_mlflow(
        tracking_uri=tracking_uri or mlflow_config.get("tracking_uri", None),
        registry_uri=registry_uri or mlflow_config.get("registry_uri", None),
        experiment_id=experiment_id or mlflow_config.get("experiment_id", None),
        experiment_name=experiment_name or mlflow_config.get("experiment_name", None),
        tracking_token=tracking_token or mlflow_config.get("tracking_token", None),
        create_experiment_if_not_exists=create_experiment_if_not_exists,
    )
    mlflow_util.start_run(
        run_name=experiment_name,
        tags=tags or mlflow_config.get("tags", None),
        set_active=True,
    )
    mlflow_util.log_params(_config)

There the experiment_id and experiment_name are set with the trial_id and trial_name. Imho there is a mixup of the experiment_name and run_name. Defaulting the run_name to the trial_name makes sense for me, however for the experiment_name this makes no sense. We should stick there to experiment_id=0 (default Experiment for mlflow).

Setting the experiment_id to the default_trial_id makes no sense, because the convention of ray is not the same for mlflow.

Also we should switch the order of the or statement, so the config-dict raises in priority.

Lastly, since tune gets configured with dicts, I would stick to the call with a dictionary. Additionally same with wandb the config is logged as a dict.

@SebastianBodza SebastianBodza mentioned this pull request Mar 2, 2023
7 tasks
amogkam pushed a commit that referenced this pull request Mar 28, 2023
…ks (#33782)

These were soft deprecated in favor as their AIR counterparts in the following PRs:

(Ray 2.2)  #29828
(Ray 2.3)  #31295
For Ray 2.4, these will be hard deprecated.

---------

Signed-off-by: Matthew Deng <[email protected]>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…ks (ray-project#33782)

These were soft deprecated in favor as their AIR counterparts in the following PRs:

(Ray 2.2)  ray-project#29828
(Ray 2.3)  ray-project#31295
For Ray 2.4, these will be hard deprecated.

---------

Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: elliottower <[email protected]>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…ks (ray-project#33782)

These were soft deprecated in favor as their AIR counterparts in the following PRs:

(Ray 2.2)  ray-project#29828
(Ray 2.3)  ray-project#31295
For Ray 2.4, these will be hard deprecated.

---------

Signed-off-by: Matthew Deng <[email protected]>
Signed-off-by: Jack He <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AIR/Tune] Deprecate mixins and replace with setup functions
4 participants