[air] Deprecate MlflowTrainableMixin, move to setup_mlflow() function #31295

krfricke · 2022-12-22T11:09:28Z

Why are these changes needed?

Following #29828, this PR introduces a setup_mlflow() method to replace the previous wandb_mixin decorator.

The current mlflow mixin can't be used with Ray AIR trainers. By adding a setup_mlflow() function that can just be called in (data parallel) function trainers, we follow the recent changes to wandb and provide a similar interface.

setup_mlflow also follows the wandb integration by acting on the configuration dict.

Lastly, this PR moves AIR integration tests to the correct subdirectory and combines the old test_mlflow.py into the new test_integration_mlflow.py.

Related issue number

Closes #27966

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Kai Fricke <[email protected]>

python/ray/air/tests/test_integration_mlflow.py

Yard1 · 2023-01-03T10:19:48Z

Should we raise a warning if both the callback and setup_mlflow are used?

Signed-off-by: Kai Fricke <[email protected]>

krfricke · 2023-01-03T10:32:42Z

It's hard to detect (as we are in different processes). Maybe it works even? :-D

Yard1 · 2023-01-03T10:44:44Z

It probably will, but I imagine it could be an easy mistake to make (I doubt you'd want to log the same metrics twice). Still, saying that in the docs should be enough.

doc/source/tune/examples/tune-mlflow.ipynb

python/ray/air/integrations/mlflow.py

amogkam · 2023-01-03T21:42:17Z

python/ray/air/integrations/mlflow.py

+
+@PublicAPI(stability="alpha")
+def setup_mlflow(
+    config: Optional[Dict] = None, rank_zero_only: bool = True, **kwargs


why does config need to be passed in here if all the configuration is passed in through kwargs? Is this primarily for backwards compatibility?

I would prefer to actually have all the configuration args listed explicitly in the API instead of kwargs, and don't require that any configurations need to be passed in via the config dict.

I had this in a previous iteration and the reason I went to config was mainly to keep consistency with wandb_setup.

I'm happy to move back to pure kwargs and not parsing a config, but I think we should then also do this for weights and biases

I guess the main reason we do this for W&B is because we pass the other config parameters as run metadata (wand.init takes a config parameter). We could do the same in mlflow to justify the config

I've updated the PR to also log parameters. Let's have a chat offline which API we want to go with

Context:

We’ve got rid of the @wandb_mixin decorator in favor of a setup_wandb(config) function for W&B

We’re doing the same for MlFlow

The question is, should we use a config-API setup_mlflow(config) or and explicit API setup_mlflow(experiment_name, experiment_id, tracking_uri, …)

Pro config:

Same way we used to configure the mixins

Required for W&B so it logs the config to the run (otherwise users have to do this manually. They are used to passing it to wandb.init() )

We should be consistent between wandb and mlflow mixins

Pro args:

Explicit about which args are supported

Passing the mixin configs via config was a workaround because this was the easiest way to pass these settings and still retain some flexibility. But with the explicit setup_xyz() APIs this workaround isn’t needed anymore

Proposal:

How about we go with setup_mlflow(config, experiment_name, experiment_id) (similar for wandb). If a mlflow or - wandb config key is set, we print a deprecation warnings and tell people to pass the configs as proper arguments instead.

This will get rid of the “special” config keys but we can still log the rest of the config automatically.

python/ray/air/integrations/mlflow.py

python/ray/tune/integration/mlflow.py

python/ray/air/integrations/mlflow.py

Signed-off-by: Kai Fricke <[email protected]>

amogkam

Thanks @krfricke, lgtm! Would be good to get another review from @xwjiang2010, @Yard1, or @justinvyu before merging.

python/ray/tune/integration/mlflow.py

python/ray/tune/integration/wandb.py

Yard1

LGTM! just some nits

python/ray/air/integrations/wandb.py

python/ray/air/integrations/mlflow.py

Co-authored-by: Amog Kamsetty <[email protected]> Signed-off-by: Kai Fricke <[email protected]>

Signed-off-by: Kai Fricke <[email protected]>

…#31295) Following #29828, this PR introduces a `setup_mlflow()` method to replace the previous `wandb_mixin` decorator. The current mlflow mixin can't be used with Ray AIR trainers. By adding a `setup_mlflow()` function that can just be called in (data parallel) function trainers, we follow the recent changes to wandb and provide a similar interface. `setup_mlflow` also follows the wandb integration by acting on the configuration dict. Lastly, this PR moves AIR integration tests to the correct subdirectory and combines the old `test_mlflow.py` into the new `test_integration_mlflow.py`. Signed-off-by: Kai Fricke <[email protected]> Signed-off-by: Kai Fricke <[email protected]> Co-authored-by: Amog Kamsetty <[email protected]>

…ray-project#31295) Following ray-project#29828, this PR introduces a `setup_mlflow()` method to replace the previous `wandb_mixin` decorator. The current mlflow mixin can't be used with Ray AIR trainers. By adding a `setup_mlflow()` function that can just be called in (data parallel) function trainers, we follow the recent changes to wandb and provide a similar interface. `setup_mlflow` also follows the wandb integration by acting on the configuration dict. Lastly, this PR moves AIR integration tests to the correct subdirectory and combines the old `test_mlflow.py` into the new `test_integration_mlflow.py`. Signed-off-by: Kai Fricke <[email protected]> Signed-off-by: Kai Fricke <[email protected]> Co-authored-by: Amog Kamsetty <[email protected]> Signed-off-by: tmynn <[email protected]>

SebastianBodza · 2023-03-02T11:02:55Z

Working with MLFow Projects in combination with ray is quite buggy.

When running a Mlflow project, a mlflow session will be started. Then ray will put all mlflow logs into the same run. So the losses, parameters, ... are just overwritten. Same as in #19909

Also there are imho some errors in the setup. When trying to put everything in seperate runs when calling setup_mlflow(config, rank_zero_only =False)
An error is raised that the experiment with the tune name! does not exist. This is a bug in my opinion and the experiment_name and run_name are mixed up.

My suggestion is:

If no experiment name is provided, use the default experiment (id=0)
If the experiment name is provided actually use the name and do not overwrite that with the default_trial_name
Rework the get_world_rank, because this led to an RuntimeError even tough we were in a train session.

SebastianBodza · 2023-03-02T11:40:37Z

Ok, this seems to be an issue of the following:

The question is, should we use a config-API setup_mlflow(config) or and explicit API setup_mlflow(experiment_name, experiment_id, tracking_uri, …)

Currently the config-dict has the least priority. This means, when any defaults are generated in the code itself, the config dict will be ignored. An example is the following:

    experiment_id = experiment_id or default_trial_id
    experiment_name = experiment_name or default_trial_name

    # Setup mlflow
    mlflow_util = _MLflowLoggerUtil()
    mlflow_util.setup_mlflow(
        tracking_uri=tracking_uri or mlflow_config.get("tracking_uri", None),
        registry_uri=registry_uri or mlflow_config.get("registry_uri", None),
        experiment_id=experiment_id or mlflow_config.get("experiment_id", None),
        experiment_name=experiment_name or mlflow_config.get("experiment_name", None),
        tracking_token=tracking_token or mlflow_config.get("tracking_token", None),
        create_experiment_if_not_exists=create_experiment_if_not_exists,
    )
    mlflow_util.start_run(
        run_name=experiment_name,
        tags=tags or mlflow_config.get("tags", None),
        set_active=True,
    )
    mlflow_util.log_params(_config)

There the experiment_id and experiment_name are set with the trial_id and trial_name. Imho there is a mixup of the experiment_name and run_name. Defaulting the run_name to the trial_name makes sense for me, however for the experiment_name this makes no sense. We should stick there to experiment_id=0 (default Experiment for mlflow).

Setting the experiment_id to the default_trial_id makes no sense, because the convention of ray is not the same for mlflow.

Also we should switch the order of the or statement, so the config-dict raises in priority.

Lastly, since tune gets configured with dicts, I would stick to the call with a dictionary. Additionally same with wandb the config is logged as a dict.

…ks (#33782) These were soft deprecated in favor as their AIR counterparts in the following PRs: (Ray 2.2) #29828 (Ray 2.3) #31295 For Ray 2.4, these will be hard deprecated. --------- Signed-off-by: Matthew Deng <[email protected]>

…ks (ray-project#33782) These were soft deprecated in favor as their AIR counterparts in the following PRs: (Ray 2.2) ray-project#29828 (Ray 2.3) ray-project#31295 For Ray 2.4, these will be hard deprecated. --------- Signed-off-by: Matthew Deng <[email protected]> Signed-off-by: elliottower <[email protected]>

…ks (ray-project#33782) These were soft deprecated in favor as their AIR counterparts in the following PRs: (Ray 2.2) ray-project#29828 (Ray 2.3) ray-project#31295 For Ray 2.4, these will be hard deprecated. --------- Signed-off-by: Matthew Deng <[email protected]> Signed-off-by: Jack He <[email protected]>

Kai Fricke added 9 commits December 1, 2022 22:09

start

ea9860f

Signed-off-by: Kai Fricke <[email protected]>

Cont. setup_mlflow

24a2760

Signed-off-by: Kai Fricke <[email protected]>

Merge branch 'master' into air/mlflow-setup

6625f19

Use setup_mlflow in example

16bf840

Signed-off-by: Kai Fricke <[email protected]>

Update examples

c644034

Signed-off-by: Kai Fricke <[email protected]>

Do not create experiment if not exists

14c0196

Signed-off-by: Kai Fricke <[email protected]>

Improve noop module

f95c5b8

Signed-off-by: Kai Fricke <[email protected]>

Move and combine tests

d9bf4b9

Signed-off-by: Kai Fricke <[email protected]>

Improve docstring

5eb4ff3

Signed-off-by: Kai Fricke <[email protected]>

krfricke requested review from richardliaw, xwjiang2010, amogkam, matthewdeng, Yard1, maxpumperla and a team as code owners December 22, 2022 11:09

krfricke assigned amogkam Dec 22, 2022

Kai Fricke added 5 commits December 22, 2022 11:12

Spelling

59dc593

Signed-off-by: Kai Fricke <[email protected]>

Merge branch 'master' into air/mlflow-setup

f6cf823

Fix BUILD references

5ed2c6e

Signed-off-by: Kai Fricke <[email protected]>

Fix test cleanup

c6b33ef

Signed-off-by: Kai Fricke <[email protected]>

More test cleanup

2b9e2ba

Signed-off-by: Kai Fricke <[email protected]>

Yard1 reviewed Jan 3, 2023

View reviewed changes

python/ray/air/tests/test_integration_mlflow.py Outdated Show resolved Hide resolved

Tune session shutdown

fefc628

Signed-off-by: Kai Fricke <[email protected]>

amogkam reviewed Jan 3, 2023

View reviewed changes

python/ray/tune/integration/mlflow.py Show resolved Hide resolved

krfricke commented Jan 4, 2023

View reviewed changes

python/ray/air/integrations/mlflow.py Outdated Show resolved Hide resolved

Kai Fricke added 6 commits January 4, 2023 09:14

Merge remote-tracking branch 'upstream/master' into air/mlflow-setup

ff03230

Add warning

285ce20

Signed-off-by: Kai Fricke <[email protected]>

Log parameters

587fd23

Signed-off-by: Kai Fricke <[email protected]>

Update docs

affc40a

Signed-off-by: Kai Fricke <[email protected]>

Use explicit API for mlflow

133d979

Signed-off-by: Kai Fricke <[email protected]>

Use explicit API for wandb

dea8d2d

Signed-off-by: Kai Fricke <[email protected]>

krfricke requested a review from amogkam January 4, 2023 15:32

amogkam approved these changes Jan 4, 2023

View reviewed changes

python/ray/tune/integration/mlflow.py Outdated Show resolved Hide resolved

python/ray/tune/integration/wandb.py Outdated Show resolved Hide resolved

krfricke requested a review from Yard1 January 4, 2023 21:40

krfricke assigned Yard1 Jan 4, 2023

Yard1 approved these changes Jan 4, 2023

View reviewed changes

python/ray/air/integrations/wandb.py Outdated Show resolved Hide resolved

python/ray/air/integrations/mlflow.py Outdated Show resolved Hide resolved

python/ray/air/integrations/mlflow.py Outdated Show resolved Hide resolved

Apply suggestions from code review

9c78d4c

Co-authored-by: Amog Kamsetty <[email protected]> Signed-off-by: Kai Fricke <[email protected]>

krfricke requested a review from gjoliver as a code owner January 5, 2023 12:20

Suggestions

5b6f000

Signed-off-by: Kai Fricke <[email protected]>

krfricke merged commit 5817391 into ray-project:master Jan 5, 2023

krfricke deleted the air/mlflow-setup branch January 5, 2023 14:47

SebastianBodza mentioned this pull request Mar 2, 2023

Update mlflow.py #32972

Closed

7 tasks

matthewdeng mentioned this pull request Mar 28, 2023

[no_early_kickoff][tune] Hard deprecate Tune MLflow/W&B mixin/callbacks #33782

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[air] Deprecate MlflowTrainableMixin, move to setup_mlflow() function #31295

[air] Deprecate MlflowTrainableMixin, move to setup_mlflow() function #31295

krfricke commented Dec 22, 2022 •

edited

Loading

Yard1 commented Jan 3, 2023

krfricke commented Jan 3, 2023

Yard1 commented Jan 3, 2023

amogkam Jan 3, 2023

amogkam Jan 3, 2023 •

edited

Loading

krfricke Jan 4, 2023

krfricke Jan 4, 2023

krfricke Jan 4, 2023

krfricke Jan 4, 2023

amogkam left a comment

Yard1 left a comment

SebastianBodza commented Mar 2, 2023

SebastianBodza commented Mar 2, 2023 •

edited

Loading

[air] Deprecate MlflowTrainableMixin, move to setup_mlflow() function #31295

[air] Deprecate MlflowTrainableMixin, move to setup_mlflow() function #31295

Conversation

krfricke commented Dec 22, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

Yard1 commented Jan 3, 2023

krfricke commented Jan 3, 2023

Yard1 commented Jan 3, 2023

amogkam Jan 3, 2023

Choose a reason for hiding this comment

amogkam Jan 3, 2023 • edited Loading

Choose a reason for hiding this comment

krfricke Jan 4, 2023

Choose a reason for hiding this comment

krfricke Jan 4, 2023

Choose a reason for hiding this comment

krfricke Jan 4, 2023

Choose a reason for hiding this comment

krfricke Jan 4, 2023

Choose a reason for hiding this comment

amogkam left a comment

Choose a reason for hiding this comment

Yard1 left a comment

Choose a reason for hiding this comment

SebastianBodza commented Mar 2, 2023

SebastianBodza commented Mar 2, 2023 • edited Loading

krfricke commented Dec 22, 2022 •

edited

Loading

amogkam Jan 3, 2023 •

edited

Loading

SebastianBodza commented Mar 2, 2023 •

edited

Loading