Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tune/Train] Make MLflowLoggerUtil copyable #23333

Merged
merged 3 commits into from
Mar 25, 2022

Conversation

Yard1
Copy link
Member

@Yard1 Yard1 commented Mar 18, 2022

Why are these changes needed?

Makes sure mlflow module is not saved as an attribute in MLflowLoggerUtil, which was causing an exception when running deepcopy on the callback.

Related issue number

Closes #23330

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@amogkam
Copy link
Contributor

amogkam commented Mar 18, 2022

Hey @Yard1 does this change work for multi-node? I believe the reason we had it stored as an attribute is because on the driver side, we modify mlflow configurations which are saved as module attributes. If on worker nodes, we don't reuse the same mlflow module as on the driver, then would we lose those configurations?

Btw, it seems like this might just be a problem with dictionary deep copying, not with the callback itself. We have a test here for running MLflowLoggerCallback with ray Client, which means it must be serializable with cloudpickle right? https://github.com/ray-project/ray/blob/master/python/ray/tune/tests/test_client.py#L95.

@amogkam
Copy link
Contributor

amogkam commented Mar 18, 2022

I think the fix we want to make is just not to include run_config in self._params in the Trainer.

@Yard1
Copy link
Member Author

Yard1 commented Mar 18, 2022

@amogkam ok I changed the approach, PTAL

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this?

@Yard1
Copy link
Member Author

Yard1 commented Mar 23, 2022

@krfricke for every callback or just this?

@amogkam amogkam merged commit ebb592b into ray-project:master Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] MLfLowLoggerCallback is not possible to deepcopy
4 participants