Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hook doesn't allow configuration updates at runtime #395

Closed
kasperjanehag opened this issue Jan 19, 2023 · 5 comments · Fixed by #535
Closed

Hook doesn't allow configuration updates at runtime #395

kasperjanehag opened this issue Jan 19, 2023 · 5 comments · Fixed by #535
Assignees
Labels
enhancement New feature or request waiting-for-kedro The implementation of this feature is blocked by a ticket in kedro

Comments

@kasperjanehag
Copy link

Description

In large kedro projects where you have multiple different pipelines working in sequence / paralell you may want to override certain kedro-mlflow settings like tracking.experiment.name during runtime. For example you may want to do something like

kedro run --pipeline pipeline_1 --params mlflow.tracking.experiment.name=pipeline_1

to make sure that pipeline_1 results are tracked and stored to the right experiment name. The current implementation of MlflowHook, which looks something like this

... 
try:
    conf_mlflow_yml = context.config_loader.get("mlflow*", "mlflow*/**")
except MissingConfigException:
    ...
    conf_mlflow_yml = {}
mlflow_config = KedroMlflowConfig.parse_obj(conf_mlflow_yml)
  ...

doesn't provide any other way to set the experiment name except for keeping one individual kedro configuraiton per pipeline.

Possible Implementation

I suggest, and can support in implementing, that a method call is added right before

mlflow_config = KedroMlflowConfig.parse_obj(conf_mlflow_yml)

where the mlflow_config object is updated based on all parameters starting with "mlflow" from context.params.

Any ideas?

@kasperjanehag kasperjanehag changed the title feat: Mlflow hook doesn't allow settings to be update at runtime Hook doesn't allow configuration updates at runtime Jan 19, 2023
@Galileo-Galilei
Copy link
Owner

Hi, thank you very much for your suggestion!

The idea looks cool, but I have a bunch of questions / remarks :

  • it seems quite tedious to type a long entry with the CLI command
  • there is a high risk to "forget" it when typing the command, leading to inconsistency
  • when I look at the mlflow.yml, I think there are very little keys one wants to modify at runtime : likely the experiment, and the run properties (name, id, nested). All other entries should very likely not be modified at runtime but are part of the environment.
  • if your use case is specifically about making the experiment "dynamic", do you think it would be better to support to add a key like experiment: ${PIPELINE_NAME} in the mlflow.yml?

I think that the recommended way to change the experiment is to use a different environment with a dedicated mlflow.yml entry and to use kedro run --env=<my_env>. I understand that it may not suit your use case because you may already have use the environment for another configuration overriding and you want to change the experiment conditionnaly to the pipeline for the same environment.

I'd be glad to help you if you want to open a PR!

@kasperjanehag
Copy link
Author

Hi @Galileo-Galilei.

I agree writing it along every command line would be tedious, but in production scenarios this is carried out by some orchestrator anyway. Regarding your last comment with kedro run --env=<STAGING/PROD/>, that what I already use to handle config in my different environments. However, since my project has 20+ pipelines, having one mlflow.yml for every pipeline and envirment would force me to have 40-60 different mlflow.ymls for different pipelines in different environments.

I suggest we move along with a modified version of your suggestion where the user may have certain settings parsed at runtime by using template strings and environment variables. Example:

´experiment: "prefix-${MLFLOW_ENV_1}_${MLFLOW_ENV_2}-suffix"`

Benefits would be high flexability and high compatabilityt with orchestration tools (ENVs works everywhere). What do you think? Happy to write the PR if you guide me where you want the code to be located.

@Galileo-Galilei
Copy link
Owner

Hi @kasperjanehag, I understand your issue about multiple combination of pipeline x env and I suspected something like that. I have some good and bad news :)

Environment variables are already supported...

The good news is that what you suggest (combining fixed string and templated environment variables) is already possible: kedro-mlflow leverages the ConfigLoader of your kedro project, so you can even use jinja2 inside the configuration file and itr will be properly parsed. You just need to configure your kedro project to accept environment variables:

# settings.py

from kedro.config import TemplatedConfigLoader  # new import
import os # new import

CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
    "globals_dict": os.environ,
    }
}

and your above example will work automatically. Even better, I think this will become the default in kedro==0.18.5 which should be released soon

... but runtime CLI params are not

The bad news is that if you also want to use the pipeline name as in the original question (e.g. use something like ${PIPELINE_NAME} where pipeline name is the one given to the CLI, this is much harder. The TemplatedConfigLoader does not add to its globals_dict the runtime cli arguments. By far the easiest way to have this supported is to modify kedro itself to support such a use case. Open an issue in kedro's repo, and I'll support the feature request as much as I can.

That said, it is unlikely they release the feature soon so I may support this at the kedro-mlflow's level. The idea is to modify at runtime the config loader in above mentioned lines (

conf_mlflow_yml = context.config_loader.get("mlflow*", "mlflow*/**")
) to update the global_dict with the run_params arguments of before_pipeline run. However this is tricky because the 2 hooks need to interact and I'd like to avoid that if possible.

@Galileo-Galilei Galileo-Galilei self-assigned this Feb 6, 2023
@Galileo-Galilei Galileo-Galilei added the enhancement New feature or request label Feb 6, 2023
@Galileo-Galilei Galileo-Galilei added the waiting-for-kedro The implementation of this feature is blocked by a ticket in kedro label Oct 28, 2023
@Galileo-Galilei
Copy link
Owner

This feature will likely wait the implementation of this issue on kedro's side: kedro-org/kedro#2866

@Galileo-Galilei
Copy link
Owner

For the record, this is now possible with the runtime_params resolver with OmegaConfigLoader. I need to document it. Further reference can be found on slack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request waiting-for-kedro The implementation of this feature is blocked by a ticket in kedro
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants