Hook doesn't allow configuration updates at runtime #395

kasperjanehag · 2023-01-19T13:11:25Z

Description

In large kedro projects where you have multiple different pipelines working in sequence / paralell you may want to override certain kedro-mlflow settings like tracking.experiment.name during runtime. For example you may want to do something like

kedro run --pipeline pipeline_1 --params mlflow.tracking.experiment.name=pipeline_1

to make sure that pipeline_1 results are tracked and stored to the right experiment name. The current implementation of MlflowHook, which looks something like this

... 
try:
    conf_mlflow_yml = context.config_loader.get("mlflow*", "mlflow*/**")
except MissingConfigException:
    ...
    conf_mlflow_yml = {}
mlflow_config = KedroMlflowConfig.parse_obj(conf_mlflow_yml)
  ...

doesn't provide any other way to set the experiment name except for keeping one individual kedro configuraiton per pipeline.

Possible Implementation

I suggest, and can support in implementing, that a method call is added right before

mlflow_config = KedroMlflowConfig.parse_obj(conf_mlflow_yml)

where the mlflow_config object is updated based on all parameters starting with "mlflow" from context.params.

Any ideas?

The text was updated successfully, but these errors were encountered:

Galileo-Galilei · 2023-01-20T21:13:14Z

Hi, thank you very much for your suggestion!

The idea looks cool, but I have a bunch of questions / remarks :

it seems quite tedious to type a long entry with the CLI command
there is a high risk to "forget" it when typing the command, leading to inconsistency
when I look at the mlflow.yml, I think there are very little keys one wants to modify at runtime : likely the experiment, and the run properties (name, id, nested). All other entries should very likely not be modified at runtime but are part of the environment.
if your use case is specifically about making the experiment "dynamic", do you think it would be better to support to add a key like experiment: ${PIPELINE_NAME} in the mlflow.yml?

I think that the recommended way to change the experiment is to use a different environment with a dedicated mlflow.yml entry and to use kedro run --env=<my_env>. I understand that it may not suit your use case because you may already have use the environment for another configuration overriding and you want to change the experiment conditionnaly to the pipeline for the same environment.

I'd be glad to help you if you want to open a PR!

kasperjanehag · 2023-01-22T08:36:19Z

Hi @Galileo-Galilei.

I agree writing it along every command line would be tedious, but in production scenarios this is carried out by some orchestrator anyway. Regarding your last comment with kedro run --env=<STAGING/PROD/>, that what I already use to handle config in my different environments. However, since my project has 20+ pipelines, having one mlflow.yml for every pipeline and envirment would force me to have 40-60 different mlflow.ymls for different pipelines in different environments.

I suggest we move along with a modified version of your suggestion where the user may have certain settings parsed at runtime by using template strings and environment variables. Example:

´experiment: "prefix-${MLFLOW_ENV_1}_${MLFLOW_ENV_2}-suffix"`

Benefits would be high flexability and high compatabilityt with orchestration tools (ENVs works everywhere). What do you think? Happy to write the PR if you guide me where you want the code to be located.

Galileo-Galilei · 2023-01-24T21:26:05Z

Hi @kasperjanehag, I understand your issue about multiple combination of pipeline x env and I suspected something like that. I have some good and bad news :)

Environment variables are already supported...

The good news is that what you suggest (combining fixed string and templated environment variables) is already possible: kedro-mlflow leverages the ConfigLoader of your kedro project, so you can even use jinja2 inside the configuration file and itr will be properly parsed. You just need to configure your kedro project to accept environment variables:

# settings.py

from kedro.config import TemplatedConfigLoader  # new import
import os # new import

CONFIG_LOADER_CLASS = TemplatedConfigLoader
CONFIG_LOADER_ARGS = {
    "globals_dict": os.environ,
    }
}

and your above example will work automatically. Even better, I think this will become the default in kedro==0.18.5 which should be released soon

... but runtime CLI params are not

The bad news is that if you also want to use the pipeline name as in the original question (e.g. use something like ${PIPELINE_NAME} where pipeline name is the one given to the CLI, this is much harder. The TemplatedConfigLoader does not add to its globals_dict the runtime cli arguments. By far the easiest way to have this supported is to modify kedro itself to support such a use case. Open an issue in kedro's repo, and I'll support the feature request as much as I can.

That said, it is unlikely they release the feature soon so I may support this at the kedro-mlflow's level. The idea is to modify at runtime the config loader in above mentioned lines (

kedro-mlflow/kedro_mlflow/framework/hooks/mlflow_hook.py

Line 63 in a4276b3

conf_mlflow_yml = context.config_loader.get("mlflow*", "mlflow*/**")

) to update the global_dict with the run_params arguments of before_pipeline run. However this is tricky because the 2 hooks need to interact and I'd like to avoid that if possible.

Galileo-Galilei · 2023-10-28T19:52:26Z

This feature will likely wait the implementation of this issue on kedro's side: kedro-org/kedro#2866

Galileo-Galilei · 2023-11-24T22:44:26Z

For the record, this is now possible with the runtime_params resolver with OmegaConfigLoader. I need to document it. Further reference can be found on slack

kasperjanehag changed the title ~~feat: Mlflow hook doesn't allow settings to be update at runtime~~ Hook doesn't allow configuration updates at runtime Jan 19, 2023

Galileo-Galilei self-assigned this Feb 6, 2023

Galileo-Galilei added the enhancement New feature or request label Feb 6, 2023

Galileo-Galilei mentioned this issue Jul 16, 2023

Kedro-MLflow on AWS batch causes every node to be logged as a separate run #432

Closed

Galileo-Galilei added the waiting-for-kedro The implementation of this feature is blocked by a ticket in kedro label Oct 28, 2023

Galileo-Galilei added a commit that referenced this issue Apr 7, 2024

📝 Document the ability to update configuration at runtime (#395)

11bdbfe

Galileo-Galilei mentioned this issue Apr 7, 2024

Document the ability to update configuration at runtime (#395) #535

Merged

6 tasks

Galileo-Galilei added a commit that referenced this issue Apr 7, 2024

📝 Document the ability to update configuration at runtime (#395)

3cc099e

Galileo-Galilei closed this as completed in #535 Apr 7, 2024

Galileo-Galilei added a commit that referenced this issue Apr 7, 2024

📝 Document the ability to update configuration at runtime (#395)

c67b38f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hook doesn't allow configuration updates at runtime #395

Hook doesn't allow configuration updates at runtime #395

kasperjanehag commented Jan 19, 2023

Galileo-Galilei commented Jan 20, 2023

kasperjanehag commented Jan 22, 2023

Galileo-Galilei commented Jan 24, 2023

Galileo-Galilei commented Oct 28, 2023

Galileo-Galilei commented Nov 24, 2023

Hook doesn't allow configuration updates at runtime #395

Hook doesn't allow configuration updates at runtime #395

Comments

kasperjanehag commented Jan 19, 2023

Description

Possible Implementation

Galileo-Galilei commented Jan 20, 2023

kasperjanehag commented Jan 22, 2023

Galileo-Galilei commented Jan 24, 2023

Environment variables are already supported...

... but runtime CLI params are not

Galileo-Galilei commented Oct 28, 2023

Galileo-Galilei commented Nov 24, 2023