Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add migration steps for TemplatedConfigLoader to OmegaConfigLoader #2904

Merged
merged 26 commits into from
Aug 30, 2023
Merged
Changes from 23 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
28fa916
Add migration steps for CL to OCL
merelcht Aug 2, 2023
25f89f1
Merge branch 'main' into migration-docs-config-loaders
stichbury Aug 2, 2023
798f608
Add new migration guide to index
stichbury Aug 2, 2023
04bdf71
Merge branch 'main' into migration-docs-config-loaders
merelcht Aug 2, 2023
c2d3c8e
Address review comments
merelcht Aug 3, 2023
ffe59d7
Merge branch 'main' into migration-docs-config-loaders
merelcht Aug 3, 2023
bd624f9
Try diff highlighting
merelcht Aug 4, 2023
1e8b76c
Add diff highlight to all examples
merelcht Aug 4, 2023
5a6fa50
Add migration steps for TCL to OCL
merelcht Aug 7, 2023
47f22f5
Add Jinja2 migration example and improve diff highlighting
merelcht Aug 7, 2023
44ec087
Merge branch 'main' into migration-docs-tcl
merelcht Aug 8, 2023
0925537
Fix merge conflicts
merelcht Aug 8, 2023
9b25d47
Merge branch 'main' into migration-docs-tcl
merelcht Aug 18, 2023
c359752
Add section on globals for OCL
merelcht Aug 18, 2023
97827ff
Merge branch 'main' into migration-docs-tcl
merelcht Aug 18, 2023
da4f432
Apply suggestions from code review
merelcht Aug 18, 2023
d04e444
update how to do default globals for ocl
merelcht Aug 18, 2023
1f15fef
Apply suggestions from code review
merelcht Aug 18, 2023
9272efd
Merge branch 'main' into migration-docs-tcl
noklam Aug 25, 2023
0b49d4c
Apply suggestions from code review
merelcht Aug 29, 2023
6baa550
Add link to globals docs for ocl
merelcht Aug 29, 2023
62510e6
Merge branch 'main' into migration-docs-tcl
merelcht Aug 29, 2023
40df8ff
Fix link + address review comments
merelcht Aug 29, 2023
17b594e
Set correct 0.18.x version
merelcht Aug 29, 2023
ee7785d
Merge branch 'main' into migration-docs-tcl
merelcht Aug 30, 2023
367539e
Apply suggestions from code review
merelcht Aug 30, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions docs/source/configuration/config_loader_migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,3 +60,178 @@ In this example, `"catalog"` is the key to the default catalog patterns specifie
For error and exception handling, most errors are the same. Those you need to be aware of that are different between the original `ConfigLoader` and `OmegaConfigLoader` are as follows:
* `OmegaConfigLoader` throws a `MissingConfigException` when configuration paths don't exist, rather than the `ValueError` used in `ConfigLoader`.
* In `OmegaConfigLoader`, if there is bad syntax in your configuration files, it will trigger a `ParserError` instead of a `BadConfigException` used in `ConfigLoader`.

## [`TemplatedConfigLoader`](/kedro.config.TemplatedConfigLoader) to [`OmegaConfigLoader`](/kedro.config.OmegaConfigLoader)

### 1. Install the required library
The [`OmegaConfigLoader`](advanced_configuration.md#omegaconfigloader) was introduced in Kedro `0.18.5` and is based on [OmegaConf](https://omegaconf.readthedocs.io/). Features that replace `TemplatedConfigLoader` functionality have been released in later versions, so we recommend users
to install at least Kedro version `0.18.X` to properly replace the `TemplatedConfigLoader` with `OmegaConfigLoader`.
You can install both this Kedro version and `omegaconf` using `pip`:

```bash
pip install kedro==0.18.X
merelcht marked this conversation as resolved.
Show resolved Hide resolved
```
This would be the minimum required Kedro version which includes `omegaconf` as a dependency and the necessary functionality to replace `TemplatedConfigLoader`.
Or you can run:
```bash
pip install -U kedro
```

This command installs the most recent version of Kedro which also includes `omegaconf` as a dependency.

### 2. Use the `OmegaConfigLoader`
To use `OmegaConfigLoader` in your project, set the `CONFIG_LOADER_CLASS` constant in your [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md):

```diff
+ from kedro.config import OmegaConfigLoader # new import

+ CONFIG_LOADER_CLASS = OmegaConfigLoader
```

### 3. Import statements
merelcht marked this conversation as resolved.
Show resolved Hide resolved
Replace the import statement for `TemplatedConfigLoader` with the one for `OmegaConfigLoader`:

```diff
- from kedro.config import TemplatedConfigLoader

+ from kedro.config import OmegaConfigLoader
merelcht marked this conversation as resolved.
Show resolved Hide resolved
```

### 4. File format support
merelcht marked this conversation as resolved.
Show resolved Hide resolved
`OmegaConfigLoader` supports only `yaml` and `json` file formats. Make sure that all your configuration files are in one of these formats. If you were using other formats with `TemplatedConfigLoader`, convert them to `yaml` or `json`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Kedro.weaselwords] 'only' is a weasel word!


### 5. Load configuration
merelcht marked this conversation as resolved.
Show resolved Hide resolved
The method to load the configuration using `OmegaConfigLoader` differs slightly from that used by `TemplatedConfigLoader`, which allowed users to access configuration through the `.get()` method and required patterns as argument.
When you migrate to use `OmegaConfigLoader` it requires you to fetch configuration through a configuration key that points to [configuration patterns specified in the loader class](configuration_basics.md#configuration-patterns) or [provided in the `CONFIG_LOADER_ARGS`](advanced_configuration.md#how-to-change-which-configuration-files-are-loaded) in `settings.py`.

```diff
- conf_path = str(project_path / settings.CONF_SOURCE)
- conf_loader = TemplatedConfigLoader(conf_source=conf_path, env="local")
- catalog = conf_loader.get("catalog*")

+ conf_path = str(project_path / settings.CONF_SOURCE)
+ config_loader = OmegaConfigLoader(conf_source=conf_path, env="local")
+ catalog = config_loader["catalog"] # note the key accessor syntax
```

In this example, the `"catalog"` key points to the default catalog patterns specified in the `OmegaConfigLoader` class.

### 6. Templating of values
merelcht marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, under this section, it would also be helpful to flag that the variable interpolation is only scoped to a particular configuration and the same environment. For eg., templated keys from local will not overwrite base and the resolution of the keys happens within the environment which is different from how TemplatedConfigLoader uses the globals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second on this. Do we already have the relevant docs in OmegaConfigLoader's page? Obviously just move everything to global would work(this is how template config loader works), but we may want to advise to use the catalog_globals.yml etc to replace the use of globals.py for certain things.

Templating of values is done through native [variable interpolation in `OmegaConfigLoader`](advanced_configuration.md#how-to-do-templating-with-the-omegaconfigloader). Where in `TemplatedConfigLoader` it was necessary to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 [vale] reported by reviewdog 🐶
[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Kedro.toowordy] 'it was' is too wordy

provide the template values in a `globals` file or dictionary, in `OmegaConfigLoader` you can provide these values within the same file that has the placeholders or a file that has a name that follows [the same config pattern specified](configuration_basics.md#configuration-patterns).
merelcht marked this conversation as resolved.
Show resolved Hide resolved
The variable interpolation is scoped to a specific configuration type and environment. If you want to share templated values across configuration types and environments, [you will need to use globals](#7-globals).
merelcht marked this conversation as resolved.
Show resolved Hide resolved

Suppose you are migrating a templated **catalog** file from using `TemplatedConfigLoader` to `OmegaConfigLoader` you would do the following:
merelcht marked this conversation as resolved.
Show resolved Hide resolved
1. Rename `conf/base/globals.yml` to match the patterns specified for catalog (`["catalog*", "catalog*/**", "**/catalog*"]`), for example `conf/base/catalog_globals.yml`
2. Add an underscore `_` to any catalog template values. This is needed because of how catalog entries are validated.

```diff
- bucket_name: "my_s3_bucket"
+ _bucket_name: "my_s3_bucket" # kedro requires `_` to mark templatable keys
- key_prefix: "my/key/prefix/"
+ _key_prefix: "my/key/prefix/"

- datasets:
+ _datasets:
csv: "pandas.CSVDataSet"
spark: "spark.SparkDataSet"

```

3. Update `catalog.yml` with the underscores `_` at the beginning of the templated value names.
merelcht marked this conversation as resolved.
Show resolved Hide resolved
```diff
raw_boat_data:
- type: "${datasets.spark}"
+ type: "${_datasets.spark}"
- filepath: "s3a://${bucket_name}/${key_prefix}/raw/boats.csv"
+ filepath: "s3a://${_bucket_name}/${_key_prefix}/raw/boats.csv"
file_format: parquet

raw_car_data:
- type: "${datasets.csv}"
+ type: "${_datasets.csv}"
- filepath: "s3://${bucket_name}/data/${key_prefix}/raw/cars.csv"
+ filepath: "s3://${_bucket_name}/data/${_key_prefix}/raw/cars.csv"
```

#### Providing default values for templates via `oc.select`
merelcht marked this conversation as resolved.
Show resolved Hide resolved
To provide a default for any template values you have to use [the omegaconf `oc.select` resolver](https://omegaconf.readthedocs.io/en/latest/custom_resolvers.html#oc-select).
merelcht marked this conversation as resolved.
Show resolved Hide resolved

```diff
boats:
users:
- fred
- - "${write_only_user|ron}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to define this as it currently doesn't have context

+ - "${oc.select:write_only_user,ron}"
```

### 7. Globals
merelcht marked this conversation as resolved.
Show resolved Hide resolved
merelcht marked this conversation as resolved.
Show resolved Hide resolved
If you want to share variables across configuration types, for example parameters and catalog, and environments you need to use [the custom globals resolver with the `OmegaConfigLoader`](advanced_configuration.md#how-to-use-global-variables-with-the-omegaconfigloader).
merelcht marked this conversation as resolved.
Show resolved Hide resolved
The `OmegaConfigLoader` requires global values to be provided in a `globals.yml` file. Note that using a `globals_dict` to provide globals is not supported with `OmegaConfigLoader`. The following section explains the differences between using globals with `TemplatedConfigLoader` and the `OmegaConfigLoader`.
merelcht marked this conversation as resolved.
Show resolved Hide resolved
merelcht marked this conversation as resolved.
Show resolved Hide resolved
merelcht marked this conversation as resolved.
Show resolved Hide resolved

Let's assume your project contains a `conf/base/globals.yml` file with the following contents:

```yaml
bucket_name: "my_s3_bucket"
key_prefix: "my/key/prefix/"

datasets:
csv: "pandas.CSVDataSet"
spark: "spark.SparkDataSet"

folders:
raw: "01_raw"
int: "02_intermediate"
pri: "03_primary"
fea: "04_feature"
```

You no longer need to set `CONFIG_LOADER_ARGS` variable in [`src/<package_name>/settings.py`](../kedro_project_setup/settings.md) to find this `globals.yml` file, because the
`OmegaConfigLoader` is configured to pick up files named `globals.yml` by default.
Comment on lines +188 to +189
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a link in case they name it something other than globals.yml?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the replacement for globals_dict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no replacement. That's why I wrote "The OmegaConfigLoader requires global values to be provided in a globals.yml file.", but I can explicitly mention that the globals_dict is not supported with OCL for clarity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Should we add a link in case they name it something other than globals.yml?" @noklam what should we link to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@merelcht How about https://docs.kedro.org/en/latest/configuration/advanced_configuration.html#how-to-use-global-variables-with-the-omegaconfigloader?

Btw I think we should bubble up OmegaConfigLoader docs since we are trying to make it default. The hierarchy is also broken there is nothing inside OmegaConfigLoader. Cc @stichbury @ankatiyar
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hierarchy is also broken there is nothing inside OmegaConfigLoader.

I don't think it's broken, we don't have any sub-sections under OmegaConfigLoader because those are all under "Advanced Kedro configuration". We should probably clean this up when all docs on ConfigLoader and TemplatedConfigLoader are removed as part of: #2692

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noklam We will be doing some of the moving around sections in the docs for #2900 after the "make starters use OmegaConfigLoader by default" PRs are merged. (@lrcouto)


```diff
- CONFIG_LOADER_ARGS = {"globals_pattern": "*globals.yml"}
```

The globals templating in your catalog configuration will need to be updated to use the globals resolver as follows:
merelcht marked this conversation as resolved.
Show resolved Hide resolved
merelcht marked this conversation as resolved.
Show resolved Hide resolved

```diff
raw_boat_data:
- type: "${datasets.spark}"
+ type: "${globals:datasets.spark}" # nested paths into global dict are allowed
- filepath: "s3a://${bucket_name}/${key_prefix}/${folders.raw}/boats.csv"
+ filepath: "s3a://${globals:bucket_name}/${globals:key_prefix}/${globals:folders.raw}/boats.csv"
file_format: parquet

raw_car_data:
- type: "${datasets.csv}"
+ type: "${globals:datasets.csv}"
- filepath: "s3://${bucket_name}/data/${key_prefix}/${folders.raw}/${filename|cars.csv}" # default to 'cars.csv' if the 'filename' key is not found in the global dict
+ filepath: "s3://${globals:bucket_name}/data/${globals:key_prefix}/${globals:folders.raw}/${globals:filename,'cars.csv'}" # default to 'cars.csv' if the 'filename' key is not found in the global dict
```

### 8. Deprecation of Jinja2
merelcht marked this conversation as resolved.
Show resolved Hide resolved
`OmegaConfigLoader` does not support Jinja2 syntax in configuration. However, users can achieve similar functionality with the `OmegaConfigLoader` in combination with [dataset factories](../data/kedro_dataset_factories.md).
merelcht marked this conversation as resolved.
Show resolved Hide resolved
If you take the example from [the `TemplatedConfigLoader` with Jinja2 documentation](advanced_configuration.md#how-to-use-jinja2-syntax-in-configuration) you can rewrite your configuration as follows to work with `OmegaConfigLoader`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want to provide some opinion here on why Jinja is a suboptimal solution here, not built for a whitespaced language etc.

Or point to articles like this one (last paragraph of "Pitfall 1")


```diff
# catalog.yml
- {% for speed in ['fast', 'slow'] %}
- {{ speed }}-trains:
+ "{speed}-trains":
type: MemoryDataSet

- {{ speed }}-cars:
+ "{speed}-cars":
type: pandas.CSVDataSet
- filepath: s3://${bucket_name}/{{ speed }}-cars.csv
+ filepath: s3://${bucket_name}/{speed}-cars.csv
save_args:
index: true

- {% endfor %}
```

### 9. Exception handling
merelcht marked this conversation as resolved.
Show resolved Hide resolved
For error and exception handling, most errors are the same. Those you need to be aware of that are different between the original `TemplatedConfigLoader` and `OmegaConfigLoader` are as follows:
* For missing template values `OmegaConfigLoader` throws `omegaconf.errors.InterpolationKeyError`.