Replies: 6 comments
-
How much of this is still relevant in the face of keeping only |
Beta Was this translation helpful? Give feedback.
-
It's less relevant if there is only ever one class. I do still think separating the two is beneficial, even if it's just on the function level. User can still extend on the abstract class, we cannot assume everyone is settled with OmegaConfigLoader? Maybe config can also partially read from cloud key vault and still resolved with OmegaConf. I am aware there is a lot of maybe here. |
Beta Was this translation helpful? Give feedback.
-
Related: #2973 |
Beta Was this translation helpful? Give feedback.
-
Question: What new use cases would this enable? (So that we can justify the refactoring) |
Beta Was this translation helpful? Give feedback.
-
Introspection of the config files is one possibility #2821 (comment) |
Beta Was this translation helpful? Give feedback.
-
I feel pydantic would be very helpful here too |
Beta Was this translation helpful? Give feedback.
-
Introduction
The ConfigLoader class in Kedro is currently responsible for loading, resolving, validating, and merging configurations for various components. However, it has become too big, and its tests have become more complicated. This issue proposes to make ConfigLoader more modular and remove its strong coupling with the Kedro project structure.
Background
Background
Kedro is a framework that consists of several major components, including DataCatalog, ConfigLoader, Pipeline, and Runner. There are also other classes such as KedroSession and KedroContext, which are mainly used to orchestrate the kedro run.
Of these components, the DataCatalog is the most standalone and can be used separately without the Kedro project template. However, this is not the case for the ConfigLoader, and there is potential to make it more modular.
Problem
The current ConfigLoader class has many assumptions, such as assuming that you are working with a Kedro project, and that all configurations are file-based. There are workarounds that you can use hooks to override non-file base configuration, but that's not ideal. Many of these assumptions may not hold true, and it indicates that there is strong coupling with the Kedro project structure.
The easiest way to test this is to try to create a ConfigLoader in a Jupyter notebook and load some configuration files, and it's not straightforward.
Be as concrete as you can about:
What's in scope
What's not in scope
Design
To address these issues, I propose separating the ConfigLoader class into ConfigLoader and ConfigResolver, which can potentially be orchestrated by a ConfigManager class. Once we realize that the current ConfigLoader has two jobs, loading and resolving (e.g., Templating, interpolating, etc.), we can split them into smaller classes.
For example, we can create an AbstractConfigLoader base class, a FileConfigLoader class for loading configuration from a file, and a DatabaseConfigLoader class for loading configuration from a database. We can also create an AbstractConfigResolver base class and several implementations of ConfigResolver classes, such as JinjaConfigResolver for resolving Jinja templates and OmegaConfigResolver for resolving OmegaConf config files.
This design can help us remove the file-based assumption and easily load configurations from various sources such as a cloud key vault, a database or a parameter server. The resolving logic will remain similar to current implementations.
How may this help? Once we get rid of the file-base assumption, we can easily load configuration from a Cloud KeyVault, a parameter server, or even from a database. The resolving logic will remain similar as the current implementations.
Alternatives considered
An alternative we considered was creating an extra ConfigManager class to cache or validate configurations. However, these functions are not too common, so we decided to focus on the loading and resolving functions.
Another alternative would be to let the KedroContext class take on some of the responsibilities of the ConfigManager. However, this would require significant changes to the Kedro framework and was ultimately deemed unnecessary.
Testing
Explain the testing strategies to verify your design correctness (if possible).
Rollout strategy
The first iteration of our proposed solution could involve refactoring the FileConfigLoader class to be more modular, along with adding the OmegaConfigResolver and TemplatedConfigResolver. Future iterations can add more types of config loaders.
Is the change backward compatible? If not, what is the migration strategy?
This change is not backward compatible, and it will require changes to the API. Therefore, we will need to devise a migration strategy to help users transition smoothly to the new API.
Beta Was this translation helpful? Give feedback.
All reactions