Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] - Replace ConfigLoader with ConfigLoader and ConfigResolver #2481

Closed
noklam opened this issue Mar 29, 2023 · 6 comments
Closed

[Proposal] - Replace ConfigLoader with ConfigLoader and ConfigResolver #2481

noklam opened this issue Mar 29, 2023 · 6 comments
Assignees

Comments

@noklam
Copy link
Contributor

noklam commented Mar 29, 2023

Introduction

The ConfigLoader class in Kedro is currently responsible for loading, resolving, validating, and merging configurations for various components. However, it has become too big, and its tests have become more complicated. This issue proposes to make ConfigLoader more modular and remove its strong coupling with the Kedro project structure.

Background

Background

Kedro is a framework that consists of several major components, including DataCatalog, ConfigLoader, Pipeline, and Runner. There are also other classes such as KedroSession and KedroContext, which are mainly used to orchestrate the kedro run.

Of these components, the DataCatalog is the most standalone and can be used separately without the Kedro project template. However, this is not the case for the ConfigLoader, and there is potential to make it more modular.

Problem

The current ConfigLoader class has many assumptions, such as assuming that you are working with a Kedro project, and that all configurations are file-based. There are workarounds that you can use hooks to override non-file base configuration, but that's not ideal. Many of these assumptions may not hold true, and it indicates that there is strong coupling with the Kedro project structure.

The easiest way to test this is to try to create a ConfigLoader in a Jupyter notebook and load some configuration files, and it's not straightforward.
Be as concrete as you can about:

What's in scope

What's not in scope

Design

To address these issues, I propose separating the ConfigLoader class into ConfigLoader and ConfigResolver, which can potentially be orchestrated by a ConfigManager class. Once we realize that the current ConfigLoader has two jobs, loading and resolving (e.g., Templating, interpolating, etc.), we can split them into smaller classes.

For example, we can create an AbstractConfigLoader base class, a FileConfigLoader class for loading configuration from a file, and a DatabaseConfigLoader class for loading configuration from a database. We can also create an AbstractConfigResolver base class and several implementations of ConfigResolver classes, such as JinjaConfigResolver for resolving Jinja templates and OmegaConfigResolver for resolving OmegaConf config files.

This design can help us remove the file-based assumption and easily load configurations from various sources such as a cloud key vault, a database or a parameter server. The resolving logic will remain similar to current implementations.

class AbstractConfigLoader(UserDict):
    """Abstract base class for all ConfigLoader implementations."""
    # We can keep the Dict-like interface

class FileConfigLoader(AbstractConfigLoader):
    """ConfigLoader implementation for loading configuration from a file."""
    # This will be the same as the current `ConfigLoader`
           
class DatabaseConfigLoader(AbstractConfigLoader):
    """ConfigLoader implementation for loading configuration from a database."""
    def load(self, connection_string):
          ...

class ConfigResolver:
    """Abstract base class for all ConfigResolver implementations."""

    def resolve(self) -> Dict[str, Any]:
        """Resolves the configuration data."""
        raise NotImplementedError

class JinjaConfigResolver(ConfigResolver):
    """ConfigResolver implementation for resolving Jinja templates."""

    def resolve(self) -> Dict[str, Any]:
        """Resolves the configuration data using Jinja templates."""
        # implementation here

class OmegaConfigResolver(ConfigResolver):
    """ConfigResolver implementation for resolving OmegaConf config files."""
    ...
    def resolve(self) -> Dict[str, Any]:
        """Resolves the configuration data using OmegaConf config files."""
        # implementation here

How may this help? Once we get rid of the file-base assumption, we can easily load configuration from a Cloud KeyVault, a parameter server, or even from a database. The resolving logic will remain similar as the current implementations.

Alternatives considered

An alternative we considered was creating an extra ConfigManager class to cache or validate configurations. However, these functions are not too common, so we decided to focus on the loading and resolving functions.

Another alternative would be to let the KedroContext class take on some of the responsibilities of the ConfigManager. However, this would require significant changes to the Kedro framework and was ultimately deemed unnecessary.

Testing

Explain the testing strategies to verify your design correctness (if possible).

Rollout strategy

The first iteration of our proposed solution could involve refactoring the FileConfigLoader class to be more modular, along with adding the OmegaConfigResolver and TemplatedConfigResolver. Future iterations can add more types of config loaders.

Is the change backward compatible? If not, what is the migration strategy?
This change is not backward compatible, and it will require changes to the API. Therefore, we will need to devise a migration strategy to help users transition smoothly to the new API.

Note Is there anyone try to use ConfigLoader as a standalone component? Would love to hear more feedback!

@astrojuanlu
Copy link
Member

How much of this is still relevant in the face of keeping only OmegaConfigLoader?

@noklam
Copy link
Contributor Author

noklam commented Aug 31, 2023

It's less relevant if there is only ever one class. I do still think separating the two is beneficial, even if it's just on the function level.

User can still extend on the abstract class, we cannot assume everyone is settled with OmegaConfigLoader?

Maybe config can also partially read from cloud key vault and still resolved with OmegaConf. I am aware there is a lot of maybe here.

@astrojuanlu
Copy link
Member

Related: #2973

@astrojuanlu
Copy link
Member

Question: What new use cases would this enable? (So that we can justify the refactoring)

@astrojuanlu
Copy link
Member

Introspection of the config files is one possibility #2821 (comment)

@datajoely
Copy link
Contributor

I feel pydantic would be very helpful here too

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants