Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to Override Deeply Nested Settings using Environment Variables? #203

Closed
kschwab opened this issue Dec 28, 2023 · 10 comments · Fixed by #204 or #348
Closed

How to Override Deeply Nested Settings using Environment Variables? #203

kschwab opened this issue Dec 28, 2023 · 10 comments · Fixed by #204 or #348
Assignees

Comments

@kschwab
Copy link
Contributor

kschwab commented Dec 28, 2023

Hello,

Is it possible to override a deeply nested setting without having to redefine the entirety of the model?

Below is a modified example based off of Parsing environment variable values:

import os
from pydantic import BaseModel
from pydantic_settings import BaseSettings, SettingsConfigDict


class DeepSubModel(BaseModel):
    v4: str


class SubModel(BaseModel):
    v1: str
    v2: bytes
    v3: int
    deep: DeepSubModel


class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_nested_delimiter='__')

    v0: str
    sub_model: SubModel

    @classmethod
    def settings_customise_sources(
        cls,
        settings_cls,
        init_settings,
        env_settings,
        dotenv_settings,
        file_secret_settings):
        return env_settings, init_settings, file_secret_settings


# Ideal scenario would be a simple point modification
os.environ['SUB_MODEL__DEEP__V4'] = 'override-v4'

try:
    print(Settings(v0='0', sub_model=SubModel(v1='init-v1', v2=b'init-v2', v3=3, 
                   deep=DeepSubModel(v4='init-v4'))).model_dump())
except ValidationError as e:
    print(e)
    """
    pydantic_core._pydantic_core.ValidationError: 3 validation errors for Settings
    sub_model.v1
      Field required [type=missing, input_value={'deep': {'v4': 'override-v4'}}, input_type=dict]
        For further information visit https://errors.pydantic.dev/2.5/v/missing
    sub_model.v2
      Field required [type=missing, input_value={'deep': {'v4': 'override-v4'}}, input_type=dict]
        For further information visit https://errors.pydantic.dev/2.5/v/missing
    sub_model.v3
      Field required [type=missing, input_value={'deep': {'v4': 'override-v4'}}, input_type=dict]
        For further information visit https://errors.pydantic.dev/2.5/v/missing
    """

# Current scenario seems to require entire definition of nested modes etc.
os.environ['SUB_MODEL'] = '{"v1": "reinit-v1", "v2": "reinit-v2"}'
os.environ['SUB_MODEL__V3'] = '33'

print(Settings(v0='0', sub_model=SubModel(v1='init-v1', v2=b'init-v2', v3=3, 
               deep=DeepSubModel(v4='init-v4'))).model_dump())
"""
{'v0': '0', 'sub_model': {'v1': 'reinit-v1', 'v2': b'reinit-v2', 'v3': 33, 'deep': {'v4': 'override-v4'}}}
"""

The difference here is Settings is defined through instantiation instead of environment variables. Ideally, the below concept would still apply to Settings with respect to nested precedence, allowing for point modifications of nested variables:

Nested environment variables take precedence over the top-level environment variable JSON...

@kschwab
Copy link
Contributor Author

kschwab commented Dec 28, 2023

Mirroring the instantiated fields in the environment works:

import os
import json
from pydantic import BaseModel
from pydantic_settings import BaseSettings, SettingsConfigDict, EnvSettingsSource


class DeepSubModel(BaseModel):
    v4: str


class SubModel(BaseModel):
    v1: str
    v2: str
    v3: int
    deep: DeepSubModel


class Settings(BaseSettings):
    model_config = SettingsConfigDict(env_nested_delimiter='__')

    v0: str
    sub_model: SubModel

    @classmethod
    def settings_customise_sources(
        cls,
        settings_cls,
        init_settings,
        env_settings,
        dotenv_settings,
        file_secret_settings):

        # Mirror the instantiated fields in the environment
        for key, value in cls.model_construct(**init_settings.init_kwargs).model_dump().items():
            os.environ[f'{env_settings.env_prefix}{key}'] = json.dumps(value) if isinstance(value, dict) else value

        # Re-instantiate EnvSettingsSource so it uses the latest environment
        return EnvSettingsSource(settings_cls,
                                 env_settings.case_sensitive,
                                 env_settings.env_prefix,
                                 env_settings.env_nested_delimiter), init_settings, file_secret_settings


os.environ['SUB_MODEL__DEEP__V4'] = 'override-v4'

print(Settings(v0='0', sub_model=SubModel(v1='init-v1', v2=b'init-v2', v3=3, 
               deep=DeepSubModel(v4='init-v4'))).model_dump())
"""
{'v0': '0', 'sub_model': {'v1': 'init-v1', 'v2': 'init-v2', 'v3': 3, 'deep': {'v4': 'override-v4'}}}
"""

If the above were formalized under EnvSettingsSource, an additional option could be added to apply the changes in a local copy of the environment to avoid polluting the original.

The changes are pretty straightforward, I can throw up a PR if desired.

@hramezani
Copy link
Member

Thanks @kschwab for reporting this.

pydantic-settings initializes all sources and merges the results by deep_update.

Here are the results of the sources in your example:

  • EnvSettingsSource -> {'sub_model': {'deep': {'v4': 'override-v4'}}}
  • InitSettingsSource -> {'v0': '0', 'sub_model': SubModel(v1='init-v1', v2=b'init-v2', v3=3, deep=DeepSubModel(v4='init-v4'))}

As you can see in InitSettingsSource, sub_model is an object, not a dict. So, deep_update can't update the SubModel object and replace it with the whole dict {'deep': {'v4': 'override-v4'}}

If you initialize your Settings class by dict you will get the correct result:

print(Settings(v0='0', sub_model={'v1': 'init-v1', 'v2': b'init-v2', 'v3': 3, 'deep':{'v4':'init-v4'}}).model_dump())

@kschwab
Copy link
Contributor Author

kschwab commented Dec 29, 2023

Got it, thanks @hramezani for the response.

If I understand correctly, InitSettingsSource is the only source currently that can return a dict with objects instead of a dict with dicts. Can we update InitSettingsSource such that it will always return a dict with dicts? e.g.:

class InitSettingsSource(PydanticBaseSettingsSource):
    """
    Source class for loading values provided during settings class initialization.
    """

    def __call__(self) -> dict[str, Any]:
        InitSettings = create_model('InitSettings', **{key: (Any, val) for key,val in self.init_kwargs.items()})
        return InitSettings().model_dump()

This would then allow for the below approaches to be equivalent:

# 1. Expressing "sub_model" and "deep" as dicts
print(Settings(v0='0', sub_model={'v1': 'init-v1', 'v2': b'init-v2', 'v3': 3, 
               'deep':{'v4':'init-v4'}}).model_dump())

# 2. Expressing "sub_model" and "deep" as objects
print(Settings(v0='0', sub_model=SubModel(v1='init-v1', v2=b'init-v2', v3=3, 
               deep=DeepSubModel(v4='init-v4'))).model_dump())

# 3. Expressing "sub_model" as object and "deep" as dict
print(Settings(v0='0', sub_model=SubModel(v1='init-v1', v2=b'init-v2', v3=3, 
               deep={'v4':'init-v4'})).model_dump())

# 4. Expressing "sub_model" as dict and "deep" as object
print(Settings(v0='0', sub_model={'v1': 'init-v1', 'v2': b'init-v2', 'v3': 3, 
               'deep':DeepSubModel(v4='init-v4')}).model_dump())

As an aside, our application is in the simulation and modelling domain. Having the ability to express model variants in object form improves readability and reduces our configuration maintenance. Pydantic settings is perfect for this, just need a slight tweak to enable this generally.

@hramezani
Copy link
Member

Good idea @kschwab

You can use TypeAdapter as well:

    def __call__(self) -> dict[str, Any]:
        return TypeAdapter(dict[str, Any]).dump_python(self.init_kwargs)

Would you like to open a PR? Otherwise I will do it.

@kschwab
Copy link
Contributor Author

kschwab commented Jan 2, 2024

Done. Opened a PR. Thanks!

@hramezani
Copy link
Member

@kschwab I am going to revert the fix for this issue because this introduced a breaking change

I think it would be easy for you to implement your custom InitSettingsSource and override the __call__ like:

class CustomInitSettingsSource(InitSettingsSource):
    def __call__(self) -> dict[str, Any]:
        return TypeAdapter(Dict[str, Any]).dump_python(self.init_kwargs, by_alias=True)

@kschwab
Copy link
Contributor Author

kschwab commented Feb 19, 2024

@hramezani, yes, reverting the original commit makes sense. Actually #241 and @moonrail raise a couple of interesting points.

The above suggestion would not work for cases such as #241 where the type information must be preserved. i.e., the fix needs to handle sources that result in dict with objects. Currently only InitSettingsSource does this, but generally speaking user defined sources could do this as well.

A better solution is to modify deep_update such that it takes into account objects:

def deep_update(
    mapping: dict[KeyType, Any], *updating_mappings: dict[KeyType, Any]
) -> dict[KeyType, Any]:
    updated_mapping = mapping.copy()
    for updating_mapping in updating_mappings:
        for key, new_val in updating_mapping.items():
            if key in updated_mapping:
                old_val = updated_mapping[key]
                old_val_type = type(old_val)
                if is_model_class(old_val_type) and isinstance(new_val, dict):
                    old_val = old_val.model_dump()
                updated_mapping[key] = (
                    TypeAdapter(old_val_type).validate_python(deep_update(old_val, new_val))
                    if isinstance(old_val, dict) and isinstance(new_val, dict)
                    else new_val
                )
            else:
                updated_mapping[key] = new_val
    return updated_mapping

Where the resulting object type would be:

  • dict if old_val and new_val are both dict
  • type(old_val) if old_val is_model_class and new_val is dict
  • type(new_val) if new_val is not dict

The above will also resolve the second point raised in #241, "copy-by-reference to copy-by-value", which really only applies to BaseModel derived objects.

I'll open a new PR with the changes and additional tests to cover #241 once ready.

@kschwab
Copy link
Contributor Author

kschwab commented Feb 20, 2024

@hramezani opened PR #244 with updated fixes 👍🏾

@hramezani
Copy link
Member

Thanks @kschwab for creating the new PR.

I am afraid this change breaks pydantic-settings again because it changes deep_update and it can make all the sources break if we have some hidden bug.

BTW, we can keep the PR open and merge it in V3

@kschwab
Copy link
Contributor Author

kschwab commented Feb 21, 2024

@hramezani no concerns there, I agree on the risk. V3 sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants