Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit: Make KedroContext a dataclass and add config_loader as a property #1459

Closed
3 tasks done
Tracked by #1441
noklam opened this issue Apr 20, 2022 · 9 comments · Fixed by #1465 or #3300
Closed
3 tasks done
Tracked by #1441

Revisit: Make KedroContext a dataclass and add config_loader as a property #1459

noklam opened this issue Apr 20, 2022 · 9 comments · Fixed by #1465 or #3300
Assignees

Comments

@noklam
Copy link
Contributor

noklam commented Apr 20, 2022

Description

As KedroSession now control the lifecycle of Kedro's run, KedroContext act like a container and it stores important attributes. Since we now dropped Python 3.6 support, we can make use of Python's dataclass to further tidy up KedroContext's constructor code.

todo:

@noklam
Copy link
Contributor Author

noklam commented Apr 21, 2022

Turns out using the dataclasss doesn't seem to reduce any boilerplate code since we need to keep the attributes "private". You can find more details in this blog post https://florimond.dev/en/posts/2018/10/reconciling-dataclasses-and-properties-in-python/.

attrs support this but it's one more dependency to add.
https://www.attrs.org/en/stable/examples.html

@noklam
Copy link
Contributor Author

noklam commented Jul 30, 2023

This was merged already, but as we approaching 0.19, I want to revise if it's a good idea to freeze KedroContext.
Related Issues

We should also consider how easy for plugin developer to use KedroContext. In the past we were more negative about storing context and states in Hooks for following hooks. However in #2690, we may want to document this. If this is not a bad idea, do we need to re-visit the decision we made?

@noklam
Copy link
Contributor Author

noklam commented Aug 1, 2023

python/cpython#93133 (comment)

Potentially helpful with the issues that using mock with dataclass or attr.

@Galileo-Galilei
Copy link
Contributor

Galileo-Galilei commented Oct 28, 2023

I cross-reference #3214 here which is likely a duplicate of this, sorry!

Description

The KedroContext has been made a frozen dataclass in the develop branch since May 2022 and the PR #1465.
I think it should be unfrozen to keep te current behaviour and enable users to modify the context.

The PR has triggered a lot of discussions (see in the issue), but it can be summarized as follows:

  1. if the class is frozen, it prevents users to modify the context with unintended side_effects, which has been made easier with the after_context_created
  2. if the class is not frozen, users can add attributes to the context, e.g. plugin configuration. The main advantage of this approach is that it makes the "interactive" workflow (i.e. within a notebook) consistent with the "command line" workflow (i.e. running with the kedro run command).

Basically, we (@antonymilne, @noklam and myself) were not absolutely sure of what the best decision was, but @noklam pointed out that we have a workaround to force setting the attribute of a frozen dataclass by using context.__setattr__("attribute_name", attribute_value) so we could have the benefits of both 1 & 2.

Context

I am currently preparing the migration of kedro-mlflow to kedro=0.19, and it turns out that the workaround identified back then does not work. I can try do dark magic by digging deeply in python internals and kind of redefining __setattr__ on the fly, but it looks very (very!) bad and I am not even sure I can make it work properly.

Unless we find a workaround, we are forced to choose between 1 & 2. My personal preference go to 2. for the following reasons:

  • Kedro has been open source for 4 years now, and I have never seen a bug due to a user involuntarily modifying the context, even during the last past year with the after_context_created hook. This obviously may happen, but we should acknowledge that the few users who modify the context do it "at their own risk" and know what they do.
  • I personally think (but I am biased as a plugin author 😅 ) that the consistency with the interactive workflow (not doing so will basically remove the mlflow configuration from context and make it hard to use in notebooks) and the simplified experience for plugin authors is worth letting them modifying the class.

Possible Implementation

Replace @frozen decorator by @define(slots=False). The @slots=False is needed to enable adding new attributes.

@frozen
class KedroContext:
"""``KedroContext`` is the base class which holds the configuration and
Kedro's main functionality.

Possible Alternatives

Keep it frozen.

@noklam
Copy link
Contributor Author

noklam commented Oct 28, 2023

Thanks for x-ref the issues. The issue has been 1.5 years old and my thinking has changed a little bit as well. I agree with you mostly because:

  1. It has not been causing any issue since the hook is added 1.5 years ago.
  2. Stateful hook is more common in Kedro now, so users are likely to bump into this problem often.

Basically, I think we can take a middle-ground which is Mutable class + immutable attributes. The key is to use attrs.setters.frozen, I mentioned that in #1465 but honestly I don't remember why I did not use it but go for a full frozen class. Would this be sufficient?

import attr
from attrs.setters import frozen

@attr.s
class Foo:
  # A partial frozen class
  bar: int = attr.field(default=0, on_setattr=frozen)

f = Foo()
f.abc = 3 # It's ok to mutate the class
f.bar = 3 # It's not ok to mutate existing attribtue

@astrojuanlu
Copy link
Member

Is this possible with plain dataclass? (To avoid the extra dependency if possible)

@Galileo-Galilei
Copy link
Contributor

Choosing between attrs and dataclasses

In the original issue #1465, there are detailed analysis of the pros and cons and justifications to choose over attrs. The conclusion seems to be that it is much easier to make the class "partially" frozen (e.g. some attributes are frozen and other are not,namely the one prefixed with an underscore like _hook_manager) with attrs rather than with dataclasses (even if it was possible too with a lot of boilerplate code).

Since we finally ended up making the entire class frozen, I am not sure why this was relevant in the end, but I may miss someething. This needs to be reassesed once we figure out if we really want to make the class or some of its attributes frozen.

Should we make some fields frozen?

If I understand correctly, this was the original motivation of the PR, but I think the following arguments still stands:

It has not been causing any issue since the hook is added 1.5 years ago.

So I would be in favor of not freezing anything and use a pure dataclass. This would also helps to get rid of attrs (the motivation for thsi package was to make only some attributes frozen) and one less dependency, many less future maintenance problems :)

Workaround to add fields on a frozen dataclass

I've played a bit around this, and here are some food for thoughts. In a nutshell:

Package Override existing attribute Add extra attribute
dataclass
attrs

This implies that if we switch to dataclass instead of attrs, we can keep the exact current behaviour, but the price is to have a boilerplate code as described in the first paragraph.

frozen dataclass can be overriden and added

from dataclasses import dataclass

@dataclass(frozen=True)
class PointDataClass:
    x: int
    y: int

point_data_class = PointDataClass(x=1, y=1)
point_data_class.z = 2 # # FrozenInstanceError: cannot assign to field 'z'
object.__setattr__(point_data_class, "y", 4)  # works: you can override an existing field
print(point_data_class.y)
point_data_class.__setattr__("z", 2)  # self.__setattr__ does not work: FrozenInstanceError: cannot assign to field 'z'
object.__setattr__(point_data_class, "z", 2)  # object.__setattr__ works! You can add an attribute
print(point_data_class.z) # 2

frozen attrs attributes can be overriden and but not added

from attrs import frozen

@frozen
class PointAttrs:
    x: int
    y: int

point_attrs = PointAttrs(x=1, y=1)
point_attrs.z = 2
point_attrs.__setattr__("z", 2)  # self.__setattr__ does not work: FrozenInstanceError
object.__setattr__(point_attrs, "y", 4)  # object.__setattr__ works : you can override an existing field
print(point_attrs.y) # 4
object.__setattr__(point_attrs, "z", 2)  #  # object.__setattr__ does NOT work! You CANNOT add an attribute: AttributeError: 'PointAttrs' object has no attribute 'z'

@noklam
Copy link
Contributor Author

noklam commented Nov 6, 2023

@astrojuanlu attrs aren't really "extra" because it was always a secondary dependencies used by other library. Partial immutability is only possible with attr.

So I think the key here is:

  • Should we make it partially immutable or just keep everything mutable?

The example above shows that we can make it works for partial immutability. In my mind attr is a superset of dataclass so I don't expect things that only work with dataclasses

Whether or not we should use attr is an implementation decision, it's not necessary an extra dependencies because it is already used by other core dependencies (maybe we can double check),

@merelcht merelcht changed the title Make KedroContext a dataclass and add config_loader as a property Revisit: Make KedroContext a dataclass and add config_loader as a property Nov 6, 2023
@noklam noklam linked a pull request Nov 13, 2023 that will close this issue
7 tasks
@merelcht
Copy link
Member

Completed in #3300

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
5 participants