Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

Refactor strategy instantiation for more extensitiliby #1254

Merged
merged 12 commits into from
Sep 7, 2022

Conversation

adamsachs
Copy link
Contributor

@adamsachs adamsachs commented Sep 2, 2022

Purpose

Make our strategies more extensible by creating a Strategy abstract class using the builtin __subclasses__() method to find and instantiate Strategy subclases.

Before this change, many of our strategies (with the exception of MaskingStrategys, which had been refactored in #560) were registered by means of a hardcoded enums in the core fidesops codebase. If a developer wanted to implement their own strategy, it required an update to the core fidesops codebase.

With this change, developers outside of core fidesops can implement their own strategy (whether that's an AuthenticationStrategy, MaskingStrategy, PaginationStrategy, or PostProcessorStrategy) and leverage it in the system by simply importing their subclass and ensuring that it defines a unique name class variable, along with a configuration_model variable pointing to the strategy's pydantic configuration class. As an example:

class SomeStrategyConfiguration(StrategyConfiguration):
    some_key: str = "default value"

class SomeStrategy(PostProcessorStrategy):
    name = "some postprocessor strategy"
    configuration_model = SomeStrategyConfiguration

    def __init__(self, configuration: SomeStrategyConfiguration):
        self.some_config = configuration.some_key

    def process(
        self, data: Any, identity_data: Dict[str, Any] = None
    ) -> Union[List[Dict[str, Any]], Dict[str, Any]]:
        pass

Changes

  • an abstract base class Strategy that defines logic for strategy retrieval and instantiation, through a generic get_strategy method
    • update existing strategy types to inherit from this new base class.
    • the base class defines standardized class variables name and configuration_model that are used to identify and instantiate strategy subtypes in a consistent manner
  • remove existing strategy factories as they are no longer needed; update references in the codebase to use the new get_strategy method for the corresponding Strategy subtype

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #562

@adamsachs adamsachs added run unsafe ci checks Triggers running of unsafe CI checks Needs doc review SaaS Connector The issue indicates development work for a specific SaaS application labels Sep 2, 2022
@adamsachs adamsachs self-assigned this Sep 2, 2022
@adamsachs adamsachs changed the title 562 refactor strategy instantiation for more extensitiliby Refactor strategy instantiation for more extensitiliby Sep 2, 2022
@adamsachs
Copy link
Contributor Author

the shopify extrnal unsafe integration test is failing but i don't think that's related to my changes?

looks like unsafe tests were never run on the shopify PR that was merged a few days ago, and looks like the unsafe tests have failed all runs since then, besides for @galvana's draft PR which actually seems like it is amending the shopify issue.

@galvana is that accurate? if so, then i think we can ignore the failure here...

@@ -48,6 +45,7 @@
FIDESOPS_AUTOGENERATED_STORAGE_KEY = "fidesops_autogenerated_storage_destination"
AUTOGENERATED_ACCESS_KEY = "download"
AUTOGENERATED_ERASURE_KEY = "delete"
STRING_REWRITE_STRATEGY_NAME = "string_rewrite"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it OK to edit this file by hand? i hesitated before doing so since it's a generated migration file...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine since it's functionally equivalent, we're just cleaning up the constants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that this file wasn't actually auto-generated either, it's a data migration instead of a schema migration and automatically adds several rows to multiple tables in the fidesops application database!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, nice - thanks for clarifying that @pattisdr!

@galvana
Copy link
Collaborator

galvana commented Sep 6, 2022

@adamsachs, yes, we can ignore the Shopify issues as part of this ticket

@sanders41
Copy link
Contributor

I believe I found the shopify issue and fixed it as part of #1260 🤞

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, main thing is fixing up _find_all_strategy_subclasses - the latest commit broke this, and adding tests around it. I do like this implementation generally.

Comment on lines 32 to 37
def test_read_strategies(self, api_client: TestClient):
expected_response = []
for strategy in MaskingStrategyFactory.get_strategies():
for strategy in MaskingStrategy.get_strategies():
expected_response.append(strategy.get_description())

response = api_client.get(V1_URL_PREFIX + MASKING_STRATEGY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't pick up the error. Currently get_strategies is broken and returns an empty list, but the expected_response here is also incorrectly an empty list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i've been a bit uncertain about this test since i first saw it. but i'm also not really sure on the best way to resolve it - i feel like any approach to getting a "true" list of strategies is either going to rely on duplicating the logic of get_strategies(), or it's going to be prone to false negatives if/when more masking strategies are added to the testing runtime (whether that's core fidesops updates or, potentially, some extended runtime like -plus).

obviously we need more robust testing for get_strategies(), as you've correctly pointed out. but what do you think about keeping this as is, and just focusing on firming up the tests around get_strategies() itself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_strategies() improvement was the main thing, but perhaps asserting that the response is non-empty at least?

src/fidesops/ops/service/strategy.py Show resolved Hide resolved
src/fidesops/ops/service/strategy.py Show resolved Hide resolved
src/fidesops/ops/service/strategy.py Show resolved Hide resolved
src/fidesops/ops/service/strategy.py Outdated Show resolved Hide resolved
@adamsachs
Copy link
Contributor Author

adamsachs commented Sep 7, 2022

@ethyca/docs-authors would you be able to take a quick look at the docs changes here? this PR is instead of #1163, as we decided on a different implementation approach -- sorry for the double-work! let me know if you've got suggestions.

i've also tweaked the description of related docs ticket #1169 to reference this PR, rather than the now outdated #1163

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work @adamsachs 🏆

A small improvement might be adding #1254 (comment), but otherwise this looks good to me.

I'll let your team take care of merging in case there's more left to do -

Copy link
Contributor

@conceptualshark conceptualshark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little tweak - otherwise this looks good!

In order to leverage an implemented masking strategy, the `MaskingStrategy` subclass must be registered with the `MaskingStrategyFactory`. To register a new `MaskingStrategy`, use the `register` decorator on the `MaskingStrategy` subclass definition, as shown in the above example.

The value passed as the argument to the decorator must be the registered name of the `MaskingStrategy` subclass. This is the same value defined by [callers](#using-fidesops-as-a-masking-service) in the `"masking_strategy"."strategy"` field.
In order to leverage an implemented masking strategy, the `MaskingStrategy` subclass must be imported into the application runtime. Also, the `MaskingStrategy` class must define two class variables: `name`, which is the unique, registered name that callers [callers](#using-fidesops-as-a-masking-service) will use in their `"masking_strategy"."strategy"` field to invoke the strategy; and `configuration_model`, which references the configuration class used to parameterize the strategy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to leverage an implemented masking strategy, the `MaskingStrategy` subclass must be imported into the application runtime. Also, the `MaskingStrategy` class must define two class variables: `name`, which is the unique, registered name that callers [callers](#using-fidesops-as-a-masking-service) will use in their `"masking_strategy"."strategy"` field to invoke the strategy; and `configuration_model`, which references the configuration class used to parameterize the strategy.
In order to leverage an implemented masking strategy, the `MaskingStrategy` subclass must be imported into the application runtime. Also, the `MaskingStrategy` class must define two class variables: `name`, which is the unique, registered name that [callers](#using-fidesops-as-a-masking-service) will use in their `"masking_strategy"."strategy"` field to invoke the strategy; and `configuration_model`, which references the configuration class used to parameterize the strategy.

Adam Sachs added 9 commits September 7, 2022 14:54
A generalized Strategy abstract base class provides generalized getter methods
that instantiate strategy subclasses (implementations).
These methods rely on the builtin __subclasses__() method to identify Strategy subclasses,
which allows for more dynamic and extensible strategy implementation, removing the need
for a hardcoded enumeration of supported Strategy implementations.
Abstract strategy types inherit from this new abstract base class,
and strategy subclasses (implementations) must provide `name` and `configuration_model` attributes
that are leveraged by new instantiation mechanism in the abstract base class.
This allows the method to leverage the new `name` class variable rather than
relying on a static constant variable.
Strategy factories are no longer needed with refactored Strategy getters.
Update the uses (references) of strategy factories throughout the codebase
to now rely on the new Strategy getters.
Strategy subclasses (implementations) now need to be imported explicitly
in __init__.py's because they used to be imported in factory modules.
Also remove the old MaskingStrategy registration/factory mechanisms.
Now that the abstract Strategy base class enforces implementation subclasses
to have a `name` class attribute, this attribute should be relied upon rather than
the arbitrary name constants declared previously.
The get_strategy_name() abstract method is also superfluous, as the `name`
class attribute can be used as a standardized way to retrieve the strategy name.
The generalized strategy getter now relies upon the `configuration_model`
class variable that's on each Strategy. Therefore we no longer need the
get_configuration_model() getter on each Strategy subclass.
Update associated tests to make sure the recursion is properly tested
Copy link
Collaborator

@galvana galvana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach! I also like how there's a distinction between the different strategy "types" instead of just one combined pool of strategies.

@adamsachs adamsachs merged commit f478a6a into main Sep 7, 2022
@adamsachs adamsachs deleted the 562-subclasses-builtin branch September 7, 2022 20:29
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* Instantiate strategies via abstract Strategy base class

A generalized Strategy abstract base class provides generalized getter methods
that instantiate strategy subclasses (implementations).
These methods rely on the builtin __subclasses__() method to identify Strategy subclasses,
which allows for more dynamic and extensible strategy implementation, removing the need
for a hardcoded enumeration of supported Strategy implementations.
Abstract strategy types inherit from this new abstract base class,
and strategy subclasses (implementations) must provide `name` and `configuration_model` attributes
that are leveraged by new instantiation mechanism in the abstract base class.

* Update get_description() to be a class rather than static method

This allows the method to leverage the new `name` class variable rather than
relying on a static constant variable.

* Remove strategy factories and update references

Strategy factories are no longer needed with refactored Strategy getters.
Update the uses (references) of strategy factories throughout the codebase
to now rely on the new Strategy getters.
Strategy subclasses (implementations) now need to be imported explicitly
in __init__.py's because they used to be imported in factory modules.
Also remove the old MaskingStrategy registration/factory mechanisms.

* Remove strategy name constants

Now that the abstract Strategy base class enforces implementation subclasses
to have a `name` class attribute, this attribute should be relied upon rather than
the arbitrary name constants declared previously.
The get_strategy_name() abstract method is also superfluous, as the `name`
class attribute can be used as a standardized way to retrieve the strategy name.

* Remove get_configuration_model() abstract method

The generalized strategy getter now relies upon the `configuration_model`
class variable that's on each Strategy. Therefore we no longer need the
get_configuration_model() getter on each Strategy subclass.

* Update MaskingStrategy docs with new Strategy functionality

* Update changelog

* Improve recursion in _find_all_strategy_subclasses

* Fix recursion bug when finding all strategies

Update associated tests to make sure the recursion is properly tested

* Tweak conditional for falsy check

* Make get_strategies endpoint test more robust

* Fix typo in documentation

Co-authored-by: Adam Sachs <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Needs doc review run unsafe ci checks Triggers running of unsafe CI checks SaaS Connector The issue indicates development work for a specific SaaS application
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider refactoring factories for increased extensibility
5 participants