From dc2eda7abe6022044c7faa7d641443407751f180 Mon Sep 17 00:00:00 2001 From: Armin Ronacher Date: Thu, 18 Aug 2022 18:47:34 +0200 Subject: [PATCH 1/4] Import Reorg Plan --- text/0004-import-reorg.md | 120 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) create mode 100644 text/0004-import-reorg.md diff --git a/text/0004-import-reorg.md b/text/0004-import-reorg.md new file mode 100644 index 00000000..9a44e31d --- /dev/null +++ b/text/0004-import-reorg.md @@ -0,0 +1,120 @@ +* Start Date: 2022-08-18 +* RFC Type: decision +* RFC PR: https://github.com/getsentry/rfcs/pull/4 + +# Summary + +This RFC proposes a preliminary refactoring of the Sentry monolith in order +to support later compartmentalization into services by eliminating mass +re-exports in `sentry.app` and `sentry.models`. The goal is to make the +dependency tree easier to understand in order to break it into separate +services by making modules the boundary of our dependencies. + +In other words we want to be able to say that if module A depends on module B +it depends on the entirey of it. Today this is violated by a `sentry.app` +and `sentry.models`. + +# Motivation + +Sentry's CI and even local testing times are quickly increasing through the +code size in Sentry. We run pretty much all tests at all times as we are not +able to determine which change is going to require which code to run. The +solution in parts to this will likely be to compartmentalize Sentry into smaller +services. Today drawing these service boundaries however is relatively tricky +because a lot of code within Sentry is deeply intertwined. + +This also in parts shows up in import times. To run a single test file for a +utils module (`test_rust.py`) pytest will spend 0.2 seconds in test execution vs +2 seconds in import time. While imports are so far acceptable, the bigger issue +caused by it is that it increases the total amount of surface that we need to +consider for test execution. + +The main motivator however is the inability to draw boundaries within Sentry +today. For instance today `sentry.http` pulls in `sentry.models` (to import +`EventError` to access a constant) which pulls in _all_ database models. +`sentry.http` itself is needed by `sentry.utils.pytest.sentry` and some other +places. Because all models are imported, not just the model declarations are +imported but a lot of the application. For instance via the +`sentry.models.identity` the `sentry.analytics` system is pulled. +`sentry.models.integrations` will pull in the entire integration platform code +including `sentry.pipeline` which drags in the entire incidents system, API +helpers and more. `sentry.models.organizationmember` pulls in `sentry.app` +which then pulls in `tsdb`, `buffer`, `nodestore` etc. + +While it's undoubtedly true that today many of these imports will happen anyways +as we are globally configuring sentry in the tests, getting imports under control +will let us slowly break the Sentry monolith into distinct services in an easier +and more controlled manner. + +Making these imports however explicit enables us to better understand the real +dependencies through imports. Today we cannot use import tracking to see the +real dependencies because they are obfuscated through the mass re-exports. + +# Background + +This proposal came out of the desire to attempt to isolate the processing +pipeline out of the majority of the Sentry codebase. The end goal for that is +to be able to perform important changes to the event processing pipeline in +isolation of the rest of the code base to reduce the time spent in CI for +important changes to it. + +As such all code related to the processing pipeline should be moved into a clear +structure and have a largely independent test setup (think moving all of processing +related logic to `sentry.services.processing` or `sentry_processing` for better +enforcability). + +# Options Considered + +the proposal is to require developers to import models from the declaring model +instead of the re-import. that means rather than to import +`from sentry.models import user` the developer is required to import from +`sentry.models.user` instead. This has a few benefits: + +1. People are less likely to accidentally use imports out of the `sentry.models` + module that exist today but were unintentional. As an example we have seen + users of `from sentry.models import Any` because vscode adds auto imports from + the first seen module and it happens that we accidentally re-export the `Any` + type from `sentry.models` rather than `typing`. +2. It becomes easier to understand what is declared where when not using IDEs. + In particular some of the constants which are currently imported from the + `sentry.models` module can be hard to pinpoint to (eg: where is `sentry.models.ONE_DAY` + coming from?) +3. We can start enforcing isolation on the module level to enable + compartmentalization with lints. + +# Drawbacks + +The drawback of this change is that imports become more verbose: + +Before: + +```python +from sentry.models import Integration, OrganizationIntegration, Organization, \ + OrganizationMember, User +``` + +After: + +```python +from sentry.models.integrations import Integration, OrganizationIntegration +from sentry.models.organization import Organization +from sentry.models.organizationmember import OrganizationMember +from sentry.models.user import User +``` + +# Outside of Scope / Unresolved Questions + +This RFC does not yet set out a plan for the actual comparmentalization. The goal +is to start enabling the ability to use the module boundary to track and visualize +dependencies. + +Out of scope is also the eliminiation of further star re-exports. The majority of +other star re-exports we have are somewhat contained within modules which are already +largely self contained modules. As an example `sentry.identity` has a lot of star +exports from the different identity modules (`github`, `slack`, `google` etc.). As +we are likely going to consider this to be a self contained piece of code there is +only limited benefit of changing this. + +However poventially we want to be more specific about re-exporting in modules by +being explicit about what is being re-exported to make code discovery easier and to +avoid accidentaly mis-imports such as pulling in types from `sentry.models`. From 472a0a4f531803a1d98cd176a1c4a0f313421424 Mon Sep 17 00:00:00 2001 From: Armin Ronacher Date: Thu, 18 Aug 2022 18:52:43 +0200 Subject: [PATCH 2/4] Added rollout plan --- text/0004-import-reorg.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/text/0004-import-reorg.md b/text/0004-import-reorg.md index 9a44e31d..8e008925 100644 --- a/text/0004-import-reorg.md +++ b/text/0004-import-reorg.md @@ -82,6 +82,18 @@ instead of the re-import. that means rather than to import 3. We can start enforcing isolation on the module level to enable compartmentalization with lints. +# Rollout Plan + +The implementation of the migration path is a multi stage process: + +1. fully canonicalize all the imports from `sentry.app` and `sentry.models` + in `getsentry` and prevent the introduction of future through a lint. After + this point all changes can be seen locally to `sentry`. +2. fully canonicalize all the imports from `sentry.app` in `sentry`. +3. remove the re-exports in `sentry.app`. +4. gradually canonicalize the imports of models from the `sentry` codebase. +5. eliminate the re-exports of models in `sentry.models`. + # Drawbacks The drawback of this change is that imports become more verbose: From 5379131eeec1a09b7a186ff5987c3bfac60172fb Mon Sep 17 00:00:00 2001 From: Armin Ronacher Date: Thu, 18 Aug 2022 20:15:24 +0200 Subject: [PATCH 3/4] Update text/0004-import-reorg.md Co-authored-by: anthony sottile <103459774+asottile-sentry@users.noreply.github.com> --- text/0004-import-reorg.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0004-import-reorg.md b/text/0004-import-reorg.md index 8e008925..88225b87 100644 --- a/text/0004-import-reorg.md +++ b/text/0004-import-reorg.md @@ -67,8 +67,8 @@ enforcability). the proposal is to require developers to import models from the declaring model instead of the re-import. that means rather than to import -`from sentry.models import user` the developer is required to import from -`sentry.models.user` instead. This has a few benefits: +`from sentry.models import User` the developer is required to import +`from sentry.models.user import User` instead. This has a few benefits: 1. People are less likely to accidentally use imports out of the `sentry.models` module that exist today but were unintentional. As an example we have seen From 62274d1b8385fea4f6a99983776b9cfa6cae5901 Mon Sep 17 00:00:00 2001 From: Armin Ronacher Date: Mon, 22 Aug 2022 21:58:11 +0200 Subject: [PATCH 4/4] Add lazy imports into plan for shell usage --- text/0004-import-reorg.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/text/0004-import-reorg.md b/text/0004-import-reorg.md index 88225b87..492c6d51 100644 --- a/text/0004-import-reorg.md +++ b/text/0004-import-reorg.md @@ -93,6 +93,8 @@ The implementation of the migration path is a multi stage process: 3. remove the re-exports in `sentry.app`. 4. gradually canonicalize the imports of models from the `sentry` codebase. 5. eliminate the re-exports of models in `sentry.models`. +6. introduce a lazy-import utility into `sentry.models` for exclusive use in the + Sentry shell. # Drawbacks