Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autogenerate renders TypeDecorator instance instead of underlying impl type #1386

Open
saifelse opened this issue Jan 5, 2024 · 7 comments
Labels
autogenerate - rendering cookbook requested use case not quite a feature and not quite a bug, something we just didn't think of

Comments

@saifelse
Copy link
Contributor

saifelse commented Jan 5, 2024

Describe the bug

This isn't a bug per se, but a small improvement for autogenerate when using TypeDecorator.

When a TypeDecorator is used in a column definition, e.g.:

"""
File: app/models/foo.py
"""
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.types import TypeDecorator
...

class JSONBData(TypeDecorator):
   impl = JSONB

foo = Table("foo", MetaData(), Column("data", JSONBData))

The auto-generated migration ends up referencing the TypeDecorator:

op.add_column("foo", sa.Column("data", app.models.foo.JSONBData(), nullable=True))

which is annoying for two reasons:

  1. The import is not automatically rendered.
  2. The migration file has an unnecessary dependency on app, e.g. if the app/models/foo.py is refactored, we may need to update this migration file... when that could have been avoided if instead of rendering app.models.foo.JSONBData, alembic directly rendered the underlying impl: postgresql.JSONB.

I'm not sure if there are any scenarios where it is actually preferable to have the TypeDecorator in the autogenerated file. If there are use cases for it, would it be sensible to make this a config option instead of unconditional?

Expected behavior
Ideally, we'd generate the same thing as when foo = Table("foo", MetaData(), Column("data", JSONB)) is provided, i.e.:

op.add_column("foo", sa.Column("data", postgresql.JSONB(astext_type=Text()), nullable=True))

To Reproduce

Test case:

diff --git a/tests/test_autogen_render.py b/tests/test_autogen_render.py
index eeeb92e..9755869 100644
--- a/tests/test_autogen_render.py
+++ b/tests/test_autogen_render.py
@@ -33,6 +33,7 @@ from sqlalchemy.sql import false
 from sqlalchemy.sql import literal_column
 from sqlalchemy.sql import table
 from sqlalchemy.types import TIMESTAMP
+from sqlalchemy.types import TypeDecorator
 from sqlalchemy.types import UserDefinedType
 
 from alembic import autogenerate
@@ -1078,6 +1079,21 @@ class AutogenRenderTest(TestBase):
             "server_default='5', nullable=True))",
         )
 
+    def test_render_add_column_type_decorator(self):
+        self.autogen_context.opts["user_module_prefix"] = None
+
+        class MyType(TypeDecorator):
+            impl = Integer
+
+        op_obj = ops.AddColumnOp(
+            "foo", Column("x", MyType, server_default="5")
+        )
+        eq_ignore_whitespace(
+            autogenerate.render_op_text(self.autogen_context, op_obj),
+            "op.add_column('foo', sa.Column('x', sa.Integer(), "
+            "server_default='5', nullable=True))",
+        )
+
     @testing.emits_warning("Can't validate argument ")
     def test_render_add_column_custom_kwarg(self):
         col = Column(

Error

When running tox -e py-sqlalchemy -- tests/test_autogen_render.py with the above patch:

  File "/Users/saif/contrib/alembic/tests/test_autogen_render.py", line 1091, in test_render_add_column_type_decorator
    eq_ignore_whitespace(
  File "/Users/saif/contrib/alembic/alembic/testing/assertions.py", line 111, in eq_ignore_whitespace
    assert a == b, msg or "%r != %r" % (a, b)
           ^^^^^^
AssertionError: "op.add_column('foo', sa.Column('x', tests.test_autogen_render.MyType(), server_default='5', nullable=True))" != "op.add_column('foo', sa.Column('x', sa.Integer(), server_default='5', nullable=True))"

Versions.

  • OS: macOS 14.1.2
  • Python: 3.11.6
  • Alembic: 1.13.1
  • SQLAlchemy: 1.3.24 / 2.0.23
  • Database: Postgres
  • DBAPI: psycopg2

Have a nice day!

@saifelse saifelse added the requires triage New issue that requires categorization label Jan 5, 2024
saifelse added a commit to benchling/alembic that referenced this issue Jan 5, 2024
@CaselIT
Copy link
Member

CaselIT commented Jan 5, 2024

Hi

I'm not sure if there are any scenarios where it is actually preferable to have the TypeDecorator in the autogenerated file. If there are use cases for it, would it be sensible to make this a config option instead of unconditional?

Well a type decorator can render a different type depending on the dialect, so this change if accepted would need to be made a config option. For example https://docs.sqlalchemy.org/en/20/core/custom_types.html#backend-agnostic-guid-type

Also at the moment alembic support customization of what type is rendered, documented here: https://alembic.sqlalchemy.org/en/latest/autogenerate.html#affecting-the-rendering-of-types-themselves

maybe an extension to that documentation to show an example function to render the impl of type decorator could be enough?
@zzzeek opinion on this?

@CaselIT CaselIT added autogenerate - rendering use case not quite a feature and not quite a bug, something we just didn't think of and removed requires triage New issue that requires categorization labels Jan 5, 2024
@zzzeek
Copy link
Member

zzzeek commented Jan 5, 2024

"""
File: app/models/foo.py
"""
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.types import TypeDecorator
...

class JSONBData(TypeDecorator):
impl = JSONB

foo = Table("foo", MetaData(), Column("data", JSONBData))


The auto-generated migration ends up referencing the TypeDecorator:

```python
op.add_column("foo", sa.Column("data", app.models.foo.JSONBData(), nullable=True))

which is annoying for two reasons:

1. The import is not automatically rendered.

we have a configuration path for this which is to add your application's standard type import paths to the script.py.mako template. There's some other ways to add imports dynamically as well but are not as easy to document.

2. The migration file has an unnecessary dependency on `app`, e.g. if the app/models/foo.py is refactored, we may need to update this migration file... when that could have been avoided if instead of rendering `app.models.foo.JSONBData`, alembic directly rendered the underlying impl: `postgresql.JSONB`.

if you are refactoring your application to move where a particular symbol imports from, you necessarily have to change all the places that reference it, so I dont see why an alembic migration file is any different than any other file which refers to it.

I'm not sure if there are any scenarios where it is actually preferable to have the TypeDecorator in the autogenerated file.

there is, which is that the op.create_table() directive actually returns the Table object it just created so you can then use it for bulk_insert , where you definitely want your TypeDecorator to take place.

If there are use cases for it, would it be sensible to make this a config option instead of unconditional?

there already is! make yourself a render_item callable that does this for all typedecorators, I will gladly accept an add for cookbook for this:

def render_item(type_, obj, autogen_context):
    """Apply custom rendering for selected items."""

    if type_ == 'type' and isinstance(obj, TypeDecorator):
        return f"sa.{obj.impl!r}"

    # default rendering for other objects
    return False

def run_migrations_online():
    # ...

    context.configure(
                connection=connection,
                target_metadata=target_metadata,
                render_item=render_item,
                # ...
                )

    # ...

@saifelse
Copy link
Contributor Author

saifelse commented Jan 5, 2024

Thanks @CaselIT and @zzzeek for the feedback!

For some additional context on where I'm coming from, I've already implemented a monkeypatch that is applying a change similar to the one I've proposed that we've been using for a couple of years now, so that for most part developers get correctly generated migrations... but at the cost of someone maintaining ugly monkeypatch code in sync with alembic's changelog, so I was hoping to fold this into alembic source code if it would make sense. It looks like there are already cases where folks want to preserve their type decorators in the migration, so finding the right configuration instead of monkey patching / broadly making the change seems sensible to me.

monkeypatch if you're curious
from alembic import context
from alembic.autogenerate.render import _repr_type
import sqlalchemy.types as types

migration_context = context.get_context()
old_render_type = migration_context.impl.render_type

def new_render_type(type_, autogen_context):
    # If we are dealing with a custom TypeDecorator, run _repr_type on the underlying implementation.
    if isinstance(type_, types.TypeDecorator) and type(type_).__module__.startswith("app."):
        return _repr_type(type_.impl, autogen_context)
    # Otherwise, do the default behavior.
    return old_render_type(type_, autogen_context)

migration_context.impl.render_type = new_render_type

But in the interest of properly communicating why this sort of recipe is useful, I'll respond to some of the things you both have mentioned 😇.

if you are refactoring your application to move where a particular symbol imports from, you necessarily have to change all the places that reference it, so I dont see why an alembic migration file is any different than any other file which refers to it.

That's generally true... for our project, we've decided to stricly enfoce that migrations are disjoint from app, which has some pretty big benefits:

  • Easier maintainence: a migration at the end of the day is applying changes to our database using a dbapi, where we typically autogenerate it from our constantly-iterated-upon application's ORM models. By avoiding importing our application code in our migraiton, we end up with a growing collection of past migrations that are self-contained scripts can be referenced by developers writing new migrations with minimal maintainence cost since it is not importing our constantly evolving app.
  • Lower likelihood of bugs: There's also less possibility of accidentally refactoring a migration to do something differently. We manage 100+ databases that are not released to at the same release cadence, so not refactoring our migrations needlessly is one additional measure to ensure we are keeping our databases in sync.
  • Performance:Our app is monolithic, so importing it comes with a good deal of baggage (cost of runtime imports, installing all dependencies used by the app). As a migration is effectively a self-contained set of dbapi instructions, we can keep our migrations package super lightweight dependency-wise, and can easily dockerize and run our migrations without having to bundle it with our monolithic application.

There are many ways to combat the above issues, which I concur are not universal issues, but we found this decoupling to work for us.

Well a type decorator can render a different type depending on the dialect, so this change if accepted would need to be made a config option.

FWIW, where I put my proposed fix (#1387) happens after a dialect has attempted to render the type, so I think the mentioned use case still works - at least no test broke where I put my logic.

we have a configuration path for this which is to add your application's standard type import paths to the script.py.mako template. There's some other ways to add imports dynamically as well but are not as easy to document.

Good to know. In my monolith applicaiton, we have many places that a type decorator could live, so a standard set of imports isn't totally doable, and the overhead of custom code for adding dynamic imports sound like a heavier burden on the team, also not allowed by my teams' "migrations does not depend on app" requirement.

there already is! make yourself a render_item callable that does this for all typedecorators, I will gladly accept an add for cookbook for this:

Nice! I'll try this out, and attempt to write up a cookbook recipe.

@saifelse
Copy link
Contributor Author

saifelse commented Jan 5, 2024

re: render_item , one thing to note is that with my proposed implementation when dealing with impl = JSONB case, we get the from sqlalchemy.dialects import postgresql import for free since it uses the existing machinery, whereas using render_item , I think I'll have to duplicate some of the logic from

if mod.startswith("sqlalchemy.dialects"):
match = re.match(r"sqlalchemy\.dialects\.(\w+)", mod)
assert match is not None
dname = match.group(1)
if imports is not None:
imports.add("from sqlalchemy.dialects import %s" % dname)
if impl_rt:
return impl_rt
else:
return "%s.%r" % (dname, type_)
elif impl_rt:
return impl_rt
elif not _skip_variants and sqla_compat._type_has_variants(type_):
return _render_Variant_type(type_, autogen_context)
elif mod.startswith("sqlalchemy."):
if "_render_%s_type" % type_.__visit_name__ in globals():
fn = globals()["_render_%s_type" % type_.__visit_name__]
return fn(type_, autogen_context)
else:
prefix = _sqlalchemy_autogenerate_prefix(autogen_context)
return "%s%r" % (prefix, type_)
, which isn't the end of the world, just more complexity to maintain in userland.

One way of implementing render_item would be to do something very similar to the monkeypatch I shared above, where instead of adding in our custom logic, we directly invoke alembic.autogenerate.render._repr_type. As this is currently a private function, the recipe would be fairly brittle unless _repr_type were "public" and changes to its signature would be called out as a minor/major breaking change?

@zzzeek
Copy link
Member

zzzeek commented Jan 5, 2024

this is the thing with recipes. Think of real world use. You have an app, and you're using like two or three database-specific types, you know you're using postgresql.

Realistically, I dont really think having the TypeDecorator render is really any kind of issue, I dont really feel "maintenance" or "bugs" are impacted very much at all by this rendering and certainly not "performance", database migrations can certainly afford an extra .3 seconds startup to import your application's utility module, but as stated, there's the render_item() recipe and you can do whatever you want.

so re: render_item's "automatic logic with imports", again, real world - your app is hardcoded to Postgresql, so OK, add "from sqlalchemy.dialects import postgresql" to your script.py.mako template. that's how 99% of people would solve this, and it's not a problem. but you want to be more fancy, you can do that also, the imports collection is passed to the hook via the autogen_context, see the next paragraph at https://alembic.sqlalchemy.org/en/latest/autogenerate.html#affecting-the-rendering-of-types-themselves :

def render_item(type_, obj, autogen_context):
    """Apply custom rendering for selected items."""

    if type_ == 'type' and isinstance(obj, MySpecialType):
        # add import for this type
        autogen_context.imports.add("from mymodel import types")
        return "types.%r" % obj

    # default rendering for other objects
    return False

I think overall the vibe I want to impart is that this is a simple thing Alembic does and it allows a hook to customize, we want to keep things simple, not overthink, not qualify all of Alembic's API with N^18 flags and switches, just imperative code where needed.

@lachaib
Copy link
Contributor

lachaib commented Sep 13, 2024

Hello,
Bringing my 2 cents here, it's really bothersome for many developers, generally not proficient with alembic, even with recipes (which TBH are most of the time not well followed), to have lots of extra scripts they won't understand and be able to maintain.
Besides, some developers may use type decorators without being aware of it (for example using some from sqlalchemy-utils)

I really think there is a possibility to ease there life even to the slightest.

For instance, I took the original test case, and instead of expecting the TypeDecorator's impl to the rendering, which I understand may be complex when some type decorators may be dialect dependent, I changed the expectation to add the import automatically.
It took just one line to add:

    else:
        prefix = _user_autogenerate_prefix(autogen_context, type_)
+        autogen_context.imports.add(prefix.rstrip("."))
        return "%s%r" % (prefix, type_)

This single line of code may lighten 100s of repositories which would have to copy your above recipe.

The good thing is that this imports attribute is a set so having alembic pre-fill it would remain compatible with those who also import it manually, but for new projects or people reading carefully the release notes, that'll be less code in their repositories 😁

How would you feel about this trade-off?

@zzzeek
Copy link
Member

zzzeek commented Sep 18, 2024

    else:
        prefix = _user_autogenerate_prefix(autogen_context, type_)
+        autogen_context.imports.add(prefix.rstrip("."))
        return "%s%r" % (prefix, type_)

This single line of code may lighten 100s of repositories which would have to copy your above recipe.

The good thing is that this imports attribute is a set so having alembic pre-fill it would remain compatible with those who also import it manually, but for new projects or people reading carefully the release notes, that'll be less code in their repositories 😁

we're talking about a single line of code in their script.py.mako file or a similar line in their env.py file.

How would you feel about this trade-off?

we can't assume that a user-defined type is importable based on what it says in the __module__ attribute. the user defined type could be inside of a class definition, it could be generated on the fly from a function, and overall people usually have specific conventions they want to use for importing their own library code into source files. by hardcoding it here means it creates a bigger problem for all these cases that something "magical and wrong" is happening.

as mentioned above, we have a prominent documentation section which explains why Alembic works the way it does as well as exactly what someone needs to do when they are using non-SQLAlchemy types. SQLAlchemy and Alembic both subscribe to a philosophy of zero assumptions about a user environment and it's assumed that users will be reading documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autogenerate - rendering cookbook requested use case not quite a feature and not quite a bug, something we just didn't think of
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants