Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL Boolean autogenerate comparson to TINYINT #605

Closed
johaven opened this issue Oct 1, 2019 · 20 comments
Closed

MySQL Boolean autogenerate comparson to TINYINT #605

johaven opened this issue Oct 1, 2019 · 20 comments
Labels
autogenerate - detection autogenerate for enums a long term subject, tagging issues related to this bug Something isn't working data types mysql

Comments

@johaven
Copy link

johaven commented Oct 1, 2019

Alembic version: 1.2.1
SQL: MariaDB

Each upgrade (with compare_type enabled) duplicate constraints :

paid = db.Column(db.Boolean)
satisfaction = db.Column(db.Boolean)
  CONSTRAINT `CONSTRAINT_1` CHECK (`paid` in (0,1)),
  CONSTRAINT `CONSTRAINT_2` CHECK (`satisfaction` in (0,1)),
  CONSTRAINT `CONSTRAINT_3` CHECK (`paid` in (0,1)),
  CONSTRAINT `CONSTRAINT_4` CHECK (`satisfaction` in (0,1)),
  CONSTRAINT `CONSTRAINT_5` CHECK (`paid` in (0,1)),
  CONSTRAINT `CONSTRAINT_6` CHECK (`satisfaction` in (0,1)),
  CONSTRAINT `CONSTRAINT_7` CHECK (`paid` in (0,1)),
  CONSTRAINT `CONSTRAINT_8` CHECK (`satisfaction` in (0,1)),
  CONSTRAINT `CONSTRAINT_9` CHECK (`paid` in (0,1)),
  CONSTRAINT `CONSTRAINT_10` CHECK (`satisfaction` in (0,1)),
  CONSTRAINT `CONSTRAINT_11` CHECK (`paid` in (0,1)),
  CONSTRAINT `CONSTRAINT_12` CHECK (`satisfaction` in (0,1)),
  CONSTRAINT `CONSTRAINT_13` CHECK (`paid` in (0,1)),
  CONSTRAINT `CONSTRAINT_14` CHECK (`satisfaction` in (0,1)),
  CONSTRAINT `CONSTRAINT_15` CHECK (`paid` in (0,1)),
  CONSTRAINT `CONSTRAINT_16` CHECK (`satisfaction` in (0,1)),
  CONSTRAINT `CONSTRAINT_17` CHECK (`paid` in (0,1)),
  CONSTRAINT `CONSTRAINT_18` CHECK (`satisfaction` in (0,1))
@zzzeek zzzeek added question usage and API questions bug Something isn't working mysql and removed question usage and API questions labels Oct 1, 2019
@zzzeek
Copy link
Member

zzzeek commented Oct 1, 2019

this is because compare_type does not detect a MySQL TINYINT column as the same as Boolean, so you will need to adjust these migrations manually or use a custom type comparator.

there is then a second issue that when "changing" the type of the column, it does not drop the old constraint.

unfortunately there is no good way to automate these things and I have no means of fixing this issue in the near future.

@zzzeek zzzeek changed the title Constraints are duplicated MySQL Boolean autogenerate comparson to TINYINT; auto-constraints not dropped for MySQL change type Oct 1, 2019
@zzzeek zzzeek added data types autogenerate for enums a long term subject, tagging issues related to this labels Oct 1, 2019
@zzzeek
Copy link
Member

zzzeek commented Oct 1, 2019

this is not specifically "enums" but this is the same area of issue.

@johaven
Copy link
Author

johaven commented Oct 1, 2019

I do not understand totally because logically when doing the upgrade there is a conversion to the SQL language based on the type of database, no? Starting from this idea, it is possible to compare what has been generated in SQL and the structure of the table (for example via a "show create table"). I am wrong ?

@zzzeek
Copy link
Member

zzzeek commented Oct 1, 2019

MySQL has a datatype called "TINYINT". You can use this datatype directly in your schema, but it is not a SQL standard type; only MySQL has it. SQLAlchemy has a datatype called "Boolean" that generates any number of different datatypes on databases, depending on what capabilities they have. On SQL Server, it uses BIT; on Postgresql, BOOLEAN, on MySQL, TINYINT.

The thing is you can use TINYINT / BIT / etc. without them corresponding to SQLAlchemy's Boolean.

Alembic therefore does not, at the moment, make the assumption that if a metadata model has Boolean, and the target database has TINYINT, that this TINYINT is the same datatype.

This is 100% a bug, because even if the user did use TINYINT(display_width=1), and then changed their column to Boolean, that should not produce a change in datatype.

That part of this is not too difficult to fix but needs a comprehensive approach and tests throughout the library to fully release.

However, with all of that, the situation gets much worse. SQLAlchemy's Boolean and Enuim datatypes also generate CHECK constraints, which is a decision I would never have made today if I could have seen these issues coming. There's no autogenerate support for CHECK constraints either, and this would be very difficult to do generically. However, again in the case of Boolean / Enum, we should be taking a bit of a guess that there likely is a CHECK constraint on the table when these datatypes are used. However, that is also quite complicated because these CHECK constraints have no particular name (they are unnamed by default and as you can see MySQL just generates a name), and we'd have to basically search for them and parse their SQL text on a per-database basis in order to identify the ones that we think are relevant to our particular database column.

So we can solve your immediate issue here with the first part of this but there's a longer term problem in locating and finding these CHECK constraints, such as if you changed your database type from Boolean to INTEGER, we'd need to DROP that constraint.

@johaven
Copy link
Author

johaven commented Oct 1, 2019

I understand the difficulty now, thank you very much for the explanation.

The best solution (for now) would be to disable the creation of constraints when desired.
There is an option that allows this when declaring a field like Boolean (https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.Boolean).

I tested it on my fields:

paid = db.Column(db.Boolean, create_constraint=False)
satisfaction = db.Column(db.Boolean, create_constraint=False)

But this does not work currently in alembic, the constraints are still created :(
Independently i have this error when i use the create_constraint option:

/../sqlalchemy/sql/base.py:299: SAWarning: Can't validate argument 'create_constraint'; can't locate any SQLAlchemy dialect named 'create'
  % (k, dialect_name)

@zzzeek
Copy link
Member

zzzeek commented Oct 1, 2019

that's an argument of boolean:

Boolean(create_constraint=False)

@johaven
Copy link
Author

johaven commented Oct 1, 2019

Sorry, i am tired 😅 , it works with correct initialization:

paid = db.Column(db.Boolean(create_constraint=False))
satisfaction = db.Column(db.Boolean(create_constraint=False))

@johaven
Copy link
Author

johaven commented Oct 1, 2019

that's an argument of boolean:

Boolean(create_constraint=False)

Faster than me :) thanks anyway !

@pbecotte
Copy link
Contributor

pbecotte commented Oct 7, 2019

I have been looking at this a bit. It kind of feels like the issue is with the adapt api in sqlalchemy. While we could put special cases directly in alembic to handle this, there are other places it could be useful. My thought was that we could use the native attributes like supports_native_boolean here. So the Boolean class would return Boolean if you check the affinity/adapt from Postgres, but would report an Integer affinity in Mysql where that variable is False.

I was thinking could move a bit of logic out to the type system- instead of type_api.adapt_type walking the Inheritance chain, have it call a method on the type that will do it instead. Maybe walk up the chain until the first type that has the method defined? Then for Enum, Bool, Sequence etc it could be overridden with the multiple possibilities.

Does it sound like that is a useful path to explore, or am I thinking about this all wrong?

@zzzeek
Copy link
Member

zzzeek commented Oct 7, 2019

I think it's a valid idea to explore having Alembic use the type adaptation system to do the compare, but as far as the "move type_api.adapt_type" part I think you need to look at the bigger dialect-level part of this that should already do what you're looking for, which would be the dialect.type_descriptor() method - that gives you the type which the dialect uses, given a generic type.

however, this won't work for the types we care about right now, e.g. boolean and enum, because there is no adaption used with these, the way Boolean becomes TINYINT is the MySQL compiler renders "BOOL" for it, and the way Enum becomes CHAR is that it just renders CHAR.

were you proposing a change on the SQLAlchemy side?

@pbecotte
Copy link
Contributor

pbecotte commented Oct 8, 2019

Yeah, my thought was to have the generic type come out as Integer instead of Boolean when the supports_native_boolean flag was false. Not at all sure how to make that happen- but wanted to check if it sounded reasonable before I dug in any further. If you think it's worth looking at, I'll explore a little more and see what the actual change would look like.

@zzzeek
Copy link
Member

zzzeek commented Oct 8, 2019

I think we'd have to rethink the approach on the SQLAlchemy side, since adaptation is only used to a limited extent, not really for DDL. A different architecture might not be a bad thing but I dont know if I have the appetite for that right now as that would be a huge change. The type comparison on the Alembic side is not really the biggest problem, a few hardcoded types is not the end of the world and it can be changed later. the bigger issue is the check constraint thing, and also the extra mechanics for Postgresql ENUMs. and all the tests for all of it.

@pbecotte
Copy link
Contributor

pbecotte commented Oct 8, 2019

Okay- I'll work on the alembic only solution. Is there a way to introspect WHICH of those integer columns (TINYINT, BIT, etc) is used on a particular dialect, or do I just need to stick a switch statement of some sort (or an attribute on the dialect?)

@zzzeek
Copy link
Member

zzzeek commented Oct 8, 2019

yeah...nothing too great, it's like visit_BOOLEAN -> BOOL in code like this: https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/dialects/mysql/base.py#L2057 there's a class called TypeCompiler that delivers the string representation of the type for each dialect. For Enum->CHAR it's in the base, because there is no enum type: https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/sql/compiler.py#L3398 then for mysql it switches at https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/dialects/mysql/base.py#L2018 .

for table reflection, there's this whole "let's render the DDL for a type" but in reverse, it goes back to the more specific type, e.g. in this mapping: https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/dialects/mysql/base.py#L1145 . this is what table reflection uses to get back the reflected types, in the case of MySQL BOOL we get TINYINT.

so you can see the disconnect, the python-> DDL and DDL->python mappings are not symmetrical in special cases like this. this particular version of the typing system came along in version 0.6 or so of sqlalchemy. prior to that it might have worked in more of a class-hierarchy / class-adaptation system kind of way, the TypeCompiler thing came in to provide more fluidity to the system and to get rid of lots of extra classes and such.

So there is, in fact, a way you could automate the comparison here. you could run the TypeCompiler and compare the strings:

>>> from sqlalchemy import Enum
>>> from sqlalchemy.dialects import mysql
>>> mysql.dialect().type_compiler.process(Enum('x', 'y', 'z', native_enum=False))
'VARCHAR(1)'
>>> mysql.dialect().type_compiler.process(mysql.VARCHAR(1))
'VARCHAR(1)'

>>> mysql.dialect().type_compiler.process(Enum('x', 'y', 'z'))
"ENUM('x','y','z')"
>>> mysql.dialect().type_compiler.process(mysql.ENUM('x', 'y', 'z'))
"ENUM('x','y','z')"

here it is doing a tricky one for Postgresql:

>>> from sqlalchemy.dialects import postgresql
>>> from sqlalchemy import DateTime
>>> postgresql.dialect().type_compiler.process(DateTime())
'TIMESTAMP WITHOUT TIME ZONE'
>>> postgresql.dialect().type_compiler.process(postgresql.TIMESTAMP())
'TIMESTAMP WITHOUT TIME ZONE'

still, the mysql impl needs to know that BOOL and TINYINT are synonymous:

>>> from sqlalchemy import Boolean
>>> mysql.dialect().type_compiler.process(Boolean())
'BOOL'
>>> mysql.dialect().type_compiler.process(mysql.TINYINT())
'TINYINT'

I would probably have a short translation dicttionary in the alembic MySQL impl class for things like "BOOL" -> "TINYINT".

this would also be a way that the type comparison could pick up on really small things like changes in length and things like that.

i haven't had breakfast yet and wondering if I'm overlooking something really obviously not-working about this, because it seems like it might solve most of the problem

@pbecotte
Copy link
Contributor

pbecotte commented Oct 8, 2019

Was looking at the compare_columns function in DefaultImpl. Something like this-

  def compare_type(self, inspector_column, metadata_column):
        # snip
        if conn_type._compare_type_affinity(metadata_impl):
            comparator = _type_comparators.get(conn_type._type_affinity, None)
            return comparator and comparator(metadata_impl, conn_type)

        return self.compare_non_native_types(inspector_column, metadata_column)

    def compare_non_native_types(self, inspector_column, metadata_column):
        comparator = self.non_native_comparator.get(metadata_column.type._type_affinity)
        if comparator:
            return comparator(inspector_column)
        return True

And then in MySqlImpl-

class MySQLImpl(DefaultImpl):
    __dialect__ = "mysql"

    transactional_ddl = False
    non_native_comparator = {
        sqltypes.Boolean: lambda col: not (isinstance(col.type, TINYINT) and col.type.display_width == 1),
    }

That code gets the simple tests I added passing, but... not familiar enough with the code base to know if that's a good or bad way of sticking that logic in there haha. Will also have to go through and look at the various database implementations to flesh out that mapping, but again wanted to run that through.

I also noticed that the current implementation does not show changes between TINYINT() and TINYINT(display_width=1) - is this intentional, or should this be something that we are hoping to update as well? Last question- I was thinking adding one more step there- check if ANY check constraint(1, 0) exists on that column- if it doesn't then return True so that it gets added. Would involve a more introspection queries though.

@zzzeek
Copy link
Member

zzzeek commented Oct 9, 2019

so sure, that looks like a start where we just have some rules for these specific types, that is fine. But I wonder if making a more dramatic leap and just comparing all types on the string representation like in my example above returned by the dialect would solve everything at once ? as far as comparing the elements inside the types, yes that is also something that "doesn't work" etiher right now. the string comparison approach would likely help with that also but might also produce false positives.

@pbecotte
Copy link
Contributor

pbecotte commented Nov 8, 2019

I put in a first draft PR for this. As you guessed, using the type compiler gets you 90% of the way there- the only things left to handle are default precisions and the couple weird cases we already mentioned. All feedback welcome (am going to look at the constraint issue as well - have some ideas)

@sqla-tester
Copy link
Collaborator

Paul Becotte has proposed a fix for this issue in the master branch:

Update the type comparison code used for schema autogeneration. Compare https://gerrit.sqlalchemy.org/1561

@zzzeek zzzeek changed the title MySQL Boolean autogenerate comparson to TINYINT; auto-constraints not dropped for MySQL change type MySQL Boolean autogenerate comparson to TINYINT Feb 4, 2020
@zzzeek
Copy link
Member

zzzeek commented Feb 4, 2020

split out the CHECK part of this, which I'm fairly pessimistic anything is going to really happen for a long time, into #652.

@sqla-tester
Copy link
Collaborator

Paul Becotte has proposed a fix for this issue in the master branch:

Update the type comparison code used for schema autogeneration. Compare https://gerrit.sqlalchemy.org/1561

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autogenerate - detection autogenerate for enums a long term subject, tagging issues related to this bug Something isn't working data types mysql
Projects
None yet
Development

No branches or pull requests

4 participants