Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pyodbc with pymssql #3435

Merged
merged 24 commits into from
Jun 16, 2023
Merged

Conversation

ThomasLaPiana
Copy link
Contributor

@ThomasLaPiana ThomasLaPiana commented Jun 2, 2023

Closes #3152

Note to reviewers: as I developed this on an x86 platform, I need help confirming from ARM users that everything works as expected!

Code Changes

  • add to requirements.txt
  • remove the dangerous-requirements.txt
  • update the Dockerfile to remove the platform-checking magic
  • update documentation
  • remove the logic from setup.py
  • fix how MS SQL test data is seeded

Steps to Confirm

  • on an ARM chip (Mac M-series), the nox -s build command works as expected (doesn't use emulation, chooses the right platform automatically)
  • on an ARM chip, connecting to an MSSQL server instance with pymssql works (can try running the ops integration tests locally)

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation:
    • documentation complete, updated in our dev docs
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md

Description Of Changes

This is an investigation into and a possible solution for replacing pyodbc with pymssql as our SQL Server adapter. It doesn't come with the same platform caveats as pyodbc and therefore would simplify and streamline our build/deploy logic.

This PR fully removes some of the platform-specific "magic" we've been doing, and simplifies the build session as well as the Dockerfile. We're no longer locked out of mssql connections on ARM chips

@ThomasLaPiana ThomasLaPiana self-assigned this Jun 2, 2023
@cypress
Copy link

cypress bot commented Jun 2, 2023

Passing run #2743 ↗︎

0 4 0 0 Flakiness 0
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.

Details:

Merge 5f28b54 into 45c0bca...
Project: fides Commit: cacbb82ba6 ℹ️
Status: Passed Duration: 00:52 💡
Started: Jun 16, 2023 6:50 AM Ended: Jun 16, 2023 6:51 AM

This comment has been generated by cypress-bot as a result of this project's GitHub integration settings.

@codecov
Copy link

codecov bot commented Jun 2, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.11 🎉

Comparison is base (ba4f5e4) 87.01% compared to head (9b9adf5) 87.13%.

❗ Current head 9b9adf5 differs from pull request most recent head 057db90. Consider uploading reports for the commit 057db90 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3435      +/-   ##
==========================================
+ Coverage   87.01%   87.13%   +0.11%     
==========================================
  Files         311      312       +1     
  Lines       18863    18762     -101     
  Branches     2402     2390      -12     
==========================================
- Hits        16414    16348      -66     
+ Misses       2025     1991      -34     
+ Partials      424      423       -1     
Impacted Files Coverage Δ
...nnection_configuration/connection_secrets_mssql.py 100.00% <ø> (ø)
src/fides/api/service/connectors/sql_connector.py 74.77% <ø> (+7.61%) ⬆️
src/fides/core/utils.py 88.33% <100.00%> (+0.09%) ⬆️

... and 43 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ThomasLaPiana ThomasLaPiana added the run unsafe ci checks Runs fides-related CI checks that require sensitive credentials label Jun 2, 2023
@ThomasLaPiana
Copy link
Contributor Author

I've narrowed the issue down to autocommit. Our test setup relies on this as it creates/drops databases but it works slightly differently with pymssql. It'll take some fenagling but there is definitely a way forward here

@ThomasLaPiana
Copy link
Contributor Author

@SteveDMurphy I'm getting this very close....

@SteveDMurphy
Copy link
Contributor

@SteveDMurphy I'm getting this very close....

Excited to see you jumped into this !! More pressing on this has come recently from @mfbrown too 👍🏽

How do you feel about the idea of switching these after getting your hands dirty a bit?? I see both are still listed as supported by Microsoft, and the recent activity on the repo is nice but I'm a little concerned the recent commits haven't generated a release candidate just yet 🤔

@ThomasLaPiana
Copy link
Contributor Author

@SteveDMurphy I'm getting this very close....

Excited to see you jumped into this !! More pressing on this has come recently from @mfbrown too 👍🏽

How do you feel about the idea of switching these after getting your hands dirty a bit?? I see both are still listed as supported by Microsoft, and the recent activity on the repo is nice but I'm a little concerned the recent commits haven't generated a release candidate just yet 🤔

@SteveDMurphy as in, how do I feel about making the switch from pyodbc to pymssql after working on this PR? My feeling is that this is vastly superior. The only reasons I've run into any problem at all was because the connectors are slightly different for writing to the database, but for reading they're identical.

This completely mitigates the issues we saw here, as well as completely removes the need for the dangerous-requirements.txt which also means we can remove some of the platform logic from the docker commands.

Overall this one change leads to significant simplification across the whole codebase. Although there haven't been recent commits, database drivers in general should be slow-moving as they are a standard and shouldn't need to be updated much. As long as it allows us to read from SQL Server databases, which hopefully should "never" change, I think we're fine here.

@ThomasLaPiana
Copy link
Contributor Author

final two test failures appear to be caused by the new driver returning a different error...going to diagnose but everything else looks like its working

@ThomasLaPiana
Copy link
Contributor Author

@SteveDMurphy something else I want to test/confirm here is that with these changes, we no longer need the platform logic in the docker_nox.py file...we should now be able to let Docker figure that out automatically since we don't have a step that potentially breaks things and requires a certain platform...

I'm going to make that change and attempt to test it myself by forcing the build platform to arm64

@ThomasLaPiana
Copy link
Contributor Author

I'm hitting some bugs locally when trying to build as an arm64 platform, seems to specifically be around openssl. Adding apt dependencies to see if I can fix it...

@ThomasLaPiana ThomasLaPiana marked this pull request as ready for review June 5, 2023 11:43
@adamsachs
Copy link
Contributor

@ThomasLaPiana nox -s build succeeded and seemed to work well for me, so that hurdle is gone. however i was still getting errors connecting to mssql when running nox -s "pytest(ops-integration) such that those integration tests are failing for me. here's an example stack trace:

connection_config_mssql = <fides.api.models.connectionconfig.ConnectionConfig object at 0xffff8173b4c0>

    @pytest.fixture(scope="function")
    def mssql_integration_session_cls(connection_config_mssql):
        uri = MicrosoftSQLServerConnector(connection_config_mssql).build_uri()
>       engine = get_db_engine(database_uri=uri)

tests/fixtures/mssql_fixtures.py:66:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.10/site-packages/fides/api/db/session.py:34: in get_db_engine
    return create_engine(
/root/.local/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py:309: in warned
    return fn(*args, **kwargs)
/root/.local/lib/python3.10/site-packages/sqlalchemy/engine/create.py:560: in create_engine
    dbapi = dialect_cls.dbapi(**dbapi_args)
/root/.local/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/pymssql.py:81: in dbapi
    module = __import__("pymssql")
/root/.local/lib/python3.10/site-packages/pymssql/__init__.py:3: in <module>
    from ._pymssql import *
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ImportError: libsybdb.so.5: cannot open shared object file: No such file or directory

src/pymssql/_pymssql.pyx:1: ImportError

haven't had a whole lot of time to look into it further but posting it now just to give an incremental update. curious what others with an M1 are seeing.

@SteveDMurphy
Copy link
Contributor

haven't had a whole lot of time to look into it further but posting it now just to give an incremental update. curious what others with an M1 are seeing.

Same errors from me, still haven't gotten in too deep but wanted to second what @adamsachs is seeing so far 👍🏽

@ThomasLaPiana
Copy link
Contributor Author

aha: pymssql/pymssql#533

@ThomasLaPiana
Copy link
Contributor Author

@adamsachs @SteveDMurphy can y'all try again? I added some more deps that might fix it...it's basically dependency whackamole at this point I think

@adamsachs
Copy link
Contributor

unfortunately still seem to be getting similar errors...

src/fides/api/service/connectors/base_connector.py:60: in client
    self.db_client = self.create_client()
src/fides/api/service/connectors/sql_connector.py:169: in create_client
    return create_engine(
<string>:2: in create_engine
    ???
/opt/fides/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py:309: in warned
    return fn(*args, **kwargs)
/opt/fides/lib/python3.10/site-packages/sqlalchemy/engine/create.py:560: in create_engine
    dbapi = dialect_cls.dbapi(**dbapi_args)
/opt/fides/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/pymssql.py:81: in dbapi
    module = __import__("pymssql")
/opt/fides/lib/python3.10/site-packages/pymssql/__init__.py:3: in <module>
    from ._pymssql import *
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ImportError: libsybdb.so.5: cannot open shared object file: No such file or directory

src/pymssql/_pymssql.pyx:1: ImportError

@ThomasLaPiana
Copy link
Contributor Author

unfortunately still seem to be getting similar errors...

src/fides/api/service/connectors/base_connector.py:60: in client
    self.db_client = self.create_client()
src/fides/api/service/connectors/sql_connector.py:169: in create_client
    return create_engine(
<string>:2: in create_engine
    ???
/opt/fides/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py:309: in warned
    return fn(*args, **kwargs)
/opt/fides/lib/python3.10/site-packages/sqlalchemy/engine/create.py:560: in create_engine
    dbapi = dialect_cls.dbapi(**dbapi_args)
/opt/fides/lib/python3.10/site-packages/sqlalchemy/dialects/mssql/pymssql.py:81: in dbapi
    module = __import__("pymssql")
/opt/fides/lib/python3.10/site-packages/pymssql/__init__.py:3: in <module>
    from ._pymssql import *
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ImportError: libsybdb.so.5: cannot open shared object file: No such file or directory

src/pymssql/_pymssql.pyx:1: ImportError

Hmmm dang. Ok, the feedback loop is a little too slow to solve this efficiently, let's try to grab some time and pair on this sync so we can get this finalized or close it

@Kelsey-Ethyca
Copy link
Contributor

before merging this can we investigate the longevity of pymssql because it might be deprecated soon?

@ThomasLaPiana
Copy link
Contributor Author

ThomasLaPiana commented Jun 9, 2023

before merging this can we investigate the longevity of pymssql because it might be deprecated soon?

Where does it look like it will be deprecated soon? Or do you mean abandoned?

I just looked at the repo and mailing list, and while it isn't the most active project it isn't really "dead" in my opinion?

@Kelsey-Ethyca
Copy link
Contributor

I was wrong. In 2019 it was stated that pymssql would be discontinued but that was obviously reversed and active today as you stated. I was kind of hoping we could get some guarantee of continued support. Don't let me hold you up any longer pymssql/pymssql#668

Copy link
Contributor

@SteveDMurphy SteveDMurphy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working locally for me, 🤞🏽 that checks pass as well 🤞🏽

@ThomasLaPiana
Copy link
Contributor Author

ThomasLaPiana commented Jun 16, 2023

it looks like the merge super broke stuff 😢

Update: figured it out, new migration went into the old folder, moved it to the new one

@ThomasLaPiana ThomasLaPiana merged commit df3a624 into main Jun 16, 2023
@ThomasLaPiana ThomasLaPiana deleted the ThomasLaPiana-replace-pyodbc-pymssql branch June 16, 2023 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run unsafe ci checks Runs fides-related CI checks that require sensitive credentials
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate pymssql vs pyodbc for MSSQL Support
4 participants