Skip to content
This repository has been archived by the owner on Nov 30, 2022. It is now read-only.

[#557] MSSQL discovery script #581

Merged
merged 9 commits into from
Jun 3, 2022
Merged

[#557] MSSQL discovery script #581

merged 9 commits into from
Jun 3, 2022

Conversation

seanpreston
Copy link
Contributor

@seanpreston seanpreston commented May 31, 2022

Purpose

This PR adds a script to explore the schemas of databases within a MSSQL master instance.

Checklist

  • Update CHANGELOG.md file
    • Merge in main so the most recent CHANGELOG.md file is being appended to
    • Add description within the Unreleased section in an appropriate category. Add a new category from the list at the top of the file if the needed one isn't already there.
    • Add a link to this PR at the end of the description with the PR number as the text. example: #1
  • Applicable documentation updated (guides, quickstart, postman collections, tutorial, fidesdemo, database diagram.
  • If docs updated (select one):
    • documentation complete, or draft/outline provided (tag docs-team to complete/review on this branch)
    • documentation issue created (tag docs-team to complete issue separately)
  • Good unit test/integration test coverage
  • This PR contains a DB migration. If checked, the reviewer should confirm with the author that the down_revision correctly references the previous migration before merging
  • The Run Unsafe PR Checks label has been applied, and checks have passed, if this PR touches any external services

Ticket

Fixes #557

@@ -0,0 +1,70 @@
import sqlalchemy

MASTER_MSSQL_URL = ""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentionally blank, so that anyone using this script can add in creds before runtime

db_name = db_name[0]
try:
columns = engine.execute(
f"SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE FROM {db_name}.INFORMATION_SCHEMA.COLUMNS;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can’t think of any way this this would be set outside this function, but if it was it seems vulnerable to SQL injection.

I don’t see a way to get to it from the outside so I’m probably being over paranoid.

flagged_datatypes.add(data_type)
flagged_columns.append(f"{table}.{column}: {data_type}")

print(f"{len(set(all_columns))} columns found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want a count of all columns why use a set? This would be all unique columns rather than all columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the de-dupe the list, since all columns should be unique

Copy link
Contributor

@sanders41 sanders41 Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They might not be unique across tables. What if there is table1.unsupported_field and table2.unsupported_field? Then won't the count be 1, but really there are 2 from two different tables?

flagged_columns.append(f"{table}.{column}: {data_type}")

print(f"{len(set(all_columns))} columns found")
print(f"{len(set(flagged_columns))} columns flagged")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, why a set? Should this say unique columns?

f"SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE FROM {db_name}.INFORMATION_SCHEMA.COLUMNS;"
).all()
except Exception:
# print(f"Access to {db_name}'s tables denied.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this commented out line left intentionally?



# NB. These are connection secrets, never ever commit these
USER = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an option to read these from environment variables to make it less likely to commit them.

@sanders41
Copy link
Contributor

sanders41 commented Jun 2, 2022

@seanpreston your changes look good. A bunch of tests are failing, but they all look like frontend tests. I don't think they are relevant to your PR?

@seanpreston seanpreston changed the title 557 mssql discovery [#557] MSSQL discovery script Jun 3, 2022
@seanpreston
Copy link
Contributor Author

bunch of tests are failing, but they all look like frontend tests. I don't think they are relevant to your PR?

Those failures are left over from the Github Actions downtime

@sanders41 sanders41 merged commit 71492f1 into main Jun 3, 2022
@sanders41 sanders41 deleted the 557-mssql-discovery branch June 3, 2022 12:20
sanders41 pushed a commit that referenced this pull request Sep 22, 2022
* adds script to discover mssql datastore compatibility

* make prints consistent

* updates changelog

* add URL template

* add warning

* store columns correctly

* move uncomitted secrets to another file to add to .gitignore

* remove empty secrets file, add to gitignore

* add comment explaining lack of secrets
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Spike] Investigate MSSQL Datastores
2 participants