Skip to content

Commit

Permalink
[BUG] Translate mssql to tsql in read_sql scan (#2330)
Browse files Browse the repository at this point in the history
When running `read_sql` against SQL server using a SQL Alchemy
connection, for example:
```
connection_url = sqlalchemy.engine.URL.create(
    "mssql+pyodbc",
    username=user,
    password=password,
    host=host,
    port=1433,
    database=database,
    query={
        "driver": "ODBC Driver 18 for SQL Server",
    },
)
def create_conn():
    return sqlalchemy.create_engine(connection_url).connect()

df = daft.read_sql("SELECT * FROM test_data", create_conn)
```

The query errors with `Unsupported dialect: mssql, please refer to the
documentation for supported dialects`.

This is because SQLGlot, the library that read_sql uses for query
construction, does not recognize `mssql` as a dialect, it instead
recognizes `tsql`, which is the name of the SQL dialect for Microsoft
SQL Server:
https://learn.microsoft.com/en-us/sql/t-sql/language-reference?view=sql-server-ver16

This PR adds a translation step during sql query construction to fix
this issue.

NOTE:
- This PR was tested locally against a Docker instance of Azure SQL
Edge.
  • Loading branch information
colin-ho authored May 31, 2024
1 parent 209a4e0 commit 0ba9a19
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions daft/sql/sql_scan.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,9 @@ def _construct_sql_query(
# sqlglot does not support "postgresql" dialect, it only supports "postgres"
if target_dialect == "postgresql":
target_dialect = "postgres"
# sqlglot does not recognize "mssql" as a dialect, it instead recognizes "tsql", which is the SQL dialect for Microsoft SQL Server
elif target_dialect == "mssql":
target_dialect = "tsql"

if not any(target_dialect == supported_dialect.value for supported_dialect in sqlglot.Dialects):
raise ValueError(
Expand Down

0 comments on commit 0ba9a19

Please sign in to comment.