[FEAT] Allow sql alchemy connection factory as input to read_sql #2071

colin-ho · 2024-04-02T00:14:47Z

Closes #2072

Support sql alchemy connection factory as input (same as pandas)

Sql alchemy connection is nice because it gives info on dialect, driver, url, which will fit in nicely for our partitioning + predicate pushdowns.

codecov · 2024-04-02T02:37:36Z

Codecov Report

Attention: Patch coverage is 22.80702% with 88 lines in your changes are missing coverage. Please review.

Project coverage is 84.97%. Comparing base (9ccdc48) to head (64cc1ba).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2071      +/-   ##
==========================================
- Coverage   85.27%   84.97%   -0.30%     
==========================================
  Files          68       68              
  Lines        7258     7293      +35     
==========================================
+ Hits         6189     6197       +8     
- Misses       1069     1096      +27

Files	Coverage Δ
daft/io/_sql.py	`52.38% <60.00%> (-0.57%)`	⬇️
daft/table/table_io.py	`88.61% <33.33%> (ø)`
daft/sql/sql_scan.py	`30.08% <11.76%> (+0.08%)`	⬆️
daft/sql/sql_connection.py	`22.47% <22.47%> (ø)`

jaychia

Nice, but I think we might want to wrap self.conn in our own ConnectionFactory abstraction so that we can avoid lots of "matching" with isinstance(conn, str) in the rest of the code. Lmk your thoughts?

daft/io/_sql.py

daft/sql/sql_scan.py

colin-ho · 2024-04-09T22:14:34Z

Nice, but I think we might want to wrap self.conn in our own ConnectionFactory abstraction so that we can avoid lots of "matching" with isinstance(conn, str) in the rest of the code. Lmk your thoughts?

Consolidated the "matching" in a SQLConnection object, which handles any functionality that deals with either url or connection factories, such as retrieving dialect, executing sql, etc.

jaychia

Looking good! Just some questions/nits

daft/sql/sql_connection.py

jaychia · 2024-04-10T20:12:47Z

daft/sql/sql_connection.py

+        try:
+            return self._execute_sql_query(sql)
+        except RuntimeError as e:
+            if limit is not None:


Oh dang, what is the use-case for retrying without a limit? Sounds pretty expensive.

some dbs don't support limit: https://stackoverflow.com/questions/2832013/can-you-name-a-single-popular-database-that-doesnt-support-limit-statement

however this shouldn't be necessary once we use a sql generation library to help build our queries.

daft/sql/sql_scan.py

jaychia · 2024-04-10T20:18:47Z

tests/integration/sql/test_sql.py


 import daft
 from daft.context import set_execution_config
 from tests.conftest import assert_df_equals
 from tests.integration.sql.conftest import TEST_TABLE_NAME


+@pytest.fixture(scope="session", params=["url", "conn"])
+def db_conn(request, test_db):


Do we need to fully parametrize every test with url vs conn? Or can we just have one dedicated test to ensure that passing in a conn instead of a URL works as expected?

Just a little concerned that it might bloat our tests, without really giving us much more coverage since we're just passing in a SQL statement in either case.

Yeah I guess we don't, since the sqlalchemy connection path is also tested via the Trino connections as it's not supported by ConnectorX. Will remove this parametrization and add a dedicated test

…t later

github-actions bot added the enhancement New feature or request label Apr 2, 2024

colin-ho marked this pull request as ready for review April 2, 2024 17:15

colin-ho requested a review from samster25 April 2, 2024 17:16

jaychia self-requested a review April 8, 2024 23:21

jaychia reviewed Apr 9, 2024

View reviewed changes

daft/io/_sql.py Show resolved Hide resolved

daft/sql/sql_scan.py Outdated Show resolved Hide resolved

daft/sql/sql_scan.py Outdated Show resolved Hide resolved

daft/sql/sql_scan.py Show resolved Hide resolved

colin-ho force-pushed the colin/read_sql_refactor branch from fc51e80 to 55bc9e5 Compare April 9, 2024 21:37

colin-ho requested a review from jaychia April 9, 2024 22:13

jaychia approved these changes Apr 10, 2024

View reviewed changes

colin-ho added 15 commits April 10, 2024 16:08

allow sqlalchemy conn

5741c2f

fix type checking

92f4ffd

add another type checking check

e9fb31a

add test case for bad conn

6a14b6d

missed one

a5fcde2

missed one

aca62ef

cleanup + improve docs

a9c32a5

dont need to use url from conn factory

64a2690

reorder test url

1724874

refactor

5b026c5

pass dialect instead of predicate sql, makes things easier for sqlglo…

494acb4

…t later

sql connection class

6aa7e8a

use try catch instead

1103c66

add attr check as well

0fa185e

nits

df73dd0

colin-ho force-pushed the colin/read_sql_refactor branch from f80f0eb to df73dd0 Compare April 10, 2024 23:08

fix import

64cc1ba

colin-ho merged commit 3e73d74 into main Apr 11, 2024
30 of 31 checks passed

colin-ho deleted the colin/read_sql_refactor branch April 11, 2024 00:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Allow sql alchemy connection factory as input to read_sql #2071

[FEAT] Allow sql alchemy connection factory as input to read_sql #2071

colin-ho commented Apr 2, 2024 •

edited

Loading

codecov bot commented Apr 2, 2024 •

edited

Loading

jaychia left a comment

colin-ho commented Apr 9, 2024

jaychia left a comment

jaychia Apr 10, 2024

colin-ho Apr 10, 2024

jaychia Apr 10, 2024

colin-ho Apr 10, 2024

[FEAT] Allow sql alchemy connection factory as input to read_sql #2071

[FEAT] Allow sql alchemy connection factory as input to read_sql #2071

Conversation

colin-ho commented Apr 2, 2024 • edited Loading

codecov bot commented Apr 2, 2024 • edited Loading

Codecov Report

jaychia left a comment

Choose a reason for hiding this comment

colin-ho commented Apr 9, 2024

jaychia left a comment

Choose a reason for hiding this comment

jaychia Apr 10, 2024

Choose a reason for hiding this comment

colin-ho Apr 10, 2024

Choose a reason for hiding this comment

jaychia Apr 10, 2024

Choose a reason for hiding this comment

colin-ho Apr 10, 2024

Choose a reason for hiding this comment

colin-ho commented Apr 2, 2024 •

edited

Loading

codecov bot commented Apr 2, 2024 •

edited

Loading