-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Allow sql alchemy connection factory as input to read_sql #2071
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2071 +/- ##
==========================================
- Coverage 85.27% 84.97% -0.30%
==========================================
Files 68 68
Lines 7258 7293 +35
==========================================
+ Hits 6189 6197 +8
- Misses 1069 1096 +27
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, but I think we might want to wrap self.conn
in our own ConnectionFactory abstraction so that we can avoid lots of "matching" with isinstance(conn, str)
in the rest of the code. Lmk your thoughts?
fc51e80
to
55bc9e5
Compare
Consolidated the "matching" in a SQLConnection object, which handles any functionality that deals with either url or connection factories, such as retrieving dialect, executing sql, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Just some questions/nits
try: | ||
return self._execute_sql_query(sql) | ||
except RuntimeError as e: | ||
if limit is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh dang, what is the use-case for retrying without a limit
? Sounds pretty expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some dbs don't support limit: https://stackoverflow.com/questions/2832013/can-you-name-a-single-popular-database-that-doesnt-support-limit-statement
however this shouldn't be necessary once we use a sql generation library to help build our queries.
tests/integration/sql/test_sql.py
Outdated
|
||
import daft | ||
from daft.context import set_execution_config | ||
from tests.conftest import assert_df_equals | ||
from tests.integration.sql.conftest import TEST_TABLE_NAME | ||
|
||
|
||
@pytest.fixture(scope="session", params=["url", "conn"]) | ||
def db_conn(request, test_db): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to fully parametrize every test with url vs conn? Or can we just have one dedicated test to ensure that passing in a conn instead of a URL works as expected?
Just a little concerned that it might bloat our tests, without really giving us much more coverage since we're just passing in a SQL statement in either case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess we don't, since the sqlalchemy connection path is also tested via the Trino connections as it's not supported by ConnectorX. Will remove this parametrization and add a dedicated test
f80f0eb
to
df73dd0
Compare
Closes #2072
Support sql alchemy connection factory as input (same as pandas)
Sql alchemy connection is nice because it gives info on dialect, driver, url, which will fit in nicely for our partitioning + predicate pushdowns.