-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Presto Adapter (#1106, #1229, #1230) #1245
Conversation
- fix sql docstring, which is just wrong. - extract the default seed materialization to one that takes a chunk size param - add NUMBERS constant for all number types dbt should support inserting - update Column definition to allow for "varchar"
- seeds, views, tables, and ephemeral models all implemented - ConnectionManager methods and credentials - Adapter.date_function - macros - give presto a small chunk size for seeds - manually cascade drop_schema - stub out some stuff that does not exist in presto - stub out incremental materializations for now (delete is not useful) - stub out query cancellation for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tiny comments here, will review more completely after poking around with this branch for a little bit :)
queries = [q.rstrip(';') for q in sqlparse.split(sql)] | ||
|
||
for individual_query in queries: | ||
# hack -- after the last ';', remove comments and don't run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there were issues with this hack. For one, I think this splits on semicolons found inside of sql comments. Passable for the moment I think, but let's consider if it makes sense to handle this differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of pulling the logic out into the SQLAdapter. We need to do this basically everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to test this out on EMR Presto with both the Hive metastore, as well as with AWS Glue. In both cases, this code worked basically as expected! There is one caveat though -- you must set the following configs when launching your Presto cluster:
hive.metastore-cache-ttl=0s
hive.metastore-refresh-interval = 5s
hive.allow-drop-table=true
hive.allow-rename-table=true
Let's be sure to document these configs for our eventual release.
I do think that we need to do more stress testing here to feel really comfortable about deploying this code, but I feel pretty good about getting this merged and then making small tweaks as needed.
Between the quality of this code (it looks great!) and some initial testing in a realistic prod environment, I am happy to say LGTM, , 🚢, 🎉
Resolves #1106, #1229, #1230
Implement a presto adapter. Presto required some massaging of the SQL adapter class as it has some novel properties - the local client does its own transaction+session management, the cursor doesn't have a status available, and queries do not appear to execute until results are fetched.
Working:
Not Working/incomplete:
Things to consider that might make plugin development easier: