Presto Adapter (#1106, #1229, #1230) #1245

beckjake · 2019-01-17T00:08:13Z

Implement a presto adapter. Presto required some massaging of the SQL adapter class as it has some novel properties - the local client does its own transaction+session management, the cursor doesn't have a status available, and queries do not appear to execute until results are fetched.

Working:

Views/tables/seeds
error handling
configuration
Linux tests, including integration tests against a local docker-compose setup on Linux.
Transactions, apparently, though I suspect there are bugs here
The various required macros
A nice little script that generates a bunch of boilerplate for you

Not Working/incomplete:

archives probably don't work and also don't raise an exception about it (see Presto: raise not implemented exceptions for unsupported functionality #1228)
incremental models don't work and raise an exception about it
I don't think the integration tests on Windows will work, and I think getting them to work on appveyor will be unpleasant, at best
I can't come up with a nice way to test query cancellation and I'm not sure I trust it. ctrl+c does work locally but that doesn't mean queries are actually canceled.
never tested against a real database
only "no auth required" has been tested. I don't even know how to test kerberos!
never tested with s3 backing instead of hdfs

Things to consider that might make plugin development easier:

Should schema.py's column definitions get rolled into adapters and subclasses like connection management/adapter implementations? It seems wrong to have to add database-specific type information to core in support of a plugin!
commit/rollback/begin transaction are not all defined in the same place. We canonically do rollback via a method but begin/commit transactions via queries. That seems odd and caught me by surprise when I was implementing this.

- fix sql docstring, which is just wrong. - extract the default seed materialization to one that takes a chunk size param - add NUMBERS constant for all number types dbt should support inserting - update Column definition to allow for "varchar"

- seeds, views, tables, and ephemeral models all implemented - ConnectionManager methods and credentials - Adapter.date_function - macros - give presto a small chunk size for seeds - manually cascade drop_schema - stub out some stuff that does not exist in presto - stub out incremental materializations for now (delete is not useful) - stub out query cancellation for now

drewbanin

Some tiny comments here, will review more completely after poking around with this branch for a little bit :)

drewbanin · 2019-01-17T16:25:20Z

plugins/presto/dbt/adapters/presto/connections.py

+        queries = [q.rstrip(';') for q in sqlparse.split(sql)]
+
+        for individual_query in queries:
+            # hack -- after the last ';', remove comments and don't run


I think there were issues with this hack. For one, I think this splits on semicolons found inside of sql comments. Passable for the moment I think, but let's consider if it makes sense to handle this differently

I like the idea of pulling the logic out into the SQLAdapter. We need to do this basically everywhere

drewbanin

I was able to test this out on EMR Presto with both the Hive metastore, as well as with AWS Glue. In both cases, this code worked basically as expected! There is one caveat though -- you must set the following configs when launching your Presto cluster:

hive.metastore-cache-ttl=0s
hive.metastore-refresh-interval = 5s
hive.allow-drop-table=true
hive.allow-rename-table=true

Let's be sure to document these configs for our eventual release.

I do think that we need to do more stress testing here to feel really comfortable about deploying this code, but I feel pretty good about getting this merged and then making small tweaks as needed.

Between the quality of this code (it looks great!) and some initial testing in a realistic prod environment, I am happy to say LGTM, , 🚢, 🎉

Jacob Beck added 8 commits January 16, 2019 16:46

initial work on an adapter building script

f6278d5

Create a docker-compose container system for presto testing

da2d7ea

more work on the create script

016afd4

ran the create script

27842f4

Presto testo

85389af

make cancellation work, pretty ugly and no clue how to test it

491d593

beckjake changed the title ~~Presto Adapter (#1106, #1229, #1230) [WIP]~~ Presto Adapter (#1106, #1229, #1230) Jan 17, 2019

drewbanin self-requested a review January 17, 2019 16:20

drewbanin reviewed Jan 17, 2019

View reviewed changes

beckjake mentioned this pull request Jan 17, 2019

presto notimplemented errors (#1228) #1246

Merged

drewbanin approved these changes Jan 21, 2019

View reviewed changes

beckjake merged commit 477699a into dev/stephen-girard Jan 23, 2019

beckjake deleted the feature/presto-adapter branch February 22, 2019 20:21

kzarzycki mentioned this pull request May 7, 2019

support hive as an adapter #559

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Presto Adapter (#1106, #1229, #1230) #1245

Presto Adapter (#1106, #1229, #1230) #1245

beckjake commented Jan 17, 2019 •

edited

Loading

drewbanin left a comment

drewbanin Jan 17, 2019

drewbanin Jan 17, 2019

drewbanin left a comment

Presto Adapter (#1106, #1229, #1230) #1245

Presto Adapter (#1106, #1229, #1230) #1245

Conversation

beckjake commented Jan 17, 2019 • edited Loading

drewbanin left a comment

Choose a reason for hiding this comment

drewbanin Jan 17, 2019

Choose a reason for hiding this comment

drewbanin Jan 17, 2019

Choose a reason for hiding this comment

drewbanin left a comment

Choose a reason for hiding this comment

beckjake commented Jan 17, 2019 •

edited

Loading