Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support presto as an adapter #1106

Closed
drewbanin opened this issue Nov 2, 2018 · 1 comment
Closed

support presto as an adapter #1106

drewbanin opened this issue Nov 2, 2018 · 1 comment
Assignees
Labels
adapter_plugins Issues relating to third-party adapter plugins help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors

Comments

@drewbanin
Copy link
Contributor

Feature

Feature description

dbt should work with Presto. Someone familiar with Presto should fill out this survey and paste the results in here.

Who will this benefit?

Presto users, users that want to query cross-database.

@drewbanin drewbanin added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors good_first_issue Straightforward + self-contained changes, good for new contributors! adapter labels Nov 12, 2018
@drewbanin
Copy link
Contributor Author

New adapter questionnaire, via the New Adapter Information Sheet

Questions

  1. What options are there for connecting to the warehouse? Eg: ODBC, a python module, etc

Use PyHive, a DB-API compatible module for connecting to Hive and Presto.

  1. Is there a consensus around which of the options in Configs and docs #1 is the most featureful/best supported/most mature/etc option?

Yeah - use PyHive.

  1. Does the warehouse support namespaces/schemas/datasets (or similar)?
    Yes, schemas

  2. Can schemas/namespaces be created using SQL? Eg: create schema my_schema

CREATE SCHEMA [ IF NOT EXISTS ] schema_name

via https://prestodb.io/docs/current/sql/create-schema.html

  1. Does the warehouse support logical databases?
    Yes, these are called "catalogs".

A Presto catalog contains schemas and references a data source via a connector. For example, you can configure a JMX catalog to provide access to JMX information via the JMX connector. When you run a SQL statement in Presto, you are running it against one or more catalogs. Other examples of catalogs include the Hive catalog to connect to a Hive data source.

Catalogs are defined in properties files stored in the Presto configuration directory.

via https://prestodb.io/docs/current/overview/concepts.html

  1. Does the warehouse support standard-ish SQL? Are there any noteworthy caveats?
    Yes, Presto was designed with ANSI SQL compliance in mind.

  2. Are transactions supported?
    I think so? Presto has START TRANSACTION, COMMIT, and ROLLBACK statements. The START TRANSACTION statement takes an isolation level, and I believe that the SERIALIZABLE isolation level will mimic Redshift's implementation of transactions. Unsure if there's a different/better option for modeling workloads, but worth investigating.

  3. Can tables be created with create table schema.table as (...)?
    Yes: https://prestodb.io/docs/current/sql/create-table-as.html

  4. Can views be created with create view schema.view as (...)?
    Yes: https://prestodb.io/docs/current/sql/create-view.html

  5. Can tables/views be renamed with alter table {table_name} rename to {new_name}?
    Yes: https://prestodb.io/docs/current/sql/alter-table.html

  6. Does the warehouse support insert statements? Any caveats?
    Yes: https://prestodb.io/docs/current/sql/insert.html

  7. Does the warehouse support delete statements? Any caveats?
    Yes: https://teradata.github.io/presto/docs/141t/sql/delete.html

Caveat:

Only Hive Connector currently supports DELETE and it works for the queries where one or more partitions are deleted entirely.

  1. Does the warehouse support update statements? Any caveats?
    don't think so. Replicate with a delete + insert?

  2. Does the warehouse support merge statements? Any caveats?
    Nope

  3. Does the warehouse support drop table/drop view statements? Any caveats?
    Yes: https://prestodb.io/docs/current/sql/drop-table.html

  4. Does the warehouse support truncate statements? Any caveats?
    Yes: https://prestodb.io/docs/current/sql/drop-view.html

  5. Does the warehouse support temporary tables? Any caveats?
    Doesn't look like it

  6. Can queries be cancelled?

runtime.kill_query(query_id, message)

via https://prestodb.io/docs/current/connector/system.html

  1. Are views bound to the relations they select from? Ie. do drop table statements require a ...cascade argument?
    No

  2. Does the warehouse support querying for existing relations (ie. their existence and type)? Is this via an API call or a SELECT statement? Eg: select * from information_schema.tables.

Possibly. Is the output of show tables structured? https://prestodb.io/docs/current/sql/show-tables.html

  1. Does the warehouse support querying for the the columns in a relation? Is this via an API call or a SELECT statement? Eg: select * from information_schema.columns

Possibly. Is the output of show columns structured? https://prestodb.io/docs/current/sql/show-columns.html

  1. Can columns be added and removed using DDL? Eg: alter table add column <name> <type>?
    Yes: https://prestodb.io/docs/current/sql/alter-table.html

  2. Does the warehouse support non-standard performance configurations? Ie. clustering, partitioning, sort/dist keys, etc. What are they, and how are they used? Can they be supplied in create table as statements?

Don't think so! This is surprising, but I can't find any good info on the topic.

  1. Which column types does the warehouse support? Are text types varchars (with sizes) or unsized string columns? Are numeric types (with fixed precision) supported? Is there a different type for timestamps with timezones? Any caveats?
    Types: https://prestodb.io/docs/current/language/types.html

  2. Does the warehouse support column-level constraints (eg. unique, not null, foreign key, primary key)? Are they enforced? Can they be defined in create table as statements?
    Nope

@drewbanin drewbanin added this to the Stephen Girard milestone Nov 28, 2018
@drewbanin drewbanin removed the good_first_issue Straightforward + self-contained changes, good for new contributors! label Dec 5, 2018
@beckjake beckjake self-assigned this Jan 9, 2019
@jtcohen6 jtcohen6 added the adapter_plugins Issues relating to third-party adapter plugins label Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adapter_plugins Issues relating to third-party adapter plugins help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors
Projects
None yet
Development

No branches or pull requests

4 participants