- Supports dbt version
1.0.*
- Supports Seeds
- Correctly detects views and their columns
- Support incremental models
- Support two incremental update strategies:
insert_overwrite
andappend
- Does not support the use of
unique_key
- Support two incremental update strategies:
- Only supports Athena engine 2
pip install git+https://github.com/SOVALINUX/dbt-athena
To start, you will need an S3 bucket, for instance my-staging-bucket
and an Athena database:
CREATE DATABASE IF NOT EXISTS analytics_dev
COMMENT 'Analytics models generated by dbt (development)'
LOCATION 's3://my-staging-bucket/'
WITH DBPROPERTIES ('creator'='Foo Bar', 'email'='[email protected]');
Notes:
- Take note of your AWS region code (e.g.
us-west-2
oreu-west-2
, etc.). - You can also use AWS Glue to create and manage Athena databases.
This plugin does not accept any credentials directly. Instead, credentials are determined automatically based on aws cli
/boto3
conventions and
stored login info. You can configure the AWS profile name to use via aws_profile_name
. Checkout DBT profile configuration below for details.
A dbt profile can be configured to run against AWS Athena using the following configuration:
Option | Description | Required? | Example |
---|---|---|---|
s3_staging_dir | S3 location to store Athena query results and metadata | Required | s3://bucket/dbt/ |
region_name | AWS region of your Athena instance | Required | eu-west-1 |
schema | Specify the schema (Athena database) to build models into (lowercase only) | Required | dbt |
database | Specify the database (Data catalog) to build models into (lowercase only) | Required | awsdatacatalog |
poll_interval | Interval in seconds to use for polling the status of query results in Athena | Optional | 5 |
aws_profile_name | Profile to use from your AWS shared credentials file. | Optional | my-profile |
work_group | Identifier of Athena workgroup | Optional | my-custom-workgroup |
num_retries | Number of times to retry a failing query | Optional | 3 |
Example profiles.yml entry:
athena:
target: dev
outputs:
dev:
type: athena
s3_staging_dir: s3://athena-query-results/dbt/
region_name: eu-west-1
schema: dbt
database: awsdatacatalog
aws_profile_name: my-profile
work_group: my-workgroup
Additional information
threads
is supporteddatabase
andcatalog
can be used interchangeably
Zero-Downtime Tables
Starting from adapter version 1.0.3
there is a way to do dbt run
without downtime on table update
It creates a new CTAS table with name ctas_{{model.name}}_{{timestamp}}
+ creates or updates extra view {{model.name}}
that selects from this CTAS
It works in the following way:
- Enable zero-downtime for tables
a) Add vartable_zero_downtime
to thedbt_project.yml
vars:
table_zero_downtime: true
b) Alternatively add tag table_zero_downtime
to the specific model with table
materialization
{{config(tags=['table_zero_downtime'])}}
- Cleaning up stale objects like ctas a) We've added a complimentary set of macroses to cleanup objects that not exist in Git Checkout these links:
- https://github.com/SOVALINUX/dbt-utils/blob/main/macros/sql/delete_stale_objects.sql
- https://github.com/SOVALINUX/athena-utils/blob/main/macros/dbt_utils/sql/delete_stale_objects.sql
And
on-run-end
hook I can do the following trick:
on-run-end: "{% do athena_utils.delete_stale_ctas_run_end([target.schema, generate_schema_name('some_extra_schema', '')], False, '') %}"
b) For our projects for development purposes I've added wrapper for on-run-end
hook that will not trigger on single model run on developer machines
If you ever going to add this connector to the Docker, please use Dockerfile from dbt v1.1 or higher https://github.com/dbt-labs/dbt-core/blob/1.1.latest/docker/Dockerfile
external_location
(default=none
)- The location where Athena saves your table in Amazon S3
- If
none
then it will default to{s3_staging_dir}/tables
- If you are using a static value, when your table/partition is recreated underlying data will be cleaned up and overwritten by new data
partitioned_by
(default=none
)- An array list of columns by which the table will be partitioned
- Limited to creation of 100 partitions (currently)
bucketed_by
(default=none
)- An array list of columns to bucket data
bucket_count
(default=none
)- The number of buckets for bucketing your data
format
(default='parquet'
)- The data format for the table
- Supports
ORC
,PARQUET
,AVRO
,JSON
, orTEXTFILE
write_compression
(default=none
)- The compression type to use for any storage format that allows compression to be specified. To see which options are available, check out CREATE TABLE AS
field_delimiter
(default=none
)- Custom field delimiter, for when format is set to
TEXTFILE
- Custom field delimiter, for when format is set to
More information: CREATE TABLE AS
Support for incremental models:
- Support two incremental update strategies with partitioned tables:
insert_overwrite
andappend
- Does not support the use of
unique_key
Due to the nature of AWS Athena, not all core dbt functionality is supported. The following features of dbt are not implemented on Athena:
- Snapshots
-
Quoting is not currently supported
- If you need to quote your sources, escape the quote characters in your source definitions:
version: 2 sources: - name: my_source tables: - name: first_table identifier: "first table" # Not like that - name: second_table identifier: "\"second table\"" # Like this
-
Tables, schemas and database should only be lowercase
-
Only supports Athena engine 2
First, install the adapter and its dependencies using make
(see Makefile):
make install_deps
Next, configure the environment variables in dev.env to match your Athena development environment. Finally, run the tests using make
:
make run_tests