Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace references to tables to relations. Add include param to union #178

Merged
merged 6 commits into from
Dec 11, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 54 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ models:
```

#### relationships_where ([source](macros/schema_tests/relationships_where.sql))
This test validates the referential integrity between two tables (same as the core relationships schema test) with an added predicate to filter out some rows from the test. This is useful to exclude records such as test entities, rows created in the last X minutes/hours to account for temporary gaps due to ETL limitations, etc.
This test validates the referential integrity between two relations (same as the core relationships schema test) with an added predicate to filter out some rows from the test. This is useful to exclude records such as test entities, rows created in the last X minutes/hours to account for temporary gaps due to ETL limitations, etc.

Usage:
```yaml
Expand Down Expand Up @@ -345,24 +345,33 @@ Usage:

...
```
#### get_relations_by_prefix
> This replaces the `get_tables_by_prefix` macro. Note that the `get_tables_by_prefix` macro will
be deprecated in a future release of this package.

#### get_tables_by_prefix ([source](macros/sql/get_tables_by_prefix.sql))
This macro returns a list of tables that match a given prefix, with an optional
exclusion pattern. It's particularly handy paired with `union_tables`.

Usage:
Returns a list of [Relations](https://docs.getdbt.com/docs/api-variable#section-relation)
that match a given prefix, with an optional exclusion pattern. It's particularly
handy paired with `union_relations`.
**Usage:**
```
-- Returns a list of tables that match schema.prefix%
{% set tables = dbt_utils.get_tables_by_prefix('schema', 'prefix') %}
-- Returns a list of relations that match schema.prefix%
{% set relations = dbt_utils.get_relations_by_prefix('my_schema', 'my_prefix') %}

-- Returns a list of tables as above, excluding any with underscores
{% set tables = dbt_utils.get_tables_by_prefix('schema', 'prefix', '%_%') %}
-- Returns a list of relations as above, excluding any that end in `deprecated`
{% set relations = dbt_utils.get_relations_by_prefix('my_schema', 'my_prefix', '%deprecated') %}

-- Example using the union_tables macro
{% set event_tables = dbt_utils.get_tables_by_prefix('events', 'event_') %}
{{ dbt_utils.union_tables(tables = event_tables) }}
-- Example using the union_relations macro
{% set event_relations = dbt_utils.get_relations_by_prefix('events', 'event_') %}
{{ dbt_utils.union_relations(relations = event_relations) }}
```

**Args:**
* `schema` (required): The schema to inspect for relations.
* `prefix` (required): The prefix of the table/view (case insensitive)
* `exclude` (optional): Exclude any relations that match this pattern.
* `database` (optional, default = `target.database`): The database to inspect
for relations.

#### group_by ([source](macros/sql/groupby.sql))
This macro build a group by statement for fields 1...N

Expand All @@ -381,18 +390,33 @@ select
from {{ref('my_model')}}
```

#### union_tables ([source](macros/sql/union.sql))
This macro implements an "outer union." The list of relations provided to this macro will be unioned together, and any columns exclusive to a subset of these tables will be filled with `null` where not present. The `column_override` argument is used to explicitly assign the column type for a set of columns. The `source_column_name` argument is used to change the name of the`_dbt_source_table` field.
#### union_relations ([source](macros/sql/union.sql))
> This replaces the `union_tables` macro. Note that the `union_tables` macro will
be deprecated in a future release of this package.

Usage:
This macro unions together an array of [Relations](https://docs.getdbt.com/docs/api-variable#section-relation),
even when columns have differing orders in each Relation, and/or some columns are
missing from some relations. Any columns exclusive to a subset of these
relations will be filled with `null` where not present. An new column
(`_dbt_source_relation`) is also added to indicate the source for each record.

**Usage:**
```
{{ dbt_utils.union_tables(
tables=[ref('table_1'), ref('table_2')],
column_override={"some_field": "varchar(100)"},
exclude=["some_other_field"],
source_column_name='custom_source_column_name'
{{ dbt_utils.union_relations(
relations=[ref('my_model'), source('my_source', 'my_table')],
exclude=["_loaded_at"]
) }}
```
**Args:**
* `relations` (required): An array of [Relations](https://docs.getdbt.com/docs/api-variable#section-relation).
* `exclude` (optional): A list of column names that should be excluded from
the final query.
* `include` (optional): A list of column names that should be included in the
clrcrl marked this conversation as resolved.
Show resolved Hide resolved
final query. Note the `include` and `exclude` parameters are mutually exclusive.
* `column_override` (optional): A dictionary of explicit column type overrides,
e.g. `{"some_field": "varchar(100)"}`.``
* `source_column_name` (optional, `default="_dbt_source_relation"`): The name of
the column that records the source of this row.

#### generate_series ([source](macros/sql/generate_series.sql))
This macro implements a cross-database mechanism to generate an arbitrarily long list of numbers. Specify the maximum number you'd like in your list and it will create a 1-indexed SQL result set.
Expand Down Expand Up @@ -464,7 +488,7 @@ This macro "un-pivots" a table from wide format to long format. Functionality is
Usage:
```
{{ dbt_utils.unpivot(
table=ref('table_name'),
relation=ref('table_name'),
cast_to='datatype',
exclude=[<list of columns to exclude from unpivot>],
remove=[<list of columns to remove>],
Expand All @@ -473,7 +497,7 @@ Usage:
) }}
```

Example:
**Usage:**

Input: orders

Expand All @@ -493,14 +517,13 @@ Example:
| 2017-03-01 | processing | size | S |
| 2017-03-01 | processing | color | red |

Arguments:

- table: Table name, required
- cast_to: The data type to cast the unpivoted values to, default is varchar
- exclude: A list of columns to exclude from the unpivot operation but keep in the resulting table.
- remove: A list of columns to remove from the resulting table.
- field_name: column name in the resulting table for field
- value_name: column name in the resulting table for value
**Args**:
- `relation`: The [Relation](https://docs.getdbt.com/docs/api-variable#section-relation) to unpivot.
- `cast_to`: The data type to cast the unpivoted values to, default is varchar
- `exclude`: A list of columns to exclude from the unpivot operation but keep in the resulting table.
- `remove`: A list of columns to remove from the resulting table.
- `field_name`: column name in the resulting table for field
- `value_name`: column name in the resulting table for value

---
### Web
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{{ config(materialized = 'table') }}

{% set relations = dbt_utils.get_relations_by_prefix(target.schema, 'data_events_') %}
{{ dbt_utils.union_relations(relations) }}
Original file line number Diff line number Diff line change
@@ -1,14 +1,4 @@
{{config( materialized = 'table')}}


{% if target.type == 'snowflake' %}

{% set tables = dbt_utils.get_tables_by_prefix((target.schema | upper), 'data_events_') %}
{{ dbt_utils.union_tables(tables) }}

{% else %}

{% set tables = dbt_utils.get_tables_by_prefix(target.schema, 'data_events_') %}
{{ dbt_utils.union_tables(tables) }}

{% endif %}
{% set tables = dbt_utils.get_tables_by_prefix(target.schema, 'data_events_') %}
{{ dbt_utils.union_tables(tables) }}
5 changes: 2 additions & 3 deletions integration_tests/models/sql/test_union_base.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@

{{ dbt_utils.union_tables([
{{ dbt_utils.union_relations([
ref('data_union_table_1'),
ref('data_union_table_2')]
) }}

) }}
2 changes: 1 addition & 1 deletion integration_tests/models/sql/test_unpivot.sql
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ select

from (
{{ dbt_utils.unpivot(
table=ref('data_unpivot'),
relation=ref('data_unpivot'),
cast_to=dbt_utils.type_string(),
exclude=exclude,
remove='name',
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{% macro get_tables_by_prefix(schema, prefix, exclude='', database=target.database) %}
{% macro get_relations_by_prefix(schema, prefix, exclude='', database=target.database) %}

{%- call statement('get_tables', fetch_result=True) %}

Expand All @@ -14,11 +14,20 @@
{%- set tbl_relation = api.Relation.create(database, row.table_schema, row.table_name) -%}
{%- do tbl_relations.append(tbl_relation) -%}
{%- endfor -%}

{{ return(tbl_relations) }}
{%- else -%}
{{ return([]) }}
{%- endif -%}

{% endmacro %}

{% macro get_tables_by_prefix(schema, prefix, exclude='', database=target.database) %}
{% if execute %}
{{ log("Warning: the `get_tables_by_prefix` macro is no longer supported and will be deprecated in a future release of dbt-utils. Use the `get_relations_by_prefix` macro instead", info=True) }}
{% endif %}


{{ return(dbt_utils.get_relations_by_prefix(schema, prefix, exclude, database)) }}

{% endmacro %}
54 changes: 37 additions & 17 deletions macros/sql/union.sql
Original file line number Diff line number Diff line change
@@ -1,42 +1,52 @@
{% macro union_tables(tables, column_override=none, exclude=none, source_column_name=none) -%}
{% macro union_relations(relations, column_override=none, include=[], exclude=[], source_column_name=none) -%}

{%- if exclude and include -%}
{{ exceptions.raise_compiler_error("Both an exclude and include list were provided to the `union` macro. Only one is allowed") }}
{%- endif -%}

{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{%- if not execute -%}
{{ return('') }}
{% endif %}

{%- set exclude = exclude if exclude is not none else [] %}
{%- set column_override = column_override if column_override is not none else {} %}
{%- set source_column_name = source_column_name if source_column_name is not none else '_dbt_source_table' %}
{%- set source_column_name = source_column_name if source_column_name is not none else '_dbt_source_relation' %}

{%- set table_columns = {} %}
{%- set relation_columns = {} %}
{%- set column_superset = {} %}

{%- for table in tables -%}
{%- for relation in relations -%}

{%- set _ = table_columns.update({table: []}) %}
{%- do relation_columns.update({relation: []}) %}

{%- do dbt_utils._is_relation(table, 'union_tables') -%}
{%- set cols = adapter.get_columns_in_relation(table) %}
{%- do dbt_utils._is_relation(relation, 'union_relations') -%}
{%- set cols = adapter.get_columns_in_relation(relation) %}
{%- for col in cols -%}

{%- if col.column not in exclude %}
{#- If an exclude list was provided and the column is in the list, do nothing #}
{%- if exclude and col.column in exclude %}

{#- If an include list was provided and the column is not in the list, do nothing -#}
{%- elif include and col.column not in include %}

{# update the list of columns in this table #}
{%- set _ = table_columns[table].append(col.column) %}
{#- Otherwise add the column to the column superset #}
{% else %}

{# update the list of columns in this relation #}
{%- do relation_columns[relation].append(col.column) %}

{%- if col.column in column_superset -%}

{%- set stored = column_superset[col.column] %}
{%- if col.is_string() and stored.is_string() and col.string_size() > stored.string_size() -%}

{%- set _ = column_superset.update({col.column: col}) %}
{%- do column_superset.update({col.column: col}) %}

{%- endif %}

{%- else -%}

{%- set _ = column_superset.update({col.column: col}) %}
{%- do column_superset.update({col.column: col}) %}

{%- endif -%}

Expand All @@ -47,26 +57,36 @@

{%- set ordered_column_names = column_superset.keys() %}

{%- for table in tables -%}
{%- for relation in relations -%}

(
select

cast({{ dbt_utils.string_literal(table) }} as {{ dbt_utils.type_string() }}) as {{ source_column_name }},
cast({{ dbt_utils.string_literal(relation) }} as {{ dbt_utils.type_string() }}) as {{ source_column_name }},

{% for col_name in ordered_column_names -%}

{%- set col = column_superset[col_name] %}
{%- set col_type = column_override.get(col.column, col.data_type) %}
{%- set col_name = adapter.quote(col_name) if col_name in table_columns[table] else 'null' %}
{%- set col_name = adapter.quote(col_name) if col_name in relation_columns[relation] else 'null' %}
cast({{ col_name }} as {{ col_type }}) as {{ col.quoted }} {% if not loop.last %},{% endif %}
{%- endfor %}

from {{ table }}
from {{ relation }}
)

{% if not loop.last %} union all {% endif %}

{%- endfor %}

{%- endmacro %}

{% macro union_tables(tables, column_override=none, include=[], exclude=[], source_column_name='_dbt_source_table') -%}

{% if execute %}
{{ log("Warning: the `union_tables` macro is no longer supported and will be deprecated in a future release of dbt-utils. Use the `union_relations` macro instead", info=True) }}
{% endif %}

{{ return(dbt_utils.union_relations(tables, column_override, include, exclude, source_column_name)) }}

{% endmacro %}
28 changes: 20 additions & 8 deletions macros/sql/unpivot.sql
Original file line number Diff line number Diff line change
@@ -1,18 +1,30 @@
{#
Pivot values from columns to rows. Similar to pandas DataFrame melt() function.

Example Usage: {{ unpivot(table=ref('users'), cast_to='integer', exclude=['id','created_at']) }}
Example Usage: {{ unpivot(relation=ref('users'), cast_to='integer', exclude=['id','created_at']) }}

Arguments:
table: Relation object, required.
relation: Relation object, required.
cast_to: The datatype to cast all unpivoted columns to. Default is varchar.
exclude: A list of columns to keep but exclude from the unpivot operation. Default is none.
remove: A list of columns to remove from the resulting table. Default is none.
field_name: Destination table column name for the source table column names.
value_name: Destination table column name for the pivoted values
#}

{% macro unpivot(table, cast_to='varchar', exclude=none, remove=none, field_name='field_name', value_name='value') -%}
{% macro unpivot(relation=none, cast_to='varchar', exclude=none, remove=none, field_name='field_name', value_name='value', table=none) -%}

{% if execute and table %}
{{ log("Warning: the `unpivot` macro no longer accepts a `table` parameter. This parameter will be deprecated in a future release of dbt-utils. Use the `relation` parameter instead", info=True) }}
{% endif %}

{% if relation and table %}
{{ exceptions.raise_compiler_error("Error: both the `relation` and `table` parameters were provided to `unpivot` macro. Choose one only (we recommend `relation`).") }}
{% elif not relation and table %}
{% set relation=table %}
{% elif not relation and not table %}
{{ exceptions.raise_compiler_error("Error: argument `relation` is required for `unpivot` macro.") }}
{% endif %}

{%- set exclude = exclude if exclude is not none else [] %}
{%- set remove = remove if remove is not none else [] %}
Expand All @@ -21,14 +33,14 @@ Arguments:

{%- set table_columns = {} %}

{%- set _ = table_columns.update({table: []}) %}
{%- do table_columns.update({relation: []}) %}

{%- do dbt_utils._is_relation(table, 'unpivot') -%}
{%- set cols = adapter.get_columns_in_relation(table) %}
{%- do dbt_utils._is_relation(relation, 'unpivot') -%}
{%- set cols = adapter.get_columns_in_relation(relation) %}

{%- for col in cols -%}
{%- if col.column.lower() not in remove|map('lower') and col.column.lower() not in exclude|map('lower') -%}
{% set _ = include_cols.append(col) %}
{% do include_cols.append(col) %}
{%- endif %}
{%- endfor %}

Expand All @@ -42,7 +54,7 @@ Arguments:
cast('{{ col.column }}' as {{ dbt_utils.type_string() }}) as {{ field_name }},
cast({{ col.column }} as {{ cast_to }}) as {{ value_name }}

from {{ table }}
from {{ relation }}

{% if not loop.last -%}
union all
Expand Down