Use built-in adapter functionality for datatypes #586

jtcohen6 · 2022-05-12T10:53:46Z

resolves #598

This is a:

documentation update
bug fix with no breaking changes
new functionality
a breaking change ??? (maybe???)

All pull requests from community contributors should target the main branch (default).

Description & motivation

Follow-up to TODOs left from #577. Experiment with using api.Column.translate_type for our existing type_* macros.

Feels good:

This is code defined within the adapter plugin, and already used for internal purposes.
For the most part, it "just works" as a drop-in replacement. There are a few gaps in the core / adapter methods where we could encode slightly more of this logic.
We can preserve dispatch on all legacy dbt_utils macros, so that a project / package maintainer can still intervene / override if needed.

What feels less good:

We had some explicit behaviors here previously, that are now implicit. Do they still work as before?
What's the right interface for package authors to access this functionality? Do they want to call {{ api.Column.translate_type('string') }}, {{ get_data_type('string') }}, or {{ type_string() }}?
Within dbt-core: We should aim to reconcile / consolidate the agate type conversion methods with Column class type translation.

Checklist

dbeatty10 · 2022-05-12T22:31:06Z

This turned out awesome! 💪

We had some explicit behaviors here previously, that are now implicit. Do they still work as before?

Is there anything we can do to make this feel more good? e.g., like adding in some kind of unit testing?

What's the right interface for package authors to access this functionality? Do they want to call {{ api.Column.translate_type('string') }}, {{ get_data_type('string') }}, or {{ type_string() }}?

It seems like package authors will be able to use any of those 3 no matter what, right? But we are just try to choose which of those we want to encourage?

Will the package authors have a variable in hand (like my_value = 'string')? If so, then either of these options make sense:

{{ get_data_type('string') }}
{{ api.Column.translate_type('string') }}

Otherwise, this is nice and pithy:

{{ type_string() }}

jtcohen6 · 2022-05-17T10:00:43Z

Is there anything we can do to make this feel more good? e.g., like adding in some kind of unit testing?

Definitely! I'll look into how we might do this

But we are just try to choose which of those we want to encourage?

Correct! Which of these do we want to encourage, in a world where they actually all do the same thing? Just a question of the right syntactic sugar to sprinkle in.

jtcohen6 · 2022-06-02T09:45:27Z

I've added tests for these data types (and removed the previous placeholders). These tests check both:

new (redefined) type_* macros
previous type_* macro definitions, manually readded in each Legacy test case

Two hesitations with the current implementation:

I'm doing a "more lenient" comparison for the legacy version of Postgres type__string(). Previously this returned varchar, now it returns text. Those are true aliases within Postgres, but (for some complex reasons) dbt doesn't perfectly map between them today. (Related: [CT-636] [Bug] Postgres unlimited varchar default to varchar(256) dbt-core#5238) I opted to make the test xfail, rather than truly pass or truly fail.
What should type_timestamp do on BigQuery? The most straightforward answer is, return a timestamp, and that's what it does today. But on other databases, timestamp means (by default) timestamp without time zone. On BigQuery that's datetime, whereas timestamp always means timestamp with time zone. Other macros (e.g. dateadd) return datetime values on BigQuery. I was able to get this working and these tests passing as is, and I think it's simplest for everyone if timestamp just means timestamp. But I also think we might do well to encourage folks to use more-precise types (datetime, timestampntz, timestamptz) that have clear and consistent analogues across databases.

All of the actual changes for this PR are happening in this repo only, for now. Things we could do in core/plugins to make the code here slightly cleaner:

Move the "syntactic sugar" of get_data_type(x) into dbt-core — it's just a wrapper for api.Column.translate_type(x), which is already there
Update the BigQueryColumn.numeric_type function to never return parametrized numeric types. They're technically supported on BQ, but a real nightmare to use.

Not sure if ready for final review, but possibly another look!

jtcohen6 · 2022-06-16T10:44:14Z

macros/cross_db_utils/datatypes.sql

 {% macro default__type_numeric() %}
-    numeric(28, 6)
+    {{ return(api.Column.numeric_type("numeric", 28, 6)) }}


TODO: SparkSQL wants this to be called decimal instead of numeric. Investigate whether that works on other standard DBs, or if we should use translate_type for it

dbeatty10

I feel good about this when you do.

run_functional_test.sh

run_test.sh

dbeatty10 · 2022-06-17T17:43:28Z

macros/cross_db_utils/datatypes.sql

+{%- macro get_data_type(dtype) -%}
+  {# if there is no translation for 'dtype', it just returns 'dtype' #}
+  {{ return(api.Column.translate_type(dtype)) }}
+{%- endmacro -%}
+
 {# string  -------------------------------------------------     #}

 {%- macro type_string() -%}
  {{ return(adapter.dispatch('type_string', 'dbt_utils')()) }}
 {%- endmacro -%}

 {% macro default__type_string() %}
-    string
-{% endmacro %}
-
-{%- macro redshift__type_string() -%}
-    varchar
-{%- endmacro -%}
-
-{% macro postgres__type_string() %}
-    varchar
-{% endmacro %}
-
-{% macro snowflake__type_string() %}
-    varchar
+    {{ return(dbt_utils.get_data_type("string")) }}


I would feel good about pushing both get_data_type(dtype) and type_{X} into dbt-core.

But I'm not sure which risks or maintenance burden it would impose to have both.

They each seem to have their use-cases:

type_{X} is both pithy and clear but doesn't accept a variable dtype

get_data_type(dtype) is a little more verbose but does accept a variable dtype

If we could only choose one option to push into dbt-core, I would choose type_{X} because:

it is compact and clear

we can always utilize the api.Column.translate_type(dtype) syntax for cases when we need a variable dtype

Okay, I think you make a compelling case in favor of type_{X} macros!

api.Column.translate_type(dtype) will accept ANY input string, even api.Column.translate_type('fake_type_xyz'). It will translate that input string if it's recognized, or just return the input string if it isn't.

So, there is benefit to the "stronger typing" achieved by macros that have the standard type right in the name, even if (behind-the-scenes) it still just shells out to api.Column.translate_type.

tests/functional/data_type/base_data_type_macro.py

jtcohen6 · 2022-06-30T12:20:30Z

I removed cross_db_utils/base_cross_db_macro.py and cross_db_utils/fixture_cross_db_macro.py because I realized they were unused. Not exactly related to this PR, though, so I can kick out that change if preferred.

dbeatty10 · 2022-06-30T13:42:53Z

Looks great to me!

@jtcohen6 Do these sound like the right next steps?

Merge Move data type macros into dbt-core dbt-core#5428 (you)
Update requirements.txt in this PR (you or me)
Merge this PR (me)
~~Open PR to resolve docs.getdbt.com #1644 (me), review (you), merge (me)~~
Review Cross database type_{X} macros docs.getdbt.com#1648 (you) and merge (me)

jtcohen6 · 2022-06-30T17:42:39Z

@dbeatty10 That sounds right to me! After merging the dbt-core PR, there are also the companion PRs in adapter repos. All of these run the tests added in this PR. Some have small substantive changes.

jtcohen6 · 2022-06-30T18:05:48Z

dev-requirements.txt

-git+https://github.com/dbt-labs/dbt-bigquery.git
+git+https://github.com/dbt-labs/dbt-core.git@jerco/data-type-macros#egg=dbt-core&subdirectory=core
+git+https://github.com/dbt-labs/dbt-core.git@jerco/data-type-macros#egg=dbt-tests-adapter&subdirectory=tests/adapter
+git+https://github.com/dbt-labs/dbt-bigquery.git@jerco/data-type-macros


Now that the dbt-core PR is merged, only the dbt-bigquery PR (dbt-labs/dbt-bigquery#214) is actually blocking this one, since the Redshift + Snowflake PRs don't have substantive changes (just inheriting tests).

TODO: Add back imports from main. There was no reason to remove these.

git+https://github.com/dbt-labs/dbt-redshift.git git+https://github.com/dbt-labs/dbt-snowflake.git

@jtcohen6 Just pushed these in a commit and CI is running now.

Also added back this one along with dbt-redshift and dbt-snowflake:

git+https://github.com/dbt-labs/dbt-core.git#egg=dbt-postgres&subdirectory=plugins/postgres

Please let me know if I shouldn't have added that one in -- obviously easy to pull it back out.

This was referenced May 12, 2022

Experiment with functional testing #577

Merged

[CT-636] [Bug] Postgres unlimited varchar default to varchar(256) dbt-labs/dbt-core#5238

Closed

jtcohen6 force-pushed the experiment/datatypes branch from 32a2409 to 9a245dc Compare May 31, 2022 11:35

This was referenced Jul 25, 2022

[CT-895] [Spike] Consolidate current_timestamp & associates dbt-labs/dbt-core#5521

Closed

[CT-706] Consolidate data type methods in adapter interface dbt-labs/dbt-core#5317

Open

jtcohen6 marked this pull request as ready for review June 2, 2022 09:46

jtcohen6 requested a review from dbeatty10 June 3, 2022 12:14

jtcohen6 mentioned this pull request Jun 3, 2022

Lift + shift for cross-db macros dbt-labs/dbt-core#5298

Merged

3 tasks

jtcohen6 commented Jun 16, 2022

View reviewed changes

dbeatty10 approved these changes Jun 17, 2022

View reviewed changes

jtcohen6 added 8 commits June 30, 2022 12:46

Use built-in adapter functionality for datatypes

fb466fc

Experiment with adding functional tests

33b9d1b

Try to fix pip install

3bea8fa

Missing pytest plugin

b7123b4

Refactor tests, passing on BQ

d9ad9e8

Refactor again

b092c6b

Revert bq target changes

ab54d5d

Migrate pieces into core + plugins

6f19550

jtcohen6 force-pushed the experiment/datatypes branch from 0c97f43 to 6f19550 Compare June 30, 2022 11:49

jtcohen6 mentioned this pull request Jun 30, 2022

Move data type macros into dbt-core dbt-labs/dbt-core#5428

Merged

6 tasks

jtcohen6 added 3 commits June 30, 2022 13:59

Revert accidental changes to bash files

7e8a682

Some code cleanup

cf69b40

Restore needed import. Add speedup

3068f35

dbeatty10 mentioned this pull request Jun 30, 2022

Add type_{X} macros to listing of cross-database macros dbt-labs/docs.getdbt.com#1644

Closed

1 task

dbeatty10 mentioned this pull request Jun 30, 2022

Cross database type_{X} macros dbt-labs/docs.getdbt.com#1648

Merged

4 tasks

jtcohen6 commented Jun 30, 2022

View reviewed changes

Add back imports from main branches

122085f

dbeatty10 merged commit dcd85fb into main Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use built-in adapter functionality for datatypes #586

Use built-in adapter functionality for datatypes #586

jtcohen6 commented May 12, 2022 •

edited

Loading

dbeatty10 commented May 12, 2022

jtcohen6 commented May 17, 2022

jtcohen6 commented Jun 2, 2022 •

edited

Loading

jtcohen6 Jun 16, 2022

dbeatty10 left a comment

dbeatty10 Jun 17, 2022

jtcohen6 Jun 30, 2022

jtcohen6 commented Jun 30, 2022

dbeatty10 commented Jun 30, 2022 •

edited

Loading

jtcohen6 commented Jun 30, 2022

jtcohen6 Jun 30, 2022 •

edited

Loading

dbeatty10 Jul 5, 2022

Use built-in adapter functionality for datatypes #586

Use built-in adapter functionality for datatypes #586

Conversation

jtcohen6 commented May 12, 2022 • edited Loading

Description & motivation

Checklist

dbeatty10 commented May 12, 2022

jtcohen6 commented May 17, 2022

jtcohen6 commented Jun 2, 2022 • edited Loading

jtcohen6 Jun 16, 2022

Choose a reason for hiding this comment

dbeatty10 left a comment

Choose a reason for hiding this comment

dbeatty10 Jun 17, 2022

Choose a reason for hiding this comment

jtcohen6 Jun 30, 2022

Choose a reason for hiding this comment

jtcohen6 commented Jun 30, 2022

dbeatty10 commented Jun 30, 2022 • edited Loading

jtcohen6 commented Jun 30, 2022

jtcohen6 Jun 30, 2022 • edited Loading

Choose a reason for hiding this comment

dbeatty10 Jul 5, 2022

Choose a reason for hiding this comment

jtcohen6 commented May 12, 2022 •

edited

Loading

jtcohen6 commented Jun 2, 2022 •

edited

Loading

dbeatty10 commented Jun 30, 2022 •

edited

Loading

jtcohen6 Jun 30, 2022 •

edited

Loading