Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying BigQuery labels through DBT #1942

Closed
talcherrehub opened this issue Nov 20, 2019 · 7 comments
Closed

Applying BigQuery labels through DBT #1942

talcherrehub opened this issue Nov 20, 2019 · 7 comments
Labels
bigquery enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!
Milestone

Comments

@talcherrehub
Copy link

Describe the feature

I would like to be able to apply labels to BigQuery tables that are created during the DBT process.
BQ tables have the labels feature (see attached image):
image
The Label feature is very useful for version control and other desired attributes.
It would be extremely valuable to be able to control this feature by using Config in my DBT SQL files.

Who will this benefit?

Anyone using the BigQuery Label feature will benefit from the ability to control it using DBT.

@talcherrehub talcherrehub added enhancement New feature or request triage labels Nov 20, 2019
@ciscodebs
Copy link

ciscodebs commented Nov 21, 2019

I want this too!

I'm also interested in having config tags be the labels.

Example 1
dbt tag of 'nightly'
BigQuery label = 'nightly'

Example 2
dbt tag of 'schedule=nightly'
Bigquery label = 'schedule:nightly'

Additional notes and thoughts:

  • Duplicate label keys are not allowed on both RedShift and BigQuery. What happens if a table or view config has duplicate keys with conflicting values?
  • When updating existing tables and views would we want dbt to clear out existing labels first? What if there are labels not applied by DBT? Something to think about.

Here are the BigQuery label requirements: https://cloud.google.com/bigquery/docs/labels-intro#requirements

I also included the RedShift label requirements to preserve future compatibility: https://docs.aws.amazon.com/redshift/latest/mgmt/amazon-redshift-tagging.html

@drewbanin
Copy link
Contributor

See also #1947

cool idea @talcherrehub! Let's do it :)

@ciscodebs I think the redshift tags are different -- those tags apply to redshift clusters, ie. they're something you'd set on the cluster itself, not on tables in the cluster. So, this is BQ-only functionality which actually makes this easier to implement :)

I think the idea of persisting dbt's tags as BQ tags is pretty elegant. You buy that @talcherrehub and @kconvey?

@kconvey
Copy link
Contributor

kconvey commented Nov 26, 2019

I tend to think using dbt tags for BigQuery labels is overly restrictive, and overloads the tag/label config option. BigQuery labels seem like they'd naturally support further processing post-dbt, which may not have anything to do with dbt labels. Putting BQ labels into dbt tags probably requires that all models wind up with their dbt tag as a label, which may have only existed as a tag for model selection in dbt, adding noise to BQ labels simply to avoid another config option in your dbt_project.yml. It makes more sense to me to be explicit in dbt_project.yml for what are two related, but ultimately different concepts. Where there is strong overlap, I would hope that yaml anchors and the like could potentially save you from repeating yourself.

There is also the issue that BQ labels are ultimately key-value pairs, where dbt tags currently are not. It would be a bit odd to add empty values for dbt tags just to meet a BQ syntax requirement, especially when dbt tags are used across adapters.

I am all in favor of a more elegant solution than this, but it still seems like a useful feature worth supporting even if there is some overlap between tag/label. Eager to hear some other thoughts!

@drewbanin
Copy link
Contributor

drewbanin commented Nov 27, 2019

Yeah, I buy that, thanks for the cogent argument @kconvey :)

Let's go ahead and support a BQ-specific config, labels, which should accept a dictionary.

@drewbanin drewbanin added the good_first_issue Straightforward + self-contained changes, good for new contributors! label Nov 27, 2019
@ciscodebs
Copy link

Sounds good to me. I also wanted to point out that BigQuery accepts label keys without values. These are referenced as tags in their docs which isn't confusing at all. I though this might be worth bringing up just in case...

https://cloud.google.com/bigquery/docs/adding-labels#adding_a_tag

@drewbanin
Copy link
Contributor

good to know @ciscodebs - thanks for the additional info!

@drewbanin drewbanin added this to the 0.15.1 milestone Dec 18, 2019
@drewbanin
Copy link
Contributor

closed by #1964

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors!
Projects
None yet
Development

No branches or pull requests

4 participants