Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model aliasing #651

Closed
wants to merge 13 commits into from
Closed

Model aliasing #651

wants to merge 13 commits into from

Conversation

abelsonlive
Copy link
Contributor

@abelsonlive abelsonlive commented Feb 9, 2018

WHAT

This P.R. implements model aliasing RE: #637.

WHY

To enable more control over how dbt materializes tables.

HOW

This P.R. adds an alias parameter to the model config. If not present, this parameter will default to the name of the model (AKA the parsed file path of the model). This implementation is preferable because it maintains the use of the filename in ref calls, thereby enabling dbt users to have distinct model names, but non-distinct table names.

The one nasty thing here is that now all materializations reference model['alias'] when constructing table names (I suspect this might also break some stuff in dbt-utils). It may be preferable to add a table key to a node so that these references are easier to read.

@drewbanin drewbanin self-requested a review February 9, 2018 02:03
@drewbanin
Copy link
Contributor

Couple of things from my first pass:

  1. You can make this work for BigQuery by swapping name with alias here
  2. The output in the CLI does not reflect the alias for the model. You can find that code here. There are a couple of other instances where we use name that we can replace with alias.

This is so good! It functionally works flawlessly as far as I can tell.

Going to stress test this some more over the weekend, but so far, so good :)

@@ -45,7 +45,7 @@ def __init__(self, model, adapter, profile):
def wrap_with_profile_and_model_name(self, fn):
def wrapped(*args, **kwargs):
args = (self.profile,) + args
kwargs['model_name'] = self.model.get('name')
kwargs['model_name'] = self.model.get('alias')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to do this. The adapters use this model_name arg to name their database connection in the pool. I think this line might make it possible for two different models to grab the same connection, which would probably be undesirable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think this might be causing bugs on my branch...

@@ -55,7 +55,7 @@ def type(self):

def commit(self):
return self.adapter.commit_if_has_connection(
self.profile, self.model.get('name'))
self.profile, self.model.get('alias'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. This arg is used to find the connection in the default adapter.

@@ -318,6 +322,8 @@ def is_blocking_dependency(node):
def get_materialization(node):
return node.get('config', {}).get('materialized')

def get_alias(node):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@abelsonlive
Copy link
Contributor Author

RE: logging. I was able to fix the lines you mentioned, but i'm still seeing this output:

12:44:20 | Concurrency: 1 threads (target='prod')
12:44:20 |
12:44:20 | 1 of 1 START table model foo.model_name [RUN]
12:45:35 | 1 of 1 OK created table model foo.alias [SELECT in 75.23s]
12:45:35 |
12:45:35 | Finished running 1 table models in 75.57s.

Where the START line prints the model name. Any idea of where that is?

@drewbanin
Copy link
Contributor

drewbanin commented Feb 13, 2018

@abelsonlive ah, yeah, good catch.

The model name is injected in the describe_node function of these NodeRunner classes. There are a few describe_node functions in this file, but I think only the one inModelRunner needs to be updated!

@drewbanin
Copy link
Contributor

drewbanin commented Feb 19, 2018

@abelsonlive are you blocked on anything here? I can help resolve the merge conflicts if you'd like

@abelsonlive
Copy link
Contributor Author

abelsonlive commented Feb 22, 2018

@drewbanin yeah, it seems like even when I change name -> alias in the suggested location in node_runner.py, that I still get the same logger output.

@drewbanin
Copy link
Contributor

@abelsonlive interesting...

In that same describe_node function you can do self.node['config']['alias'], which correctly returns the alias. Let me try to figure out why self.node['alias'] isn't updated accordingly. Might be indicative of some bigger oversight

dbt/utils.py Outdated
@@ -39,11 +40,14 @@ def __init__(self, profile, adapter, node, use_temp=False):
self.node = node
self.schema = node.get('schema')
self.name = node.get('name')
# set alias, defaults to name
self.alias = get_alias(node)
self.node['alias'] = get_alias(node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! Let's not mutate the node here!

I think node['alias'] should be set over here instead. The schema attribute is set on line 222 and 240 in that file. I think we'll want to do something similar here for alias.

This should also fix the issue with the START line log output I believe

Copy link
Contributor Author

@abelsonlive abelsonlive Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing that line of code caused my initial tests to fail

@abelsonlive
Copy link
Contributor Author

abelsonlive commented Feb 28, 2018

Alright this should be good to go now! Finally got the proper log output. Good look on using self.node['config']['alias']

@drewbanin
Copy link
Contributor

@abelsonlive amazing! I think we should probably figure out how to get self.node['alias'] working before merging this. That's something I can definitely do, as this code looks really good and is pretty much ready to roll notwithstanding the config thing.

Let's try to get this PR in for the 0.9.3 release around a month or so from now. I can spend some time figuring out what's going wrong here right after we drop the 0.9.2 release :)

Thanks for contributing!!

@cmcarthur cmcarthur added this to the 0.10.2 milestone Apr 26, 2018
@jon-rtr jon-rtr mentioned this pull request May 12, 2018
@drewbanin drewbanin mentioned this pull request Jun 18, 2018
@drewbanin drewbanin closed this Jun 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants