Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt docs has extra, incorrect column names when model name and schema match a source table #1708

Closed
1 of 5 tasks
tjenkins opened this issue Aug 28, 2019 · 1 comment · Fixed by #1774
Closed
1 of 5 tasks
Labels
bug Something isn't working dbt-docs [dbt feature] documentation site, powered by metadata artifacts

Comments

@tjenkins
Copy link

Describe the bug

The model documentation generated by dbt is sometimes inaccurate and includes additional columns that aren't actually present in the table/view.

Steps To Reproduce

Define a dbt source. Create a model and configure it to use a target schema and alias that match one of the tables from your source (or you can make the model name the same as your source table rather than using an alias). Build the model, run dbt compile, dbt docs generate, and dbt docs serve. The model's documentation will include the columns from the model as well as the columns from the source. The source documentation will also include columns from the model.

Expected behavior

The model documentation should only include the columns from that model itself.

dbt seems to expect that every "schema"."table" relation is unique. For example, you will get a compilation error if you have two models configured as follows:

model 1:

{{config(
        alias='users',
        schema='prod',
        database='db_1'
)}}

model 2:

{{config(
        alias='users',
        schema='prod',
        database='db_2'
)}}

dbt will not throw a compilation error, however, if a model is configured with the same "schema"."table" relation as a dbt source (but in a different database, of course).

dbt should either not allow configuring a model to have the same "schema"."table" as a source, or it should include database name in its expectation of relation uniqueness. The latter seems preferable if the goal is for dbt projects to be usable across multiple logical databases (e.g., with Snowflake).

Screenshots and log output

Source schema.yml and model .sql
Screen Shot 2019-08-27 at 10 59 34 PM

Generated documentation for model (should only have LONELY_COL)
Screen Shot 2019-08-27 at 11 02 06 PM

Generated documentation for source (should not have LONELY_COL)
Screen Shot 2019-08-27 at 11 02 18 PM

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

0.14.0

The operating system you're using:
macOS Mojave

The output of python --version:
2.7.10

@tjenkins tjenkins added bug Something isn't working triage labels Aug 28, 2019
@drewbanin drewbanin removed the triage label Aug 28, 2019
@drewbanin
Copy link
Contributor

drewbanin commented Aug 28, 2019

Thanks for this really thorough writeup @tjenkins! I totally agree with you: dbt should include the database name in its check for duplicate relation names. The fix here might also require code changes in the https://github.com/fishtown-analytics/dbt-docs repo. I'm going to queue this up for our 0.15.0 release.

@drewbanin drewbanin added the dbt-docs [dbt feature] documentation site, powered by metadata artifacts label Aug 28, 2019
@drewbanin drewbanin added this to the Louisa May Alcott milestone Aug 28, 2019
beckjake added a commit that referenced this issue Sep 20, 2019
…-comparing-catalog

Include the database when deciding if two tables are the same (#1708)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dbt-docs [dbt feature] documentation site, powered by metadata artifacts
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants