Include the database when deciding if two tables are the same (#1708) #1774

beckjake · 2019-09-20T16:54:24Z

The problem there was that dbt didn't use the table database in the catalog as part of deciding if a table was "the same" as one in the manifest. That resulted in us combining both tables' column information into each entry.

While I was in here, I converted it to use type hints and dataclasses and hologram and all that good stuff, so we now can generate an actual json schema for our output and while reading the code you can actually tell what types things are...

I also noticed that we iterated over the whole manifest once per catalog entry, and converted it to generate and use a lookup table, so now we only iterate over the manifest once when linking up IDs to catalog entries. On very large projects this should speed things up a bit.

I am pretty sure I managed to keep the output format the same, so no need to update dbt docs for this.

I made dbt a bit more picky about table/column metadata fields coming from adapter.get_catalog(): they now have an exact list of what's required. Mostly because it makes json schema generation easier, but also mypy is happier this way.

I also changed the unit tests to be a bit less unit-y but also to test what we actually care about (input of catalog dict results -> correct structured json output when we write to disk)

…log generation Convert catalog intermediate structure into something more useful Make comparing manifests to catalogs faster by generating an explicit identifier to id mapping Make the identifier to unique ID mapping include databases Convert catalog to use dataclasses/hologram types Fix unit tests to test what we actually care about No changes to integration tests means no need to change dbt docs, hooray

drewbanin

Nice work! Great that no updates are required to the docs site with this change. Leaving the database out of the comparison was a pretty big oversight when we introduced sources, but glad that we're fixing it now!

cla-bot bot added the cla:yes label Sep 20, 2019

beckjake force-pushed the fix/include-database-comparing-catalog branch from 6c81562 to a9bb1aa Compare September 20, 2019 16:54

beckjake requested a review from drewbanin September 20, 2019 17:51

drewbanin approved these changes Sep 20, 2019

View reviewed changes

beckjake merged commit f6406c9 into dev/louisa-may-alcott Sep 20, 2019

beckjake deleted the fix/include-database-comparing-catalog branch September 20, 2019 20:16

drewbanin mentioned this pull request Nov 1, 2019

dbt docs generate adding both source and model column names (source and model share name) #1883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include the database when deciding if two tables are the same (#1708) #1774

Include the database when deciding if two tables are the same (#1708) #1774

beckjake commented Sep 20, 2019

drewbanin left a comment

Include the database when deciding if two tables are the same (#1708) #1774

Include the database when deciding if two tables are the same (#1708) #1774

Conversation

beckjake commented Sep 20, 2019

drewbanin left a comment

Choose a reason for hiding this comment