Include the database when deciding if two tables are the same (#1708) #1774
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #1708
The problem there was that dbt didn't use the table database in the catalog as part of deciding if a table was "the same" as one in the manifest. That resulted in us combining both tables' column information into each entry.
While I was in here, I converted it to use type hints and dataclasses and hologram and all that good stuff, so we now can generate an actual json schema for our output and while reading the code you can actually tell what types things are...
I also noticed that we iterated over the whole manifest once per catalog entry, and converted it to generate and use a lookup table, so now we only iterate over the manifest once when linking up IDs to catalog entries. On very large projects this should speed things up a bit.
I am pretty sure I managed to keep the output format the same, so no need to update dbt docs for this.
I made dbt a bit more picky about table/column metadata fields coming from
adapter.get_catalog()
: they now have an exact list of what's required. Mostly because it makes json schema generation easier, but also mypy is happier this way.I also changed the unit tests to be a bit less unit-y but also to test what we actually care about (input of catalog dict results -> correct structured json output when we write to disk)