Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docs generation for cross-db sources in REDSHIFT RA3 node #3408

Merged
merged 8 commits into from
Jun 9, 2021

Conversation

kostek-pl
Copy link
Contributor

@kostek-pl kostek-pl commented May 31, 2021

resolves #3236(partially)

This PR is related to #3236. It fixes documentation generation for cross-db objects used as SOURCES in models. However it not resolves full cross-db support as it's not possible at this moment due to AWS cross-db restrictions.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot
Copy link

cla-bot bot commented May 31, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: kostkaw.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @kostek-pl! I'm in favor of the overall concept. There's a little bit of reorganization needed in order to get the tests passing.

Comment on lines 114 to 118
def _get_catalog_schemas(self, manifest):
# postgres/redshift only allow one database (the main one)
# postgres/redshift(not ra3) only allow one database (the main one)
schemas = super()._get_catalog_schemas(manifest)
try:
return schemas.flatten()
return schemas.flatten(allow_duplicates=self.config.credentials.ra3_node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres credentials don't have an ra3_node property, so this is raising an error and breaking tests. I think the right move here is to reimplement _get_catalog_schemas within redshift/impl.py, including this RA3-specific logic.

@@ -433,13 +433,14 @@ def search(
for schema in schemas:
yield information_schema_name, schema

def flatten(self):
def flatten(self, allow_duplicates: bool=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a naming nit: this change isn't so much about duplicates per se; it's about allowing multiple databases at all. In the past, the only reason we'd expect to see multiple databases on Postgres/Redshift was if a metadata query accidentally returned duplicates.

@cla-bot
Copy link

cla-bot bot commented Jun 1, 2021

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: kostkaw.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@kostek-pl
Copy link
Contributor Author

Thank you very much @jtcohen6 for the review and tips.
I've updated code according to your suggestions, hope that it's done in the way you expected.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 2, 2021

Thanks @kostek-pl! The change here is looking good to me. Since we don't have an RA3 cluster available for automated tests, could you confirm that it resolves the issue for you locally? Namely, that you're able to generate docs that include sources from another database.

A number of tests were failing because they were missing secrets, on account of running from the kostek-pl CircleCI org instead of the fishtown-analytics CI org. I re-ran in our org, everything is passing except the linting step, which is just a handful of pep8 errors:

core/dbt/adapters/base/relation.py:436:53: E252 missing whitespace around parameter equals
core/dbt/adapters/base/relation.py:436:54: E252 missing whitespace around parameter equals
plugins/redshift/dbt/adapters/redshift/impl.py:70:1: W293 blank line contains whitespace
plugins/redshift/dbt/adapters/redshift/impl.py:73:16: E121 continuation line under-indented for hanging indent
plugins/redshift/dbt/adapters/redshift/impl.py:89:14: W291 trailing whitespace
  • Could you remove this repo from your CircleCI org, so that it will run in our org instead?
  • It looks like some of your commits have been co-authored by kostkaw <[email protected]>, which which hasn't signed the CLA. Could you amend those commits to be singularly authored by the email associated with your @kostek-pl account?

@cla-bot cla-bot bot added the cla:yes label Jun 7, 2021
@kostek-pl kostek-pl temporarily deployed to Postgres June 7, 2021 07:33 Inactive
@kostek-pl kostek-pl temporarily deployed to Redshift June 7, 2021 07:33 Inactive
@kostek-pl kostek-pl temporarily deployed to Redshift June 7, 2021 07:33 Inactive
@kostek-pl kostek-pl temporarily deployed to Bigquery June 7, 2021 07:33 Inactive
@kostek-pl kostek-pl temporarily deployed to Bigquery June 7, 2021 07:33 Inactive
@kostek-pl kostek-pl temporarily deployed to Snowflake June 7, 2021 07:33 Inactive
@kostek-pl kostek-pl temporarily deployed to Snowflake June 7, 2021 07:33 Inactive
@kostek-pl kostek-pl temporarily deployed to Postgres June 7, 2021 08:03 Inactive
@kostek-pl kostek-pl temporarily deployed to Redshift June 7, 2021 08:03 Inactive
@kostek-pl kostek-pl temporarily deployed to Redshift June 7, 2021 08:03 Inactive
@kostek-pl kostek-pl temporarily deployed to Bigquery June 7, 2021 08:03 Inactive
@kostek-pl kostek-pl temporarily deployed to Bigquery June 7, 2021 08:03 Inactive
@kostek-pl kostek-pl temporarily deployed to Snowflake June 7, 2021 08:03 Inactive
@kostek-pl kostek-pl temporarily deployed to Snowflake June 7, 2021 08:03 Inactive
@kostek-pl kostek-pl temporarily deployed to Postgres June 7, 2021 10:15 Inactive
@kostek-pl kostek-pl temporarily deployed to Redshift June 7, 2021 10:15 Inactive
@kostek-pl kostek-pl temporarily deployed to Redshift June 7, 2021 10:15 Inactive
@kostek-pl
Copy link
Contributor Author

@jtcohen6 thank you a lot for the remarks.
I've amended previous commits and got rid of my corporate email, removed CircleCi from my project and adjusted the code to be consistent with flake8.

Yes, I can confirm that This PR resolves the problem locally with docs generating.
To be more precise I've created 2 models, the first one reads(SOURCE) from the table that is created
in another database(sources) and the second one reads(REF) the first one which is on current database(dev).

with RA3_node not set or set to false I get an error:

Error sending message, disabling tracking
Compilation Error
Cross-db references allowed only in redshift RA3.* node. Got {'sources', 'dev'}

when RA3_node is set to true then docs are generated

example used:

profile:

[...]
target: dev
  outputs:
    dev:
      type: redshift
      ra3_node: true
[...]

schema:

version: 2

sources:
  - name: sources
    database: sources
    schema: public
    tables:
      - name: cross_db_test_table

models:
    - name: cross_db_step1

    - name: cross_db_step2

cross_db_step1:

select * from {{ source('sources', 'cross_db_test_table') }}

cross_db_step2:

select * from {{ ref('cross_db_step1') }}

result:

image

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @kostek-pl!

Could you add a changelog entry, and add yourself to the list of contributors? We've just released 0.20.0rc1, and this PR represents net-new functionality, so it should really go under a new section dbt next (i.e. v0.21). If we end up putting out another release candidate for v0.20, however, I'll consider sneaking it in :)

@kostek-pl kostek-pl temporarily deployed to Postgres June 9, 2021 07:20 Inactive
@kostek-pl kostek-pl temporarily deployed to Bigquery June 9, 2021 07:20 Inactive
@kostek-pl kostek-pl temporarily deployed to Bigquery June 9, 2021 07:20 Inactive
@kostek-pl kostek-pl temporarily deployed to Snowflake June 9, 2021 07:20 Inactive
@kostek-pl kostek-pl temporarily deployed to Snowflake June 9, 2021 07:20 Inactive
@kostek-pl kostek-pl temporarily deployed to Redshift June 9, 2021 07:20 Inactive
@kostek-pl kostek-pl temporarily deployed to Redshift June 9, 2021 07:20 Inactive
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @kostek-pl!

@jtcohen6 jtcohen6 merged commit eb4ad44 into dbt-labs:develop Jun 9, 2021
@jtcohen6 jtcohen6 mentioned this pull request Jun 9, 2021
5 tasks
@jtcohen6
Copy link
Contributor

If we end up putting out another release candidate for v0.20, however, I'll consider sneaking it in :)

Update: We will be putting out v0.20.0rc2 to fix some bugs and installation issues with v0.20.0rc1. That said, we're going to keep the scope of v0.21 narrow, and be disciplined about putting out a first prerelease within 6-8 weeks of v0.20.0-final. So I'm going to stay disciplined here, too, and keep the change in v0.21.

kwigley pushed a commit that referenced this pull request Jun 24, 2021
* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update
kwigley pushed a commit that referenced this pull request Jun 24, 2021
* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update
kwigley pushed a commit that referenced this pull request Jun 28, 2021
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>
kwigley pushed a commit that referenced this pull request Jun 28, 2021
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>
kwigley pushed a commit that referenced this pull request Jun 29, 2021
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>
kwigley pushed a commit that referenced this pull request Jun 29, 2021
…) (#3500)

* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

Co-authored-by: kostek-pl <[email protected]>
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

automatic commit by git-black, original commits:
  4d24656
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

automatic commit by git-black, original commits:
  4d24656
  a76ec42
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

automatic commit by git-black, original commits:
  2cc0579
  4d24656
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Fix docs generation for cross-db sources in REDSHIFT RA3 node (#3408)

* Fix docs generating for cross-db sources

* Code reorganization

* Code adjustments according to flake8

* Error message adjusted to be more precise

* CHANGELOG update

* add static analysis info to parsing data

* update changelog

* don't use `meta`! need better separation between dbt internal objects and external facing data. hacked an internal field on the manifest to save off this parsing info for the time being

* fix partial parsing case

Co-authored-by: kostek-pl <[email protected]>

automatic commit by git-black, original commits:
  4d24656
  7563b99
  aadf3c7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cross db queries in redshift
2 participants