Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of column alias in snowflake external table seems wrong #315

Closed
1 of 3 tasks
DennisWagner opened this issue Sep 6, 2024 · 4 comments · Fixed by #318
Closed
1 of 3 tasks

Use of column alias in snowflake external table seems wrong #315

DennisWagner opened this issue Sep 6, 2024 · 4 comments · Fixed by #318
Labels
bug Something isn't working triage

Comments

@DennisWagner
Copy link

DennisWagner commented Sep 6, 2024

Describe the bug

When using the alias for a column in a Snowflake external table, the alias used both as a column name as well as within the column expression

Steps to reproduce

  1. Define external table in sources.yml
  2. Set alias for one column
  3. dbt run-operation stage_external_sources --vars "ext_full_refresh: true"

sources.yml:

version: 2

sources:

  • name: datalake
    schema: datalake

    tables:

    • name: external_table_test
      external:
      location: '@datalake.batch_load_stage/'
      file_format: datalake.parquet_format
      auto_refresh: true
      integration:
      columns:
      • name: FILECREATETIME
        alias: created_at_utc
        data_type: TIMESTAMP_NTZ

Expected results

I expected the alias to be used in either the expression, or the column name - not both places.

create or replace external table .datalake.external_table_test( created_at_utc TIMESTAMP_NTZ as ((case when is_null_value(value:FILECREATETIME) or lower(value:FILECREATETIME) = 'null' then null else value:FILECREATETIMEend)::TIMESTAMP_NTZ) ) location = @datalake.batch_load_stage/ auto_refresh = True

Actual results

The alias is used in the expression and as the column name in the table resulting in the column always returning null values.

Screenshots and log output

Output from log:
create or replace external table .datalake.external_table_test( created_at_utc TIMESTAMP_NTZ as ((case when is_null_value(value:created_at_utc) or lower(value:created_at_utc) = 'null' then null else value:created_at_utc end)::TIMESTAMP_NTZ) ) location = @datalake.batch_load_stage/ auto_refresh = True

System information

The contents of your packages.yml file:
packages:

  • package: dbt-labs/dbt_utils
    version: 1.1.1
  • package: dbt-labs/dbt_external_tables
    version: 0.9.0
  • package: dbt-labs/codegen
    version: 0.11.0

Which database are you using dbt with?

  • redshift
  • snowflake
  • other (specify: ____________)

The output of dbt --version:
2024.8.235 (from dbt cloud using versionless)

The operating system you're using:
dbt cloud

The output of python --version:
N/A

Additional context

@kyleburke-meq
Copy link
Contributor

kyleburke-meq commented Sep 17, 2024

@DennisWagner
Are you using an expression and an alias? Cause if you're defining an expression you wouldn't want to use an alias.

@DennisWagner
Copy link
Author

DennisWagner commented Sep 18, 2024

@kyleburke-meq
No, I only used name and alias, but when that did not work as expected, I used name and expression. Unfortunately, that makes me have to replicate the logic in the package for null handling, which I would like to avoid for simply renaming a column.

@kyleburke-meq
Copy link
Contributor

@DennisWagner I believe I found the issue and have put in a PR to fix it: #318

@cakkinep FYI

@DennisWagner
Copy link
Author

Wauv, that was fast. Thanks a lot for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants