Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt does not appropriately quote column names for incremental models and seed files. #1847

Closed
1 of 5 tasks
jthandy opened this issue Oct 22, 2019 · 5 comments · Fixed by #1888
Closed
1 of 5 tasks
Labels
bug Something isn't working

Comments

@jthandy
Copy link
Member

jthandy commented Oct 22, 2019

Describe the bug

dbt doesn't cooperate with lower- or mixed-case column names on Snowflake, regardless of quoting settings currently. While sometimes this "just works", there are two areas of the product that I've run into in testing that exhibit issues when mixed casing is used:

  1. dbt seed does not respect quotes in column names inside of a CSV file, creating the columns as unquoted.
  2. Both incremental materialization strategies (merge and delete+insert) list out the columns using introspection and do not quote the column list. Therefore, if any of the columns in an incremental model are mixed- or lower-case, incremental updates will always fail with an invalid identifier error.

I also tested snapshotting on top of the same test dataset I was using, and in all cases I could identify, column-level quoting was applied correctly and dbt produced the intended results.

Steps To Reproduce

Here's my test scenario:

create or replace table analytics.DBT_JTHANDY."QuotingSource" ("col_A" integer,"col_B" integer);
insert into analytics.DBT_JTHANDY."QuotingSource" ("col_A", "col_B") values
(1, 2),
(3,4),
(5,6)

--quoting_test.sql
{{config(
    materialized = 'incremental',
    unique_key = '"col_A"',
    incremental_strategy = "delete+insert"
    )}}

select "col_A", "col_B"
from {{ref('QuotingSource')}}
"col_A","col_B"
1,2
3,4
5,6

Expected behavior

I would expect that all columns that are quoted when their column name is provided to be referenced by dbt using their quoted column name.

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.14.2
latest version: 0.14.3

The operating system you're using: OSX

The output of python --version: Python 2.7.10

@jthandy jthandy added bug Something isn't working triage labels Oct 22, 2019
@jthandy
Copy link
Member Author

jthandy commented Oct 22, 2019

Two quick editorial notes:

  1. of the two problems (seeds and incremental models) identified here, I believe that incremental models are a significantly bigger problem. Fixing seeds would be good, but the fact that this is broken on incremental models is a much bigger problem IMO.
  2. I haven't looked at the source code at all, but I attempted to test the areas of the product where I could anticipate that dbt might be writing column-level SQL statements out. I may have missed some. It seems like this is an area that we haven't looked at explicitly in the past, so it would be useful for someone actually working on this to determine whether or not there are any other potential areas of the code base that this could impact that I missed.

@drewbanin drewbanin removed the triage label Oct 22, 2019
@drewbanin drewbanin added this to the Louisa May Alcott milestone Oct 24, 2019
beckjake added a commit that referenced this issue Nov 5, 2019
Fix quoting on columns for seeds/incremental models (#1847)
@coisnepe
Copy link

coisnepe commented Feb 3, 2022

I'm actually running into the same issue, also documented in this SO question. My table is built flawlessly but the incremental runs fail because the engine "uppercases" a column name that's not fully uppercased in my database:

The query:

{{
  config(
    materialized='incremental',
    unique_key='event_ID'
  )
}}

SELECT
    {{ dbt_utils.star(from=ref('staging_pg_ahoy_events'), relation_alias='events', prefix='event_') }},
    {{ dbt_utils.star(from=ref('staging_pg_ahoy_visits'), relation_alias='visits', prefix='visit_') }}
FROM
    {{ ref('staging_pg_ahoy_events') }} AS events
LEFT JOIN {{ ref('staging_pg_ahoy_visits') }} AS visits ON events.visit_id = visits.id
{% if is_incremental() %}
    WHERE "event_ID" >= (SELECT max("event_ID") FROM {{ this }})
{% endif %}

The error:

14:07:57    000904 (42000): SQL compilation error: error line 4 at position 12
14:07:57    invalid identifier 'DBT_INTERNAL_SOURCE.EVENT_ID'

The only fix I've found so far is by upper casing the prefix via dbt_utils, but that's more of a workaround…

@dvd-p
Copy link

dvd-p commented Sep 19, 2022

Experiencing the same invalid identifier error with snowflake and dbt incremental run, as described by @coisnepe.
@drewbanin Would it be possible to reopen this ticket or would you advise creating a new one - if there is none already?

@leahwicz
Copy link
Contributor

@dvd-p would you be able to open a new issue please? It just would make it easier to look at this with your specific scenario and details to make sure we are able to reproduce the error correctly

@tonygaul-hd
Copy link

Came here as this bug is still active using the Snowflake adapter. I did have success by implementing this workaround in the config (using @coisnepe example).
unique_key='"event_ID"'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants