Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model pre_hook is not executed on ODBC, but it is for HTTP #256

Closed
dinedal opened this issue Nov 13, 2021 · 7 comments
Closed

Model pre_hook is not executed on ODBC, but it is for HTTP #256

dinedal opened this issue Nov 13, 2021 · 7 comments
Labels
bug Something isn't working Stale

Comments

@dinedal
Copy link

dinedal commented Nov 13, 2021

Describe the bug

We use a model with a config block like so:

{{ config(
  materialized='incremental',
  unique_key='requestId',
  schema="staging",
  pre_hook=["set spark.sql.caseSensitive=true"]
) }}

When running with a profile setup like this:

    dev:
      type: spark
      method: http
      schema: dev
      host: [host]
      port: 443
      token: [token]
      cluster: [cluster]
      threads: 1 

Everything works as expected.

When we run with the profile set up like this:

    odbctest:
      type: spark
      method: odbc
      driver: /Library/simba/spark/lib/libsparkodbc_sbu.dylib
      host: [host]
      token: [token]
      cluster: [cluster]
      schema: dev

We get the following error:

  ('42000', '[42000] [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: Ambiguous reference to fields

Steps To Reproduce

Simple reproduction:

  1. Setup a connection to a cluster via HTTP
  2. Have a model with a pre_hook that is not syntactically correct ( set spark.sql.caseSensitive=trrue )
  3. Verify model's pre_hook fails with an exception ( java.lang.IllegalArgumentException: spark.sql.caseSensitive should be boolean, but was trrue )
  4. Setup a connection to the same cluster via ODBC
  5. Verify model's pre_hook does not execute because no exception occurs

More complex way to reproduce:

  1. Setup a connection to a cluster via HTTP
  2. Have a model with a pre_hook
  3. Verify model's pre_hook executes
  4. Setup a connection to the same cluster via ODBC
  5. Verify model's pre_hook does not execute

Expected behavior

I expect on ODBC connections the behavior documented in https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook is correct.
I expect model pre_hooks to execute consistently regardless of the connection method.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

installed version: 0.21.0
   latest version: 0.21.0

Up to date!

Plugins:
  - bigquery: 0.21.0
  - snowflake: 0.21.0
  - redshift: 0.21.0
  - postgres: 0.21.0
  - spark: 0.21.0

The operating system you're using:
Mac OS X 12.0.1, Ubuntu

The output of python --version:
Python 3.8.6

Additional context

@dinedal dinedal added bug Something isn't working triage labels Nov 13, 2021
@jtcohen6
Copy link
Contributor

@dinedal Unfortunately, this is a known limitation for set statements in particular: #141 (comment)

I think this must have to do with pyodbc creating a new connection/session for each query, rather than persisting the same connection/session across multiple queries.

In the meantime, there are two viable workarounds to set your spark config (spark.sql.caseSensitive=true):

  1. Use the server_side_parameters profile config (new in v0.21, see Add support for ODBC Server Side Parameters #201)
  2. Configure a cluster settings (Databricks docs)

Unfortunately, neither workaround will give you granular, model-specific behavior. The first should at least give you the ability to change behavior between different invocations, such as by setting its value to an env var.

I'm not sure if there's anything we can do to fix this. I think we absolutely need to document the limitation and workarounds here: https://docs.getdbt.com/reference/resource-configs/spark-configs

@jtcohen6 jtcohen6 removed the triage label Nov 16, 2021
@dinedal
Copy link
Author

dinedal commented Nov 17, 2021

I actually tried option 1 and it was also ignored - option 2 is my current work-around as well, but it has it's own disadvantages.

I think we absolutely need to document the limitation and workarounds

Considering (ref https://docs.getdbt.com/reference/warehouse-profiles/spark-profile ) says:

odbc is the preferred method when connecting to Databricks.

Yes, I would agree that the preferred method should be documented that not all of dbt functions as expected. I would suggest changing the preferred method to HTTP, but on HTTP anything other than threads: 1 had issues, so it's not perfect either.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label May 17, 2022
@dinedal
Copy link
Author

dinedal commented May 17, 2022

Issue remains

@github-actions github-actions bot removed the Stale label May 18, 2022
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 14, 2022
@dinedal
Copy link
Author

dinedal commented Nov 14, 2022

Issue still remains

@github-actions github-actions bot removed the Stale label Nov 15, 2022
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

2 participants