Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks Support #100

Open
phillem15 opened this issue Apr 11, 2023 · 12 comments
Open

Databricks Support #100

phillem15 opened this issue Apr 11, 2023 · 12 comments

Comments

@phillem15
Copy link

Is your feature request related to a problem? Please describe.
It is not related to a problem.

Describe the solution you'd like
I would like for there to be support for Databricks.

Describe alternatives you've considered
I have considered forking this repo and adding support for databricks.

Additional context

@clausherther
Copy link
Contributor

Hi @phillem15! I'd love to offer to support for spark and databricks for dbt-date and dbt-expectations. It's not so much a matter of writing compatible SQL, but a matter of being able to test against these platforms. We are currently only able to run CI/CD against Postgres, BigQuery and Snowflake. As a result, in the past, we've deferred support for non-core platforms (like spark, or MS SQL) to utils packages for those platforms as shims.
Would you be able to help set up integration testing for databricks?

@phillem15
Copy link
Author

Hi @clausherther, that makes total sense. I'd be happy to help! It looks like you are using circleci for this? That is a tool I have not used before but if you could point me in the right direction I'd be happy to learn more and help get that set up.

@clausherther
Copy link
Contributor

My concern is really around hosting a databricks instance to run the tests on (and ideally also hosting a spark instance). I currently maintain the BigQuery and Snowflake instances we test on (Postgres is created on the fly during CI/CD).
I'd have to look at the options for non-paid or low-cost options for those, and there's the question of access etc. I'm currently the only one with access to the CI/CD test databases and I'm not sure how we'd manage shared ownership etc.

@clausherther
Copy link
Contributor

Another issue I've had in the past is the databricks Community Edition clusters get dropped after 2 hours of inactivity, so for CI/CD to work one would have to programmatically created a cluster etc.

@fuselessmatt
Copy link

fuselessmatt commented Apr 20, 2023

We didn't realise it didn't support Databricks and have been using 0.5.7 for months. I tried to upgrade to 0.7.2 today and so far no errors noticed.

This is without doing the recommended "shim package" because we didn't realise it was expected

For other platforms, you will have to include a shim package for the platform, such as spark-utils, or tsql-utils.

Maybe we are just the package to a limited extent. Are we aware of anything not working?

@fuselessmatt
Copy link

Regarding integration tests, since at least Spark SQL is open source, wouldn't it be possible to instantiating spark on the CI/CI worker and run the tests there?

@alxsbn
Copy link

alxsbn commented Jun 6, 2024

Still not planned?

@clausherther
Copy link
Contributor

Not until someone can figure out how to hook up our CI tests to databricks without needing a paid account. Last I checked this wasn't doable.

@clausherther
Copy link
Contributor

(To be clear, we've had dbt-spark support for a while, just not databricks.)

@phillem15
Copy link
Author

phillem15 commented Jun 6, 2024 via email

@clausherther
Copy link
Contributor

Good to hear. Just keep in mind that since we can't test changes on databricks, it's possible that future releases may break databricks compatibility.

@alxsbn
Copy link

alxsbn commented Jun 7, 2024

@clausherther I will ping them to see how they can support OSS projects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants