Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bigquery connector #1111

Merged
merged 1 commit into from
Sep 27, 2022
Merged

Conversation

pawel-big-lebowski
Copy link
Collaborator

Signed-off-by: Pawel Leszczynski [email protected]

Problem

Spark integration fails with spark-bigquery-connector >=0.25.0

Closes: #1105

Solution

  • make Spark integration work with the latest connector
  • provide a change in a way that the old version would work either

Note: All schema changes require discussion. Please link the issue for context.

  • Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports S3 and GCS filesystem operations, tested with AWS EMR).

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

@pawel-big-lebowski pawel-big-lebowski added kind:bug Something isn't working area:integration/spark tool:bigquery Google BigQuery labels Sep 23, 2022
@pawel-big-lebowski pawel-big-lebowski marked this pull request as draft September 23, 2022 07:48
@pawel-big-lebowski pawel-big-lebowski force-pushed the spark/fix-bigquery-connector branch 3 times, most recently from 46cfdea to 0cb1821 Compare September 23, 2022 14:38
@boring-cyborg boring-cyborg bot added the area:documentation Improvements or additions to documentation label Sep 23, 2022
Signed-off-by: Pawel Leszczynski <[email protected]>
*
* @return
*/
private Optional<String> getBigQueryTableName(BigQueryRelation relation) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main change of the PR: call both methods tableName and getTableName on BigQueryRelation to retrieve name regardless of version.

@@ -40,7 +40,6 @@ repositories {
archivesBaseName='openlineage-spark-spark3'

ext {
bigqueryVersion = '0.21.1'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bigquery dependencies are not needed for spark2, spark3 and spark32 subprojects. Instead of upgrading, I remove them,

@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review September 26, 2022 07:28
@pawel-big-lebowski pawel-big-lebowski merged commit 2f3b199 into main Sep 27, 2022
@pawel-big-lebowski pawel-big-lebowski deleted the spark/fix-bigquery-connector branch September 27, 2022 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:documentation Improvements or additions to documentation area:integration/spark kind:bug Something isn't working tool:bigquery Google BigQuery
Projects
None yet
2 participants