Skip to content

Commit

Permalink
[PROPOSAL OpenLineage#2161] Add a Registry of Producers and Consumers…
Browse files Browse the repository at this point in the history
… in OpenLineage (OpenLineage#2228)

* add proposal for OpenLineage registry

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <julien.ledem@nlb-int-svc-proxy-06a01ef5ff9ffd47.elb.us-gov-west-1.amazonaws.com>

* fix headers

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <julien.ledem@nlb-int-svc-proxy-06a01ef5ff9ffd47.elb.us-gov-west-1.amazonaws.com>

* improve clarity

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Julien Le Dem <julien.ledem@nlb-int-svc-proxy-06a01ef5ff9ffd47.elb.us-gov-west-1.amazonaws.com>

* Proposed requirements and expanded spec for facet registrations (#2237)

* add proposal for OpenLineage registry

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* fix headers

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* improve clarity

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Added a requirement
Trying out a commit in a new branch...Hopefully I did it correctly :D

Signed-off-by: Sheeri K. Cabral <[email protected]>

* Added an entry for "core" to propose changes to the spec with a "known" starting point.

This entry:
---
root_doc_URL: "https://openlineage.io/spec/facets/"
produced_facets: [
  "ol:core:1-0-0/ColumnLineageDatasetFacet.json",
  "ol:core:1-0-1/ColumnLineageDatasetFacet.json",
  "ol:core:1-0-0/DataQualityAssertionsDatasetFacet.json"
]
---

indicates that the documentation for the produced facets are at:
https://openlineage.io/spec/facets/1-0-0/ColumnLineageDatasetFacet.json
https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json
https://openlineage.io/spec/facets/1-0-0/DataQualityAssertionsDatasetFacet.json

Signed-off-by: Sheeri K. Cabral <[email protected]>

* Proposing a new formatting of producer/consumer objects instead of an array of strings.

I have modified the "core" example, to show the translation between old and new format.
Added an "egeria" example which is both a producer and consumer.

- Producer and consumer doc root URLs may differ, so they are set inside the producer/consumer object
- It is assumed that the documentation link only applies to the facets owned by the same entity
  - e.g. egeria owns NewCustomFacet so the docs are at the Egeria doc URL
  - egeria does not own ColumnLineageDatasetFacet so there are no docs at the Egeria doc URL (unless it's extended by egeria?)
- "sample_URL" is where the examples/tests can be found (recommended but not required)
- "owner" is added for clarification. egeria produces their own custom facet plus one facet from core in this example
- spec_versions array for compatibility
- use_cases to better create the documentation page
- since there are "consumer" and "producer" objects, "facets" replaces "produced_facets" and "consumed_facets"

Signed-off-by: Sheeri K. Cabral <[email protected]>

* adding another proposed requirement

Signed-off-by: Sheeri K. Cabral <[email protected]>

* added note about accuracy of registry entries

Signed-off-by: Sheeri K. Cabral <[email protected]>

* added Acceptance guidelines

Signed-off-by: Sheeri K. Cabral <[email protected]>

* added a note about reserving names for the future.

Signed-off-by: Sheeri K. Cabral <[email protected]>

* Update registry.md

Removing open questions that we don't have an answer to

Signed-off-by: Sheeri K. Cabral <[email protected]>

* bump 0.30.0 release date (#2002)

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 0.30.0

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.0.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* ci: remove java macos arm parser build (#2003)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 0.30.1

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.0.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* fix changelog (#2005)

* fix changelog

Signed-off-by: Michael Robinson <[email protected]>

* add missing change

Signed-off-by: Michael Robinson <[email protected]>

---------

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Remove $ref facets from core spec. (#1997)

Move from boon to jv.

Add test facets.

Add pre-commit usage guide.

Change facets versions bump to REVISION level.

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* airflow: convert lineage from legacy file definition (#2006)

* airflow: convert lineage from legacy file definition

Signed-off-by: Maciej Obuchowski <[email protected]>

* Update integration/airflow/openlineage/airflow/extractors/converters.py

Co-authored-by: JDarDagran <[email protected]>

---------

Signed-off-by: Maciej Obuchowski <[email protected]>
Co-authored-by: JDarDagran <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Fix error message to avoid confusion (#2001)

Signed-off-by: Mars Lan <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* docs: add file transport documentation (#2008)

Signed-off-by: Alexandre Bergere <[email protected]>
Co-authored-by: Alexandre Bergere <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* airflow: make sure we cannot fail in thread despite direct execution (#2010)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Change log level to DEBUG when extractor isn't found (#2012)

There isn't much a user can do when an extractor is not even available for them to use. So changing this to DEBUG makes more sense IM

Signed-off-by: Kaxil Naik <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.gradle.test-retry from 1.5.3 to 1.5.4 in /integration/spark (#2021)

Bumps org.gradle.test-retry from 1.5.3 to 1.5.4.

---
updated-dependencies:
- dependency-name: org.gradle.test-retry
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.junit.jupiter:junit-jupiter in /client/java (#2022)

Bumps [org.junit.jupiter:junit-jupiter](https://github.com/junit-team/junit5) from 5.9.3 to 5.10.0.
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.3...r5.10.0)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump junit5Version from 5.9.3 to 5.10.0 in /integration/flink (#2015)

Bumps `junit5Version` from 5.9.3 to 5.10.0.

Updates `org.junit.jupiter:junit-jupiter` from 5.9.3 to 5.10.0
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.3...r5.10.0)

Updates `org.junit.jupiter:junit-jupiter-params` from 5.9.3 to 5.10.0
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.3...r5.10.0)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.junit:junit-bom from 5.9.3 to 5.10.0 in /integration/flink (#2016)

Bumps [org.junit:junit-bom](https://github.com/junit-team/junit5) from 5.9.3 to 5.10.0.
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.3...r5.10.0)

---
updated-dependencies:
- dependency-name: org.junit:junit-bom
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* http, snowflake: stop using reusable session by default, do not send full event on snowflake complete (#2025)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* add missing changes to changelog for 1.0.0 release (#2027)

* add missing changes to changelog for 1.0.0 release

Signed-off-by: Michael Robinson <[email protected]>

* add missing changes to changelog for 1.0.0 release continued

Signed-off-by: Michael Robinson <[email protected]>

---------

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.0.0

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.1.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] filter unwanted events (#1987)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.junit:junit-bom from 5.9.3 to 5.10.0 in /integration/spark (#2032)

Bumps [org.junit:junit-bom](https://github.com/junit-team/junit5) from 5.9.3 to 5.10.0.
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.3...r5.10.0)

---
updated-dependencies:
- dependency-name: org.junit:junit-bom
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] merge into delta integration test (#2026)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [FLINK] read configuration from flink conf (#2033)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump psycopg2-binary from 2.9.6 to 2.9.7 in /integration/airflow (#2031)

Bumps [psycopg2-binary](https://github.com/psycopg/psycopg2) from 2.9.6 to 2.9.7.
- [Changelog](https://github.com/psycopg/psycopg2/blob/master/NEWS)
- [Commits](https://github.com/psycopg/psycopg2/compare/2.9.6...2.9.7)

---
updated-dependencies:
- dependency-name: psycopg2-binary
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Don't use database as fallback when no schema parsed. (#2023)

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* add javadoc to the java client (#2004)

* add javadoc to the java client

Signed-off-by: Julien Le Dem <[email protected]>

* change for compiler version compat

Signed-off-by: Julien Le Dem <[email protected]>

* fix javadoc

Signed-off-by: Julien Le Dem <[email protected]>

---------

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* spark: fix wrong naming of JDBC datasets (#2035)

* spark: jdbc namespaces should not have database in them

Signed-off-by: Maciej Obuchowski <[email protected]>

* tests tests

Signed-off-by: Maciej Obuchowski <[email protected]>

---------

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* ci: bump Go image version used in root CI job (#2047)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] append output dataset name to job name (#2036)

* [SPARK] append output dataset name to job name

Signed-off-by: Pawel Leszczynski <[email protected]>

* [SPARK] use dot separator within job name parts

Signed-off-by: Pawel Leszczynski <[email protected]>

---------

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Add codespell pre-commit hook. (#2011)

* Add codespell pre-commit hook.

Fix mispells.

Signed-off-by: Jakub Dardzinski <[email protected]>

* Shorten allow list.

Signed-off-by: Jakub Dardzinski <[email protected]>

---------

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] upgrade latest supported version to 3.4.1 (#2057)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump junit5Version from 5.9.3 to 5.10.0 in /integration/spark (#2018)

Bumps `junit5Version` from 5.9.3 to 5.10.0.

Updates `org.junit.jupiter:junit-jupiter` from 5.9.3 to 5.10.0
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.3...r5.10.0)

Updates `org.junit.jupiter:junit-jupiter-params` from 5.9.3 to 5.10.0
- [Release notes](https://github.com/junit-team/junit5/releases)
- [Commits](https://github.com/junit-team/junit5/compare/r5.9.3...r5.10.0)

---
updated-dependencies:
- dependency-name: org.junit.jupiter:junit-jupiter
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.junit.jupiter:junit-jupiter-params
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] replace dbfs init scripts (#2055)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [FLINK] fix a bug when getting schema for KafkaSink (#2042)

* fix a bug when getting schema for KafkaSink

Signed-off-by: pentium3 <[email protected]>

* fix a bug when getting schema for KafkaSink

Signed-off-by: pentium3 <[email protected]>

---------

Signed-off-by: pentium3 <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bug/fix ignored event adaptive spark plan databricks (#2061)

* removed/adaptive_spark_plan from excludedNodes of DatabricksEventFilter. see :https://github.com/OpenLineage/OpenLineage/issues/2058

Signed-off-by: Abdallah Terrab <[email protected]>

* added Databricks integration tests testNarrowTransformation and testWideTransformation related to
https://github.com/OpenLineage/OpenLineage/issues/2058
Signed-off-by: Abdallah Terrab <[email protected]>

Signed-off-by: Abdallah Terrab <[email protected]>

* gradlew :app:spotlessApply

Signed-off-by: Abdallah Terrab <[email protected]>

* gradlew spotlessApply

Signed-off-by: Abdallah Terrab <[email protected]>

* gradlew spotlessApply
w/ java version "1.8.0_381"

Signed-off-by: Abdallah Terrab <[email protected]>

---------

Signed-off-by: Abdallah Terrab <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* update changelog for 1.1.0 (#2062)

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.1.0

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.2.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump io.confluent:kafka-schema-registry-client in /integration/flink (#2068)

Bumps [io.confluent:kafka-schema-registry-client](https://github.com/confluentinc/schema-registry) from 7.4.1 to 7.5.0.
- [Commits](https://github.com/confluentinc/schema-registry/compare/v7.4.1...v7.5.0)

---
updated-dependencies:
- dependency-name: io.confluent:kafka-schema-registry-client
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump testcontainersVersion from 1.18.1 to 1.19.0 in /integration/spark (#2064)

Bumps `testcontainersVersion` from 1.18.1 to 1.19.0.

Updates `org.testcontainers:junit-jupiter` from 1.18.1 to 1.19.0
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testcontainers/testcontainers-java/compare/1.18.1...1.19.0)

Updates `org.testcontainers:postgresql` from 1.18.1 to 1.19.0
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testcontainers/testcontainers-java/compare/1.18.1...1.19.0)

Updates `org.testcontainers:mockserver` from 1.18.1 to 1.19.0
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testcontainers/testcontainers-java/compare/1.18.1...1.19.0)

Updates `org.testcontainers:kafka` from 1.18.1 to 1.19.0
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testcontainers/testcontainers-java/compare/1.18.1...1.19.0)

---
updated-dependencies:
- dependency-name: org.testcontainers:junit-jupiter
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.testcontainers:postgresql
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.testcontainers:mockserver
  dependency-type: direct:production
  update-type: version-update:semver-minor
- dependency-name: org.testcontainers:kafka
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.xerial:sqlite-jdbc in /integration/spark (#2065)

Bumps [org.xerial:sqlite-jdbc](https://github.com/xerial/sqlite-jdbc) from 3.42.0.0 to 3.42.0.1.
- [Release notes](https://github.com/xerial/sqlite-jdbc/releases)
- [Changelog](https://github.com/xerial/sqlite-jdbc/blob/master/CHANGELOG)
- [Commits](https://github.com/xerial/sqlite-jdbc/compare/3.42.0.0...3.42.0.1)

---
updated-dependencies:
- dependency-name: org.xerial:sqlite-jdbc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.codehaus.groovy:groovy-all in /integration/flink (#2070)

Bumps [org.codehaus.groovy:groovy-all](https://github.com/apache/groovy) from 3.0.18 to 3.0.19.
- [Commits](https://github.com/apache/groovy/commits)

---
updated-dependencies:
- dependency-name: org.codehaus.groovy:groovy-all
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump io.confluent:kafka-avro-serializer in /integration/flink (#2069)

Bumps [io.confluent:kafka-avro-serializer](https://github.com/confluentinc/schema-registry) from 7.4.1 to 7.5.0.
- [Commits](https://github.com/confluentinc/schema-registry/compare/v7.4.1...v7.5.0)

---
updated-dependencies:
- dependency-name: io.confluent:kafka-avro-serializer
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* update jackson (#2071)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.openapi.generator from 6.6.0 to 7.0.0 in /client/java (#2066)

Bumps org.openapi.generator from 6.6.0 to 7.0.0.

---
updated-dependencies:
- dependency-name: org.openapi.generator
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Ci/fix pre commit step (#2072)

* Change python to python3.8.

Signed-off-by: Jakub Dardzinski <[email protected]>

* Bump cache version due to change in cimg image.

Signed-off-by: Jakub Dardzinski <[email protected]>

---------

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] fix RDD missing inputs (#2039)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump com.diffplug.spotless from 6.20.0 to 6.21.0 in /integration/flink (#2080)

Bumps com.diffplug.spotless from 6.20.0 to 6.21.0.

---
updated-dependencies:
- dependency-name: com.diffplug.spotless
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.xerial:sqlite-jdbc in /integration/spark (#2077)

Bumps [org.xerial:sqlite-jdbc](https://github.com/xerial/sqlite-jdbc) from 3.42.0.1 to 3.43.0.0.
- [Release notes](https://github.com/xerial/sqlite-jdbc/releases)
- [Changelog](https://github.com/xerial/sqlite-jdbc/blob/master/CHANGELOG)
- [Commits](https://github.com/xerial/sqlite-jdbc/compare/3.42.0.1...3.43.0.0)

---
updated-dependencies:
- dependency-name: org.xerial:sqlite-jdbc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump com.github.tomakehurst:wiremock in /integration/flink (#2079)

Bumps [com.github.tomakehurst:wiremock](https://github.com/wiremock/wiremock) from 2.27.2 to 3.0.1.
- [Release notes](https://github.com/wiremock/wiremock/releases)
- [Commits](https://github.com/wiremock/wiremock/compare/2.27.2...3.0.1)

---
updated-dependencies:
- dependency-name: com.github.tomakehurst:wiremock
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [FLINK] don't send RUNNING events after COMPLETE (#2075)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* http: use non-deprecated apiKey if loading it from env variables (#2029)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump pytest from 7.4.0 to 7.4.1 in /integration/airflow (#2081)

Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.0 to 7.4.1.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.4.0...7.4.1)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* fix: serde filtering (#2044)

* fix: serde filtering

Signed-off-by: Xiang Li <[email protected]>

* Rewrite lambda function.

Signed-off-by: Jakub Dardzinski <[email protected]>

---------

Signed-off-by: Xiang Li <[email protected]>
Signed-off-by: Jakub Dardzinski <[email protected]>
Co-authored-by: Xiang Li <[email protected]>
Co-authored-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Remove `sqlparser` main dependency in ifaces. (#2090)

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* spark: publish ProcessingEngineRunFacet (#2089)

* spark: publish ProcessingEngineRunFacet

Previously, the Spark integration published a custom facet named
'SparkVersion'. As it is custom, it isn't defined in the OpenLineage
spec. However, the OpenLineage spec defines a ProcessingEngineRunFacet
that is meant to capture details about the things that runs a job.

This change introduces this in the form of the
'SparkProcessingEngineRunFacetBuilder' and
'SparkProcessingEngineRunFacetBuilderDelegate'.

As the names suggest, these classes are meant to create and populate
ProcessingEngineRunFacet.

The reason for the existence of the delegate is because there are two
code paths that interact with the run facets, namely the code path
within the RddExecutionContext and the other in the
SparkSqlExecutionContext, albeit via a very roundabout way.

The delegate is the object that actually constructs the facet, whilst
the builder provides an adapter that uses the CustomFacetBuilder
interface.

Yes, it's a hacky way of doing it and may need to be changed in the
future. For now though, its good enough.

Closes: https://github.com/OpenLineage/OpenLineage/issues/2086
Signed-off-by: Damien Hawes <[email protected]>

* spark: Deprecated the SparkVersionFacet, alongside the version-facet.json

Signed-off-by: Damien Hawes <[email protected]>

* spark: Used a mocked SparkContext instead inside SparkProcessingEngineFacetBuilderTest

Signed-off-by: Damien Hawes <[email protected]>

* changelog: Updated the changelog indicating that the SparkVersionFacet will be removed in 1.4.0

Signed-off-by: Damien Hawes <[email protected]>

---------

Signed-off-by: Damien Hawes <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK][FLINK] Unify dataset naming from URI objects. (#2083)

* [SPARK][FLINK] Unify dataset naming from URI objects.

Signed-off-by: Pawel Leszczynski <[email protected]>

* [SPARK] move DatasetIdentifier to openlineage-java

Signed-off-by: Pawel Leszczynski <[email protected]>

---------

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* fix ol proxy chart (#2091)

Signed-off-by: Harel Shein <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.slf4j:slf4j-simple from 2.0.7 to 2.0.9 in /client/java (#2096)

Bumps org.slf4j:slf4j-simple from 2.0.7 to 2.0.9.

---
updated-dependencies:
- dependency-name: org.slf4j:slf4j-simple
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] verify dataset naming on databricks and limit amount of events sent (#2076)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [CI] fix circle CI caches (#2101)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Capture clusterAllTags variable from databricks (#2099)

* capture clusterAllTags var

Signed-off-by: anirudh.shrinivason <[email protected]>

* Update changelod

Signed-off-by: anirudh.shrinivason <[email protected]>

* Changelog nit

Signed-off-by: anirudh.shrinivason <[email protected]>

---------

Signed-off-by: anirudh.shrinivason <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* python: fix custom http transport TokenProvider (#2100)

* (init)

Signed-off-by: John Lukenoff <[email protected]>

* unlint

Signed-off-by: John Lukenoff <[email protected]>

---------

Signed-off-by: John Lukenoff <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] fix failing S3 test on main (#2102)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* fix: Support parsing dbt dbt_project.yml without target-path (#2106)

As of dbt v1.5, usage of target-path in the dbt_project.yml file has been deprecated, now preferring a CLI flag or env var. It will be removed in a future version. See dbt-labs/dbt-core#6882

Docs: https://docs.getdbt.com/reference/project-configs/target-path

This change allows users to run DbtLocalArtifactProcessor in dbt projects that don't declare target-path

Fix: #2093

Signed-off-by: Tatiana Al-Chueyr <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* docs: Add openlineage-integration-common PyPI links (#2108)

Signed-off-by: Tatiana Al-Chueyr <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* update changelog for 1.2.0 (#2111)

* update changelog for 1.2.0

---------

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* update changelog (#2112)

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.2.0

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.3.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [CI] fix checksum for release sql java (#2114)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* update changelog for 1.2.1 (#2119)

* update changelog for 1.2.1

Signed-off-by: Michael Robinson <[email protected]>

* revert changelog change

Signed-off-by: Michael Robinson <[email protected]>

---------

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.2.1

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.3.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* revert bump: org.openapi.generator required JDK 11 (#2113)

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* update changelog for 1.2.2 (#2120)

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.2.2

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.3.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Update the openlineage-java client's documentation (#2123)

- Added missing transports documentation to the README.md
- Streamlined descriptions for clarity and consistency across all transport types.
- Organized configuration details and examples for better readability.
- Highlighted key notes and behaviors for each transport.

Signed-off-by: Damien Hawes <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.projectlombok:lombok in /integration/spark (#2125)

Bumps [org.projectlombok:lombok](https://github.com/projectlombok/lombok) from 1.18.28 to 1.18.30.
- [Changelog](https://github.com/projectlombok/lombok/blob/master/doc/changelog.markdown)
- [Commits](https://github.com/projectlombok/lombok/compare/v1.18.28...v1.18.30)

---
updated-dependencies:
- dependency-name: org.projectlombok:lombok
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.projectlombok:lombok in /integration/flink (#2128)

Bumps [org.projectlombok:lombok](https://github.com/projectlombok/lombok) from 1.18.28 to 1.18.30.
- [Changelog](https://github.com/projectlombok/lombok/blob/master/doc/changelog.markdown)
- [Commits](https://github.com/projectlombok/lombok/compare/v1.18.28...v1.18.30)

---
updated-dependencies:
- dependency-name: org.projectlombok:lombok
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.openapi.generator from 6.6.0 to 7.0.1 in /client/java (#2126)

Bumps org.openapi.generator from 6.6.0 to 7.0.1.

---
updated-dependencies:
- dependency-name: org.openapi.generator
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.projectlombok:lombok from 1.18.28 to 1.18.30 in /client/java (#2127)

Bumps [org.projectlombok:lombok](https://github.com/projectlombok/lombok) from 1.18.28 to 1.18.30.
- [Changelog](https://github.com/projectlombok/lombok/blob/master/doc/changelog.markdown)
- [Commits](https://github.com/projectlombok/lombok/compare/v1.18.28...v1.18.30)

---
updated-dependencies:
- dependency-name: org.projectlombok:lombok
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [CI] modify CircleCi resource class for macos (#2133)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.gradle.test-retry from 1.5.4 to 1.5.5 in /integration/spark (#2116)

Bumps org.gradle.test-retry from 1.5.4 to 1.5.5.

---
updated-dependencies:
- dependency-name: org.gradle.test-retry
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* add timers to ol emit calls (#1845)

Add tests for stats.

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump pytest from 7.4.1 to 7.4.2 in /integration/airflow (#2094)

Bumps [pytest](https://github.com/pytest-dev/pytest) from 7.4.1 to 7.4.2.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/7.4.1...7.4.2)

---
updated-dependencies:
- dependency-name: pytest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* dbt: Add SQLSERVER to supported dbt profile types (#2136)

* Add SQLSERVER to supported dbt profile types

Signed-off-by: Erik Alfthan <[email protected]>

* Signed commit

Signed-off-by: Erik Alfthan <[email protected]>

* Update integration/common/openlineage/common/provider/dbt/processor.py

Co-authored-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Erik Alfthan <[email protected]>

---------

Signed-off-by: Erik Alfthan <[email protected]>
Co-authored-by: Erik Alfthan <[email protected]>
Co-authored-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* #2130 Add columns as schema facet for airflow.lineage.Table (if defined) (#2138)

* #2130 Add columns as schema facet for airflow.lineage.Table (if defined)

Signed-off-by: Erik Alfthan <[email protected]>

* #2130 Format import - revert black formatting on untouched test

Signed-off-by: Erik Alfthan <[email protected]>

* #2130 Format import - revert black formatting on untouched function

Signed-off-by: Erik Alfthan <[email protected]>

* Apply ruff sort

Signed-off-by: Erik Alfthan <[email protected]>

---------

Signed-off-by: Erik Alfthan <[email protected]>
Co-authored-by: Erik Alfthan <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Updated the README.md of the sql parser (#2140)

This just surfaces the list of supported dialects to make it more accessible to readers.

Signed-off-by: Damien Hawes <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Add more graceful logging when no OL provider installed. (#2141)

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Fix find-links path in tox. (#2139)

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump psycopg2-binary from 2.9.7 to 2.9.8 in /integration/airflow (#2146)

Bumps [psycopg2-binary](https://github.com/psycopg/psycopg2) from 2.9.7 to 2.9.8.
- [Changelog](https://github.com/psycopg/psycopg2/blob/master/NEWS)
- [Commits](https://github.com/psycopg/psycopg2/compare/2.9.7...2.9.8)

---
updated-dependencies:
- dependency-name: psycopg2-binary
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Spark: Fixed scheme preservation bug in PathUtils (#2142)

Summary:

Fixed a bug in the PathUtils' prepareDatasetIdentifierFromDefaultTablePath(CatalogTable) method, ensuring the correct scheme preservation from the CatalogTable's location.

Details:

Previously, when generating a DatasetIdentifier from a CatalogTable's default path, the scheme (like "hdfs") could be incorrectly set to "file". This fix addresses the issue, ensuring that the proper scheme from the CatalogTable's location is always preserved.

Impact:

This fix ensures the accuracy and correctness of the DatasetIdentifier's namespace.

Testing:

A unit test was added in PathUtilsTest#testFromCatalogTableShouldReturnADatasetIdentifierWithTheActualScheme

Issue: https://github.com/OpenLineage/OpenLineage/issues/2132

Signed-off-by: Damien Hawes <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump com.diffplug.spotless from 6.21.0 to 6.22.0 in /integration/flink (#2145)

Bumps com.diffplug.spotless from 6.21.0 to 6.22.0.

---
updated-dependencies:
- dependency-name: com.diffplug.spotless
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.apache.avro:avro from 1.11.2 to 1.11.3 in /integration/flink (#2144)

Bumps org.apache.avro:avro from 1.11.2 to 1.11.3.

---
updated-dependencies:
- dependency-name: org.apache.avro:avro
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] support for Spark 3.5 (#2118)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Update the changelog (#2148)

* Updates the changelog.

---------

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.3.0

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.4.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Reverts org.openapi.generator version to enable Java client release. (#2152)

* Reverts org.openapi.generator version to enable java client release.

Signed-off-by: Michael Robinson <[email protected]>

* Configures dependabot to ignore org.openapi.generator.

Signed-off-by: Michael Robinson <[email protected]>

* Updates changelog for 1.3.1.

Signed-off-by: Michael Robinson <[email protected]>

---------

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.3.1

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.4.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [Flink] expand iceberg source types (#2149)

Signed-off-by: Zhenqiu Huang <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump testcontainersVersion from 1.19.0 to 1.19.1 in /integration/spark (#2154)

Bumps `testcontainersVersion` from 1.19.0 to 1.19.1.

Updates `org.testcontainers:junit-jupiter` from 1.19.0 to 1.19.1
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testcontainers/testcontainers-java/compare/1.19.0...1.19.1)

Updates `org.testcontainers:postgresql` from 1.19.0 to 1.19.1
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testcontainers/testcontainers-java/compare/1.19.0...1.19.1)

Updates `org.testcontainers:mockserver` from 1.19.0 to 1.19.1
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testcontainers/testcontainers-java/compare/1.19.0...1.19.1)

Updates `org.testcontainers:kafka` from 1.19.0 to 1.19.1
- [Release notes](https://github.com/testcontainers/testcontainers-java/releases)
- [Changelog](https://github.com/testcontainers/testcontainers-java/blob/main/CHANGELOG.md)
- [Commits](https://github.com/testcontainers/testcontainers-java/compare/1.19.0...1.19.1)

---
updated-dependencies:
- dependency-name: org.testcontainers:junit-jupiter
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.testcontainers:postgresql
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.testcontainers:mockserver
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: org.testcontainers:kafka
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.scala-lang.modules:scala-collection-compat_2.12 (#2157)

Bumps [org.scala-lang.modules:scala-collection-compat_2.12](https://github.com/scala/scala-collection-compat) from 2.1.2 to 2.11.0.
- [Release notes](https://github.com/scala/scala-collection-compat/releases)
- [Commits](https://github.com/scala/scala-collection-compat/compare/v2.1.2...v2.11.0)

---
updated-dependencies:
- dependency-name: org.scala-lang.modules:scala-collection-compat_2.12
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Allow setting client's endpoint via environment variable (#2151)

* Allow setting client's endpoint via environment variable

Signed-off-by: Mars Lan <[email protected]>

* Fix lint errors

Signed-off-by: Mars Lan <[email protected]>

---------

Signed-off-by: Mars Lan <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.gradle.test-retry from 1.5.5 to 1.5.6 in /integration/spark (#2156)

Bumps org.gradle.test-retry from 1.5.5 to 1.5.6.

---
updated-dependencies:
- dependency-name: org.gradle.test-retry
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump psycopg2-binary from 2.9.8 to 2.9.9 in /integration/airflow (#2155)

Bumps [psycopg2-binary](https://github.com/psycopg/psycopg2) from 2.9.8 to 2.9.9.
- [Changelog](https://github.com/psycopg/psycopg2/blob/master/NEWS)
- [Commits](https://github.com/psycopg/psycopg2/compare/2.9.8...2.9.9)

---
updated-dependencies:
- dependency-name: psycopg2-binary
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] add debug facet to help resolving Spark integration issues (#2147)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.apache.kafka:kafka-clients from 3.5.1 to 3.6.0 in /client/java (#2172)

Bumps org.apache.kafka:kafka-clients from 3.5.1 to 3.6.0.

---
updated-dependencies:
- dependency-name: org.apache.kafka:kafka-clients
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump io.confluent:kafka-avro-serializer in /integration/flink (#2174)

Bumps [io.confluent:kafka-avro-serializer](https://github.com/confluentinc/schema-registry) from 7.5.0 to 7.5.1.
- [Commits](https://github.com/confluentinc/schema-registry/compare/v7.5.0...v7.5.1)

---
updated-dependencies:
- dependency-name: io.confluent:kafka-avro-serializer
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.apache.kafka:kafka-clients in /integration/spark (#2171)

Bumps org.apache.kafka:kafka-clients from 3.5.1 to 3.6.0.

---
updated-dependencies:
- dependency-name: org.apache.kafka:kafka-clients
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Enable nessie rest catalog (#2165)

Add a new case to the if-else statement in the getDatasetIdentifier() method to handle Nessie catalogs.
#2084

Signed-off-by: WINKJUL <[email protected]>
Co-authored-by: WINKJUL <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump io.confluent:kafka-schema-registry-client in /integration/flink (#2173)

Bumps [io.confluent:kafka-schema-registry-client](https://github.com/confluentinc/schema-registry) from 7.5.0 to 7.5.1.
- [Commits](https://github.com/confluentinc/schema-registry/compare/v7.5.0...v7.5.1)

---
updated-dependencies:
- dependency-name: io.confluent:kafka-schema-registry-client
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Updates changelog for 1.4.0. (#2178)

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.4.0

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.5.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Remove `deref()` from openlineage-sql impl. (#2179)

Fix mispell in changelog.

Fix `unwrap_or_else`.

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Updates changelog for 1.4.1. (#2180)

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.4.1

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.5.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] migrate RddExecutionContext to PlanUtils (#2181)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Skip redaction on `ColumnLineageDatasetFacetFieldsAdditionalInputFields`. (#2177)

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump com.amazonaws:amazon-kinesis-producer in /client/java (#2196)

Bumps [com.amazonaws:amazon-kinesis-producer](https://github.com/awslabs/amazon-kinesis-producer) from 0.15.7 to 0.15.8.
- [Release notes](https://github.com/awslabs/amazon-kinesis-producer/releases)
- [Changelog](https://github.com/awslabs/amazon-kinesis-producer/blob/master/CHANGELOG.md)
- [Commits](https://github.com/awslabs/amazon-kinesis-producer/compare/v0.15.7...v0.15.8)

---
updated-dependencies:
- dependency-name: com.amazonaws:amazon-kinesis-producer
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump com.github.davidmc24.gradle.plugin:gradle-avro-plugin (#2193)

Bumps [com.github.davidmc24.gradle.plugin:gradle-avro-plugin](https://github.com/davidmc24/gradle-avro-plugin) from 1.8.0 to 1.9.1.
- [Release notes](https://github.com/davidmc24/gradle-avro-plugin/releases)
- [Changelog](https://github.com/davidmc24/gradle-avro-plugin/blob/master/CHANGES.md)
- [Commits](https://github.com/davidmc24/gradle-avro-plugin/compare/1.8.0...1.9.1)

---
updated-dependencies:
- dependency-name: com.github.davidmc24.gradle.plugin:gradle-avro-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump com.github.davidmc24.gradle.plugin.avro in /integration/flink (#2195)

Bumps [com.github.davidmc24.gradle.plugin.avro](https://github.com/davidmc24/gradle-avro-plugin) from 1.8.0 to 1.9.1.
- [Release notes](https://github.com/davidmc24/gradle-avro-plugin/releases)
- [Changelog](https://github.com/davidmc24/gradle-avro-plugin/blob/master/CHANGES.md)
- [Commits](https://github.com/davidmc24/gradle-avro-plugin/compare/1.8.0...1.9.1)

---
updated-dependencies:
- dependency-name: com.github.davidmc24.gradle.plugin.avro
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] fix duplicate COMPLETE events (#2103)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [Flink] support Flink cassandra lineage (#2175)

Signed-off-by: Zhenqiu Huang <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* athena: change dataset name to its location (#2167)

* fix ci

athena: change dataset name to its location

Signed-off-by: dkt-sophie-ly <[email protected]>

* add s3 location in symlink facets

Signed-off-by: dkt-sophie-ly <[email protected]>

* add test for athena extractor

Signed-off-by: dkt-sophie-ly <[email protected]>

* fix ci

Signed-off-by: dkt-sophie-ly <[email protected]>

* constraint airflow version

Signed-off-by: dkt-sophie-ly <[email protected]>

* small fix

Signed-off-by: dkt-sophie-ly <[email protected]>

* fix typo in version

Signed-off-by: dkt-sophie-ly <[email protected]>

---------

Signed-off-by: dkt-sophie-ly <[email protected]>
Co-authored-by: dkt-sophie-ly <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* minor changes to spark/README.md file to avoid (#2202)

hick ups in first time setup of spark integration

Signed-off-by: savan navalgi <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.apache.logging.log4j:log4j-slf4j-impl in /integration/flink (#2205)

Bumps org.apache.logging.log4j:log4j-slf4j-impl from 2.20.0 to 2.21.0.

---
updated-dependencies:
- dependency-name: org.apache.logging.log4j:log4j-slf4j-impl
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump com.typesafe:config from 1.4.2 to 1.4.3 in /integration/flink (#2206)

Bumps [com.typesafe:config](https://github.com/lightbend/config) from 1.4.2 to 1.4.3.
- [Release notes](https://github.com/lightbend/config/releases)
- [Changelog](https://github.com/lightbend/config/blob/main/NEWS.md)
- [Commits](https://github.com/lightbend/config/compare/v1.4.2...v1.4.3)

---
updated-dependencies:
- dependency-name: com.typesafe:config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.xerial:sqlite-jdbc in /integration/spark (#2207)

Bumps [org.xerial:sqlite-jdbc](https://github.com/xerial/sqlite-jdbc) from 3.43.0.0 to 3.43.2.1.
- [Release notes](https://github.com/xerial/sqlite-jdbc/releases)
- [Changelog](https://github.com/xerial/sqlite-jdbc/blob/master/CHANGELOG)
- [Commits](https://github.com/xerial/sqlite-jdbc/compare/3.43.0.0...3.43.2.1)

---
updated-dependencies:
- dependency-name: org.xerial:sqlite-jdbc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] write scala integration test (#2188)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] support databricks 13.3. (#2185)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Update fluentd proxy to validate against 2.0 spec. (#2213)

Add unit tests to CI.

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Add always step. (#2182)

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Loosen attrs and requests versions. (#2107)

Remove unnecessary dependency in openlineage-airflow.

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [SPARK] fix bitnami image hash (#2216)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.apache.logging.log4j:log4j-slf4j-impl in /integration/flink (#2217)

Bumps org.apache.logging.log4j:log4j-slf4j-impl from 2.21.0 to 2.21.1.

---
updated-dependencies:
- dependency-name: org.apache.logging.log4j:log4j-slf4j-impl
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Add script to dev for generating release docs for the website (#2219)

* Adds script for generating release doc.

Signed-off-by: Michael Robinson <[email protected]>

* Adds docstring about purpose of script.

Signed-off-by: Michael Robinson <[email protected]>

---------

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Render yaml configs lazily. (#2221)

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare for release 1.5.0

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Prepare next development version 1.6.0-SNAPSHOT

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Updates the changelog for 1.5.0. (#2224)

Signed-off-by: Michael Robinson <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Bump org.xerial:sqlite-jdbc in /integration/spark (#2231)

Bumps [org.xerial:sqlite-jdbc](https://github.com/xerial/sqlite-jdbc) from 3.43.2.1 to 3.43.2.2.
- [Release notes](https://github.com/xerial/sqlite-jdbc/releases)
- [Changelog](https://github.com/xerial/sqlite-jdbc/blob/master/CHANGELOG)
- [Commits](https://github.com/xerial/sqlite-jdbc/compare/3.43.2.1...3.43.2.2)

---
updated-dependencies:
- dependency-name: org.xerial:sqlite-jdbc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* upgrade gradle and jackson (#2233)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Pin yq version. (#2235)

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* revert #2216 as is no longer required (#2234)

Signed-off-by: Pawel Leszczynski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* [FLINK] add option for flink job listener to read from flink conf (#2229)

* allow flink job listener to read config values from stream execution environment

Signed-off-by: ensctom <[email protected]>

* add changelog and update flink readme

Signed-off-by: ensctom <[email protected]>

---------

Signed-off-by: ensctom <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* spec: add clarity to snowflake naming docs (#2223)

Signed-off-by: David Goss <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* add Kafka to naming schema (#2226)

Add missing Kafka to Naming.md

Signed-off-by: Maciej Obuchowski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Update README.md (#2236)

Add latest spark version supported
---------

Signed-off-by: Julien Le Dem <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* Run always workflow really always. (#2238)

Fix spelling.

Signed-off-by: Jakub Dardzinski <[email protected]>
Signed-off-by: Sheeri K. Cabral <[email protected]>

* dagster: support dagster 1.5.x (#2220)

* fix(dagster): fix missing imports in conftest.py

fix missing imports in conftest.py
---
Signed-off-by: George T. C., Lai <[email protected]>
Closes #2043
Signed-off-by: George T. C. Lai <[email protected]>

* fix(dagster): fix missing argument and AttributeError

EventRecordsFilter accepts an event_type as mandatory argument so that
we have to get event records for each event_type.

---

Signed-off-by:
George T. C., Lai <[email protected]>

Closes #2043

Signed-off-by: George T. C. Lai <[email protected]>

* test(dagster): correct tests for utils.py

correct tests for utils.py

---

Signed-off-by: George T. C., Lai
<[email protected]>

Closes #2043

Signed-off-by: George T. C. Lai <[email protected]>

* test(dagster): correct tests for sensor evaluation

correct tests for sensor evaluation

---

Signed-off-by: George T. C., Lai
<[email protected]>

Closes #2043

Signed-off-by: George T. C. Lai <[email protected]>

* fix(dagster): fix missing arguments for sensor

sensor factory method now accepts additional event_type with default set
PIPELINE_EVENTS and STEP_EVENTS for filtering event
records

---

Signed-off-by: George T. C., Lai <[email protected]>

Closes #2043

Signed-off-by: George T. C. Lai <[email protected]>

* docs(dagster): correct requirements for dagster version

correct requirements for Dagster version to 0.15.0+

---

Signed-off-by:
George T. C., Lai <[email protected]>

Closes #2043

Signed-off-by: George T. C. Lai <[email protected]>

* fix(dagster): support for Dagster version >=1.0.0

support for Dagster version >=1.…
  • Loading branch information
1 parent 0865206 commit e53a776
Showing 1 changed file with 267 additions and 0 deletions.
267 changes: 267 additions & 0 deletions proposals/2161/registry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,267 @@
# OpenLineage registry proposal
## Goal
- Allow third parties to register their implementations or custom extensions to make them easy to discover.
- Shorten “Producer” and “schema url” values

## Concept needing a registry:
Producers:

- Custom facet prefix to registry of facet schemas
- Producer uri to full URL of producer doc
- Facets produced
- Facet URI to full facet schema url
Consumers:
- URL to Documentation of facets understood.
- Facets consumed

Requirements:

- Producers can create and evolve their custom facets without requiring approval from the OpenLineage project.
- Producers and Consumers can update the list of the facets they produce or consume without requiring approval from the OpenLineage project.
- Consumers can independently discover and support custom facets.
- OpenLineage users can easily explore which producers and/or consumers best meet their compatibility needs
- URIs should be short (producer, faceturl)
- A registered name can be both a producer and a consumer.

## Proposal

### Name registration and shorter URIs
Each consumer or producer entity can claim a name, defined in the registry: “{name}”
Each registered entity will provide a documentation URL for its documentation.
The registered name is used to shorten “producer” and “schemaUrl” fields in facets.

### Core facets
As part of the creation of the registry, the core facets under "spec/facets" will be moved to the registry as well under the "core" name. They will follow all the same constraints as all the other facets in the registry. The "core" name is used to shorten the URIs. ex: "ol:core:{FacetName}"

## Acceptance Guidelines
To claim a name, an entity must have either documentation or a test/sample. "Reserving" a name prior to public functionality is discouraged.

Corresponding values to be used:

- Custom facet Prefix = `“${name}”`
- Producer URI prefix = `“ol:${name}”`
- Schema URI prefix: `“ol:${name}:${path}”` => `“${schema url prefix}/${path}”`

### CI and Documentation

#### CI validation
- The registry is consistent:
- Validate custom facet prefixes match the registered name
- The registry has the required fields
- Linting.
- Custom facet schemas are valid and validated against examples.

#### The registry will be used to publish documentation in CI
- A page similar to our ecosystem page that lists producers, consumers, links to their documentation and what facets they support.
- Custom facets on their schema URL like the current core facets:
Ex: https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json
- Generated doc from the json schemas of the facets. Publish them on openlineage.io

#### Additional documentation
We can create a documentation page per use case to document what facet can be use in what case:

- Compliance with privacy laws (GDPR, CCPA, …)
- Compliance with Banking regulation (BCBS-239)
- Data reliability, data quality
- Data discovery, data catalog
- Data Governance
- Data Lineage
-

We should have a page explaining how a custom facet can be promoted to a core facet through the OpenLineage Proposal process.

### The Registry

We propose a self contained registry hosted in the OpenLineage repository.

**TL;DR: The registry defines consumers and producers names and contains all their custom facets.**

Access control is delegated using `CODEOWNERS` files
Each participant in the ecosystem owns their own folder.

Structure:

```
OpenLineage
CODEOWNERS <- Delegate approval to the owners of each spec/registry/{Name}/ folder
spec/registry/
{Name}/ <- custom facet schemas are stored in this folder
<- and respect the same rules as spec/facets
registry.json: <- one file per participant
Producer:
Producer root doc URL: https://…
Produced facets:
{ facets: [ {“URI}”, “{URI}”, … ]}
Consumer:
Consumer root doc URL: https://…
Consumed facets:
{ facets: [ “{URI}”, “{URI}”, … ]}
/facets/ <- where custom facet schemas are stored
/facets/examples/{FacetName}/{number}.json <- where facet examples are stored
```
Facet examples are currently in [spec/tests](https://github.com/OpenLineage/OpenLineage/blob/main/spec/tests/ColumnLineageDatasetFacet/1.json)

Examples:

In `OpenLineage/spec/registry/`

```
airflow/
registry.json
{
producer: {
root_doc_URL: “https://airflow/doc”
produced_facets: [
“ol:airflow:AirflowRunFacet.json”,
“ol:core:1-0-0/DatasetVersionDatasetFacet.json”,
]
}
}
facets/AirflowRunFacet.json
```
```
core/
registry.json
{
producer: {
root_doc_URL: "https://openlineage.io/spec/facets/",
sample_URL: "https://github.com/OpenLineage/OpenLineage/tree/main/spec/tests/",
facets: [
"ColumnLineageDatasetFacet.json": {
"owner": "core"
},
"DataQualityAssertionsDatasetFacet.json": {
"owner": "core"
}
]
}
}
```
```
egeria/
registry.json
{
producer: {
root_doc_URL: … ,
sample_URL: … ,
facets: [
"ColumnLineageDatasetFacet.json": {
"owner": "core"
},
"NewCustomFacet.json": {
"owner": "egeria"
}
]
},
consumer: {
root_doc_URL: …
facets: [
"NewCustomFacet.json": {
"owner": "egeria"
}
]
}
}
```
```
manta/
registry.json
{
consumer: {
root_doc_URL: “https://manta.com/doc”
consumed_facets: [ … ]
}
}
```

These files get published on openlineage.io just like the official spec:

- `https://openlineage.io/spec/2-0-1/OpenLineage.json`
- `https://openlineage.io/spec/facets/1-0-1/ColumnLineageDatasetFacet.json`

The Airflow producer should now use:

- Custom facet prefix = `airflow`
- Producer URI prefix = `ol:airflow`
- Schema URI prefix: `ol:airflow:`
- Schema URI: `ol:airflow:AirflowRunFacet.json` => `https://github.com/apache/airflow/providers/openlineage/schemas/AirflowRunFacet.json`

Pros:

- Once the name is registered, producers and consumers define codeowners to approve changes to the registry (documentation location, custom facets, …)
- CI can guarantee that the changes to the registry do not make it inconsistent.
- Producers do not need to host and maintain their own subset of the registry.
- Publication is automated and is consistent with current core spec.

Cons:

- Registered entities must maintain their list of codeowners to guarantee that they can keep updating their own definition.


### Alternative considered
Minimizing what is in the central registry and referring to externally hosted artifacts:

**TL;DR: A central registry that defines only names for producers and consumers but externalizes custom facets and what is consumed/produced**

The OpenLineage repository contains a single `registry.json` file in spec/registry structured as follows:

```
OpenLineage/spec/registry/registry.json:
“{Name}”
Producer:
Producer root doc URL: https://…
Schema URL prefix: actual longer URL where schemas are found: https://…
Produced facets:
URL to a json doc containing the list of facet schemas produced
{ facets: [ {“URI}”, “{URI}”, … ]}
Consumer:
Consumer root doc URL: https://…
Consumed facets:
URL to a json doc containing the list of facet schemas consumed
{ facets: [ “{URI}”, “{URI}”, … ]}
```

Example:

`registry.json`

```
{
airflow: {
producer: {
root_doc_URL: “https://airflow/doc”
schema_URL_prefix: “https://github.com/apache/airflow/providers/openlineage/schemas/”
produced_facets:
“https://github.com/apache/airflow/providers/openlineage/produced_facets.json”
}
},
manta: {
consumer: {
root_doc_URL: “https://manta.com/doc”
Consumed_facets: “https://manta.com/consumedFacets.json”
}
}
}
```

`https://github.com/apache/airflow/providers/openlineage/produced_facets.json`

```
{
facets: [
“ol:airflow:AirflowRunFacet.json”,
“ol:core:1-0-0/DatasetVersionDatasetFacet.json”,
]
}
```
Pros:

- Once the name is registered, producers and consumers fully own the lifecycle of their facets without relying on any core repo interaction

Cons:

- Producers need to host and maintain their own subset of the registry
- The core repo ci cannot guarantee consistency of the registry as it relies on external references
- The core repo cannot guarantee accuracy of registry entries (e.g. does an entity actually consume/produce the facets they say they do?)

0 comments on commit e53a776

Please sign in to comment.