Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenLineage] Remove redundant operator information from facets #38264

Conversation

kacpermuda
Copy link
Contributor

@kacpermuda kacpermuda commented Mar 18, 2024

Problem

We encountered a case where a customer received an overly large OpenLineage START event, exceeding 2MB. This was traced back to an operator with unusually long arguments and attributes.

Solution

I propose refining the operator's attribute inclusion logic in facets. Instead of excluding known unimportant or large attributes, we should selectively include only those known to be important or compact. This approach ensures that custom operator attributes with substantial data do not inflate the event size.

I also deprecated two facets that contain duplicate information from AirflowRunFacet. We should probably remove them in the future.

Similar PR in openlineage-airflow


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@kacpermuda kacpermuda force-pushed the ol/remove-redundant-task-information-from-facets branch from a7ac157 to 8c26dd6 Compare March 18, 2024 15:15
@kacpermuda kacpermuda requested a review from potiuk as a code owner March 18, 2024 15:15
@kacpermuda kacpermuda force-pushed the ol/remove-redundant-task-information-from-facets branch 2 times, most recently from 4843355 to e8124d8 Compare March 18, 2024 15:54
Copy link
Contributor

@JDarDagran JDarDagran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work with deprecation and making get_unknown_source_attribute_run_facet in utils 🚀

@kacpermuda kacpermuda force-pushed the ol/remove-redundant-task-information-from-facets branch 4 times, most recently from 3b09779 to ef156e3 Compare March 26, 2024 16:04
@kacpermuda kacpermuda force-pushed the ol/remove-redundant-task-information-from-facets branch from ef156e3 to 1bedc07 Compare April 4, 2024 07:37
@kacpermuda kacpermuda force-pushed the ol/remove-redundant-task-information-from-facets branch from 1bedc07 to 8f0708d Compare April 4, 2024 08:30
@mobuchowski mobuchowski merged commit ecd6995 into apache:main Apr 4, 2024
40 checks passed
@kacpermuda kacpermuda deleted the ol/remove-redundant-task-information-from-facets branch April 8, 2024 16:23
utkarsharma2 pushed a commit to astronomer/airflow that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants