Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6685] ThresholdCheckOperator #7353

Merged
merged 4 commits into from
Mar 30, 2020
Merged

Conversation

alexzlue
Copy link
Contributor

@alexzlue alexzlue commented Feb 3, 2020

This PR includes a new operator in CheckOperator that allows users to perform a threshold data quality check.

ThresholdCheckOperator will check a single value, sql result against a threshold range, and will fail a task if it is outside this range. The lower and upper bound of the threshold can be defined as either a numeric values, or sql-statements that returns a numeric value.


Issue link: AIRFLOW-6685

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg
Copy link

boring-cyborg bot commented Feb 3, 2020

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://apache-airflow-slack.herokuapp.com/

@alexzlue alexzlue force-pushed the AIRFLOW-6685 branch 7 times, most recently from 5c3b6bc to e4d2930 Compare February 4, 2020 23:54
@codecov-io
Copy link

codecov-io commented Feb 5, 2020

Codecov Report

Merging #7353 into master will increase coverage by 0.17%.
The diff coverage is 96.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7353      +/-   ##
==========================================
+ Coverage   86.35%   86.52%   +0.17%     
==========================================
  Files         871      874       +3     
  Lines       40627    41841    +1214     
==========================================
+ Hits        35083    36203    +1120     
- Misses       5544     5638      +94
Impacted Files Coverage Δ
airflow/operators/check_operator.py 93.33% <96.66%> (+0.74%) ⬆️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 47.18% <0%> (-45.08%) ⬇️
...viders/cncf/kubernetes/operators/kubernetes_pod.py 69.38% <0%> (-24.23%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
airflow/utils/sqlalchemy.py 88.67% <0%> (-7.99%) ⬇️
airflow/api/common/experimental/__init__.py 92.59% <0%> (-7.41%) ⬇️
airflow/config_templates/airflow_local_settings.py 65.38% <0%> (-6.36%) ⬇️
airflow/stats.py 85.29% <0%> (-5.19%) ⬇️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2195bb4...b18c8f5. Read the comment docs.

@alexzlue alexzlue requested a review from mik-laj February 5, 2020 01:11
@alexzlue alexzlue force-pushed the AIRFLOW-6685 branch 2 times, most recently from f6c003b to e85e036 Compare February 5, 2020 19:58
@eladkal
Copy link
Contributor

eladkal commented Feb 13, 2020

In general the operators in this PR sounds like enhancement of CheckOperator

  1. Why it needs to be in a new file?
  2. Aren't CheckOperator and BaseDataQualityOperator have same functionality?

@alexzlue
Copy link
Contributor Author

@eladkal Thanks for bringing this to mind. I do see that there is some functionality that I have that CheckOperator does not have. I will work on merging some of my work into this file then.

@alexzlue alexzlue changed the title [AIRFLOW-6685] Data Quality Check operators [AIRFLOW-6685] ThresholdCheckOperator Feb 14, 2020

def push(self, meta_data):
"""
Optional: Send data check info and metadata to an external database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When inheriting from this class, push can be overwritten

Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@turbaszek turbaszek merged commit 4c6ae18 into apache:master Mar 30, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 30, 2020

Awesome work, congrats on your first merged pull request!

kaxil pushed a commit that referenced this pull request Apr 1, 2020
* [AIRFLOW-6685] Data Quality Check operators

* removed .get_connection to get hook in get_sql_value

* added tests for get_sql_value

* threshold check operator and tests added to checkoperator file

(cherry picked from commit 4c6ae18)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants