-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-4811] Implement GCP DLP' Hook and Operators #5539
Conversation
Alternative implementation is available: |
1c99ca2
to
24a2c4a
Compare
@mik-laj Any ideas on the pylint's failure from dataproc_operator? Can we just disable C0302 for that file? |
@ryanyuan Yes. You can disable this warning for this file. We do not have a good way to deal with this problem. |
@ryanyuan It seems that there are syntax errors with ecb67a2#diff-a67b10df84accccf123dba4c948582fdR37 and ecb67a2#diff-3e59770742eafcf9f0447065d33017fcR64. |
@zzlbuaa Good catch. Cheers! |
@ryanyuan @mik-laj Since create_dlp_job is an asynchronous call(completes while job is still pending or running), and the job can take seconds, minutes, or hours to run depending on inspected data size, one feature we want to have is a RunDlpJobOperator that could create a dlpJob and keep polling its status via get_dlp_job until the job is done or canceled/failed. Do you have any suggestions on where the actual looping and polling status thing should be, in a hook function or in the operator? I have that operator in my implementation, and it does the actual looping in a hook function: For more info about dlpJob: https://cloud.google.com/dlp/docs/inspecting-storage |
@zzlbuaa Nice suggestion! I would put that logic in along with an optional parameter, which will be true by default for waiting, in the hook and the operator to let the user decide whether they want to wait or not. |
c51f875
to
369c295
Compare
@mik-laj PTAL |
Today i was escalating the question about library for this service to Google employee. If i do not get response in 3 working days, I will do a review. |
I got a message. New operators should use google-cloud-python, if possible, so i do a review this PR. |
Hello. I see you add a lot of operators and GCP integration.
I also available at: [email protected] |
@mik-laj I just sent you an email. |
Creates a job trigger to run DLP actions such as scanning storage for sensitive | ||
information on a set schedule. | ||
|
||
:param project_id: (Optional) Google Cloud Platform project ID where the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code reads the project only from the parameters. Am I wrong? Is it worth to add support for reading the project ID from the configuration?
|
||
client = self.get_conn() | ||
|
||
if not project_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like dead code, because decorator fallback_to_default_project_id
prevent to execute this part of code
:param organization_id: (Optional) The organization ID. Required to set this | ||
field if parent resource is an organzation. | ||
:type organization_id: str | ||
:param project_id: (Optional) Google Cloud Platform project ID where the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code reads the project only from the parameters. Am I wrong? Is it worth to add support for reading the project ID from the configuration?
|
||
client = self.get_conn() | ||
|
||
if not project_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like dead code, because decorator fallback_to_default_project_id
prevent to execute this part of code
) | ||
def test_get_dlp_job_without_dlp_job_id(self, _): | ||
with self.assertRaises(AirflowException): | ||
self.hook.get_dlp_job(dlp_job_id=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test checks absence of project_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a few very minor problems. I prepared the commit. Can you check if the fixes suit you?
curl https://termbin.com/rmts | git am
I apologize for the long review time, but recently I have a lot of work on Airflow. I introduced changes in the organization of work so that I would have more time to review in the community.
@mik-laj It looks like your patch is the same as my commit. Did you export the wrong commit? |
Yes. I exported wrong commit
|
If we get a quick consensus, I will try to add these changes to the 1.10.4 release |
@mik-laj cheers, I will have a look now. |
@mik-laj |
@ryanyuan Yes. We only need it when there is no decorator. |
Implement GCP DLP' Hook and Operators
we are waiting for Travis. :-D |
@ryanyuan Can you add typehints in a separate PR? I have accepted the current version, because in 1.10.4 typehint is not supprted fully. In version 2.0 I would like all hooks and operators to have typehints. |
@mik-laj No problem. I will work on that. |
Implement GCP DLP' Hook and Operators (cherry picked from commit 6ef0e37)
Implement GCP DLP' Hook and Operators (cherry picked from commit 6ef0e37)
Implement GCP DLP' Hook and Operators (cherry picked from commit 6ef0e37)
Implement GCP DLP' Hook and Operators (cherry picked from commit 6ef0e37)
Implement GCP DLP' Hook and Operators
Make sure you have checked all steps below.
Jira
Description
Implement GCP DLP' Hook and Operators using google.cloud.dlp_v2 from Google Cloud Client Libraries for Python.
Tests
tests.contrib.operators.test_gcp_dlp_operator.py
tests.contrib.operators.test_gcp_dlp_operator_system.py
tests.contrib.hooks.test_gcp_dlp_hook.py
Commits
Documentation
Code Quality
flake8