-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-5168] Fix Dataproc Operators & add tests #5928
Conversation
e3f39af
to
78076bc
Compare
) | ||
|
||
self.assertEqual(task.template_fields, | ||
['cluster_name', 'project_id', 'region']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to test if elements of template_fields
are attributes of instance. Then changing attribute name without changing template field would result in test failure. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ti.render_template()
should take care of that case. It would fail if it doesn't fiend an attribute listed in template_fields
@@ -660,15 +656,13 @@ def __init__(self, | |||
dataproc_jars=None, | |||
gcp_conn_id='google_cloud_default', | |||
delegate_to=None, | |||
labels=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is here I don't think we can remove it without breaking anyone who got it working on 1.10.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will cherry-pick the actual commit that added labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will cherry-pick 4d58f36 into v1-10-test after this merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, though labels never worked. right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comments, great job on the test coverage 👍
@@ -385,7 +393,7 @@ def test_create_cluster(self): | |||
requestId=mock.ANY, | |||
body={ | |||
'projectId': 'test-project-id', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace this one with GCP_PROJECT_ID
@@ -492,6 +522,27 @@ def test_update_cluster(self): | |||
}) | |||
hook.wait.assert_called_once_with(self.operation) | |||
|
|||
def test_render_template(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a bit silly, maybe we can check if we can actually template a field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are actually checking to template :D
DataprocClusterScaleOperator
is passed GCP_PROJECT_TEMPLATED
, CLUSTER_NAME_TEMPLATED
etc which are defined at the top of the file containing Jinja templated strings.
And while asserting we check GCP_PROJECT
& CLUSTER_NAME
containing rendered fields
|
||
class DataProcSparkOperatorTest(unittest.TestCase): | ||
# Unit test for the DataProcSparkOperator | ||
def setUp(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can remove these setUp
's with a fixture. In the end they are not really setting up something, but just setting a global variable. Creating a fixture is nicer, which we can reuse in multiple tests:
@pytest.fixture
def dag():
return DAG(
DAG_ID,
default_args={
'owner': 'airflow',
'start_date': DEFAULT_DATE,
},
schedule_interval='@daily'
)
And then in the actual test:
def test_render_template(self, dag):
task = DataProcSparkOperator(
dag=dag
)
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, didn't knew about @pytest.fixture
. I will create a separate PR for that as we already have some tests in master and then cherry-pick to v1-10-test
. This PR is for v1-10-test
.
TASK_ID = 'test-dataproc-operator' | ||
CLUSTER_NAME = 'test-cluster-name' | ||
CLUSTER_NAME = 'airflow_{}_cluster'.format(DAG_ID) | ||
CLUSTER_NAME_TEMPLATED = 'airflow_{{ dag.dag_id }}_cluster' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fokko here
PR targeted to 1.10.4. Creating new PR for master
Make sure you have checked all steps below.
Jira
Description
2 issues in 1.10.4
(1) Invalid template field. It was resolved by #5751
(2) An issue with
add_labels
field expained below:There is a commit to master which adds label support, this has a change in the hook & base operator: 4d58f36#diff-bc55ad8b749b7a136f51ffabfdbaf13dL684
Then in the 1.10.4 branch there is a massive commit for GCP DLP: 222c6ac
This seems to have leaked the operator part of the label change only: 222c6ac#diff-bc55ad8b749b7a136f51ffabfdbaf13dR693
This then manifests as this error when trying to execute a Dataproc job: [2019-08-09 14:32:02,971] {taskinstance.py:1047} ERROR - _DataProcJobBuilder instance has no attribute 'add_labels'
Tests
Add tests to cover both issues
Commits
Documentation
Code Quality
flake8