Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored waiting function for Tableau Jobs #17034

Merged
merged 11 commits into from
Jul 21, 2021

Conversation

ciancolo
Copy link
Contributor

Hello all,
as suggested in PR #16937 I created a new PR with the following changes:

  1. Creation function to get job status and to check the completion of a Tableau job in TableauHook
  2. Refactored waiting check, removing the TableauJobStatusSensor, in TableauRefreshWorkbookOperator
  3. Refactored job status check in TableauJobStatusSensor
  4. Updated tests

Thank you

airflow/providers/amazon/CHANGELOG.rst Outdated Show resolved Hide resolved
airflow/providers/amazon/CHANGELOG.rst Outdated Show resolved Hide resolved
Comment on lines 162 to 174
while finish_code == TableauJobFinishCode.PENDING:
finish_code = self.get_job_status(job_id=job_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we risk here to overwhelm the Tableau API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right especially for a long refresh. But now I am a little bit confused, the original request was to remove the sensor from the operator and to implement the waiting code in the hook. But why just not use the sensor and the already implemented poke function instead to create a new one?

What do you think?

Copy link
Contributor

@eladkal eladkal Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the issue of waiting for a specific status in other operators.
For example:
EC2StartInstanceOperator allows to pass check_interval to the wait_for_statefunction in the hook:

ec2_hook.wait_for_state(
instance_id=self.instance_id,
target_state="running",
check_interval=self.check_interval,
)

That way we provide the user the power to decide how much time to wait.

BTW I think it would be nice to change the waiting_until_succeeded to a genericwait_for_state like it's implemented in the EC2Hook that way it will allow more flexibility and users may use it for further custom operators if they need to. For our use case we can wait tillTableauJobFinishCode.SUCCESS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Now it's more clear. I will implement the changes.
Thank you.

@uranusjr
Copy link
Member

uranusjr commented Jul 15, 2021

Many whitespace changes besides amazon.rst are also irrelavant (or even wrong, like removing the blank line after the first line in a docstring). It’d be best if you could revert all of those.

@ciancolo
Copy link
Contributor Author

@uranusjr @eladkal Yeah, I'm sorry, I had a problem in docs-builds, I'm tried to fix it. I will just remove the changes. It was very strange because I didn't modify the Amazon docs before the docs-build error.

@ciancolo ciancolo force-pushed the feature-job-waiting-tableau-hook branch from 36ee4e6 to 8c0c9ac Compare July 16, 2021 11:00
@ciancolo
Copy link
Contributor Author

I pushed the requested changes.

Just a comment, the function wait_for_state returns a bool because it is not certain that the desired state is reached for a particular job. For example, if the refresh job fails, that job will never reach the SUCCESS state. So the function returns True in case the desired state is reached False otherwise.

Comment on lines 198 to 210
# Test SUCCESS
mock_tableau_server.jobs.get_by_id.return_value.finish_code = 0
with TableauHook(tableau_conn_id='tableau_test_password') as tableau_hook:
tableau_hook.server = mock_tableau_server
jobs_status = tableau_hook.get_job_status(job_id='j1')
assert jobs_status == TableauJobFinishCode.SUCCESS

# Test ERROR
mock_tableau_server.jobs.get_by_id.return_value.finish_code = 1
with TableauHook(tableau_conn_id='tableau_test_password') as tableau_hook:
tableau_hook.server = mock_tableau_server
jobs_status = tableau_hook.get_job_status(job_id='j1')
assert jobs_status == TableauJobFinishCode.ERROR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets also add test case for CANCELED
Also we have a repeated pattern here probably better to use parameterized test

Comment on lines 259 to 277
side_effect=[
TableauJobFinishCode.PENDING,
TableauJobFinishCode.PENDING,
TableauJobFinishCode.ERROR,
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why PENDING twice?

Copy link
Contributor Author

@ciancolo ciancolo Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to simulate a more realistic case, just to be more secure that everything was ok.

I read wrong before. You're right, in this case, it is not necessary.

airflow/providers/tableau/hooks/tableau.py Outdated Show resolved Hide resolved
@ciancolo ciancolo force-pushed the feature-job-waiting-tableau-hook branch from 8c0c9ac to e79c63f Compare July 19, 2021 09:05
@ciancolo
Copy link
Contributor Author

Pushed requested changes.

Just a comment, I didn't parametrize the test_wait_for_state function because I think it is more readable and clear in the current status, with the different cases separate.

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good in the sense that it should do what you want it to do, but I’m not really familiar with Tebleau to comment further.

@eladkal eladkal self-requested a review July 20, 2021 06:17
Copy link
Contributor

@eladkal eladkal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LTGM
please fix/explain the duplication (there were 4 tests with the duplicated values you addressed only 1 of them)

Other than that looks OK

@eladkal
Copy link
Contributor

eladkal commented Jul 20, 2021

@potiuk since you approved the PR that we splitted out from. Do you have any further comments?
If not I'm happy to merge.

@eladkal eladkal dismissed uranusjr’s stale review July 21, 2021 12:08

issues addressed

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jul 21, 2021
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@eladkal eladkal merged commit 29b6be8 into apache:main Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants