Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel BigQuery job if block_until_done call times out or is interrupted #1699

Merged

Conversation

codyjlin
Copy link
Contributor

@codyjlin codyjlin commented Jul 9, 2021

Signed-off-by: Cody Lin [email protected]

What this PR does / why we need it:

If a user decides to abort a to_bigquery run (like via a ctrl-c), the job should also be cancelled. The PR also fixes the retry logic of getting the bq_job state, and raises a better exception on timeout.

Which issue(s) this PR fixes:

(Not large enough for issue)

Does this PR introduce a user-facing change?:

NONE

@feast-ci-bot
Copy link
Collaborator

Hi @codyjlin. Thanks for your PR.

I'm waiting for a feast-dev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@codecov-commenter
Copy link

codecov-commenter commented Jul 9, 2021

Codecov Report

Merging #1699 (648937b) into master (703c4be) will increase coverage by 0.16%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1699      +/-   ##
==========================================
+ Coverage   83.32%   83.48%   +0.16%     
==========================================
  Files          76       76              
  Lines        6794     6795       +1     
==========================================
+ Hits         5661     5673      +12     
+ Misses       1133     1122      -11     
Flag Coverage Δ
integrationtests 83.41% <80.00%> (+0.16%) ⬆️
unittests 69.81% <30.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/feast/infra/offline_stores/bigquery.py 81.01% <76.47%> (+4.87%) ⬆️
sdk/python/feast/errors.py 72.61% <100.00%> (+1.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 703c4be...648937b. Read the comment docs.

@codyjlin
Copy link
Contributor Author

codyjlin commented Jul 9, 2021

/kind bug

@mavysavydav
Copy link
Collaborator

if the notebook server/kernel crashes or is shutdown or is interrupted, that doesn't count as a keyboard interrupt right?

@codyjlin codyjlin force-pushed the cancel-bq-job-on-keyboard-interrupt branch from ef1640d to 4247379 Compare July 12, 2021 20:14
Copy link
Collaborator

@MattDelac MattDelac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch about KeyboardInterrupt 🙌
LGTM

@MattDelac
Copy link
Collaborator

Why did you change everything ? I feel that the previous commit was much better ... 🤔

@codyjlin codyjlin force-pushed the cancel-bq-job-on-keyboard-interrupt branch from a475530 to 1e7d852 Compare July 13, 2021 19:36
@MattDelac MattDelac self-requested a review July 13, 2021 21:37
sdk/python/feast/infra/offline_stores/bigquery.py Outdated Show resolved Hide resolved
_wait_until_done(job_id=job_id)
try:
_wait_until_done(job_id=job_id)
except KeyboardInterrupt:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, would it make more sense to have this in a finally block instead of a except KeyboardInterrupt? that may also handle sigterm/sigkill.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. We could have a finally statement that send an asynchronous call to cancel the BQ job in case something goes wrong 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the finally clause to catch other needs for cancellation (including timeout and KeyboardInterrupt).

Cody Lin added 3 commits July 14, 2021 12:30
Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Collaborator

@MattDelac MattDelac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks for putting the effort 🙏

@achals
Copy link
Member

achals commented Jul 14, 2021

@codyjlin just need to fix lint, and then we should be god to go!

Signed-off-by: Cody Lin <[email protected]>
@feast-ci-bot feast-ci-bot removed the lgtm label Jul 14, 2021
@codyjlin
Copy link
Contributor Author

@codyjlin just need to fix lint, and then we should be god to go!

Ah sorry, should have ran lint locally first. Thanks @achals and @MattDelac for all the comments!

@codyjlin codyjlin changed the title Cancel BigQuery job if to_bigquery call is cancelled by user Cancel BigQuery job if block_until_done call times out or is interrupted Jul 14, 2021
Copy link
Member

@achals achals left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achals, codyjlin, MattDelac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit 88489d9 into feast-dev:master Jul 14, 2021
@codyjlin codyjlin deleted the cancel-bq-job-on-keyboard-interrupt branch July 14, 2021 21:31
8bit-pixies pushed a commit to 8bit-pixies/feast that referenced this pull request Jul 16, 2021
…ted (feast-dev#1699)

* Cancel job if to_bigquery is cancelled by user

Signed-off-by: Cody Lin <[email protected]>

* cancel job in _upload_entity_df_into_bq as well

Signed-off-by: Cody Lin <[email protected]>

* Fix _is_done logic?

Signed-off-by: Cody Lin <[email protected]>

* make cancel job code more readable

Signed-off-by: Cody Lin <[email protected]>

* move KeyboardInterrupt catch outside retry logic; fix retry logic

Signed-off-by: Cody Lin <[email protected]>

* make block_until_done public; add custom exception for BQJobStillRunning

Signed-off-by: Cody Lin <[email protected]>

* fix retry logic to catch specific exception

Signed-off-by: Cody Lin <[email protected]>

* Make retry params configurable; use finally clause to catch more cancellation cases

Signed-off-by: Cody Lin <[email protected]>

* Modify docstring

Signed-off-by: Cody Lin <[email protected]>

* Typo in docstring

Signed-off-by: Cody Lin <[email protected]>

* Fix lint

Signed-off-by: Cody Lin <[email protected]>
Signed-off-by: CS <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants