Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6091] Add flushing in execute method for BigQueryCursor #6683

Merged
merged 1 commit into from
Dec 14, 2019

Conversation

zuku1985
Copy link
Contributor

@zuku1985 zuku1985 commented Nov 27, 2019

If you execute multiple queries results of old ones will be
flushed allowing to read results of recent execute without
any issues.

Make sure you have checked all steps below.

Jira

  • [x ] My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-6091
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • [ x] Here are some details about my PR, including screenshots of any UI changes:

My PR addresses issue with results buffer and reading flags not being cleaned in some cases what ends up in returning no data or even data from previous job in case of executing many queries with one cursor.

Tests

  • [ x] My PR adds the following unit tests OR does not need testing for this extremely good reason:

My PR does not add any functionality to test

Commits

  • [ x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • [ x] In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@zuku1985 zuku1985 force-pushed the wp_AIRFLOW_6091 branch 2 times, most recently from ace59d0 to fcf0c63 Compare November 28, 2019 15:44
@zuku1985
Copy link
Contributor Author

zuku1985 commented Dec 2, 2019

@nuclearpinguin any hints what should I do to fix failing tests? I ran out of ideas :/

@potiuk
Copy link
Member

potiuk commented Dec 2, 2019

I restarted the failing job. It's a flaky "run_on_kill" test that @nuclearpinguin is solving with the upcoming #6472 change.

@codecov-io
Copy link

codecov-io commented Dec 2, 2019

Codecov Report

Merging #6683 into master will decrease coverage by 0.13%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6683      +/-   ##
==========================================
- Coverage   84.45%   84.31%   -0.14%     
==========================================
  Files         676      676              
  Lines       38310    38314       +4     
==========================================
- Hits        32354    32304      -50     
- Misses       5956     6010      +54
Impacted Files Coverage Δ
airflow/gcp/hooks/bigquery.py 70.88% <100%> (+0.17%) ⬆️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 78.2% <0%> (-20.52%) ⬇️
airflow/jobs/backfill_job.py 91.59% <0%> (-0.29%) ⬇️
airflow/jobs/scheduler_job.py 89.26% <0%> (+1.17%) ⬆️
airflow/utils/sqlalchemy.py 96.61% <0%> (+6.77%) ⬆️
airflow/utils/dag_processing.py 87.8% <0%> (+7.04%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5483ae4...44bdd67. Read the comment docs.

@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Dec 3, 2019
@mik-laj
Copy link
Member

mik-laj commented Dec 4, 2019

Can you add tests to prevent regression?

@zuku1985
Copy link
Contributor Author

zuku1985 commented Dec 6, 2019

I will need some hints on how to do it.

@mik-laj
Copy link
Member

mik-laj commented Dec 6, 2019

The best way to learn this is to read the source test codes for operators and hooks for other GCP services. All services except BigQuery have one style and the way they are tested. All services except BigQuery have one style and the way they are tested. It is also useful to know unittest.mock.

@TobKed
Copy link
Contributor

TobKed commented Dec 11, 2019

I agree with @mik-laj . Analysing some other tests is the best way to learn, preferably play with then locally, break thing and what happened. I am during improving BigQuery hooks test, fell free to check them out here: #6777.
Information how to run Breeze environment and run tests locally you can find in files BREEZE.rst and TESTING.rst

If you execute multiple queries results of old ones will be
flushed allowing to read results of recent execute without
any issues.
@zuku1985 zuku1985 changed the title [AIRFLOW-6019] Add flushing in execute method for BigQueryCursor [AIRFLOW-6091] Add flushing in execute method for BigQueryCursor Dec 13, 2019
@potiuk potiuk merged commit 0cf9598 into apache:master Dec 14, 2019
potiuk pushed a commit that referenced this pull request Dec 14, 2019
If you execute multiple queries results of old ones will be
flushed allowing to read results of recent execute without
any issues.

(cherry picked from commit 0cf9598)
kaxil pushed a commit that referenced this pull request Dec 17, 2019
If you execute multiple queries results of old ones will be
flushed allowing to read results of recent execute without
any issues.

(cherry picked from commit 0cf9598)
ashb pushed a commit that referenced this pull request Dec 19, 2019
If you execute multiple queries results of old ones will be
flushed allowing to read results of recent execute without
any issues.

(cherry picked from commit 0cf9598)
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
…che#6683)

If you execute multiple queries results of old ones will be
flushed allowing to read results of recent execute without
any issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants