Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5730] Enable get_pandas_df on PinotDbApiHook #6399

Merged
merged 2 commits into from
Nov 15, 2019

Conversation

sekikn
Copy link
Contributor

@sekikn sekikn commented Oct 24, 2019

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"
    • https://issues.apache.org/jira/browse/AIRFLOW-5730
    • In case you are fixing a typo in the documentation you can prepend your commit with [AIRFLOW-XXX], code changes always need a Jira issue.
    • In case you are proposing a fundamental code change, you need to create an Airflow Improvement Proposal (AIP).
    • In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.

Description

  • Here are some details about my PR, including screenshots of any UI changes:

Currently, DruidDbApiHook and PinotDbApiHook disable their get_pandas_df
methods by raising NotImplementedError. But they actually work as
inherited from DbApiHook. This PR enables them.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

TestDruidHook.test_get_pandas_df in tests/hooks/test_druid_hook.py
TestPinotDbApiHook.test_get_pandas_df in tests/contrib/hooks/test_pinot_hook.py

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

@sekikn
Copy link
Contributor Author

sekikn commented Oct 24, 2019

Sorry, I submitted a wrong PR mistakenly. Will update it later.

@OmerJog
Copy link
Contributor

OmerJog commented Oct 24, 2019

Same issue as #6057 ?

@sekikn
Copy link
Contributor Author

sekikn commented Oct 24, 2019

Oh, Thanks! I missed that issue. I'll change my JIRA issue and this PR only about Pinot.

Currently, PinotDbApiHook is disabling its get_pandas_df
method by raising NotImplementedError. But it actually
works as inherited from DbApiHook. This PR enables it.
@sekikn sekikn changed the title [AIRFLOW-5730] Enable get_pandas_df on Druid and Pinot DbApiHooks [AIRFLOW-5730] Enable get_pandas_df on PinotDbApiHook Oct 24, 2019
@sekikn
Copy link
Contributor Author

sekikn commented Oct 24, 2019

Updated the PR. The CI failure seems to be a transient one. The newly added test works on my local environment, as follows:

$ git diff upstream/master 
diff --git a/airflow/contrib/hooks/pinot_hook.py b/airflow/contrib/hooks/pinot_hook.py
index e617f8e9b..0864b3584 100644
--- a/airflow/contrib/hooks/pinot_hook.py
+++ b/airflow/contrib/hooks/pinot_hook.py
@@ -90,8 +90,5 @@ class PinotDbApiHook(DbApiHook):
     def set_autocommit(self, conn, autocommit):
         raise NotImplementedError()
 
-    def get_pandas_df(self, sql, parameters=None):
-        raise NotImplementedError()
-
     def insert_rows(self, table, rows, target_fields=None, commit_every=1000):
         raise NotImplementedError()
diff --git a/tests/contrib/hooks/test_pinot_hook.py b/tests/contrib/hooks/test_pinot_hook.py
index 72cee01ea..489af15aa 100644
--- a/tests/contrib/hooks/test_pinot_hook.py
+++ b/tests/contrib/hooks/test_pinot_hook.py
@@ -34,6 +34,7 @@ class TestPinotDbApiHook(unittest.TestCase):
         self.conn.conn_type = 'http'
         self.conn.extra_dejson = {'endpoint': 'pql'}
         self.cur = mock.MagicMock()
+        self.conn.cursor.return_value = self.cur
         self.conn.__enter__.return_value = self.cur
         self.conn.__exit__.return_value = None
 
@@ -74,3 +75,14 @@ class TestPinotDbApiHook(unittest.TestCase):
         result_sets = [('row1',), ('row2',)]
         self.cur.fetchone.return_value = result_sets[0]
         self.assertEqual(result_sets[0], self.db_hook().get_first(statement))
+
+    def test_get_pandas_df(self):
+        statement = 'SQL'
+        column = 'col'
+        result_sets = [('row1',), ('row2',)]
+        self.cur.description = [(column,)]
+        self.cur.fetchall.return_value = result_sets
+        df = self.db_hook().get_pandas_df(statement)
+        self.assertEqual(column, df.columns[0])
+        for i in range(len(result_sets)):  # pylint: disable=consider-using-enumerate
+            self.assertEqual(result_sets[i][0], df.values.tolist()[i][0])
$ ./run-tests tests.contrib.hooks.test_pinot_hook:TestPinotDbApiHook.test_get_pandas_df
AIRFLOW__CORE__SQL_ALCHEMY_CONN not set - using default
Airflow home: /home/sekikn
Airflow root: /home/sekikn/repos/airflow
Home of the user: /home/sekikn


Skipping initializing of the DB as it was initialized already

You can re-initialize the database by adding --with-db-init flag when running tests

KRB5_KTNAME variable is empty - no kerberos intialisation

Starting the tests with arguments: tests.contrib.hooks.test_pinot_hook:TestPinotDbApiHook.test_get_pandas_df

.
----------------------------------------------------------------------
Ran 1 test in 0.289s

OK
[2019-10-25 07:20:36,090] {settings.py:200} DEBUG - Disposing DB connection pool (PID 3587)

@OmerJog
Copy link
Contributor

OmerJog commented Oct 28, 2019

@sekikn tests did not run yet due to flake8 error:
tests/dags/test_impersonation_custom.py:53:1: E302 expected 2 blank lines, found 1
@potiuk how is it possible to get flake8 error on file that wasn't changed in the PR?

@sekikn
Copy link
Contributor Author

sekikn commented Nov 8, 2019

Thanks @OmerJog for taking care of this PR, rebased on master to run the CI again :)

@codecov-io
Copy link

codecov-io commented Nov 8, 2019

Codecov Report

Merging #6399 into master will decrease coverage by 0.3%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6399      +/-   ##
==========================================
- Coverage   84.08%   83.77%   -0.31%     
==========================================
  Files         635      635              
  Lines       36849    36847       -2     
==========================================
- Hits        30983    30868     -115     
- Misses       5866     5979     +113
Impacted Files Coverage Δ
airflow/contrib/hooks/pinot_hook.py 91.6% <ø> (+0.62%) ⬆️
airflow/operators/mysql_operator.py 100% <0%> (ø) ⬆️
airflow/operators/mysql_to_hive.py 100% <0%> (ø) ⬆️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/kube_client.py 33.33% <0%> (-41.67%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 70.14% <0%> (-28.36%) ⬇️
airflow/utils/sqlalchemy.py 93.22% <0%> (ø) ⬆️
airflow/jobs/local_task_job.py 85% <0%> (-5%) ⬇️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 813cd43...f5973d8. Read the comment docs.

@kaxil kaxil merged commit 8f1a585 into apache:master Nov 15, 2019
kaxil pushed a commit that referenced this pull request Dec 17, 2019
ashb pushed a commit that referenced this pull request Dec 17, 2019
ashb pushed a commit that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants