-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Convert typings to mypy #311
Conversation
|
Codecov Report
@@ Coverage Diff @@
## master #311 +/- ##
==========================================
+ Coverage 74.19% 74.23% +0.03%
==========================================
Files 105 105
Lines 4449 4502 +53
Branches 405 407 +2
==========================================
+ Hits 3301 3342 +41
- Misses 1041 1050 +9
- Partials 107 110 +3
Continue to review full report at Codecov.
|
All tests now pass, after fixing a few lint errors, one error in bigquery caused by the larger-than-typical surgery required, and one error in @feng-tao I realized after opening that I actually could have deferred all of the stylistic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dorianj , this is exciting! I will be mostly OOO, and look at more in detail once back. I think having a single PR is fine.
|
||
http = httplib2.Http() | ||
authed_http = google_auth_httplib2.AuthorizedHttp(credentials, http=http) | ||
self.bigquery_service = build('bigquery', 'v2', http=authed_http, cache_discovery=False) | ||
self.logging_service = build('logging', 'v2', http=authed_http, cache_discovery=False) | ||
self.iter = iter(self._iterate_over_tables()) | ||
self.iter: Iterator[Any] = iter([]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why empty list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^^ @dorianj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, so it is defined in the based class which will get overrided in each bq extractor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
@dorianj let me know if it is ready to review. Would like to get it shipped first and let others rebase based on the slack conv. |
@feng-tao should be ready for review -- let me know if anything is missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm from the skim through, posts a few qqs, wdyt
@@ -25,12 +25,11 @@ class BigQueryTableUsageExtractor(BaseBigQueryExtractor): | |||
for referencedTables in the response. | |||
""" | |||
TIMESTAMP_KEY = 'timestamp' | |||
_DEFAULT_SCOPES = ('https://www.googleapis.com/auth/cloud-platform',) | |||
_DEFAULT_SCOPES = ['https://www.googleapis.com/auth/cloud-platform'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
['https://www.googleapis.com/auth/cloud-platform',] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've previously only seen trailing commas used for multiline lists (since it makes the diff prettier), but happy to change if that's the normal style
Either way, I'll look into making flake8 check this -- I'm not partial to the decision, but would like to make sure it's enforced going forward
count = 0 | ||
for entry in self._retrieve_records(): | ||
count += 1 | ||
if count % self.pagesize == 0: | ||
LOGGER.info('Aggregated {} records'.format(count)) | ||
|
||
if entry is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this check for mypy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct; happy to add a comment as such if it's not clear. Could also cast entry
as any
to force an exception if None. I'm new to python types so not sure what's more pythonic :)
from databuilder.rest_api.rest_api_query import RestApiQuery | ||
|
||
|
||
def sort_widgets(widgets): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason with the function moved from top to bottom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: we use the classes defined in this file as inputs and outputs to these utility functions, so need to define those classes first for it to typecheck.
Thanks for the review! I answered the comments, also rebased to make sure it's also tested on Python 3.7 and won't break |
@dorianj the pr lgtm, could you also rebase the pr with master as I just add github action to fix the pypi upload issue with travis? thanks. |
This version is a few ahead of the other amundsen services, but includes better defaults. Signed-off-by: Dorian Johnson <[email protected]>
warn_no_return and strict_optional are now default-on Also try disabling ignore_missing_imports Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
This was either messed up by com2ann, or was incorrect before. Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
I edit with trim_trailing_white_space_on_save enabled in Sublime, editing anything else in this file will cause this sensitive test to break. This isn't the prettiest way to fix this, but I think this is enough of a gotcha that the previous status quo is not acceptable. Very open to feedback if there's a better way or established pattern to fix this footgun. Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
This was tricky because the code conflates the objects themselves and their names. We also have classes named `TableMetadata` and `ColumnMetadata` which shadow those in the Cassandra package Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
`_iterate_over_tables` was attached to the super class but reached into methods that only the subclass had. Move that over, and add best-guess typings to the rest. This class is really a mess type wise, there's a ton of dict-typing going on, should clean it up later. Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Am going to unwind this later Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
I fixed the original incorrectly, reading the `or` as an `and`, which thankfully caused a test failure. However, I believe the original code which tested `isinstance(exception, HTTPError)` to be incorrect -- we don't need that guard if we're just grabbing the value regardless. Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
Signed-off-by: Dorian Johnson <[email protected]>
@feng-tao sure thing, rebased onto master |
nvm, actually make test should cover the mypy test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the hard work!
Thanks @dorianj for making these changes. This is great. Looking forward to your next set of changes :) |
* commit 'e14b33e776929f8b020f1c6fec75d0fb83687693': (23 commits) Fix Athena sample DAG (amundsen-io#341) fix: Update postgres_sample_dag to set table extract job as upstream for elastic search publisher (amundsen-io#340) chore: mypy cleanup (convert last comment types, remove noqa imports) (amundsen-io#338) chore: Convert typings to mypy (amundsen-io#311) chore: replace all references of Lyft repo with Amundsen (amundsen-io#323) feat: add github actions for databuilder (amundsen-io#336) build: fix broken tests in Python 3.7, test in CI (amundsen-io#334) fix(deps): Unpin attrs (amundsen-io#332) ci: add dependabot config (amundsen-io#330) Change repo name in travis file (amundsen-io#324) tests: add mock for bigquery auth (amundsen-io#313) feat: allow hive sql to be provided as config (amundsen-io#312) chore: remove python2 (amundsen-io#310) chore: update deps for databuilder (amundsen-io#309) fix: cypher statement param issue in Neo4jStalenessRemovalTask (amundsen-io#307) fix: Added missing job tag key in hive_sample_dag.py (amundsen-io#308) feat: enhance glue extractor (amundsen-io#306) fix: Fix sql for missing columns and mysql based dialects (#550) (amundsen-io#305) docs: Fix broken doc link to dashboard_execution model (amundsen-io#296) chore: apply license headers to all the source files (amundsen-io#304) ... # Conflicts: # README.md # databuilder/extractor/kafka_source_extractor.py # databuilder/publisher/neo4j_csv_publisher.py # docs/models.md # example/scripts/sample_data_loader.py # setup.py
ref: amundsen-io/amundsen#560.
Summary of Changes
Convert (most) existing typings from comment to annotation style, and add missing type annotations wherever they exist. Just the same as the other Amundsen repos, we now require functions to declare their argument and return types.
TypeVar
s would make the typings more accurate, but it was Any before, and so Any it will stay for now.Any
sIterator
where callers were expecting to be able to calllen
, meaning the type signature was incorrect. I fixed these.cast
s, instead I refactored those sections to usetry
blocksSome things I intend to do immediately after this lands (since they impact so many files, I wanted to do it later to avoid big merge conflicts):
# noqa: F401
commentsAnd some things we should do piece-by-piece but I don't plan to open immediately:
Union[..., None]
that can be safely replaced withOptional
, but I didn't replace all of them.Tests
Any
, since mocks are challenging to type and the benefits are smallDocumentation
No documentation changes, the user-facing experience is transparent since
mypy
is called frommake test
and CICheckList
Make sure you have checked all steps below to ensure a timely review.
make test