Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Convert typings to mypy #311

Merged
merged 28 commits into from
Aug 18, 2020
Merged

chore: Convert typings to mypy #311

merged 28 commits into from
Aug 18, 2020

Conversation

dorianj
Copy link
Contributor

@dorianj dorianj commented Aug 7, 2020

ref: amundsen-io/amundsen#560.

Summary of Changes

Convert (most) existing typings from comment to annotation style, and add missing type annotations wherever they exist. Just the same as the other Amundsen repos, we now require functions to declare their argument and return types.

  • I generally favored the minimal change necessary to get tests to pass. I didn't TODO any because I felt the added noise would be worse than too-weak typings
  • There were some legit bugs: formatting strings that weren't complete, etc. Tried to use my best judgement for these.
  • There are some spots where adding TypeVars would make the typings more accurate, but it was Any before, and so Any it will stay for now.
  • There were a lot of missing type signatures in args and return values. For the vast majority, I tracked down the correct type and included it. I didn't add many "bad" Anys
  • There were a lot of places with incorrect types. For example, there were some places labeled Iterator where callers were expecting to be able to call len, meaning the type signature was incorrect. I fixed these.
  • I had to add some None guards assertions to tests so that the compiler knows the nullness of some variables and doesn't complain about expected and actual type mismatches
  • I didn't add any casts, instead I refactored those sections to use try blocks
  • The BigQuery inheritance chain got a bit mixed up, and needed some surgery, I split that into a commit
  • There was some dead code that referenced non-existent class members. I removed it. I'm assuming there's not any crazy dynamism going on (even if there was, those methods would have thrown)
  • I organized the commits for development sanity, not for review. I believe the easiest way to review for both correctness and style is to review the combined diff. But it's a huge diff, and I know it's going to be a huge pain to review carefully. I'm happy to reorganize the commits in any way that makes it easier to review this beast.
  • I also don't know any way to make this a smaller PR, but open to feedback. I considered structuring it as a progressive sequence with some checks disabled, then enable the check and fix violations. But this was a lot faster, and I felt like the test suite was good enough, and the existing type annotations complete enough, and the failure modes obvious enough, that the combined risk was low enough to merit doing the far-faster option of doing it all at once. But if this is too much to digest, I can break it up differently.

Some things I intend to do immediately after this lands (since they impact so many files, I wanted to do it later to avoid big merge conflicts):

  • Re-enable unused import option
  • Remove unused import # noqa: F401 comments
  • Conversion of remaining comment types into annotations

And some things we should do piece-by-piece but I don't plan to open immediately:

  • There were some lingering Union[..., None] that can be safely replaced with Optional, but I didn't replace all of them.

Tests

  • Added mypy to Travis CI
  • Many existing tests were modified to add typings. I mostly just used Any, since mocks are challenging to type and the benefits are small

Documentation

No documentation changes, the user-facing experience is transparent since mypy is called from make test and CI

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
  • PR passes make test

@dorianj
Copy link
Contributor Author

dorianj commented Aug 10, 2020

Rebased onto master, I'm working on getting the test pass in amundsen-io/amundsen#605, will update this once I figure that out. done

@codecov-commenter
Copy link

codecov-commenter commented Aug 10, 2020

Codecov Report

Merging #311 into master will increase coverage by 0.03%.
The diff coverage is 82.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #311      +/-   ##
==========================================
+ Coverage   74.19%   74.23%   +0.03%     
==========================================
  Files         105      105              
  Lines        4449     4502      +53     
  Branches      405      407       +2     
==========================================
+ Hits         3301     3342      +41     
- Misses       1041     1050       +9     
- Partials      107      110       +3     
Impacted Files Coverage Δ
.../mode_analytics/mode_dashboard_charts_extractor.py 0.00% <0.00%> (ø)
...e_analytics/mode_dashboard_executions_extractor.py 0.00% <0.00%> (ø)
...shboard/mode_analytics/mode_dashboard_extractor.py 0.00% <0.00%> (ø)
...ode_dashboard_last_modified_timestamp_extractor.py 0.00% <0.00%> (ø)
..._dashboard_last_successful_executions_extractor.py 0.00% <0.00%> (ø)
...d/mode_analytics/mode_dashboard_owner_extractor.py 0.00% <0.00%> (ø)
...mode_analytics/mode_dashboard_queries_extractor.py 0.00% <0.00%> (ø)
...d/mode_analytics/mode_dashboard_usage_extractor.py 0.00% <0.00%> (ø)
...rd/mode_analytics/mode_dashboard_user_extractor.py 0.00% <0.00%> (ø)
...r/dashboard/mode_analytics/mode_dashboard_utils.py 0.00% <0.00%> (ø)
... and 85 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 381a417...c91ee4a. Read the comment docs.

@dorianj
Copy link
Contributor Author

dorianj commented Aug 10, 2020

All tests now pass, after fixing a few lint errors, one error in bigquery caused by the larger-than-typical surgery required, and one error in HttpFailureSkipOnStatus due to me misunderstanding the original intent of the function.

@feng-tao I realized after opening that I actually could have deferred all of the stylistic com2ann stuff until later. I have a few other ideas that could also allow me to break this into a sequence of smaller PRs. I'm more than happy to do that if it would let you land this more easily/safely/faster. But if reviewing in this format works for you, I'll leave it as is.

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dorianj , this is exciting! I will be mostly OOO, and look at more in detail once back. I think having a single PR is fine.


http = httplib2.Http()
authed_http = google_auth_httplib2.AuthorizedHttp(credentials, http=http)
self.bigquery_service = build('bigquery', 'v2', http=authed_http, cache_discovery=False)
self.logging_service = build('logging', 'v2', http=authed_http, cache_discovery=False)
self.iter = iter(self._iterate_over_tables())
self.iter: Iterator[Any] = iter([])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why empty list?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, so it is defined in the based class which will get overrided in each bq extractor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

@feng-tao
Copy link
Member

@dorianj let me know if it is ready to review. Would like to get it shipped first and let others rebase based on the slack conv.

@dorianj
Copy link
Contributor Author

dorianj commented Aug 18, 2020

@feng-tao should be ready for review -- let me know if anything is missing

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm from the skim through, posts a few qqs, wdyt

@@ -25,12 +25,11 @@ class BigQueryTableUsageExtractor(BaseBigQueryExtractor):
for referencedTables in the response.
"""
TIMESTAMP_KEY = 'timestamp'
_DEFAULT_SCOPES = ('https://www.googleapis.com/auth/cloud-platform',)
_DEFAULT_SCOPES = ['https://www.googleapis.com/auth/cloud-platform']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

['https://www.googleapis.com/auth/cloud-platform',] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've previously only seen trailing commas used for multiline lists (since it makes the diff prettier), but happy to change if that's the normal style

Either way, I'll look into making flake8 check this -- I'm not partial to the decision, but would like to make sure it's enforced going forward

count = 0
for entry in self._retrieve_records():
count += 1
if count % self.pagesize == 0:
LOGGER.info('Aggregated {} records'.format(count))

if entry is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this check for mypy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct; happy to add a comment as such if it's not clear. Could also cast entry as any to force an exception if None. I'm new to python types so not sure what's more pythonic :)

from databuilder.rest_api.rest_api_query import RestApiQuery


def sort_widgets(widgets):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason with the function moved from top to bottom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: we use the classes defined in this file as inputs and outputs to these utility functions, so need to define those classes first for it to typecheck.

@dorianj
Copy link
Contributor Author

dorianj commented Aug 18, 2020

Thanks for the review! I answered the comments, also rebased to make sure it's also tested on Python 3.7 and won't break master when merged. Will push changes tomorrow in response to questions as needed

@feng-tao
Copy link
Member

@dorianj the pr lgtm, could you also rebase the pr with master as I just add github action to fix the pypi upload issue with travis? thanks.

@feng-tao feng-tao closed this Aug 18, 2020
@feng-tao feng-tao reopened this Aug 18, 2020
This version is a few ahead of the other amundsen services, but includes
better defaults.

Signed-off-by: Dorian Johnson <[email protected]>
warn_no_return and strict_optional are now default-on

Also try disabling ignore_missing_imports

Signed-off-by: Dorian Johnson <[email protected]>
This was either messed up by com2ann, or was incorrect before.

Signed-off-by: Dorian Johnson <[email protected]>
I edit with trim_trailing_white_space_on_save enabled in Sublime, editing
anything else in this file will cause this sensitive test to break.

This isn't the prettiest way to fix this, but I think this is enough of a
gotcha that the previous status quo is not acceptable.

Very open to feedback if there's a better way or established pattern to fix
this footgun.

Signed-off-by: Dorian Johnson <[email protected]>
This was tricky because the code conflates the objects themselves and their
names. We also have classes named `TableMetadata` and `ColumnMetadata` which
shadow those in the Cassandra package

Signed-off-by: Dorian Johnson <[email protected]>
`_iterate_over_tables` was attached to the super class but reached into
methods that only the subclass had. Move that over, and add best-guess
typings to the rest.

This class is really a mess type wise, there's a ton of dict-typing going on,
should clean it up later.

Signed-off-by: Dorian Johnson <[email protected]>
Am going to unwind this later

Signed-off-by: Dorian Johnson <[email protected]>
I fixed the original incorrectly, reading the `or` as an `and`, which
thankfully caused a test failure.

However, I believe the original code which tested `isinstance(exception, HTTPError)`
to be incorrect -- we don't need that guard if we're just grabbing the value
regardless.

Signed-off-by: Dorian Johnson <[email protected]>
@dorianj
Copy link
Contributor Author

dorianj commented Aug 18, 2020

@feng-tao sure thing, rebased onto master

@feng-tao
Copy link
Member

feng-tao commented Aug 18, 2020

nvm, actually make test should cover the mypy test.

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the hard work!

@feng-tao feng-tao merged commit b32fb65 into amundsen-io:master Aug 18, 2020
@Alagappan
Copy link
Contributor

Thanks @dorianj for making these changes. This is great. Looking forward to your next set of changes :)

@dorianj dorianj deleted the dj-mypy-typings branch August 18, 2020 20:59
jerryzhu2007 pushed a commit to kylg/amundsendatabuilder that referenced this pull request Aug 20, 2020
* commit 'e14b33e776929f8b020f1c6fec75d0fb83687693': (23 commits)
  Fix Athena sample DAG (amundsen-io#341)
  fix: Update postgres_sample_dag to set table extract job as upstream for elastic search publisher (amundsen-io#340)
  chore: mypy cleanup (convert last comment types, remove noqa imports) (amundsen-io#338)
  chore: Convert typings to mypy (amundsen-io#311)
  chore: replace all references of Lyft repo with Amundsen (amundsen-io#323)
  feat: add github actions for databuilder (amundsen-io#336)
  build: fix broken tests in Python 3.7, test in CI (amundsen-io#334)
  fix(deps): Unpin attrs (amundsen-io#332)
  ci: add dependabot config (amundsen-io#330)
  Change repo name in travis file (amundsen-io#324)
  tests: add mock for bigquery auth (amundsen-io#313)
  feat: allow hive sql to be provided as config (amundsen-io#312)
  chore: remove python2 (amundsen-io#310)
  chore: update deps for databuilder (amundsen-io#309)
  fix: cypher statement param issue in Neo4jStalenessRemovalTask (amundsen-io#307)
  fix: Added missing job tag key in hive_sample_dag.py (amundsen-io#308)
  feat: enhance glue extractor (amundsen-io#306)
  fix: Fix sql for missing columns and mysql based dialects (#550) (amundsen-io#305)
  docs: Fix broken doc link to dashboard_execution model (amundsen-io#296)
  chore: apply license headers to all the source files (amundsen-io#304)
  ...

# Conflicts:
#	README.md
#	databuilder/extractor/kafka_source_extractor.py
#	databuilder/publisher/neo4j_csv_publisher.py
#	docs/models.md
#	example/scripts/sample_data_loader.py
#	setup.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants