Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for Connection/Hook discovery from providers #12466

Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 19, 2020

This PR extends providers discovery with the mechanism
of retrieving mapping of connections from type to hook.

Fixes #12456

Depends on #12512 so only check the last commit


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added provider:Apache provider:google Google (including GCP) related issues labels Nov 19, 2020
@potiuk potiuk force-pushed the add-connection-discovery-to-providers branch 3 times, most recently from 61023df to 157fb07 Compare November 19, 2020 00:17
@potiuk potiuk force-pushed the add-connection-discovery-to-providers branch 2 times, most recently from 2aecb5a to a8b2e03 Compare November 19, 2020 01:38
@potiuk potiuk linked an issue Nov 19, 2020 that may be closed by this pull request
@potiuk
Copy link
Member Author

potiuk commented Nov 19, 2020

I will still have to make some doc generation fixes, but at least it shoudl be good to review the approach

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

airflow/providers_manager.py Outdated Show resolved Hide resolved
airflow/providers_manager.py Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the add-connection-discovery-to-providers branch 2 times, most recently from 8c996be to 5fbef7e Compare November 22, 2020 18:48
@potiuk
Copy link
Member Author

potiuk commented Nov 22, 2020

Hey @ashb - we can talk about it tomorrow. I understand (I think) your point about connection parameters being part of Hook - but I woudl love to see if we can do it in a way that will be nice to discover the connections from providers and make it easy for non-provider (casual) users to possibly use the same kind of functionality.

@potiuk
Copy link
Member Author

potiuk commented Nov 22, 2020

HEy @ashb -> that is another case for the "extra links" -> this one is simpler because we really just need the class names and nothing more. And I believe this is something similar you wanted to do for Connections via #12466 - but let's talk about it tomorrow

@potiuk potiuk force-pushed the add-connection-discovery-to-providers branch 2 times, most recently from 84c716f to cdb3cfc Compare November 22, 2020 22:00
@potiuk potiuk changed the title Adds support for Connection discovery from providers Adds support for Hook discovery from providers Nov 23, 2020
@potiuk potiuk force-pushed the add-connection-discovery-to-providers branch 4 times, most recently from 34c7b2d to 2a0a42e Compare November 23, 2020 05:58
@ashb
Copy link
Member

ashb commented Nov 29, 2020

Ah, you've just done the existing ones, cool. Didn't compare that - I had a list of providers open on the other screen was all.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 29, 2020
This PR extends providers discovery with the mechanism
of retrieving mapping of connections from type to hook.

Fixes apache#12456
@potiuk potiuk force-pushed the add-connection-discovery-to-providers branch from a84b218 to a6cf883 Compare November 29, 2020 10:40
@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2020

Just one last fixup - we had cyclic dependency Connection -> provider_manager -> plugins_manager -> Hook -> Connections. Solved it by extracting entrypoint_with_dist to a separate util module (as this is an util used by both plugins_manager and providers_manager now).

@mik-laj
Copy link
Member

mik-laj commented Nov 29, 2020

Some of the existing ones are also not DbApiHook either - docker, gcpssh, grpc for example.

We didn't have clear rules on when to add classes, so sometimes classes were added but then never used. Actually in Airflow code, this is only used by operators that use DbApiHook. I did not observe any other use cases.

@ashb
Copy link
Member

ashb commented Nov 29, 2020

Just one last fixup - we had cyclic dependency Connection -> provider_manager -> plugins_manager -> Hook -> Connections. Solved it by extracting entrypoint_with_dist to a separate util module (as this is an util used by both plugins_manager and providers_manager now).

Did something really detect that as a cycle? Cos importing connection.py would not cause the provider_manager to import anything from plugins_manager -- only running code would do that, not simply importing. Therefore that is not a cyclic import.

(I'm okay with moving it to a separate module anyway, I was just being a little bit lazy last night when adding it. I'm still surprised that this was claimed to be a cycle though.)

airflow/plugins_manager.py Outdated Show resolved Hide resolved
@ashb
Copy link
Member

ashb commented Nov 29, 2020

Some of the existing ones are also not DbApiHook either - docker, gcpssh, grpc for example.

We didn't have clear rules on when to add classes, so sometimes classes were added but then never used. Actually in Airflow code, this is only used by operators that use DbApiHook. I did not observe any other use cases.

Yeah, right now it's not. We've been talking about populating the Connections form in the UI with this information, at which point we'll have to add everything. But you are both right in saying one thing at a time.

@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2020

Did something really detect that as a cycle? Cos importing connection.py would not cause the provider_manager to import anything from plugins_manager -- only running code would do that, not simply importing. Therefore that is not a cyclic import.

Yes. Pylint did and fail.

@ashb
Copy link
Member

ashb commented Nov 29, 2020

Did something really detect that as a cycle? Cos importing connection.py would not cause the provider_manager to import anything from plugins_manager -- only running code would do that, not simply importing. Therefore that is not a cyclic import.

Yes. Pylint did and fail.

pylint-dev/pylint#850 - anyway, Not a "real" cyclic import (or at least not on triggered at import time) but anyway, moving it out is cleanler

@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2020

Therefore that is not a cyclic import.

Of course not cyclic import. Cyclic imports are detected by python parser so there is no point static-checking them. Pylint detects cyclic dependencies between modules instead.

It's a cyclic dependency between modules, not a cyclic import (just to reiterate what I keep on repeating).

Those are different things and this one is easy (and natural) to solve. If two modules are using a common function, it's better to extract it to a separate module, rather than having one module depended on the other where the dependency could be reverted.

Both providers_manager and plugins_manager are at the same "logical" level, so there is no reason why providers manager should import anything from plugins_manager. Why not plugins manager importing the same function from providers manager ?

So extracting it to common module makes perfect sense and is "cleaner" architecture - and this is what pylint complains about.

@ashb
Copy link
Member

ashb commented Nov 29, 2020

Both providers_manager and plugins_manager are at the same "logical" level, so there is no reason why providers manager should import anything from plugins_manager. Why not plugins manager importing the same function from providers manager ?

So extracting it to common module makes perfect sense and is "cleaner" architecture - and this is what pylint complains about.

Yes, in this case I agree it is cleaner.

My general complaint with this pylint check is that in general I disagree that it detects actual problems.

@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2020

My general complaint with this pylint check is that in general I disagree that it detects actual problems.

I'll stick to my opinion that it actually detects bad "smells" in code architecture and solving them leads to nicer code that has less hidden, unnecessary dependencies and is easier to maintain as the code maintains much better "single responsibility" principle and avoids "god object" pattern. https://en.wikipedia.org/wiki/God_object. While using local imports might work, those dependencies (like in this case) are there, and make it more difficult to refactor and understand the code.

But that's only my opinion not a universal truth, of course, coming from much better structured and stronger typing languages.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk potiuk removed the full tests needed We need to run full set of tests for this PR to merge label Nov 29, 2020
@potiuk potiuk force-pushed the add-connection-discovery-to-providers branch from 1021552 to 034c19e Compare November 29, 2020 13:02
@potiuk potiuk merged commit 2037303 into apache:master Nov 29, 2020
@potiuk potiuk deleted the add-connection-discovery-to-providers branch November 29, 2020 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-8 - Provider packages provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement connection-to-hook registering in providers
5 participants