Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reading provider information from packages/sources #12512

Merged

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 20, 2020

This PR implements discovering and readin provider information from
packages (using entry_points) and - if found - from local
provider yaml files for the built-in airflow providers,
when they are found in the airflow.provider packages.
The provider.yaml files - if found - take precedence over the
package-provided ones.

Add displaying provider information in CLI

Closes: #12470


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk
Copy link
Member Author

potiuk commented Nov 20, 2020

Hey @ashb @mik-laj -> this is the "base" version of the ProvidersManager - updated to include both options:

  • running directly from the sources (in which case provider information is read directly from provider.yaml file) - dev
  • run-time version that reads the information from package entry_points - production

It also includes CLI that you can query the providers installed.

Reading from packages is much slower than from file, but it is bearable (~2-3 seconds on my machine). most of it is resolving and checking package dependencies, not reading the content. Also - currently - two packages (snowflake and amazon) fail to register because they have "ContextualVersionConflict" which I hope we will eventually resolve (#10854, #10854)

You can also try it out yourself - there is a new 'providers' cli command displaying the installed providers:

Screenshot from 2020-11-20 17-56-45

Screenshot from 2020-11-20 17-57-09

Works both - in dev env (with local sources) and prod env (with packages)

@potiuk potiuk force-pushed the implement-providers-runtime-dev branch from a3e9242 to 8ac737d Compare November 20, 2020 19:30
airflow/providers_manager.py Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the implement-providers-runtime-dev branch from 8ac737d to 33c17c0 Compare November 21, 2020 14:09
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk potiuk force-pushed the implement-providers-runtime-dev branch from 33c17c0 to 5973ebf Compare November 21, 2020 18:35
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk potiuk force-pushed the implement-providers-runtime-dev branch 2 times, most recently from d0439de to b91e0b8 Compare November 22, 2020 07:52
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk potiuk force-pushed the implement-providers-runtime-dev branch 2 times, most recently from 596d28f to 5c95086 Compare November 22, 2020 08:54
@potiuk potiuk mentioned this pull request Nov 22, 2020
@potiuk potiuk force-pushed the implement-providers-runtime-dev branch from 5c95086 to 50246bf Compare November 22, 2020 10:51
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk
Copy link
Member Author

potiuk commented Nov 22, 2020

Hey @ashb. It's green - this one is "pure" provider discovery - without connections/extra links - it just contains the discovery and restructured provider package tests to show better what's happening.

Right now what happens:

  1. for backport providers I install airflow 1.10.12 (soon 1.10.13) install all the provider.whl and provider.tgz packages and check if all classes from airlfow.providers are importable

  2. for regular providers, I prepare now "airflow" wheel package and install it from there (not from sources directly) and install all the providers from packages (whl or .tgz - I run two checks), I check if all providers are importable and then I run the new airflow providers list command to get all the providers and see if we see the expected number of providers as output.

This does not work now because we have import version conflict problems, but once we solve it, it should work - we have 59 out of 60 now - so for now I just skip exit in the number check.

See here: https://github.com/apache/airflow/runs/1438201616?check_suite_focus=true#step:9:4436

I will continue on rebasing the others and we can discuss tomorrow how close we can get to support the cases that you have in mind (with plugins) not only the providers. I think we can find some good future-proof solutions now, I simply do not have the capacity to also implement plugins at the same time, but I think if we implement it correctly, we can have a follow up PR implementing the "plugin" case you mentioned as well, without some heavy modifications.

@potiuk potiuk force-pushed the implement-providers-runtime-dev branch 4 times, most recently from 5e80cc6 to aa2f739 Compare November 23, 2020 06:17
@ashb
Copy link
Member

ashb commented Nov 23, 2020

I think this is all we'd have to change to remove the yaml.safe_load from in the generated provieer_info.

It won't be the prettiest, but we can fix that by running black on it after generating the template.

I tested this and this is what the provider info for amazon package looks okay, if horrible without black formatting.

@ashb
Copy link
Member

ashb commented Nov 23, 2020

I think this is all we'd have to change to remove the yaml.safe_load from in the generated provieer_info.
It won't be the prettiest, but we can fix that by running black on it after generating the template.

I tested this and this is what the provider info for amazon package looks okay, if horrible without black formatting.

Formatting with black is quite straight forward:

+    import black
     with open(get_provider_file_path, "wt") as get_provider_file:
-        get_provider_file.write(get_provider_content)
+        get_provider_file.write(black.format_str(get_provider_content, mode=black.FileMode()))
 

(In black >= 0.20.8b0, that has been renamed to just Mode, but the version in my container was older than that.)

It now looks like:

def get_provider_info():
    {
        "package-name": "apache-airflow-providers-amazon",
        "name": "Amazon",
        "description": "Amazon integration (including `Amazon Web Services (AWS) <https://aws.amazon.com/>`__).\n",
        "versions": ["1.0.0b2"],
        "integrations": [
            {

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk potiuk force-pushed the implement-providers-runtime-dev branch 3 times, most recently from e5f5ebf to ed93e40 Compare November 23, 2020 16:45
@potiuk
Copy link
Member Author

potiuk commented Nov 23, 2020

I will be running some more tests but I think everything addressed @ashb/@mik-laj

@potiuk
Copy link
Member Author

potiuk commented Nov 23, 2020

It look like this one is good to go. Some quarantined tests failed. I am re-running it.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk potiuk force-pushed the implement-providers-runtime-dev branch from ed93e40 to 6c1eef2 Compare November 24, 2020 14:23
@github-actions
Copy link

The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.

@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2020

@ashb -> I would love to merge that one, so that I can rebase and finish the other two :)

@potiuk potiuk force-pushed the implement-providers-runtime-dev branch from 6c1eef2 to dadda14 Compare November 25, 2020 16:30
@ashb
Copy link
Member

ashb commented Nov 25, 2020

I hadn't been looking cos of all the failing tests. Is it ready now?

@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2020

I hadn't been looking cos of all the failing tests. Is it ready now?

Yep. Rebased it and but last time it passed all tests.

@potiuk
Copy link
Member Author

potiuk commented Nov 26, 2020

gentle reminder @ash :)

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny change -- preemptively approving.

airflow/providers_manager.py Show resolved Hide resolved
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs rebasing on Master but LGTM otherwise

This PR implements discovering and readin provider information from
packages (using entry_points) and - if found - from local
provider yaml files for the built-in airflow providers,
when they are found in the airflow.provider packages.
The provider.yaml files - if found - take precedence over the
package-provided ones.

Add displaying provider information in CLI

Closes: apache#12470
@potiuk potiuk force-pushed the implement-providers-runtime-dev branch from 0bfb26e to 4f36865 Compare November 27, 2020 15:50
@potiuk potiuk merged commit 41a699a into apache:master Nov 27, 2020
@potiuk potiuk deleted the implement-providers-runtime-dev branch November 27, 2020 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Display providers via cli
4 participants