Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improves documentation regarding providers and custom connections 2 #13410

Merged
89 changes: 75 additions & 14 deletions docs/apache-airflow-providers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@ Provider packages
Provider packages context
'''''''''''''''''''''''''

Unlike Apache Airflow 1.10, the Airflow 2.0 is delivered in multiple, separate, but connected packages.
Unlike Apache Airflow 1.10, the Airflow 2.0 is delivered in multiple, separate but connected packages.
The core of Airflow scheduling system is delivered as ``apache-airflow`` package and there are around
60 providers packages which can be installed separately as so called "Airflow Provider packages".
Those provider packages are separated per-provider (for example ``amazon``, ``google``, ``salesforce``
etc.) Those packages are available as ``apache-airflow-providers`` packages - separately per each provider
(for example there is an ``apache-airflow-providers-amazon`` or ``apache-airflow-providers-google`` package.
etc.). Those packages are available as ``apache-airflow-providers`` packages - separately per each provider
(for example there is an ``apache-airflow-providers-amazon`` or ``apache-airflow-providers-google`` package).

You can install those provider packages separately in order to interface with a given provider. For those
providers that have corresponding extras, the provider packages (latest version from PyPI) are installed
Expand Down Expand Up @@ -72,26 +72,25 @@ Separate provider packages provide the possibilities that were not available in
Extending Airflow Connections and Extra links via Providers
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

Providers can not only deliver operators, hooks, sensor, transfer operators to communicate with
Providers can not only deliver operators, hooks, sensor, and transfer operators to communicate with a
multitude of external systems, but they can also extend Airflow. Airflow has several extension capabilities
that can be used by providers. Airflow automatically discovers which providers add those additional
capabilities and, once you install provider package and re-start Airflow, those become automatically
available to Airflow Users.

The capabilities are:

* Adding Extra Links to operators delivered by the provider.
See :doc:`apache-airflow:howto/define_extra_link`
for description of what extra links are and examples of provider registering an operator with extra links
* Adding Extra Links to operators delivered by the provider. See :doc:`apache-airflow:howto/define_extra_link`
for a description of what extra links are and examples of provider registering an operator with extra links

* Adding custom connection types, extending connection form and handling custom form field behaviour for the
connections defined by the provider. See :doc:`apache-airflow:howto/connection` for description of
connections defined by the provider. See :doc:`apache-airflow:howto/connection` for a description of
connection and what capabilities of custom connection you can define.

How to create your own provider
"""""""""""""""""""""""""""""""
'''''''''''''''''''''''''''''''

Adding provider to Airflow is just a matter of building a Python package and adding the right meta-data to
Adding a provider to Airflow is just a matter of building a Python package and adding the right meta-data to
the package. We are using standard mechanism of python to define
`entry points <https://docs.python.org/3/library/importlib.metadata.html#entry-points>`_ . Your package
needs to define appropriate entry-point ``apache_airflow_provider`` which has to point to a callable
Expand All @@ -111,7 +110,7 @@ your own purpose) but the two important fields from the extensibility point of v
:doc:`apache-airflow:howto/connection` for more details.


When your providers are installed you can query the installed providers and their capabilities with
When your providers are installed you can query the installed providers and their capabilities with the
``airflow providers`` command. This way you can verify if your providers are properly recognized and whether
they define the extensions properly. See :doc:`cli-and-env-variables-ref` for details of available CLI
sub-commands.
Expand Down Expand Up @@ -178,17 +177,79 @@ Creating your own providers
**When I write my own provider, do I need to do anything special to make it available to others?**

You do not need to do anything special besides creating the ``apache_airflow_provider`` entry point
returning properly formatted meta-data (dictionary with ``extra-links`` and ``hook-class-names`` fields.
returning properly formatted meta-data (dictionary with ``extra-links`` and ``hook-class-names`` fields).

Anyone who runs airflow in an environment that has your Python package installed will be able to use the
package as a provider package.

**What do I need to do to turn a package into a provider?**

You need to do the following to turn an existing Python package into a provider (see below for examples):

* Add the ``apache_airflow_provider`` entry point in the ``setup.cfg`` - this tells airflow where to get
the required provider metadata
* Create the function that you refer to in the first step as part of your package: this functions returns a
dictionary that contains all meta-data about your provider package; see also ``provider.yaml``
files in the community managed provider packages as examples

Example ``setup.cfg``:

.. code-block:: cfg

[options.entry_points]
# the function get_provider_info is defined in myproviderpackage.somemodule
apache_airflow_provider=
provider_info=myproviderpackage.somemodule:get_provider_info

Example ``myproviderpackage/somemodule.py``:

.. code-block:: Python

def get_provider_info():
return {
"package-name": "my-package-name",
"name": "name",
"description": "a description",
"hook-class-names": [
"myproviderpackage.hooks.source.SourceHook",
],
'versions': ["1.0.0"],
}

**How do provider packages work under the hood?**

At the end, there will be (at least) three components to your airflow installation with custom connection types:

* The installation itself (ideally you have a ``venv`` where you installed airflow with ``pip install apache-airflow``)
* The ``apache-airflow`` package
* Your own ``myproviderpackage`` package that is independent of ``apache-airflow`` or your airflow installation, which
can be a local Python package (that you install via ``pip pip install -e /path/to/my-package``) or a normal pip package
(``pip install myproviderpackage``) (or any other type of Python package)

In the ``myproviderpackage`` package you need to add the entry point and provide the appropriate metadata as described above.
If you have done that, airflow does the following at runtime:

* loop through ALL packages installed in your environment / ``venv``
* for each package, if the package's ``setup.cfg`` has a section ``[options.entry_points]``, and if that section has a value
for ``apache_airflow_provider``, then get the value for ``provider_info``, e.g. ``myproviderpackage.somemodule:get_provider_info``
* that value works like an import statement: myproviderpackage.somemodule:get_provider_info translates to something like
``from myproviderpackage.somemodule import get_provider_info``, and the get_provider_info that is being imported should be a
callable, i.e. a function
* this function should return a dictionary with metadata
* if you have custom connection types as part of your package, that metadata will including a field called ``hook-class-names``,
which should be a list of strings of your custom hooks - those strings should also be in an import-like format, e.g.
``myproviderpackage.hooks.source.SourceHook`` means that there is a class ``SourceHook`` in ``myproviderpackage/hooks/source.py``
- airflow then imports these hooks and looks for the functions ``get_ui_field_behaviour`` and ``get_connection_form_widgets``
(both optional) as well as the attributes ``conn_type`` and ``hook_name`` to create the custom connection type in the airflow UI

**Should I named my provider specifically or should it be created in ``airflow.providers`` package?**
**Should I name my provider specifically or should it be created in ``airflow.providers`` package?**

We have quite a number (>70) of providers managed by the community and we are going to maintain them
together with Apache Airflow. All those providers have well-defined structured and follow the
naming conventions we defined and they are all in ``airflow.providers`` package. If your intention is
to contribute your provider, then you should follow those conventions and make a PR to Apache Airflow
to contribute to it. But you are free to use any package name as long as there are no conflicts with other
names,so preferably choose package that is in your "domain".
names, so preferably choose package that is in your "domain".

**Is there a convention for a connection id and type?**

Expand Down
21 changes: 11 additions & 10 deletions docs/apache-airflow/howto/connection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -327,23 +327,24 @@ an secrets backend to retrieve connections. For more details see :doc:`/security
Custom connection types
-----------------------

Airflow allows to define custom connection types - including modification of the add/edit form for the
connections. Custom connection types are defined in community maintained providers, but also you can add
custom providers, that can add their own connection types. See :doc:`apache-airflow-providers:index`
for description on how to add your own connection type via custom providers.
Airflow allows the definition of custom connection types - including modifications of the add/edit form
for the connections. Custom connection types are defined in community maintained providers, but you can
can also add a custom provider that adds custom connection types. See :doc:`apache-airflow-providers:index`
for description on how to add custom providers.

The custom connection types are defined via Hooks delivered by the providers. The Hooks can implement
methods defined in the protocol :class:`~airflow.hooks.base_hook.DiscoverableHook`. Note that your custom
Hook should not derive from the class, the class is merely there to document expectations about class
fields and methods that your Hook might define.
methods defined in the protocol class :class:`~airflow.hooks.base_hook.DiscoverableHook`. Note that your
custom Hook should not derive from this class, this class is a dummy example to document expectations
regarding about class fields and methods that your Hook might define. Another good example is
:py:class:`~airflow.providers.jdbc.hooks.jdbc.JdbcHook`.

By implementing those method in the hooks of yours and exposing them via ``hook-class-names`` array in
By implementing those methods in your hooks and exposing them via ``hook-class-names`` array in
the provider meta-data you can customize Airflow by:

* Adding custom connection type
* Adding custom connection types
* Adding automated Hook creation from the connection type
* Adding custom form widget to display and edit custom "extra" parameters in your connection URL
* Hiding fields that are not used for your connection
* Adding placeholders showing examples of how fields should be formatted

You can read more about details how to add custom connection type in the :doc:`apache-airflow-providers:index`
You can read more about details how to add custom provider packages in the :doc:`apache-airflow-providers:index`