Skip to content

Commit

Permalink
Use Pip 21.* to install airflow officially
Browse files Browse the repository at this point in the history
The PIP 20.2.4 was so far the only officially supported installation
mechanism for Airflow as there were some problems with conflicting
dependencies (which were ignored by previous versio of PIP).

This change attempts to solve this by removing a [gcp] extra
from `apache-beam` which turns out to be the major source of
the problem - as it contains requirements to the old version of
google client libraries (but apparently only used for tests).

The "apache-beam" provider migh however need the [gcp] extra
for other components so in order to not break the backwards
compatibility, another approach is used.

Instead of adding [gcp] as extra in the apache-beam extra,
the apache.beam provider's [google] extra is extended with
'apache-beam[gcp]' additional requirement so that whenever the
provider is installed, the apache-beam with [gcp] extra is installed
as well.
  • Loading branch information
potiuk committed Apr 25, 2021
1 parent a3b0a27 commit 1b1f96b
Show file tree
Hide file tree
Showing 32 changed files with 172 additions and 173 deletions.
20 changes: 2 additions & 18 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -553,15 +553,7 @@ Airflow dependencies

.. note::

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might
depend on your choice of extras. In order to install Airflow you might need to either downgrade
pip to version 20.2.4 ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3,
you need to add option ``--use-deprecated legacy-resolver`` to your pip install command.

While ``pip 20.3.3`` solved most of the ``teething`` problems of 20.3, this note will remain here until we
set ``pip 20.3`` as official version in our CI pipeline where we are testing the installation as well.
Due to those constraints, only ``pip`` installation is currently officially supported.
Only ``pip`` installation is currently officially supported.

While they are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
Expand Down Expand Up @@ -788,15 +780,7 @@ Pinned constraint files

.. note::

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might
depend on your choice of extras. In order to install Airflow you might need to either downgrade
pip to version 20.2.4 ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3,
you need to add option ``--use-deprecated legacy-resolver`` to your pip install command.

While ``pip 20.3.3`` solved most of the ``teething`` problems of 20.3, this note will remain here until we
set ``pip 20.3`` as official version in our CI pipeline where we are testing the installation as well.
Due to those constraints, only ``pip`` installation is currently officially supported.
Only ``pip`` installation is officially supported.

While they are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
Expand Down
10 changes: 1 addition & 9 deletions CONTRIBUTORS_QUICK_START.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,15 +167,7 @@ Setup Airflow with Breeze and PyCharm

.. note::

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might
depend on your choice of extras. In order to install Airflow you might need to either downgrade
pip to version 20.2.4 ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3,
you need to add option ``--use-deprecated legacy-resolver`` to your pip install command.

While ``pip 20.3.3`` solved most of the ``teething`` problems of 20.3, this note will remain here until we
set ``pip 20.3`` as official version in our CI pipeline where we are testing the installation as well.
Due to those constraints, only ``pip`` installation is currently officially supported.
Only ``pip`` installation is currently officially supported.

While they are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ ARG AIRFLOW_GID="50000"

ARG PYTHON_BASE_IMAGE="python:3.6-slim-buster"

ARG AIRFLOW_PIP_VERSION=20.2.4
ARG AIRFLOW_PIP_VERSION=21.1

# By default PIP has progress bar but you can disable it.
ARG PIP_PROGRESS_BAR="on"
Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ ARG AIRFLOW_PRE_CACHED_PIP_PACKAGES="true"
# By default in the image, we are installing all providers when installing from sources
ARG INSTALL_PROVIDERS_FROM_SOURCES="true"
ARG INSTALL_FROM_PYPI="true"
ARG AIRFLOW_PIP_VERSION=20.2.4
ARG AIRFLOW_PIP_VERSION=21.1
# Setup PIP
# By default PIP install run without cache to make image smaller
ARG PIP_NO_CACHE_DIR="true"
Expand Down
12 changes: 2 additions & 10 deletions IMAGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,15 +172,7 @@ This will build the image using command similar to:
.. note::

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might
depend on your choice of extras. In order to install Airflow you might need to either downgrade
pip to version 20.2.4 ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3,
you need to add option ``--use-deprecated legacy-resolver`` to your pip install command.

While ``pip 20.3.3`` solved most of the ``teething`` problems of 20.3, this note will remain here until we
set ``pip 20.3`` as official version in our CI pipeline where we are testing the installation as well.
Due to those constraints, only ``pip`` installation is currently officially supported.
Only ``pip`` installation is currently officially supported.

While they are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
Expand Down Expand Up @@ -632,7 +624,7 @@ The following build arguments (``--build-arg`` in docker build command) can be u
| ``ADDITIONAL_RUNTIME_APT_ENV`` | | Additional env variables defined |
| | | when installing runtime deps |
+------------------------------------------+------------------------------------------+------------------------------------------+
| ``AIRFLOW_PIP_VERSION`` | ``20.2.4`` | PIP version used. |
| ``AIRFLOW_PIP_VERSION`` | ``21.1`` | PIP version used. |
+------------------------------------------+------------------------------------------+------------------------------------------+
| ``PIP_PROGRESS_BAR`` | ``on`` | Progress bar for PIP installation |
+------------------------------------------+------------------------------------------+------------------------------------------+
Expand Down
8 changes: 0 additions & 8 deletions INSTALL
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,6 @@ java -jar apache-rat.jar -E ./.rat-excludes -d .
python3 -m venv PATH_TO_YOUR_VENV
source PATH_TO_YOUR_VENV/bin/activate

NOTE!!

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
does not yet work with Apache Airflow and might lead to errors in installation - depends on your choice
of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3, you need to add option
``--use-deprecated legacy-resolver`` to your pip install command.

# [required] building and installing by pip (preferred)
pip install .

Expand Down
20 changes: 2 additions & 18 deletions LOCAL_VIRTUALENV.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,7 @@ Extra Packages

.. note::

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might
depend on your choice of extras. In order to install Airflow you might need to either downgrade
pip to version 20.2.4 ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3,
you need to add option ``--use-deprecated legacy-resolver`` to your pip install command.

While ``pip 20.3.3`` solved most of the ``teething`` problems of 20.3, this note will remain here until we
set ``pip 20.3`` as official version in our CI pipeline where we are testing the installation as well.
Due to those constraints, only ``pip`` installation is currently officially supported.
Only ``pip`` installation is currently officially supported.

While they are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
Expand Down Expand Up @@ -137,15 +129,7 @@ To create and initialize the local virtualenv:

.. note::

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might
depend on your choice of extras. In order to install Airflow you might need to either downgrade
pip to version 20.2.4 ``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3,
you need to add option ``--use-deprecated legacy-resolver`` to your pip install command.

While ``pip 20.3.3`` solved most of the ``teething`` problems of 20.3, this note will remain here until we
set ``pip 20.3`` as official version in our CI pipeline where we are testing the installation as well.
Due to those constraints, only ``pip`` installation is currently officially supported.
Only ``pip`` installation is currently officially supported.

While they are some successes with using other tools like `poetry <https://python-poetry.org/>`_ or
`pip-tools <https://pypi.org/project/pip-tools/>`_, they do not share the same workflow as
Expand Down
9 changes: 1 addition & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,14 +149,7 @@ correct Airflow tag/version/branch and Python versions in the URL.

NOTE!!!

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might
depend on your choice of extras. In order to install Airflow reliably, you might need to either downgrade
pip to version 20.2.4 `pip install --upgrade pip==20.2.4` or, in case you use Pip 20.3,
you might need to add option] `--use-deprecated legacy-resolver` to your pip install command.
While `pip 20.3.3` solved most of the `teething` problems of 20.3, this note will remain here until we
set `pip 20.3` as official version in our CI pipeline where we are testing the installation as well.
Due to those constraints, only `pip` installation is currently officially supported.
Only `pip` installation is currently officially supported.

While they are some successes with using other tools like [poetry](https://python-poetry.org) or
[pip-tools](https://pypi.org/project/pip-tools), they do not share the same workflow as
Expand Down
7 changes: 0 additions & 7 deletions UPDATING.md
Original file line number Diff line number Diff line change
Expand Up @@ -1787,13 +1787,6 @@ you should use `pip install apache-airflow[apache.atlas]`.

NOTE!

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
does not yet work with Apache Airflow and might lead to errors in installation - depends on your choice
of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
`pip install --upgrade pip==20.2.4` or, in case you use Pip 20.3, you need to add option
`--use-deprecated legacy-resolver` to your pip install command.


If you want to install integration for Microsoft Azure, then instead of

```
Expand Down
4 changes: 4 additions & 0 deletions airflow/provider.yaml.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,10 @@
"items": {
"type": "string"
}
},
"additional-extras": {
"type": "object",
"description": "Additional extras that the provider should have"
}
},
"additionalProperties": false,
Expand Down
58 changes: 58 additions & 0 deletions airflow/providers/apache/beam/CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,64 @@
Changelog
---------

2.0.0
.....

Breaking changes
~~~~~~~~~~~~~~~~

Integration with the ``google`` provider
````````````````````````````````````````

In 2.0.0 version of the provider we've changed the way of integrating with the ``google`` provider.
The previous versions of both providers caused conflicts when trying to install them together
using PIP > 20.2.4. The conflict is not detected by PIP 20.2.4 and below but it was there and
the version of ``Google BigQuery`` python client was not matching on both sides. As the result, when
both ``apache.beam`` and ``google`` provider were installed, some features of the ``BigQuery`` operators
might not work properly. This was cause by ``apache-beam`` client not yet supporting the new google
python clients when ``apache-beam[gcp]`` extra was used. The ``apache-beam[gcp]`` extra is used
by ``Dataflow`` operators and while they might work with the newer version of the ``Google BigQuery``
python client, it is not guaranteed.

This version introduces additional extra requirement for the ``apache.beam`` extra of the ``google`` provider
and symmetrically the additional requirement for the ``google`` extra of the ``apache.beam`` provider.
Both ``google`` and ``apache.beam`` provider do not use those extras by default, but you can specify
them when installing the providers. The consequence of that is that some functionality of the ``Dataflow``
operators might not be available.

Unfortunately the only ``complete`` solution to the problem is for the ``apache.beam`` to migrate to the
new (>=2.0.0) Google Python clients.

This is the extra for the ``google`` provider:

.. code-block:: python
extras_require={
...
'apache.beam': ['apache-airflow-providers-apache-beam', 'apache-beam[gcp]'],
....
},
And likewise this is the extra for the ``apache.beam`` provider:

.. code-block:: python
extras_require={'google': ['apache-airflow-providers-google', 'apache-beam[gcp]']},
You can still run this with PIP version <= 20.2.4 and go back to the previous behaviour:

.. code-block:: shell
pip install apache-airflow-providers-google['apache.beam']
or

.. code-block:: shell
pip install apache-airflow-providers-apache-beam['google']
But be aware that some ``BigQuery`` operators functionality might not be available in this case.

1.0.1
.....

Expand Down
8 changes: 0 additions & 8 deletions airflow/providers/apache/beam/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,6 @@ are in `airflow.providers.apache.beam` python package.

## Installation

NOTE!

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
does not yet work with Apache Airflow and might lead to errors in installation - depends on your choice
of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
`pip install --upgrade pip==20.2.4` or, in case you use Pip 20.3, you need to add option
`--use-deprecated legacy-resolver` to your pip install command.

You can install this package on top of an existing airflow 2.* installation via
`pip install apache-airflow-providers-apache-beam`

Expand Down
4 changes: 4 additions & 0 deletions airflow/providers/apache/beam/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ description: |
`Apache Beam <https://beam.apache.org/>`__.
versions:
- 2.0.0
- 1.0.1
- 1.0.0

Expand All @@ -41,3 +42,6 @@ hooks:
- integration-name: Apache Beam
python-modules:
- airflow.providers.apache.beam.hooks.beam

additional-extras:
google: apache-beam[gcp]
2 changes: 1 addition & 1 deletion airflow/providers/apache/hive/transfers/mssql_to_hive.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# pylint: disable=no-member
"""This module contains operator to move data from MSSQL to Hive."""

from collections import OrderedDict
Expand Down
58 changes: 58 additions & 0 deletions airflow/providers/google/CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,64 @@
Changelog
---------

3.0.0
.....

Breaking changes
~~~~~~~~~~~~~~~~

Integration with the ``apache.beam`` provider
`````````````````````````````````````````````

In 3.0.0 version of the provider we've changed the way of integrating with the ``apache.beam`` provider.
The previous versions of both providers caused conflicts when trying to install them together
using PIP > 20.2.4. The conflict is not detected by PIP 20.2.4 and below but it was there and
the version of ``Google BigQuery`` python client was not matching on both sides. As the result, when
both ``apache.beam`` and ``google`` provider were installed, some features of the ``BigQuery`` operators
might not work properly. This was cause by ``apache-beam`` client not yet supporting the new google
python clients when ``apache-beam[gcp]`` extra was used. The ``apache-beam[gcp]`` extra is used
by ``Dataflow`` operators and while they might work with the newer version of the ``Google BigQuery``
python client, it is not guaranteed.

This version introduces additional extra requirement for the ``apache.beam`` extra of the ``google`` provider
and symmetrically the additional requirement for the ``google`` extra of the ``apache.beam`` provider.
Both ``google`` and ``apache.beam`` provider do not use those extras by default, but you can specify
them when installing the providers. The consequence of that is that some functionality of the ``Dataflow``
operators might not be available.

Unfortunately the only ``complete`` solution to the problem is for the ``apache.beam`` to migrate to the
new (>=2.0.0) Google Python clients.

This is the extra for the ``google`` provider:

.. code-block:: python
extras_require={
...
'apache.beam': ['apache-airflow-providers-apache-beam', 'apache-beam[gcp]'],
....
},
And likewise this is the extra for the ``apache.beam`` provider:

.. code-block:: python
extras_require={'google': ['apache-airflow-providers-google', 'apache-beam[gcp]']},
You can still run this with PIP version <= 20.2.4 and go back to the previous behaviour:

.. code-block:: shell
pip install apache-airflow-providers-google['apache.beam']
or

.. code-block:: shell
pip install apache-airflow-providers-apache-beam['google']
But be aware that some ``BigQuery`` operators functionality might not be available in this case.

2.2.0
.....

Expand Down
4 changes: 4 additions & 0 deletions airflow/providers/google/provider.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ description: |
- `Google Workspace <https://workspace.google.pl/>`__ (formerly Google Suite)
versions:
- 3.0.0
- 2.2.0
- 2.1.0
- 2.0.0
Expand Down Expand Up @@ -742,3 +743,6 @@ extra-links:
- airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleLink
- airflow.providers.google.cloud.operators.bigquery.BigQueryConsoleIndexableLink
- airflow.providers.google.cloud.operators.mlengine.AIPlatformConsoleLink

additional-extras:
apache.beam: apache-beam[gcp]
1 change: 1 addition & 0 deletions airflow/providers/microsoft/mssql/hooks/mssql.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# pylint: disable=no-member
"""Microsoft SQLServer hook module"""

import pymssql
Expand Down
9 changes: 0 additions & 9 deletions dev/provider_packages/PROVIDER_INDEX_TEMPLATE.rst.jinja2
Original file line number Diff line number Diff line change
Expand Up @@ -47,15 +47,6 @@ are in ``{{FULL_PACKAGE_NAME}}`` python package.
Installation
------------

.. note::

On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver
does not yet work with Apache Airflow and might lead to errors in installation - depends on your choice
of extras. In order to install Airflow you need to either downgrade pip to version 20.2.4
``pip install --upgrade pip==20.2.4`` or, in case you use Pip 20.3, you need to add option
``--use-deprecated legacy-resolver`` to your pip install command.


You can install this package on top of an existing airflow 2.* installation via
``pip install {{PACKAGE_PIP_NAME}}``
{%- if PIP_REQUIREMENTS %}
Expand Down
Loading

0 comments on commit 1b1f96b

Please sign in to comment.