Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-1467] Dynamic pooling via allowing tasks to use more than one pool slot (depending upon the need) #7160

Merged
merged 18 commits into from
Jan 19, 2020

Conversation

lokeshlal
Copy link
Contributor

PR contains changes in pool and task instance to provide functionality to tasks to use more than one pool slot.

  • Added pool_capacity field in TaskInstance (pool_capacity is defaulted to 1, to maintain the current behavior)
  • Added pool_capacity in baseoperator
  • Modified pools functionality to calculate the used/queued/occupied/available slots
  • Modified pool_slots_available_dep.py to check against task pool_capacity field instead of 1
  • Modifed test case in file test_pool_slots_available_dep.py to include pool_capacity in the Mock object

@codecov-io
Copy link

codecov-io commented Jan 14, 2020

Codecov Report

Merging #7160 into master will decrease coverage by 0.99%.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #7160    +/-   ##
========================================
- Coverage   85.41%   84.42%    -1%     
========================================
  Files         753      753            
  Lines       39685    39693     +8     
========================================
- Hits        33898    33509   -389     
- Misses       5787     6184   +397
Impacted Files Coverage Δ
airflow/models/pool.py 97.36% <ø> (ø) ⬆️
airflow/models/baseoperator.py 96.28% <100%> (+0.02%) ⬆️
airflow/models/taskinstance.py 94.96% <100%> (+0.03%) ⬆️
airflow/ti_deps/deps/pool_slots_available_dep.py 100% <100%> (ø) ⬆️
airflow/operators/mysql_operator.py 0% <0%> (-100%) ⬇️
airflow/operators/mysql_to_hive.py 0% <0%> (-100%) ⬇️
...flow/providers/apache/cassandra/hooks/cassandra.py 21.51% <0%> (-72.16%) ⬇️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/api/auth/backend/kerberos_auth.py 28.16% <0%> (-54.93%) ⬇️
...irflow/contrib/operators/redis_publish_operator.py 50% <0%> (-50%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5abce47...3a5351d. Read the comment docs.

@lokeshlal
Copy link
Contributor Author

Hello @dimberman, Could you please review this PR. This is same PR as earlier #6975. In this PR, I have updated the migration head as suggested by potiuk. Thank you.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing you will need to do is add support for this column in to the Serialized DAG format.

We should add this as an (optional) field in to airflow/serialization/schema.json and most things should be handled already. The important thing to test is that the existing "ground truth" dag in tests/serializtion/test_dag_serialization.py should have this field set correctly when it is deserializaed without having to update the JSON blob -- that ensures that when upgrading this will behave itself. Please add tests covering that.

@@ -194,6 +195,11 @@ def __init__(self, task, execution_date, state=None):

self.queue = task.queue
self.pool = task.pool
if hasattr(task, 'pool_capacity'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasattr check shouldn't be needed -- you've added it to baseoperator

@@ -194,6 +195,11 @@ def __init__(self, task, execution_date, state=None):

self.queue = task.queue
self.pool = task.pool
if hasattr(task, 'pool_capacity'):
self.pool_capacity = task.pool_capacity
if task.pool_capacity < 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should be done when creating/setting it on the Task/operator, not here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the check to baseoperator


if open_slots <= 0:
if open_slots <= (ti.pool_capacity - 1):
yield self._failing_status(
reason=("Not scheduling since there are %s open slots in pool %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extend this message to say how many slots we are looking for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified the message to

reason=("Not scheduling since there are %s open slots in pool %s "
                        "and require %s pool slots",
                        open_slots, pool_name, ti.pool_slots)

self.assertFalse(PoolSlotsAvailableDep().is_met(ti=ti))

@patch('airflow.models.Pool.open_slots', return_value=1)
# pylint: disable=unused-argument
def test_pooled_task_pass(self, mock_open_slots):
ti = Mock(pool='test_pool')
ti = Mock(pool='test_pool', pool_capacity=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't 1 the default, meaning most of these changes in tests aren't needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed pool_capacity from most of the places except few where it is absolutely required.

@@ -178,6 +178,9 @@ class derived from this one results in the creation of a task object,
:param pool: the slot pool this task should run in, slot pools are a
way to limit concurrency for certain tasks
:type pool: str
:param pool_capacity: the number of pool slots this task should use (>= 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capacity is the overall size of the pool, and in the logs etc we talk about slots ("Not scheduling since there are %s open slots in pool %s") so do you think this would be better named as pool_slots

WDYT @tooptoop4 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes pool_slots sounds more reasonable. renamed pool_capacity to pool_slots

@ashb
Copy link
Member

ashb commented Jan 14, 2020

(sorry to add more work for you after the first PR was merged!)

@potiuk
Copy link
Member

potiuk commented Jan 14, 2020

The other thing you will need to do is add support for this column in to the Serialized DAG format.

We should add this as an (optional) field in to airflow/serialization/schema.json and most things should be handled already. The important thing to test is that the existing "ground truth" dag in tests/serializtion/test_dag_serialization.py should have this field set correctly when it is deserializaed without having to update the JSON blob -- that ensures that when upgrading this will behave itself. Please add tests covering that.

I have added automated test covering this case - so that in the future this should be apparent that you should do it #7162

@lokeshlal
Copy link
Contributor Author

Thank you everyone for the direction. I am looking at them one by one.

@lokeshlal
Copy link
Contributor Author

lokeshlal commented Jan 14, 2020

The other thing you will need to do is add support for this column in to the Serialized DAG format.
We should add this as an (optional) field in to airflow/serialization/schema.json and most things should be handled already. The important thing to test is that the existing "ground truth" dag in tests/serializtion/test_dag_serialization.py should have this field set correctly when it is deserializaed without having to update the JSON blob -- that ensures that when upgrading this will behave itself. Please add tests covering that.

I have added automated test covering this case - so that in the future this should be apparent that you should do it #7162

I have added the pool_slots in schema.json. I will wait for your inputs on #7162 . Just to clarify, the test should create a dag and deserialize the dag and check it against the "ground truth" dict. and pool_slots needs to be added in the ground truth dict as well.

@potiuk
Copy link
Member

potiuk commented Jan 14, 2020

I will wait for your inputs on #7162 .

Thanks for your patience @lokeshlal -> I think we would like to wait with that for @dimberman or @kaxil who knows best the serialisation part and agree with him how to handle the scenario of added field in BaseOperator.

It happened for the first time since we introduced serialisation, so we do not have fully hashed-out scenario!

Kaxil is now travelling to India and have some holidays so it might take some time (few days maybe) until we synchronise, so if you can bear with us a little more, that will be great!

Thanks for the contribution BTW. It looks great!

@kaxil kaxil self-requested a review January 15, 2020 21:43
@kaxil
Copy link
Member

kaxil commented Jan 15, 2020

Thanks for waiting @lokeshlal , I am currently on Leave but would be reviewing it very soon :)

@potiuk
Copy link
Member

potiuk commented Jan 16, 2020

So just update the json @lokeshlal. Here is the current message that you will get when we merge #7162. So please follow it and let us know if it's clear message

ACTION NEEDED! PLEASE READ THIS CAREFULLY AND CORRECT TESTS CAREFULLY

Some fields were added to the BaseOperator! Please add them to the list 
above and make sure that you add support for DAG serialization - 
you should add the field to `airflow/serialization/schema.json` - 
they should have correct type defined there.

Note that we do not support versioning yet so you should
only add optional fields to BaseOperator.

@lokeshlal
Copy link
Contributor Author

Thanks @potiuk - I have already updated schema.json file with pool_slots field. Plus this field needs to be added to the test case you have added for #7162 ... correct.

@lokeshlal lokeshlal requested a review from ashb January 17, 2020 04:09
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #7162 test is merged now - so please rebase, observe the test failing ;) and fix it then @lokeshlal

@kaxil kaxil mentioned this pull request Apr 5, 2020
5 tasks
kaxil added a commit to astronomer/airflow that referenced this pull request Feb 25, 2021
closes apache#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache#7160
kaxil added a commit that referenced this pull request Feb 25, 2021
closes #13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by #7160
ashb pushed a commit that referenced this pull request Mar 19, 2021
closes #13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by #7160

(cherry picked from commit f763b7c)
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 16, 2021
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

(cherry picked from commit f763b7c3aa9cdac82b5d77e21e1840fbe931257a)

GitOrigin-RevId: 8c956756e7b60ee265a73309cb3a245966a7477c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 17, 2021
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 23, 2021
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 27, 2021
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Mar 10, 2022
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 4, 2022
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jul 9, 2022
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Aug 27, 2022
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 4, 2022
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
aglipska pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 7, 2022
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Dec 7, 2022
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 27, 2023
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
kosteev pushed a commit to kosteev/composer-airflow-test-copybara that referenced this pull request Sep 12, 2024
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
kosteev pushed a commit to kosteev/composer-airflow-test-copybara that referenced this pull request Sep 12, 2024
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 17, 2024
closes apache/airflow#13799

Without it the migration from 1.10.14 to 2.0.0 can fail with following error for old TIs:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1275, in _execute
    self._run_scheduler_loop()
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1377, in _run_scheduler_loop
    num_queued_tis = self._do_scheduling(session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1533, in _do_scheduling
    num_queued_tis = self._critical_section_execute_task_instances(session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1132, in _critical_section_execute_task_instances
    queued_tis = self._executable_task_instances_to_queued(max_tis, session=session)
  File "/usr/local/lib/python3.6/dist-packages/airflow/utils/session.py", line 62, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/airflow/jobs/scheduler_job.py", line 1034, in _executable_task_instances_to_queued
    if task_instance.pool_slots > open_slots:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Workaround was to run manually:

```
UPDATE task_instance SET pool_slots = 1 WHERE pool_slots IS NULL;
```

This commit makes adds a DB migration to change the value to 1 for records with NULL value. And makes the column NOT NULLABLE.

This bug was caused by apache/airflow#7160

GitOrigin-RevId: f763b7c3aa9cdac82b5d77e21e1840fbe931257a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants