Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid sharing session with RenderedTaskInstanceFields write and delete #9993

Merged
merged 1 commit into from
Jul 25, 2020

Conversation

22quinn
Copy link
Contributor

@22quinn 22quinn commented Jul 25, 2020

Sharing session with RTIF leads to idle-in-transaction timeout error when DAG serialization is enabled and task running duration exceeds the idle-in-transaction timeout setting of the database.

The change was introduced in #6788.

In many production databases, the idle-in-transaction timeout is not unlimited. For example, if the timeout is set to 1 hour, any task that runs for more than 1 hour will raise an exception even if the actual task succeeds.

Procedure to reproduce the bug and isolate the issue:

Use this docker-compose:

version: "3"
services:
  webserver:
    image: apache/airflow:1.10.11
    volumes:
      - logs:/opt/airflow/logs
    environment:
      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:airflow@postgres/airflow
      - AIRFLOW__CORE__EXECUTOR=LocalExecutor
      - AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
      - AIRFLOW__CORE__STORE_DAG_CODE=True
      - AIRFLOW__CORE__LOGGING_LEVEL=DEBUG
    depends_on:
      - postgres
    command: webserver
    ports:
      - "8080:8080"
  scheduler:
    image: apache/airflow:1.10.11
    volumes:
      - logs:/opt/airflow/logs
      - ./dags/:/opt/airflow/dags/:ro
    environment:
      - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:airflow@postgres/airflow
      - AIRFLOW__CORE__EXECUTOR=LocalExecutor
      - AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
      - AIRFLOW__CORE__STORE_DAG_CODE=True
      - AIRFLOW__CORE__LOGGING_LEVEL=DEBUG
    depends_on:
      - postgres
    command: scheduler
  postgres:
    image: library/postgres:10.7
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    volumes:
      - /dev/urandom:/dev/random   # Required to get non-blocking entropy source
    ports:
      - "5432:5432"
    command: postgres -c 'idle_in_transaction_session_timeout=60000'   # 1 minute timeout

volumes:
  logs:

and put this DAG in dags folder:

from datetime import datetime
import time
import logging

from airflow.models.dag import DAG
from airflow.operators.python_operator import PythonOperator

logging.getLogger('sqlalchemy.engine').setLevel(logging.DEBUG)


def sleep():
    logging.info("sleep")
    time.sleep(61)


with DAG(
    dag_id="sleep",
    schedule_interval="0 0 * * *",
    catchup=False,
    default_args={
        "start_date": datetime(2020, 1, 1)
    },
) as dag:
    sensor = PythonOperator(task_id="sleep_py", python_callable=sleep)

Run the following:

docker-compose up -d postgres 
docker-compose run --rm webserver initdb
docker-compose up -d

Enable the DAG in webserver http://localhost:8080/ and wait for 1 minute to see the log.

{python_operator.py:114} INFO - Done. Returned value was: None
{taskinstance.py:1150} ERROR - (psycopg2.errors.IdleInTransactionSessionTimeout) terminating connection due to idle-in-transaction timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Search for BEGIN (implicit) in the log. The BEGIN (implicit) for rendered_task_instance_fields does not have a corresponding COMMIT


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

Sharing session with RTIF leads to idle-in-transaction timeout error when DAG serialization is enabled and task running duration exceeds the idle-in-transaction timeout setting of the database.
@kaxil kaxil self-requested a review July 25, 2020 17:44
@kaxil
Copy link
Member

kaxil commented Jul 25, 2020

Thanks @zikun

@22quinn 22quinn deleted the fix-idle-in-transaction-timeout branch July 26, 2020 07:00
kaxil pushed a commit that referenced this pull request Aug 12, 2020
#9993)

Sharing session with RTIF leads to idle-in-transaction timeout error when DAG serialization is enabled and task running duration exceeds the idle-in-transaction timeout setting of the database.

(cherry picked from commit ffcd060)
kaxil pushed a commit that referenced this pull request Aug 15, 2020
#9993)

Sharing session with RTIF leads to idle-in-transaction timeout error when DAG serialization is enabled and task running duration exceeds the idle-in-transaction timeout setting of the database.

(cherry picked from commit ffcd060)
kaxil pushed a commit to astronomer/airflow that referenced this pull request Sep 11, 2020
apache#9993)

Sharing session with RTIF leads to idle-in-transaction timeout error when DAG serialization is enabled and task running duration exceeds the idle-in-transaction timeout setting of the database.

(cherry picked from commit ffcd060)
(cherry picked from commit 21066c2)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
apache#9993)

Sharing session with RTIF leads to idle-in-transaction timeout error when DAG serialization is enabled and task running duration exceeds the idle-in-transaction timeout setting of the database.

(cherry picked from commit ffcd060)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants