Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG loading very slow in Graph view when using Dynamic Tasks #27483

Closed
2 tasks done
jose-workpath opened this issue Nov 3, 2022 · 8 comments · Fixed by #29791
Closed
2 tasks done

DAG loading very slow in Graph view when using Dynamic Tasks #27483

jose-workpath opened this issue Nov 3, 2022 · 8 comments · Fixed by #29791
Labels
affected_version:2.4 Issues Reported for 2.4 area:webserver Webserver related Issues good first issue kind:bug This is a clearly a bug

Comments

@jose-workpath
Copy link
Contributor

jose-workpath commented Nov 3, 2022

Apache Airflow version

2.4.2

What happened

The web UI is very slow when loading the Graph view on DAGs that have a large number of expansions in the mapped tasks.
The problem is very similar to the one described in #23786 (resolved), but for the Graph view instead of the grid view.

It takes around 2-3 minutes to load DAGs that have ~1k expansions, with the default Airflow settings the web server worker will timeout. One can configure web_server_worker_timeout to increase the timeout wait time.

What you think should happen instead

The Web UI takes a reasonable amount of time to load the Graph view after the dag run is finished.

How to reproduce

Same way as in #23786, you can create a mapped task that spans a large number of expansions then when you run it, the Graph view will take a very long amount of time to load and eventually time out.

You can use this code to generate multiple dags with 2^x expansions. After running the DAGs you should notice how slow it is when attempting to open the Graph view of the DAGs with the largest number of expansions.

from datetime import datetime
from airflow.models import DAG
from airflow.operators.empty import EmptyOperator
from airflow.operators.python import PythonOperator

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
}

initial_scale = 7
max_scale = 12
scaling_factor = 2

for scale in range(initial_scale, max_scale + 1):
    dag_id = f"dynamic_task_mapping_{scale}"
    with DAG(
        dag_id=dag_id,
        default_args=default_args,
        catchup=False,
        schedule_interval=None,
        start_date=datetime(1970, 1, 1),
        render_template_as_native_obj=True,
    ) as dag:
        start = EmptyOperator(task_id="start")

        mapped = PythonOperator.partial(
            task_id="mapped",
            python_callable=lambda m: print(m),
        ).expand(
            op_args=[[x] for x in list(range(2**scale))]
        )

        end = EmptyOperator(task_id="end")

        start >> mapped >> end
    globals()[dag_id] = dag

Operating System

MacOS Version 12.6 (Apple M1)

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==4.0.0
apache-airflow-providers-common-sql==1.2.0
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-sqlite==3.2.1

Deployment

Docker-Compose

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@jose-workpath jose-workpath added area:core kind:bug This is a clearly a bug labels Nov 3, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 3, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@uranusjr uranusjr added area:webserver Webserver related Issues and removed area:core labels Nov 3, 2022
@uranusjr
Copy link
Member

uranusjr commented Nov 3, 2022

Marking this as webserver instead of UI since #23813 seems to indicate this can be resolved by only backend changes. Pull requests are welcomed.

@jose-workpath
Copy link
Contributor Author

jose-workpath commented Nov 3, 2022

I can take a look at #23813 next week and see if I can solve this issue in a similar way.

@arley-wilches
Copy link

Hi there,

I am facing the same issue and I just have about 100 mapped tasks.
Is there anything I can do to help?
logs? screenshots?

@jose-workpath
Copy link
Contributor Author

jose-workpath commented Feb 27, 2023

Hey! Sorry, I deprioritised this issue because we changed the logic we were using on my organisation to avoid using many mapped tasks (which turned out to be a better design choice IMO tbh), but I am looking into it again.

I already confirmed this is still an issue in the latest version of Airflow, I will make a PR for it this week on my personal Github account.

@arley-wilches
Copy link

Hi Jose.

what do you mean by "we changed the logic to avoid using many mapped tasks"?
Does it mean we can not use several mapped tasks?

Thank you

@jose-workpath
Copy link
Contributor Author

jose-workpath commented Feb 27, 2023

@arley-wilches Ups! Sorry, Didn't explain myself well there, I meant in my organisation we changed the logic we were using in our DAGs, which was a decision that had nothing to do with this issue.

At the moment you can use as many mapped tasks as you want in Airflow (Until a certain configurable threshold), just it will take a lot of time (or even fail) to load the "Graph" View in the Web UI.

@arley-wilches
Copy link

@jose-workpath I get it.

but anyways will the solution come at some point?

Is there anything I can help with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.4 Issues Reported for 2.4 area:webserver Webserver related Issues good first issue kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants