Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using TaskGroup without context manager (Graph view visual bug) #14864

Closed
wolfier opened this issue Mar 17, 2021 · 3 comments · Fixed by #23071
Closed

Using TaskGroup without context manager (Graph view visual bug) #14864

wolfier opened this issue Mar 17, 2021 · 3 comments · Fixed by #23071
Labels
affected_version:2.0 Issues Reported for 2.0 area:TaskGroup area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:bug This is a clearly a bug

Comments

@wolfier
Copy link
Contributor

wolfier commented Mar 17, 2021

Apache Airflow version: 2.0.0

Kubernetes version (if you are using kubernetes) (use kubectl version): n/a

What happened:
When I do not use the context manager for the task group and instead call the add function to add the tasks, those tasks show up on the Graph view.
Screen Shot 2021-03-17 at 2 06 17 PM

However, when I click on the task group item on the Graph UI, it will fix the issue. When I close the task group item, the tasks will not be displayed as expected.
Screen Shot 2021-03-17 at 2 06 21 PM

What you expected to happen:
I expected the tasks inside the task group to not display on the Graph view.
Screen Shot 2021-03-17 at 3 17 34 PM

How to reproduce it:
Render this DAG in Airflow

from airflow.models import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.dummy import DummyOperator
from airflow.utils.task_group import TaskGroup
from datetime import datetime
with DAG(dag_id="example_task_group", start_date=datetime(2021, 1, 1), tags=["example"], catchup=False) as dag:
    start = BashOperator(task_id="start", bash_command='echo 1; sleep 10; echo 2;')
    tg = TaskGroup("section_1", tooltip="Tasks for section_1")
    task_1 = DummyOperator(task_id="task_1")
    task_2 = BashOperator(task_id="task_2", bash_command='echo 1')
    task_3 = DummyOperator(task_id="task_3")
    tg.add(task_1)
    tg.add(task_2)
    tg.add(task_3)
@wolfier wolfier added the kind:bug This is a clearly a bug label Mar 17, 2021
@vikramkoka vikramkoka added area:UI Related to UI/UX. For Frontend Developers. affected_version:2.0 Issues Reported for 2.0 labels Mar 22, 2021
@ryanahamilton
Copy link
Contributor

I was able to reproduce this with the supplied sample DAG.

@yuqian90 any ideas on this? You probably know this graph view task group logic fairly well after your work on #14661.

@yuqian90
Copy link
Contributor

yuqian90 commented Mar 24, 2021

Hi @ryanahamilton and @wolfier , I can explain why this is happening, but I'll need some suggestions from you and others on how to improve this.
There are two ways of putting a task into a TaskGroup:

  1. Create the task inside the contextmanager of the TaskGroup;
  2. Pass the task_group as an argument to the constructor of the task.

The reproducing example is doing neither of these. Instead it's calling TaskGroup.add after the task has already been created. At the time of creating the task, there was no TaskGroup contextmanager nor was the task_group specified. So the tasks were already added to the root task_group of the dag, resulting in this inconsistent behaviour.

To fix this, you just need to use a contextmanager. Or do this:

from airflow.models import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.dummy import DummyOperator
from airflow.utils.task_group import TaskGroup
from datetime import datetime
with DAG(dag_id="example_task_group", start_date=datetime(2021, 1, 1), tags=["example"], catchup=False) as dag:
    start = BashOperator(task_id="start", bash_command='echo 1; sleep 10; echo 2;')
    tg = TaskGroup("section_1", tooltip="Tasks for section_1")
    task_1 = DummyOperator(task_id="task_1", task_group=tg)
    task_2 = BashOperator(task_id="task_2", bash_command='echo 1', task_group=tg)
    task_3 = DummyOperator(task_id="task_3", task_group=tg)

Some ideas for improvement, not sure what's best:

  1. Add a check somewhere to make sure a task is not added to more than one TaskGroup. Where should such a check be?
  2. Make TaskGroup.add a private method. The reason it currently looks like a public method is that it's called in places such as the constructor of BaseOperator. But strictly speaking, users of Airflow should not be encouraged to call TaskGroup.add directly because it only does what the name says under strict pre-conditions (i.e. during the construction of the task and not after).

@ryanahamilton ryanahamilton added the area:webserver Webserver related Issues label Apr 12, 2021
@eladkal
Copy link
Contributor

eladkal commented Feb 8, 2022

Add a check somewhere to make sure a task is not added to more than one TaskGroup. Where should such a check be?

We should definitely prevent adding task to more than 1 task group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.0 Issues Reported for 2.0 area:TaskGroup area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants