-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix 422 invalid value error caused by long k8s pod name #13299
Conversation
@@ -367,24 +367,6 @@ def _annotations_to_key(self, annotations: Dict[str, str]) -> Optional[TaskInsta | |||
|
|||
return TaskInstanceKey(dag_id, task_id, execution_date, try_number) | |||
|
|||
@staticmethod | |||
def _make_safe_pod_id(safe_dag_id: str, safe_task_id: str, safe_uuid: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing since this method is not being used anywhere.
ea91ff0
to
498382e
Compare
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
0359279
to
1e54988
Compare
Is the k8s image test supposed to be flaky? I am seeing failures from it in my other unrelated PRs as well as master. |
Not THAT flaky I think . This looks like legit problem |
Byt YEAH.. looks like master has the same problems :( ============ 32 failed, 23 passed, 2 warnings in 474.24s (0:07:54) ============= Something to fix in master then :( |
Hopefully this one will fix it : #13316 |
Should be fixed with latest fix @houqp :). Can you please rebase and check? |
Nice, this probably closes #13189 |
Main issue has been solved, but I think the two failing errors need to be fixed in this PR @houqp :( |
The Workflow run is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason. |
4a18216
to
ec156bd
Compare
@potiuk looks like the kind cluster is not picking up the code change in my branch. I am able to reproduce this locally with breeze and the task pod names are created with For any executor code change, what other changes do I need to make in order to get the kind cluster pick up my code? |
This should work out-of-the-box @houqp. There was a recent change though where the production images are built from packages rather than directly from sources. But the packages are locally prepared using the PR sources, so it should - in-principle - work fine. But I will take a look. Kind request: rhis change #13323 should vastly help in being able to analyse it much faster. It introduces grouping of the logs so that it will be much easier to analyse any problems. If you can take a look and we merge this one and you rebase your change on top, that would be much easier to analyse it :) |
@ashb double checked the watcher code, it's using schedule job id label as the filter: |
@dimberman would appreciate your review on this as well :) |
Jumping in on this since I've been looking into the long pod name problem as well 👋 Will |
@grepthat yes, it will still work because it only uses |
@houqp In airflow/airflow/executors/kubernetes_executor.py Lines 628 to 631 in 1ec6312
The airflow/airflow/executors/kubernetes_executor.py Lines 605 to 607 in 1ec6312
Will this not yield problems? |
@grepthat i see what you meant. yes, that pod_ids dict construction code will need to be changed to use label safe values as well. |
K8S pod names follows DNS_SUBDOMAIN naming convention, which can be broken down into one or more DNS_LABEL separated by `.`. While the max length of pod name (DNS_SUBDOMAIN) is 253, each label component (DNS_LABEL) of a the name cannot be longer than 63. Pod names generated by k8s executor right now only contains one label, which means the total effective name length cannot be greater than 63. This patch concats uuid to pod_id using `.` to generate the pod anem, thus extending the max name length to 63 + len(uuid). Reference: https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/design/identifiers.md Relevant discussion: kubernetes/kubernetes#79351 (comment)
The Workflow run is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*. |
@grepthat @ashb @kaxil @dimberman @brandondtb pushed a commit to add more label sanitizing, ready for another round of review. |
Ping @dimberman |
@dimberman Can you take a look at this one |
@grepthat Can you also please take a look too :) thanks |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
@kaxil @houqp Looks good 👍 I checked this on a test DAG with a long task name (via nested Task Groups). Attached is the DAG as a reference: process_long_taskname.pyfrom airflow import DAG
from datetime import timedelta, datetime
from airflow.operators.bash_operator import BashOperator
from airflow.utils.task_group import TaskGroup
dag = DAG(
'process_long_task',
default_args= {
'owner': 'airflow',
'depends_on_past': False,
'retries' : 0,
'start_date': datetime(1970, 1, 1),
'retry_delay': timedelta(seconds=30),
},
description='',
schedule_interval=None,
catchup=False,
)
TG_survey00000 = TaskGroup(
"TG_survey00000",
tooltip="",
dag=dag
)
TG_incremental_adjustment_survey00000_f608c63d9b = TaskGroup(
"TG_incremental_adjustment_survey00000_f608c63d9b",
tooltip="",
parent_group=TG_survey00000,
dag=dag
)
TG_msac_10_survey00000_1c1a34cf10 = TaskGroup(
"TG_msac_10_survey00000_1c1a34cf10",
tooltip="",
parent_group=TG_incremental_adjustment_survey00000_f608c63d9b,
dag=dag
)
TG_adjuster_786931747d = TaskGroup(
"TG_bundle_adjuster_786931747d",
tooltip="",
parent_group=TG_msac_10_survey00000_1c1a34cf10,
dag=dag
)
TG_color_0_521cd0b3f7 = TaskGroup(
"TG_color_0_521cd0b3f7",
tooltip="",
parent_group=TG_adjuster_786931747d,
dag=dag
)
T_finalize_5b57782bb2 = BashOperator(
task_id='T_finalize_5b57782bb2',
bash_command='echo "executing nested task && sleep 10"',
dag=dag,
task_group=TG_color_0_521cd0b3f7
) |
K8S pod names follows DNS_SUBDOMAIN naming convention, which can be broken down into one or more DNS_LABEL separated by `.`. While the max length of pod name (DNS_SUBDOMAIN) is 253, each label component (DNS_LABEL) of a the name cannot be longer than 63. Pod names generated by k8s executor right now only contains one label, which means the total effective name length cannot be greater than 63. This patch concats uuid to pod_id using `.` to generate the pod anem, thus extending the max name length to 63 + len(uuid). Reference: https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/design/identifiers.md Relevant discussion: kubernetes/kubernetes#79351 (comment) (cherry picked from commit 862443f)
K8S pod names follows DNS_SUBDOMAIN naming convention, which can be broken down into one or more DNS_LABEL separated by `.`. While the max length of pod name (DNS_SUBDOMAIN) is 253, each label component (DNS_LABEL) of a the name cannot be longer than 63. Pod names generated by k8s executor right now only contains one label, which means the total effective name length cannot be greater than 63. This patch concats uuid to pod_id using `.` to generate the pod anem, thus extending the max name length to 63 + len(uuid). Reference: https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/design/identifiers.md Relevant discussion: kubernetes/kubernetes#79351 (comment) (cherry picked from commit 862443f)
K8S pod names follows DNS_SUBDOMAIN naming convention, which can be broken down into one or more DNS_LABEL separated by `.`. While the max length of pod name (DNS_SUBDOMAIN) is 253, each label component (DNS_LABEL) of a the name cannot be longer than 63. Pod names generated by k8s executor right now only contains one label, which means the total effective name length cannot be greater than 63. This patch concats uuid to pod_id using `.` to generate the pod anem, thus extending the max name length to 63 + len(uuid). Reference: https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/design/identifiers.md Relevant discussion: kubernetes/kubernetes#79351 (comment) (cherry picked from commit 862443f)
K8S pod name follows DNS_SUBDOMAIN naming convention, which can be
broken down into one or more DNS_LABELs separated by
.
.While the max length of pod name (DNS_SUBDOMAIN) is 253, each label
component (DNS_LABEL) of a name cannot be longer than 63. Pod names
generated by k8s executor right now only contains one label, which means
the total effective name length cannot be greater than 63.
This patch concats uuid to pod_id using
.
to generate the pod anem,thus extending the max name length to 63 + len(uuid).
Reference: https://github.com/kubernetes/kubernetes/blob/release-1.1/docs/design/identifiers.md
Relevant discussion: kubernetes/kubernetes#79351 (comment)