Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[frontend] Error on Kubeflow Pipelines Dashboard: 'No Healthy Upstream', '503 UC upstream_reset_before_response_started{connection_termination}' #11260

Closed
ChaeSeungJi opened this issue Oct 1, 2024 · 4 comments

Comments

@ChaeSeungJi
Copy link

Environment

How did you deploy Kubeflow Pipelines (KFP)?

KFP version: 1.19.1

Kubernetes version
image

Steps to reproduce

  • Deployed Kubeflow Pipelines (KFP) using Kubeflow 1.9.1 Release Candidate 1.
  • The installation process completed successfully without any errors.
  • After the installation, I opened the Kubeflow dashboard.
  • When accessing the Kubeflow Pipelines dashboard, an error occurred.
    Other components like Notebook, Volume, etc., are working fine, but the issue is specific to the KFP dashboard.

Expected result

  • ml-pipeline-ui error : Error: failed to retrieve list of pipelines. Click Details for more information.

image

  • Details : An error occurred upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 111
    image

  • When refreshing the page (F5), the following screen appears: no healthy upstream

image

Materials and Reference

  • ml-pipeline logs
I1001 10:35:43.871571       7 interceptor.go:37] /api.PipelineService/ListPipelinesV1 handler finished
I1001 10:35:43.962205       7 interceptor.go:29] /api.RunService/ListRunsV1 handler starting
I1001 10:35:43.962262       7 resource_manager.go:1690] Getting user identity
I1001 10:35:43.962283       7 resource_manager.go:1711] User: [email protected], ResourceAttributes: &ResourceAttributes{Namespace:kubeflow-user-example-com,Verb:list,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:runs,Subresource:,Name:,}1beta1,Resource:runs,Subresource:,Name:,}
I1001 10:35:43.962310       7 resource_manager.go:1712] Authorizing request
I1001 10:35:43.969834       7 resource_manager.go:1755] Authorized user '[email protected]': &ResourceAttributes{Namespace:kubeflow-user-example-com,Verb:list,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:runs,Subresource:,Name:,}esource:runs,Subresource:,Name:,}
I1001 10:35:43.973333       7 interceptor.go:37] /api.RunService/ListRunsV1 handler finished
I1001 10:36:08.324576       7 interceptor.go:29] /kubeflow.pipelines.backend.api.v2beta1.PipelineService/ListPipelines handler starting
I1001 10:36:08.327806       7 interceptor.go:37] /kubeflow.pipelines.backend.api.v2beta1.PipelineService/ListPipelines handler finished
I1001 10:36:08.360273       7 interceptor.go:29] /kubeflow.pipelines.backend.api.v2beta1.PipelineService/ListPipelines handler starting
I1001 10:36:08.360340       7 resource_manager.go:1690] Getting user identity
I1001 10:36:08.360361       7 resource_manager.go:1711] User: [email protected], ResourceAttributes: &ResourceAttributes{Namespace:kubeflow-user-example-com,Verb:list,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:pipelines,Subresource:,Name:,}1beta1,Resource:pipelines,Subresource:,Name:,}
I1001 10:36:08.360382       7 resource_manager.go:1712] Authorizing request
I1001 10:36:08.363527       7 resource_manager.go:1755] Authorized user '[email protected]': &ResourceAttributes{Namespace:kubeflow-user-example-com,Verb:list,Group:pipelines.kubeflow.org,Version:v1beta1,Resource:pipelines,Subresource:,Name:,}esource:pipelines,Subresource:,Name:,}
I1001 10:36:08.366665       7 interceptor.go:37] /kubeflow.pipelines.backend.api.v2beta1.PipelineService/ListPipelines handler finished
  • ml-pipeline(istio-proxy) logs
[2024-10-01T10:37:34.157Z] "GET /apis/v1beta1/healthz HTTP/1.1" 200 - via_upstream - "-" 0 96 0 0 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)" "2bbf19f2-49c9-4360-82b7-7380296b07e4" "10.106.183.67:8888" "192.168.52.177:8888" inbound|8888|| 127.0.0.6:49739 192.168.52.177:8888 192.168.217.73:37648 - default
[2024-10-01T10:35:43.859Z] "- - -" 0 - - - "-" 2700 4440 114505 - "-" "-" "-" "-" "10.3.129.237:6443" outbound|443||kubernetes.default.svc.cluster.local 192.168.52.177:59564 10.96.0.1:443 192.168.52.177:59130 - -
[2024-10-01T10:37:39.154Z] "GET /apis/v1beta1/healthz HTTP/1.1" 200 - via_upstream - "-" 0 96 0 0 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)" "40d3f28e-f60f-415d-a83b-0b483418db00" "10.106.183.67:8888" "192.168.52.177:8888" inbound|8888|| 127.0.0.6:41605 192.168.52.177:8888 192.168.217.73:47786 - default
[2024-10-01T10:37:39.154Z] "GET /apis/v1beta1/healthz HTTP/1.1" 200 - via_upstream - "-" 0 96 0 0 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)" "c779661f-49c9-46b3-8f37-d8d277873152" "10.106.183.67:8888" "192.168.52.177:8888" inbound|8888|| 127.0.0.6:49739 192.168.52.177:8888 192.168.217.73:37648 - default
[2024-10-01T10:35:43.864Z] "- - -" 0 - - - "-" 5475 37098 120008 - "-" "-" "-" "-" "192.168.52.178:3306" outbound|3306||mysql.kubeflow.svc.cluster.local 192.168.52.177:33288 10.96.191.178:3306 192.168.52.177:39610 - -
  • ml-pipeline-ui logs
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /apis/v1beta1/healthz
GET /pipeline/
GET /pipeline/static/js/main.b980985e.js
GET /pipeline/static/css/main.e10b3034.css
GET /pipeline/system/project-id
GET /pipeline/apis/v1beta1/healthz
GET /pipeline/apis/v2beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter=
Proxied request:  /apis/v2beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter=
GET /pipeline/system/cluster-name
GET /pipeline/apis/v2beta1/pipelines?namespace=kubeflow-user-example-com&page_token=&page_size=10&sort_by=created_at%20desc&filter=
Proxied request:  /apis/v2beta1/pipelines?namespace=kubeflow-user-example-com&page_token=&page_size=10&sort_by=created_at%20desc&filter=

/server/node_modules/node-fetch/lib/index.js:1491
                        reject(new FetchError(`request to ${request.url} failed, reason: ${err.message}`, 'system', err));
                               ^
FetchError: request to http://metadata/computeMetadata/v1/project/project-id failed, reason: getaddrinfo ENOTFOUND metadata
    at ClientRequest.<anonymous> (/server/node_modules/node-fetch/lib/index.js:1491:11)
    at ClientRequest.emit (node:events:517:28)
    at Socket.socketErrorListener (node:_http_client:501:9)
    at Socket.emit (node:events:517:28)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  type: 'system',
  errno: 'ENOTFOUND',
  code: 'ENOTFOUND'
}

Node.js v18.18.2
  • ml-pipeline-ui(istio-proxy) logs
[2024-10-01T10:36:08.021Z] "GET /pipeline/static/css/main.e10b3034.css HTTP/1.1" 200 - via_upstream - "-" 0 14493 5 4 "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "58461348-0b7c-4a85-8fcf-72e0e2c78072" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| 127.0.0.6:44173 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.015Z] "GET /pipeline/static/js/main.b980985e.js HTTP/1.1" 200 - via_upstream - "-" 0 4120994 26 8 "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "85419e0a-4f50-96c3-b4e9-8060a636d949" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| 127.0.0.6:44035 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.327Z] "GET /apis/v1beta1/healthz HTTP/1.1" 200 - via_upstream - "-" 0 96 2 2 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)" "6cbd9bfd-af63-478d-8ddb-b252884570b1" "10.106.183.67:8888" "192.168.52.177:8888" outbound|8888||ml-pipeline.kubeflow.svc.cluster.local 192.168.217.73:37664 10.106.183.67:8888 192.168.217.73:55642 - default
[2024-10-01T10:36:08.312Z] "GET /pipeline/apis/v1beta1/healthz HTTP/1.1" 200 - via_upstream - "-" 0 291 17 17 "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "95641d97-c788-4b82-80ff-dec317fb7d8a" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| 127.0.0.6:44035 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.327Z] "GET /pipeline/apis/v2beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter= HTTP/1.1" 200 - via_upstream - "-" 0 883 6 6 "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "fa1cc37a-6504-4044-be57-f4fc7283abba" "10.106.183.67:8888" "192.168.52.177:8888" outbound|8888||ml-pipeline.kubeflow.svc.cluster.local 192.168.217.73:47786 10.106.183.67:8888 192.168.64.246:0 - default
[2024-10-01T10:36:08.313Z] "GET /pipeline/apis/v2beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter= HTTP/1.1" 200 - via_upstream - "-" 0 883 21 20 "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "fa1cc37a-6504-4044-be57-f4fc7283abba" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| 127.0.0.6:44173 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.363Z] "GET /pipeline/apis/v2beta1/pipelines?namespace=kubeflow-user-example-com&page_token=&page_size=10&sort_by=created_at%20desc&filter= HTTP/1.1" 200 - via_upstream - "-" 0 2 9 9 "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "1469974e-9638-4c8c-b485-a25948020125" "10.106.183.67:8888" "192.168.52.177:8888" outbound|8888||ml-pipeline.kubeflow.svc.cluster.local 192.168.217.73:47786 10.106.183.67:8888 192.168.64.246:0 - default
[2024-10-01T10:36:08.312Z] "GET /pipeline/system/project-id HTTP/1.1" 503 UC upstream_reset_before_response_started{connection_termination} - "-" 0 95 69 - "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "5215f428-d320-4cf6-b07f-5156818e6cd6" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| 127.0.0.6:45291 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.313Z] "GET /pipeline/system/cluster-name HTTP/1.1" 503 UC upstream_reset_before_response_started{connection_termination} - "-" 0 95 68 - "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "551d34d9-3a65-456a-92a2-18331db6e9c2" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| 127.0.0.6:49221 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.360Z] "GET /pipeline/apis/v2beta1/pipelines?namespace=kubeflow-user-example-com&page_token=&page_size=10&sort_by=created_at%20desc&filter= HTTP/1.1" 503 UC upstream_reset_before_response_started{connection_termination} - "-" 0 95 21 - "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "1469974e-9638-4c8c-b485-a25948020125" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| 127.0.0.6:44035 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.384Z] "GET /pipeline/system/project-id HTTP/1.1" 503 UF upstream_reset_before_response_started{remote_connection_failure,delayed_connect_error:_111} - "delayed_connect_error:_111" 0 152 0 - "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "5215f428-d320-4cf6-b07f-5156818e6cd6" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| - 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.390Z] "GET /pipeline/system/project-id HTTP/1.1" 503 UF upstream_reset_before_response_started{remote_connection_failure,delayed_connect_error:_111} - "delayed_connect_error:_111" 0 152 0 - "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "5215f428-d320-4cf6-b07f-5156818e6cd6" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| - 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.395Z] "GET /pipeline/system/cluster-name HTTP/1.1" 503 UF upstream_reset_before_response_started{remote_connection_failure,delayed_connect_error:_111} - "delayed_connect_error:_111" 0 152 0 - "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "551d34d9-3a65-456a-92a2-18331db6e9c2" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| - 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.402Z] "GET /pipeline/apis/v2beta1/pipelines?namespace=kubeflow-user-example-com&page_token=&page_size=10&sort_by=created_at%20desc&filter= HTTP/1.1" 503 UF upstream_reset_before_response_started{remote_connection_failure,delayed_connect_error:_111} - "delayed_connect_error:_111" 0 152 0 - "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "1469974e-9638-4c8c-b485-a25948020125" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| - 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.433Z] "GET /pipeline/system/cluster-name HTTP/1.1" 503 UF upstream_reset_before_response_started{remote_connection_failure,delayed_connect_error:_111} - "delayed_connect_error:_111" 0 152 0 - "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "551d34d9-3a65-456a-92a2-18331db6e9c2" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| - 192.168.217.73:3000 192.168.64.246:0 - default
[2024-10-01T10:36:08.441Z] "GET /pipeline/apis/v2beta1/pipelines?namespace=kubeflow-user-example-com&page_token=&page_size=10&sort_by=created_at%20desc&filter= HTTP/1.1" 503 UF upstream_reset_before_response_started{remote_connection_failure,delayed_connect_error:_111} - "delayed_connect_error:_111" 0 152 0 - "192.168.64.246" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" "1469974e-9638-4c8c-b485-a25948020125" "inl.test:32692" "192.168.217.73:3000" inbound|3000|| - 192.168.217.73:3000 192.168.64.246:0 - default

What I've Tried

kubeflow/kubeflow#5561
#4469
kubeflow/kubeflow#5271

Impacted by this bug? Give it a 👍.

@rimolive
Copy link
Member

rimolive commented Oct 3, 2024

@ChaeSeungJi We have just released https://github.com/kubeflow/manifests/releases/tag/v1.9.1-rc.2. Can you take a look and see if this release fixes your issue?

@utsumi-fj
Copy link

I have the same error in v1.9.1-rc.2.

Environment:

  • k8s: 1.30.5
  • kustomize: v5.2.1
  • installation command: kustomize build example | kubectl apply -f -

After running a pipeline from Jupyter Notebook and clicking the link Experiment details. as follows,

pipeline_error1

the following error occurs.

pipeline_error2

And, when clicking the menu Pipelines and so on at the sidebar, the following error occurs.

pipeline_error3

@utsumi-fj
Copy link

utsumi-fj commented Oct 22, 2024

Maybe, this issue is same as #11247. In my environment, #11321 resolved this issue.
For trying this PR, I edited deployment ml-pipeline-ui as follows, and this error disappeared.

kubectl edit deployment ml-pipeline-ui -n kubeflow
# Add environment variable DISABLE_GKE_METADATA and set to true.
#    spec:
#      containers:
#      - env:
#        - name: DISABLE_GKE_METADATA
#          value: "true"

@ChaeSeungJi
Copy link
Author

@utsumi-fj thank you!!! it works good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants