Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VK pod error "Ping Failed with exit code: -1" #325

Closed
mginfn opened this issue Nov 4, 2024 · 2 comments
Closed

VK pod error "Ping Failed with exit code: -1" #325

mginfn opened this issue Nov 4, 2024 · 2 comments

Comments

@mginfn
Copy link

mginfn commented Nov 4, 2024

On Kubernetes side, VK Pod (interlink container) logs:

time="2024-11-04T12:30:55Z" level=info msg="Pinging: https://x.y.z.v:30443/pinglink"
time="2024-11-04T12:30:55Z" level=error msg="Ping Failed with exit code: -1"

On InterLink API Server side, oauth2-proxy logs:

2024/11/04 12:30:55 http: TLS handshake error from 172.20.0.1:56384: remote error: tls: bad certificate

As a result, VK Node never becomes available.

Environment

Helm chart versions:

  • interlink-0.3.25 (installed through OCI repository)
  • interlink-0.3.28 (installed through GitHub self-hosted repository)

Pod image versions:

  • ghcr.io/intertwin-eu/interlink/virtual-kubelet-inttw:latest
  • ghcr.io/intertwin-eu/interlink/virtual-kubelet-inttw-refresh:latest

API Server version:

  • 0.3.1-patch2

Logs, stacktrace, or other symptoms

values.yaml:

nodeName: ivk

interlink:
  address: https://x.y.z.v
  port: 30443

virtualNode:
  CPUs: 10
  MemGiB: 256
  Pods: 10
  HTTPProxies:
    HTTP: null
    HTTPs: null

OAUTH:
  TokenURL: https://github.com/login/oauth/access_token
  ClientID: abc
  ClientSecret: abc
  RefreshToken: abc
  GrantType: authorization_code
  Audience: 

interlink.log:

time="2024-11-04T12:30:45Z" level=info msg="Loading InterLink config from /root/.interlink/config/InterLinkConfig.yaml"
time="2024-11-04T12:30:45Z" level=info msg="{http://localhost 30080 http://172.20.0.3 30400 true true false ~/.interlink}"
time="2024-11-04T12:30:45Z" level=info msg="interLink version: 0.3.1-patch2"

ull oauth2-proxy.log:

[2024/11/04 12:30:45] [proxy.go:89] mapping path "/" => upstream "http://localhost:30080"
[2024/11/04 12:30:45] [oauthproxy.go:161] Skipping JWT tokens from configured OIDC issuer: ""
[2024/11/04 12:30:45] [oauthproxy.go:171] OAuthProxy configured for GitHub Client ID: abc
[2024/11/04 12:30:45] [oauthproxy.go:177] Cookie settings: name:_oauth2_proxy secure(https):true httponly:true expiry:168h0m0s domains: path:/ samesite: refresh:disabled
[2024/11/04 12:30:45] [oauthproxy.go:498] Skipping auth - Method: * | Path: '*'
2024/11/04 12:30:55 http: TLS handshake error from 172.20.0.1:56384: remote error: tls: bad certificate

Full pods logs:

time="2024-11-04T12:41:00Z" level=info msg=statusLoop
time="2024-11-04T12:41:02Z" level=debug msg="404 request not found" uri=/metrics/resource vars="map[]"
time="2024-11-04T12:41:05Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-11-04T12:41:05Z" level=info msg="statusLoop=end"
time="2024-11-04T12:41:05Z" level=info msg=statusLoop
time="2024-11-04T12:41:10Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-11-04T12:41:10Z" level=info msg="statusLoop=end"
time="2024-11-04T12:41:10Z" level=info msg=statusLoop
W1104 12:41:14.584429       1 reflector.go:539] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: failed to list *v1.Secret: field label not supported: spec.nodeName
E1104 12:41:14.584631       1 reflector.go:147] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229: Failed to watch *v1.Secret: failed to list *v1.Secret: field label not supported: spec.nodeName
time="2024-11-04T12:41:15Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-11-04T12:41:15Z" level=info msg="statusLoop=end"
time="2024-11-04T12:41:15Z" level=info msg=statusLoop
time="2024-11-04T12:41:17Z" level=debug msg="404 request not found" uri=/metrics/resource vars="map[]"
time="2024-11-04T12:41:18Z" level=debug msg="404 request not found" uri=/metrics/cadvisor vars="map[]"
time="2024-11-04T12:41:20Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-11-04T12:41:20Z" level=info msg="statusLoop=end"
time="2024-11-04T12:41:20Z" level=info msg=statusLoop
time="2024-11-04T12:41:25Z" level=info msg="No pods to monitor, waiting for the next loop to start"
time="2024-11-04T12:41:25Z" level=info msg="statusLoop=end"
time="2024-11-04T12:41:25Z" level=info msg=statusLoop
time="2024-11-04T12:41:26Z" level=info msg="InterlingURL: https://x.y.z.v"
time="2024-11-04T12:41:26Z" level=info msg="Pinging: https://x.y.z.v:30443/pinglink"
time="2024-11-04T12:41:26Z" level=error msg="Ping Failed with exit code: -1"
time="2024-11-04T12:41:26Z" level=info msg=endNodeLoop
time="2024-11-04T12:41:26Z" level=debug msg="Received node status update"
time="2024-11-04T12:41:26Z" level=debug msg="got node from api server"
time="2024-11-04T12:41:26Z" level=debug msg="Generated three way patch" error="<nil>" patch="{\"metadata\":{\"annotations\":{\"virtual-kubelet.io/last-applied-node-status\":\"{\\\"capacity\\\":{\\\"cpu\\\":\\\"10\\\",\\\"memory\\\":\\\"256Gi\\\",\\\"nvidia.com/gpu\\\":\\\"0\\\",\\\"pods\\\":\\\"10\\\"},\\\"allocatable\\\":{\\\"cpu\\\":\\\"10\\\",\\\"memory\\\":\\\"256Gi\\\",\\\"nvidia.com/gpu\\\":\\\"0\\\",\\\"pods\\\":\\\"10\\\"},\\\"conditions\\\":[{\\\"type\\\":\\\"Ready\\\",\\\"status\\\":\\\"False\\\",\\\"lastHeartbeatTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"lastTransitionTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"reason\\\":\\\"KubeletPending\\\",\\\"message\\\":\\\"kubelet is pending.\\\"},{\\\"type\\\":\\\"OutOfDisk\\\",\\\"status\\\":\\\"False\\\",\\\"lastHeartbeatTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"lastTransitionTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"reason\\\":\\\"KubeletHasSufficientDisk\\\",\\\"message\\\":\\\"kubelet has sufficient disk space available\\\"},{\\\"type\\\":\\\"MemoryPressure\\\",\\\"status\\\":\\\"False\\\",\\\"lastHeartbeatTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"lastTransitionTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"reason\\\":\\\"KubeletHasSufficientMemory\\\",\\\"message\\\":\\\"kubelet has sufficient memory available\\\"},{\\\"type\\\":\\\"DiskPressure\\\",\\\"status\\\":\\\"False\\\",\\\"lastHeartbeatTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"lastTransitionTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"reason\\\":\\\"KubeletHasNoDiskPressure\\\",\\\"message\\\":\\\"kubelet has no disk pressure\\\"},{\\\"type\\\":\\\"NetworkUnavailable\\\",\\\"status\\\":\\\"True\\\",\\\"lastHeartbeatTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"lastTransitionTime\\\":\\\"2024-11-04T12:41:26Z\\\",\\\"reason\\\":\\\"RouteCreated\\\",\\\"message\\\":\\\"RouteController created a route\\\"}],\\\"addresses\\\":[{\\\"type\\\":\\\"InternalIP\\\",\\\"address\\\":\\\"10.244.1.22\\\"}],\\\"daemonEndpoints\\\":{\\\"kubeletEndpoint\\\":{\\\"Port\\\":10250}},\\\"nodeInfo\\\":{\\\"machineID\\\":\\\"\\\",\\\"systemUUID\\\":\\\"\\\",\\\"bootID\\\":\\\"\\\",\\\"kernelVersion\\\":\\\"\\\",\\\"osImage\\\":\\\"\\\",\\\"containerRuntimeVersion\\\":\\\"\\\",\\\"kubeletVersion\\\":\\\"0.3.1-patch2\\\",\\\"kubeProxyVersion\\\":\\\"\\\",\\\"operatingSystem\\\":\\\"linux\\\",\\\"architecture\\\":\\\"virtual-kubelet\\\"}}\"},\"creationTimestamp\":null},\"status\":{\"$setElementOrder/conditions\":[{\"type\":\"Ready\"},{\"type\":\"OutOfDisk\"},{\"type\":\"MemoryPressure\"},{\"type\":\"DiskPressure\"},{\"type\":\"NetworkUnavailable\"}],\"conditions\":[{\"lastHeartbeatTime\":\"2024-11-04T12:41:26Z\",\"lastTransitionTime\":\"2024-11-04T12:41:26Z\",\"type\":\"Ready\"},{\"lastHeartbeatTime\":\"2024-11-04T12:41:26Z\",\"lastTransitionTime\":\"2024-11-04T12:41:26Z\",\"type\":\"OutOfDisk\"},{\"lastHeartbeatTime\":\"2024-11-04T12:41:26Z\",\"lastTransitionTime\":\"2024-11-04T12:41:26Z\",\"type\":\"MemoryPressure\"},{\"lastHeartbeatTime\":\"2024-11-04T12:41:26Z\",\"lastTransitionTime\":\"2024-11-04T12:41:26Z\",\"type\":\"DiskPressure\"},{\"lastHeartbeatTime\":\"2024-11-04T12:41:26Z\",\"lastTransitionTime\":\"2024-11-04T12:41:26Z\",\"type\":\"NetworkUnavailable\"}]}}"
time="2024-11-04T12:41:26Z" level=debug msg="updated node status in api server" node.Status.Conditions="[{Ready False 2024-11-04 12:41:26 +0000 UTC 2024-11-04 12:41:26 +0000 UTC KubeletPending kubelet is pending.} {OutOfDisk False 2024-11-04 12:41:26 +0000 UTC 2024-11-04 12:41:26 +0000 UTC KubeletHasSufficientDisk kubelet has sufficient disk space available} {MemoryPressure False 2024-11-04 12:41:26 +0000 UTC 2024-11-04 12:41:26 +0000 UTC KubeletHasSufficientMemory kubelet has sufficient memory available} {DiskPressure False 2024-11-04 12:41:26 +0000 UTC 2024-11-04 12:41:26 +0000 UTC KubeletHasNoDiskPressure kubelet has no disk pressure} {NetworkUnavailable True 2024-11-04 12:41:26 +0000 UTC 2024-11-04 12:41:26 +0000 UTC RouteCreated RouteController created a route} {PIDPressure Unknown 2024-11-04 12:18:40 +0000 UTC 2024-11-04 12:21:25 +0000 UTC NodeStatusNeverUpdated Kubelet never posted node status.}]" node.resourceVersion=11735
time="2024-11-04T12:41:30Z" level=info msg="No pods to monitor, waiting for the next loop to start"
@dciangot
Copy link
Collaborator

I think this is solved right @mginfn ? By using insecure HTTP from vk to oauth2 proxy, helm config here:
https://github.com/interTwin-eu/interlink-helm-chart/blob/main/interlink/values.yaml#L24

@mginfn
Copy link
Author

mginfn commented Nov 12, 2024

Yes @dciangot, by using insecure HTTP the issue is solved. Thanks

@mginfn mginfn closed this as completed Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants