Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Waved panicked #2185

Closed
mwysokin opened this issue Nov 6, 2023 · 15 comments
Closed

Waved panicked #2185

mwysokin opened this issue Nov 6, 2023 · 15 comments
Assignees
Labels
bug Bug in code server Related to server

Comments

@mwysokin
Copy link

mwysokin commented Nov 6, 2023

Wave SDK Version, OS

0.26.2, Kubernetes (Managed Cloud)

Actual behavior

A wave app crashed but for some reason the container stayed up. This caused outage for at least one customer.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x77b60a]

goroutine 9288 [running]:
github.com/h2oai/wave.(*App).send(0xc0001aa000, {0x0, 0x0}, 0xc0001781e0, {0xb07f40, 0x2a, 0x2a})
	/home/runner/work/wave/wave/app.go:112 +0x54a
github.com/h2oai/wave.(*App).forward(0xc0001aa000, {0x0?, 0x0?}, 0x0?, {0xb07f40?, 0xc0004e0050?, 0x8ef900?})
	/home/runner/work/wave/wave/app.go:89 +0x2f
github.com/h2oai/wave.(*Broker).resetClients.func1(0xc0004e8390?)
	/home/runner/work/wave/wave/broker.go:214 +0x36
created by github.com/h2oai/wave.(*Broker).resetClients
	/home/runner/work/wave/wave/broker.go:213 +0x98

A very similar panic happened at least once before: #1949

@mwysokin mwysokin added the bug Bug in code label Nov 6, 2023
@mturoci mturoci added the server Related to server label Nov 7, 2023
@dulajra
Copy link
Contributor

dulajra commented Nov 7, 2023

@mturoci Can we know if the port (10101) is still open even though waved is crashed? Because in the MLOps wave app we ping the TCP port as the health check of the container. If the port is still open then the container will still be detected as healthy.

cc: @ShehanIshanka

@mturoci
Copy link
Collaborator

mturoci commented Nov 8, 2023

Can we know if the port (10101) is still open even though waved is crashed

You can check, but I would be surprised if that was the case.

Why would your app crash connecting to waved, but healthcheck would pass?

@dulajra
Copy link
Contributor

dulajra commented Nov 8, 2023

You can check, but I would be surprised if that was the case.

Are there any steps to reproduce it locally or on a dev environment?

@mturoci mturoci self-assigned this Nov 9, 2023
@mturoci
Copy link
Collaborator

mturoci commented Dec 7, 2023

Closing due to not being able to repro, seems like a Keycloak misconfiguration.

The place where panic happens is caused by token being nil which is something that should never happen according to docs, making me believe the root cause is auth provider misconfiguration of some sort.

Feel free to reopen in case you manage to repro.

@gabrielstar
Copy link

It also happened on our dev instances: https://h2oai.slack.com/archives/C068QB11XV4/p1702298998164059

@dulajra
Copy link
Contributor

dulajra commented Jan 9, 2024

Now it's happening on cloud-qa too https://h2oai.slack.com/archives/G01C9KKQLAC/p1704455231835909

@codyharris-h2o-ai
Copy link

Seeing this in 23.10.0 testing as well

@mturoci
Copy link
Collaborator

mturoci commented Jan 15, 2024

@codyharris-h2o-ai what app?
@dulajra the link is dead

@mwysokin
Copy link
Author

mwysokin commented Jan 15, 2024

Just FYI The debug version of wave has been deployed both in MC and in cloud-qa.

@mwysokin
Copy link
Author

@codyharris-h2o-ai If you see it during release testing maybe you could use this image instead: "gcr.io/vorvan/h2oai/mlops-wave-app-standalone:0.62.1-resourcefix-debugpanic" Just for debug purposes. It shouldn't be released as part of the release. @mturoci kindle implemented some additional logic to help with debugging.

@codyharris-h2o-ai
Copy link

@mturoci the mlops wave ui

I'm not sure how often we're running into this

@codyharris-h2o-ai
Copy link

codyharris-h2o-ai commented Jan 30, 2024

Another customer is seeing this in their production environment for their first party app

image

@dulajra
Copy link
Contributor

dulajra commented Feb 22, 2024

Another occurrence of this on internal.dedicated https://h2oai.slack.com/archives/C8MA5HGUU/p1708600075172279

wave-app.log

@codyharris-h2o-ai
Copy link

@dulajra, which version of Wave? We have seen positive results using 1.0.2

@mturoci
Copy link
Collaborator

mturoci commented Apr 16, 2024

Closed in #2246. Feel free to reopen if appears on the recent versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug in code server Related to server
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants