Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"CI Build and Test" is failing for the latest envoy-dev image #625

Closed
renuka-fernando opened this issue Jan 21, 2023 · 9 comments
Closed

Comments

@renuka-fernando
Copy link
Contributor

Description

The "CI Build and Test" is failing in the main branch for the latest envoy-dev image.

The build passes when the image is changed to an older image (7 days old) envoyproxy/envoy-dev:2ce96952de8431564c9c548a6f0773e21c1884de.

Logs from failing CI

env XDS=xds build/integration.sh
Envoy log: envoy.xds.log
2023/01/21 04:27:01 upstream listening HTTP/1.1 on 18080
2023/01/21 04:27:01 access log server listening on 18090
2023/01/21 04:27:01 management server listening on 18000
2023/01/21 04:27:01 waiting for the first request...
2023/01/21 04:27:01 gateway listening HTTP/1.1 on 18001
2023/01/21 04:27:01 stream 1 open for type.googleapis.com/envoy.service.runtime.v3.Runtime
2023/01/21 04:27:01 initial snapshot {Xds:xds Version: UpstreamPort:18080 BasePort:9000 NumClusters:4 NumHTTPListeners:2 NumScopedHTTPListeners:2 NumVHDSHTTPListeners:0 NumTCPListeners:2 NumRuntimes:1 TLS:false NumExtension:1 currentPort:0}
2023/01/21 04:27:01 executing sequence updates=3 request=5
2023/01/21 04:27:01 update snapshot v0
2023/01/21 04:27:01 request batch 0, ok 0, failed 6, pass false
2023/01/21 04:27:01 stream 2 open for type.googleapis.com/envoy.config.cluster.v3.Cluster
2023/01/21 04:27:01 stream 3 open for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment
2023/01/21 04:27:01 stream 4 open for type.googleapis.com/envoy.config.listener.v3.Listener
2023/01/21 04:27:01 stream 5 open for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2023/01/21 04:27:01 stream 6 open for type.googleapis.com/envoy.config.route.v3.ScopedRouteConfiguration
2023/01/21 04:27:01 stream 7 open for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2023/01/21 04:27:01 stream 8 open for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2023/01/21 04:27:01 stream 9 open for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2023/01/21 04:27:02 request batch 1, ok 6, failed 0, pass true
2023/01/21 04:27:02 request batch 2, ok 6, failed 0, pass true
2023/01/21 04:27:03 request batch 3, ok 6, failed 0, pass true
2023/01/21 04:27:03 request batch 4, ok 6, failed 0, pass true
2023/01/21 04:27:04 [echo2023-01-21T04:27:02Z] 127.0.0.1:9003 / http 200 926250ad-92d2-4029-b76f-a2711cb3f372 cluster-v0-1
2023/01/21 04:27:04 [echo2023-01-21T04:27:02Z] 127.0.0.1:9001 / http 200 3adc54d7-9759-4f88-99fb-f9669e11b4e2 cluster-v0-1
2023/01/21 04:27:04 [echo2023-01-21T04:27:02Z] 127.0.0.1:9003 / http 200 df5141b7-1779-4e80-8c12-b4de1fdf4fc7 cluster-v0-1
2023/01/21 04:27:04 [echo2023-01-21T04:27:02Z] 127.0.0.1:9000 / http 200 06f21915-5f94-43f0-9390-2b53ae9b41bd cluster-v0-0
2023/01/21 04:27:04 [echo2023-01-21T04:27:02Z] 127.0.0.1:9002 / http 200 b295c65f-9ded-47aa-94b5-3465bf14772e cluster-v0-0
2023/01/21 04:27:04 [echo2023-01-21T04:27:02Z] 127.0.0.1:9000 / http 200 1ec78e6f-773e-48e9-bfb0-30e4e7ac51a0 cluster-v0-0
2023/01/21 04:27:04 [echo2023-01-21T04:27:02Z] 127.0.0.1:9002 / http 200 59f52eb2-5e6f-4987-bc13-0216da4ce5c8 cluster-v0-0
2023/01/21 04:27:04 [echo2023-01-21T04:27:02Z] 127.0.0.1:9001 / http 200 f82a9cba-124d-4bf8-834a-5ece05b20330 cluster-v0-1
2023/01/21 04:27:04 [echo2023-01-21T04:27:03Z] 127.0.0.1:9002 / http 200 31252d8a-9c1f-4993-981c-4da89407aee1 cluster-v0-0
2023/01/21 04:27:04 [echo2023-01-21T04:27:03Z] 127.0.0.1:9000 / http 200 0b049301-ca6d-4953-9442-1a407c344a97 cluster-v0-0
2023/01/21 04:27:04 [echo2023-01-21T04:27:03Z] 127.0.0.1:9003 / http 200 c083f19a-8bca-4d9a-ae20-a12b53b409f0 cluster-v0-1
2023/01/21 04:27:04 [echo2023-01-21T04:27:03Z] 127.0.0.1:9002 / http 200 8e256ccf-a977-4824-8485-b063ad8badda cluster-v0-0
2023/01/21 04:27:04 [echo2023-01-21T04:27:03Z] 127.0.0.1:9003 / http 200 e7f2735e-d2a7-4b1d-b586-bcc41d6687bf cluster-v0-1
2023/01/21 04:27:04 [echo2023-01-21T04:27:03Z] 127.0.0.1:9001 / http 200 7a28f321-5ddf-440c-b6e5-3066227530d2 cluster-v0-1
2023/01/21 04:27:04 [echo2023-01-21T04:27:03Z] 127.0.0.1:9000 / http 200 10cea829-7764-45e7-b912-e47b727a90f7 cluster-v0-0
2023/01/21 04:27:04 [echo2023-01-21T04:27:03Z] 127.0.0.1:9001 / http 200 42297c23-83f2-42de-874e-cb554e918297 cluster-v0-1
2023/01/21 04:27:04 server callbacks fetches=0 requests=22
2023/01/21 04:27:04 update snapshot v1
2023/01/21 04:27:04 stream 8 of node test-id closed
2023/01/21 04:27:04 stream 10 open for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2023/01/21 04:27:04 stream 11 open for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2023/01/21 04:27:04 stream 9 of node test-id closed
2023/01/21 04:27:04 request batch 0, ok 1, failed 5, pass false
2023/01/21 04:27:04 stream 12 open for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2023/01/21 04:27:04 stream 13 open for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2023/01/21 04:27:04 stream 3 of node test-id closed
2023/01/21 04:27:04 stream 6 of node test-id closed
2023/01/21 04:27:04 stream 13 of node test-id closed
2023/01/21 04:27:04 stream 11 of node test-id closed
2023/01/21 04:27:04 stream 4 of node test-id closed
2023/01/21 04:27:04 stream 10 of node test-id closed
2023/01/21 04:27:04 stream 12 of node test-id closed
2023/01/21 04:27:04 stream 2 of node test-id closed
2023/01/21 04:27:04 stream 5 of node test-id closed
2023/01/21 04:27:04 stream 7 of node test-id closed
2023/01/21 04:27:04 stream 1 of node test-id closed
2023/01/21 04:27:04 request batch 1, ok 0, failed 6, pass false
2023/01/21 04:27:05 request batch 2, ok 0, failed 6, pass false
2023/01/21 04:27:05 request batch 3, ok 0, failed 6, pass false
2023/01/21 04:27:06 request batch 4, ok 0, failed 6, pass false
2023/01/21 04:27:06 server callbacks fetches=0 requests=43
2023/01/21 04:27:06 failed all requests in a run 1
build/integration.sh: line 46:  7363 Segmentation fault      (core dumped) ( ${ENVOY} -c sample/bootstrap-${XDS}.yaml --drain-time-s 1 -l debug 2> ${ENVOY_LOG} )
build/integration.sh: line 40: kill: (7363) - No such process
make: *** [Makefile:68: integration.xds] Error 1
make: *** [Makefile:95: docker_tests] Error 2
Error: Process completed with exit code 2.
@sunjayBhatia
Copy link
Member

hm, looks like the recent change to the CI image Dockerfile didn't change the fact that we were using envoy-dev:latest, but this is unfortunate regardless, we'll likely have to do some digging on the envoy-dev images to see what change is causing this (and maybe in the meantime use a known-good image)

@sunjayBhatia
Copy link
Member

first bad image: sha256:50e612ddc515be353a90b3d29a0aff3b259b84c7dad62769261eec6e78917894

go-control-plane commit: 6696b59

CI build link

mirror commit of Envoy @ 40fb636fb3ba7d502625614ed613d4e97e140b3e: envoyproxy/envoy@40fb636

@sunjayBhatia
Copy link
Member

previous good commit: 335df8c

Mirror commit of Envoy @ fb48a7d2d41e6237640d73d5ec39d103feb8e73e: envoyproxy/envoy@fb48a7d

@sunjayBhatia
Copy link
Member

comparison of the commits: envoyproxy/envoy@fb48a7d...40fb636

@sunjayBhatia
Copy link
Member

Looks like 97a7f000c75251d99f521be42ccfdacbb063eab1 in Envoy is the culprit envoyproxy/envoy@97a7f00

@nezdolik
Copy link
Member

Hi @sunjayBhatia are you getting segfault in go control plane or envoy?

@nezdolik
Copy link
Member

Reproduced the issue while running exactly same test with go control plane, I can confirm it's envoy segfaulting:

[2023-01-25 14:14:55.362][13080][debug][config] [source/common/config/grpc_mux_impl.cc:210] Resuming discovery requests for type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment
[2023-01-25 14:14:55.362][13080][critical][backtrace] [./source/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0x556467264040
[2023-01-25 14:14:55.363][13080][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2023-01-25 14:14:55.363][13080][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: 03192bfea4e75c1ae887b57061a75bde9f45bb9c/1.26.0-dev/Clean/RELEASE/BoringSSL

We will have to revert, meanwhile i will be investigating and writing corresponding integration test for Envoy. We have not yet bumped into this issue in our setup with java control plane.
cc @ggreenway @mattklein123 @adisuissa

@sunjayBhatia
Copy link
Member

hm my usual workflow for decoding the stack trace isn't working so hot, but I guess since you're working on it @nezdolik I will defer to you!

ggreenway added a commit to ggreenway/envoy that referenced this issue Jan 25, 2023
This reverts commit 97a7f00.

There was a crash reported by go-control-plane attributed to this change:
envoyproxy/go-control-plane#625

Signed-off-by: Greg Greenway <[email protected]>
ggreenway added a commit to envoyproxy/envoy that referenced this issue Jan 26, 2023
This reverts commit 97a7f00.

There was a crash reported by go-control-plane attributed to this change:
envoyproxy/go-control-plane#625

Signed-off-by: Greg Greenway <[email protected]>
@renuka-fernando
Copy link
Contributor Author

Hi all,

The build is passing now. Thank you all for prompt response. I am closing this issue.

VishalDamgude pushed a commit to freshworks/envoy that referenced this issue Feb 2, 2023
…y#25157)

This reverts commit 97a7f00.

There was a crash reported by go-control-plane attributed to this change:
envoyproxy/go-control-plane#625

Signed-off-by: Greg Greenway <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants