Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to create new native thread #207

Open
yuanbw opened this issue Dec 6, 2023 · 0 comments
Open

unable to create new native thread #207

yuanbw opened this issue Dec 6, 2023 · 0 comments

Comments

@yuanbw
Copy link

yuanbw commented Dec 6, 2023

使用kubefate模拟部署了两个在线服务集群,分别为
guest
kubectl get pods -n fate-serving-10005
NAME READY STATUS RESTARTS AGE
serving-admin-744f988bc-2mh2l 1/1 Running 0 16h
serving-proxy-59957b497d-vztml 1/1 Running 0 16h
serving-redis-7fbb959b6c-bxcqt 1/1 Running 0 16h
serving-server-65bccf659b-bqd6t 1/1 Running 0 16h
serving-zookeeper-0 1/1 Running 0 16h

host
kubectl get pods -n fate-serving-10006
NAME READY STATUS RESTARTS AGE
serving-admin-69975d8d54-qhf7t 1/1 Running 0 16h
serving-proxy-59bbb6b4fb-49xrw 1/1 Running 0 16h
serving-redis-6894c69dfc-b4cbf 1/1 Running 0 16h
serving-server-56dc9dd5b8-qpwkr 1/1 Running 0 16h
serving-zookeeper-0 1/1 Running 0 16h

功能性测试验证通过,但是当性能测试时,QPS为100-200,guest和host的serving-proxy均报“unable to create new native thread”:
guest:
2023-12-06 01:39:29,930 [ERROR] c.w.a.f.s.c.b.GrpcConnectionPool(GrpcConnectionPool.java:103) - grpc channel 10.73.99.153:30106 status is TRANSIENT_FAILURE 2023-12-06 01:39:39,930 [ERROR] c.w.a.f.s.c.b.GrpcConnectionPool(GrpcConnectionPool.java:103) - grpc channel 10.73.99.153:30106 status is TRANSIENT_FAILURE 2023-12-06 01:39:49,930 [ERROR] c.w.a.f.s.c.b.GrpcConnectionPool(GrpcConnectionPool.java:103) - grpc channel 10.73.99.153:30106 status is TRANSIENT_FAILURE 2023-12-06 01:39:58,790 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) ~[?:1.8.0_192] at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) ~[?:1.8.0_192] at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:336) ~[spring-context-5.3.20.jar:5.3.20] at io.grpc.internal.SerializingExecutor.schedule(SerializingExecutor.java:102) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.SerializingExecutor.execute(SerializingExecutor.java:95) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreatedInternal(ServerImpl.java:643) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreated(ServerImpl.java:465) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.onHeadersRead(NettyServerHandler.java:476) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.access$1000(NettyServerHandler.java:106) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler$FrameListener.onHeadersRead(NettyServerHandler.java:856) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:65) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$1.processFragment(DefaultHttp2FrameReader.java:450) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:457) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]

host:
2023-12-05 15:06:57,352 [INFO ] c.w.a.f.s.p.r.r.BaseServingRouter(BaseServingRouter.java:69) - caseid 1701788817352 get route info 10.42.5.237:8000 2023-12-05 15:06:57,354 [INFO ] c.w.a.f.s.p.u.FederatedModelUtils(FederatedModelUtils.java:59) - get model route key by version: 216 namespace: host#10006#guest-10005#host-10006#model tablename: 202312050912098047800, key : 202312050912098047800&host#10006#guest-10005#host-10006#model 2023-12-05 15:06:57,354 [INFO ] c.w.a.f.s.p.r.r.ZkServingRouter(ZkServingRouter.java:64) - try to find zk ,serving:ab548b8776d2bbb24dc3cfb3a901e255:unaryCall, result [grpc://10.42.5.237:8000/serving/ab548b8776d2bbb24dc3cf b3a901e255/unaryCall?router_mode=ALL_ALLOWED&timestamp=1701767649580&version=216] 2023-12-05 15:06:57,354 [INFO ] c.w.a.f.s.p.r.r.BaseServingRouter(BaseServingRouter.java:69) - caseid 1701788817353 get route info 10.42.5.237:8000 2023-12-05 18:28:33,102 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception: Unexpected HTTP/1.x request: GET / at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:108) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.readClientPrefaceString(Http2ConnectionHandler.java:302) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:239) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192] 2023-12-05 18:28:33,191 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception: Unexpected HTTP/1.x request: GET / at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.connectionError(Http2Exception.java:108) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.readClientPrefaceString(Http2ConnectionHandler.java:302) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:239) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192] 2023-12-05 18:30:03,152 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) ~[?:1.8.0_192] at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) ~[?:1.8.0_192] at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:336) ~[spring-context-5.3.20.jar:5.3.20] at io.grpc.internal.SerializingExecutor.schedule(SerializingExecutor.java:102) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.SerializingExecutor.execute(SerializingExecutor.java:95) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreatedInternal(ServerImpl.java:643) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreated(ServerImpl.java:465) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.onHeadersRead(NettyServerHandler.java:476) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.access$1000(NettyServerHandler.java:106) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler$FrameListener.onHeadersRead(NettyServerHandler.java:856) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:65) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$1.processFragment(DefaultHttp2FrameReader.java:450) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:457) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192] 2023-12-05 18:30:33,130 [INFO ] i.g.n.s.i.g.n.N.connections(NettyServerTransport.java:203) - Transport failed java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) ~[?:1.8.0_192] at java.lang.Thread.start(Thread.java:717) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) ~[?:1.8.0_192] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) ~[?:1.8.0_192] at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:336) ~[spring-context-5.3.20.jar:5.3.20] at io.grpc.internal.SerializingExecutor.schedule(SerializingExecutor.java:102) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.SerializingExecutor.execute(SerializingExecutor.java:95) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreatedInternal(ServerImpl.java:643) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.internal.ServerImpl$ServerTransportListenerImpl.streamCreated(ServerImpl.java:465) ~[grpc-core-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.onHeadersRead(NettyServerHandler.java:476) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.access$1000(NettyServerHandler.java:106) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler$FrameListener.onHeadersRead(NettyServerHandler.java:856) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onHeadersRead(DefaultHttp2ConnectionDecoder.java:409) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onHeadersRead(Http2InboundFrameLogger.java:65) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$1.processFragment(DefaultHttp2FrameReader.java:450) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:457) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:253) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:159) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:173) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[grpc-netty-shaded-1.45.1.jar:1.45.1] at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[grpc-netty-shaded-1.45.1.jar:1.45.1]

其中,两方的proxy的路由表为:
`[fate@yp-tgppc-ppc01 ~]$ kubectl get cm serving-proxy-config -n fate-serving-10005 -o yaml
apiVersion: v1
data:
application.properties: |
#
# Copyright 2019 The FATE Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# coordinator same as Party ID
coordinator=10005
server.port=8059
#inference.service.name=serving
#random, consistent
#routeType=random
#route.table=/data/projects/fate-serving/serving-proxy/conf/route_table.json
#auth.file=/data/projects/fate-serving/serving-proxy/conf/auth_config.json
# zk router
#useZkRouter=true
zk.url=serving-zookeeper:2181
useZkRouter=true
# zk acl
#acl.enable=false
#acl.username=
#acl.password=
# intra-partyid port
#proxy.grpc.intra.port=8879
# inter-partyid port
#proxy.grpc.inter.port=8869

# grpc
# only support PLAINTEXT, TLS(we use Mutual TLS here), if use TSL authentication
#proxy.grpc.inter.negotiationType=PLAINTEXT
# only needs to be set when negotiationType is TLS
#proxy.grpc.inter.CA.file=/data/projects/fate-serving/serving-proxy/conf/ssl/ca.crt
# negotiated client side certificates
#proxy.grpc.inter.client.certChain.file=/data/projects/fate-serving/serving-proxy/conf/ssl/client.crt
#proxy.grpc.inter.client.privateKey.file=/data/projects/fate-serving/serving-proxy/conf/ssl/client.pem
# negotiated server side certificates
#proxy.grpc.inter.server.certChain.file=/data/projects/fate-serving/serving-proxy/conf/ssl/server.crt
#proxy.grpc.inter.server.privateKey.file=/data/projects/fate-serving/serving-proxy/conf/ssl/server.pem

#proxy.grpc.inference.timeout=3000
#proxy.grpc.inference.async.timeout=1000
#proxy.grpc.unaryCall.timeout=3000
proxy.grpc.threadpool.coresize=5000
proxy.grpc.threadpool.maxsize=10000
proxy.grpc.threadpool.queuesize=1000
#proxy.async.timeout=5000
proxy.async.coresize=1000
proxy.async.maxsize=10000
#proxy.grpc.batch.inference.timeout=10000

route_table.json: |
{
"route_table": {
"default": {
"default": [
{
"ip": "serving-proxy",
"port": 8869
}
]
},
"10006": {
"default": [
{
"ip": "10.73.99.153",
"port": "30106"
}
]
},
"10005": {
"default": [
{
"ip": "serving-proxy",
"port": 8059
}
],
"serving": [
{
"ip": "serving-server",
"port": 8000
}
]
}
},
"permission": {
"default_allow": true
}
}
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: fate-serving-10005
meta.helm.sh/release-namespace: fate-serving-10005
creationTimestamp: "2023-12-05T09:10:02Z"
labels:
app.kubernetes.io/managed-by: Helm
cluster: fate-serving
fateMoudle: serving-proxy
name: fate-serving-9999
owner: kubefate
partyId: "10005"
name: serving-proxy-config
namespace: fate-serving-10005
resourceVersion: "116781084"
selfLink: /api/v1/namespaces/fate-serving-10005/configmaps/serving-proxy-config
uid: f9c36c83-c661-4148-8449-9a23a86ccf47
[fate@yp-tgppc-ppc01 ~]$ kubectl get cm serving-proxy-config -n fate-serving-10006 -o yaml
apiVersion: v1
data:
application.properties: |
#
# Copyright 2019 The FATE Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# coordinator same as Party ID
coordinator=10006
server.port=8059
#inference.service.name=serving
#random, consistent
#routeType=random
#route.table=/data/projects/fate-serving/serving-proxy/conf/route_table.json
#auth.file=/data/projects/fate-serving/serving-proxy/conf/auth_config.json
# zk router
#useZkRouter=true
zk.url=serving-zookeeper:2181
useZkRouter=true
# zk acl
#acl.enable=false
#acl.username=
#acl.password=
# intra-partyid port
#proxy.grpc.intra.port=8879
# inter-partyid port
#proxy.grpc.inter.port=8869

# grpc
# only support PLAINTEXT, TLS(we use Mutual TLS here), if use TSL authentication
#proxy.grpc.inter.negotiationType=PLAINTEXT
# only needs to be set when negotiationType is TLS
#proxy.grpc.inter.CA.file=/data/projects/fate-serving/serving-proxy/conf/ssl/ca.crt
# negotiated client side certificates
#proxy.grpc.inter.client.certChain.file=/data/projects/fate-serving/serving-proxy/conf/ssl/client.crt
#proxy.grpc.inter.client.privateKey.file=/data/projects/fate-serving/serving-proxy/conf/ssl/client.pem
# negotiated server side certificates
#proxy.grpc.inter.server.certChain.file=/data/projects/fate-serving/serving-proxy/conf/ssl/server.crt
#proxy.grpc.inter.server.privateKey.file=/data/projects/fate-serving/serving-proxy/conf/ssl/server.pem

#proxy.grpc.inference.timeout=3000
#proxy.grpc.inference.async.timeout=1000
#proxy.grpc.unaryCall.timeout=3000
proxy.grpc.threadpool.coresize=5000
proxy.grpc.threadpool.maxsize=10000
proxy.grpc.threadpool.queuesize=1000
#proxy.async.timeout=5000
proxy.async.coresize=1000
proxy.async.maxsize=10000
#proxy.grpc.batch.inference.timeout=10000

route_table.json: |
{
"route_table": {
"default": {
"default": [
{
"ip": "serving-proxy",
"port": 8869
}
]
},
"10005": {
"default": [
{
"ip": "10.73.99.153",
"port": "30096"
}
]
},
"10006": {
"default": [
{
"ip": "serving-proxy",
"port": 8059
}
],
"serving": [
{
"ip": "serving-server",
"port": 8000
}
]
}
},
"permission": {
"default_allow": true
}
}
kind: ConfigMap
metadata:
annotations:
meta.helm.sh/release-name: fate-serving-10006
meta.helm.sh/release-namespace: fate-serving-10006
creationTimestamp: "2023-12-05T09:02:25Z"
labels:
app.kubernetes.io/managed-by: Helm
cluster: fate-serving
fateMoudle: serving-proxy
name: fate-serving-9999
owner: kubefate
partyId: "10006"
name: serving-proxy-config
namespace: fate-serving-10006
resourceVersion: "116776658"
selfLink: /api/v1/namespaces/fate-serving-10006/configmaps/serving-proxy-config
uid: 6f9c1943-3fb0-4c10-9ec2-f629b3c82186`

注:
1.上述路由是kubefate安装Fate Serving 2.1.6后默认生成的。
2.K8S节点为48核/64G/597G。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant