Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemoveCachedBodyFilter does not release direct memory correctly resulting in a memory leak #2969

Closed
caixunshi opened this issue Jun 1, 2023 · 16 comments

Comments

@caixunshi
Copy link
Contributor

Describe the bug
This problem still exists in the latest 3.1.1 version,In RemoveCachedBodyFilter, the cached DataBuffer is removed and the release method is called. However, when the DataBuffer object is referenced multiple times, there may be a memory leak.

Sample
When the ServerWebExchangeUtils.cacheRequestBody method is called in multiple Filters, a ServerHttpRequest object decorated multiple times will be generated. When reading data, the retain method will be called multiple times, and eventually the release method in RemoveCachedBodyFilter cannot release the direct memory. However, we cannot restrict users from calling the ServerWebExchangeUtils.cacheRequestBody method multiple times, so we should ensure that the DataBuffer can be released correctly in the RemoveCachedBodyFilter class.

reproduce
Add these two filters to the project:

@Component
public class TestGlobalFilter1 implements GlobalFilter, Ordered {
    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        return ServerWebExchangeUtils.cacheRequestBody(exchange, (serverHttpRequest) -> {
            return chain.filter(exchange.mutate().request(serverHttpRequest).build());
        });
    }

    @Override
    public int getOrder() {
        return HIGHEST_PRECEDENCE;
    }
}
@Component
public class TestGlobalFilter2 implements GlobalFilter, Ordered {
    @Override
    public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
        return ServerWebExchangeUtils.cacheRequestBody(exchange, (serverHttpRequest) -> {
            return chain.filter(exchange.mutate().request(serverHttpRequest).build());
        });
    }

    @Override
    public int getOrder() {
        return HIGHEST_PRECEDENCE;
    }
}

Then use the pressure measurement tool to test for a period of time, and an error will appear:

Caused by: io.netty.handler.codec.EncoderException: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 1040187399, max: 1056309248)
        at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:104)
        at io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
        at io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:304)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
        at io.netty.handler.timeout.WriteTimeoutHandler.write(WriteTimeoutHandler.java:112)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
        at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764)
        at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
        at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808)
        at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025)
        at io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:306)
        at reactor.netty.http.HttpOperations.lambda$then$3(HttpOperations.java:205)
        at reactor.netty.FutureMono$DeferredFutureMono.subscribe(FutureMono.java:100)
        at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
        at reactor.core.publisher.FluxConcatIterable$ConcatIterableSubscriber.onComplete(FluxConcatIterable.java:147)
        at reactor.core.publisher.FluxConcatIterable.subscribe(FluxConcatIterable.java:60)
        at reactor.core.publisher.MonoFromFluxOperator.subscribe(MonoFromFluxOperator.java:81)
        at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:52)
        at reactor.netty.http.client.HttpClientConnect$HttpIOHandlerObserver.onStateChange(HttpClientConnect.java:424)
        at reactor.netty.ReactorNetty$CompositeConnectionObserver.onStateChange(ReactorNetty.java:677)
        at reactor.netty.resources.DefaultPooledConnectionProvider$DisposableAcquire.onStateChange(DefaultPooledConnectionProvider.java:184)
        at reactor.netty.resources.DefaultPooledConnectionProvider$PooledConnection.onStateChange(DefaultPooledConnectionProvider.java:440)
        at reactor.netty.channel.ChannelOperationsHandler.channelActive(ChannelOperationsHandler.java:62)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209)
        at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelActive(CombinedChannelDuplexHandler.java:412)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:69)
        at io.netty.channel.CombinedChannelDuplexHandler.channelActive(CombinedChannelDuplexHandler.java:211)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:209)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:230)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:216)
        at io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)
        at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:658)
        at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:691)
        at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
        at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

Adding this parameter can make errors appear faster:

-XX:MaxDirectMemorySize=24M
@spencergibb
Copy link
Member

spencergibb commented Mar 8, 2024

Can you give me some direction on the number of requests, how many parallel runners and rough times it takes? I'm unable to reproduce this with the latest supported Spring Boot and Cloud.

@spring-cloud-issues
Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@caixunshi
Copy link
Contributor Author

Can you give me some direction on the number of requests, how many parallel runners and rough times it takes? I'm unable to reproduce this with the latest supported Spring Boot and Cloud.

Can you give me some direction on the number of requests, how many parallel runners and rough times it takes? I'm unable to reproduce this with the latest supported Spring Boot and Cloud.

The problem arises when the user needs to read the requestBody data multiple times. The ServerWebExchangeUtils.cacheRequestBody method is used multiple times, as described in the issue. To simulate the user's behavior, two custom filters need to be added. Additionally, the -XX:MaxDirectMemorySize=24M parameter is included in the startup arguments to limit the size of direct memory. I used the JMeter tool with a thread count of 8, and the error occurs after approximately 10 seconds.

@spencergibb
Copy link
Member

To simulate the user's behavior, two custom filters need to be added.

Can you provide that?

nkonev added a commit to nkonev/spring-cloud-gateway-reactive-lb-consul that referenced this issue Mar 20, 2024
@nkonev
Copy link
Contributor

nkonev commented Mar 20, 2024

@spencergibb
I managed to reproduce it on my Fedora Linux Machine with Apache Benchmark (ab)

Please use branch gh2969

Instructions are in readme under "Test" label

@spencergibb
Copy link
Member

Using Spring Boot 3.2.3 and Spring Cloud 2023.0.1-SNAPSHOT the error goes away.

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <java.version>17</java.version>
        <spring.boot.version>2.6.3</spring.boot.version>
        <!--<spring-cloud.version>2023.0.0</spring-cloud.version>-->
        <spring-cloud.version>2023.0.1-SNAPSHOT</spring-cloud.version>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-dependencies</artifactId>
                <version>${spring.boot.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
            <dependency>
                <groupId>org.springframework.cloud</groupId>
                <artifactId>spring-cloud-dependencies</artifactId>
                <version>${spring-cloud.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

@nkonev

This comment was marked as outdated.

@nkonev
Copy link
Contributor

nkonev commented Mar 20, 2024

@spencergibb

There is a mistake - in xml you provided Spring Boot version is 2.6.3 and this downgrade really fixes the issue.

But if I set 3.2.3 - the error persists.

FYI on the https://spring.io/projects/spring-cloud is written that Spring Cloud 2023.0.x is compatible with Spring Boot 3.2.x

@spencergibb
Copy link
Member

Thanks for noticing.

@spencergibb
Copy link
Member

Just the one time is all I see, correct?

@nkonev
Copy link
Contributor

nkonev commented Mar 21, 2024

Not sure if I understood you, if it was question to me.
If you ask me about how many errors I have -- the answer will be - very many. The errors appear only under load, produced by ab. If I make requests with low rate - I won't see errors.

@spencergibb
Copy link
Member

Yes, sorry about the wording. I only see the error one time, but I'm on an M1 mac, which might make a difference.

@caixunshi
Copy link
Contributor Author

To simulate the user's behavior, two custom filters need to be added.

Can you provide that?

regarding the demo for two custom filters, it has been provided in the issue description.

@spencergibb
Copy link
Member

Can you try with 4.1.2-SNAPSHOT (after the build is done in 30 minutes or so)? I tried with the project supplied here and did not get the error any longer.

@caixunshi
Copy link
Contributor Author

Can you try with 4.1.2-SNAPSHOT (after the build is done in 30 minutes or so)? I tried with the project supplied here and did not get the error any longer.

Can you try with 4.1.2-SNAPSHOT (after the build is done in 30 minutes or so)? I tried with the project supplied here and did not get the error any longer.

I have tested it and this submission indeed solves the problem. However, I don't think it's sufficient because if users manually set CACHED_REQUEST_BODY_ATTR instead of using ServerWebExchangeUtils#cacheRequestBody method, there is still no guarantee of proper release. We can ensure that DataBuf is correctly released by using RemoveCachedBodyFilter. Please refer to the following PR for more details:
#2971

@spencergibb
Copy link
Member

Thanks for trying and pointing out the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants