You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We observe leak in direct memory buffer in the worker jvm like:
2023-10-23 06:58:15,873 ERROR [data-server-tcp-socket-worker-8](RPCMessageDecoder.java:48) - Error in decoding message.
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 4194304 byte(s) of direct memory (used: 10733224215, max: 10737418240)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:845)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:774)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:701)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:676)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:215)
at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:197)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:139)
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:302)
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:122)
at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:105)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:288)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.lang.Thread.run(Thread.java:750)
The size of JVM/direct memory cap is irrelevant, we observe similar OOM with different sizes. For this 1G direct memory JVM, we observe 1G direct memory is used up after reading two 16MB files for several times. The sizes we read are probably less than 1G. This is very reproducible.
Below config is proven to directly lead to the leak:
# Use this mode, so the worker cache is copied into the read buffer to serve the read request
# the buffer is direct memory so you reproduce the leak
alluxio.worker.network.netty.file.transfer=MAPPED
If you do NOT use this MAPPED mode, there is no leak observed.
To Reproduce
See above
Expected behavior
A clear and concise description of what you expected to happen.
Urgency
Describe the impact and urgency of the bug.
Are you planning to fix it
Y
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
### What changes are proposed in this pull request?
resolves#18324
Disclaimer: I might have monkey-typed this fix but I still do not know anything about buffer ref counting. This fix does NOT make me the owner of this state machine.
pr-link: #18323
change-id: cid-eb5bde353c08d3d9bdd39da5b9caf13681bae495
ssz1997
pushed a commit
to ssz1997/alluxio
that referenced
this issue
Dec 15, 2023
### What changes are proposed in this pull request?
resolvesAlluxio#18324
Disclaimer: I might have monkey-typed this fix but I still do not know anything about buffer ref counting. This fix does NOT make me the owner of this state machine.
pr-link: Alluxio#18323
change-id: cid-eb5bde353c08d3d9bdd39da5b9caf13681bae495
Alluxio Version:
305
Describe the bug
We observe leak in direct memory buffer in the worker jvm like:
The size of JVM/direct memory cap is irrelevant, we observe similar OOM with different sizes. For this 1G direct memory JVM, we observe 1G direct memory is used up after reading two 16MB files for several times. The sizes we read are probably less than 1G. This is very reproducible.
Below config is proven to directly lead to the leak:
If you do NOT use this MAPPED mode, there is no leak observed.
To Reproduce
See above
Expected behavior
A clear and concise description of what you expected to happen.
Urgency
Describe the impact and urgency of the bug.
Are you planning to fix it
Y
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: