commandBuffer peforms expensive queue in channelInactive? #960

shrimptttttttttt · 2019-01-08T07:47:18Z

Hi, all
In branch below 4.5.x, channelInactive will add all queued command to command buffer. If the network is very unstable and qps/traffic is very high, the command buffer will become huge, and this object is can't be recycled by gc. Thus client server is full gc all the time. While I see you remove commandBuffer.addAll call in channelInactive. Is it because what I think? Thanks for you attention and look forward for your reply.

mp911de · 2019-01-08T08:00:03Z

Lettuce retains commands on disconnect to re-trigger these on reconnect. We applied some optimizations in command re-queueing, you might want to check out Lettuce 5 as well.

If that's what you're talking about, then everything works as designed, feel free to close this ticket. If you mean something different, then please elaborate.

shrimptttttttttt · 2019-01-08T08:22:32Z

@Override
    public void channelInactive(ChannelHandlerContext ctx) throws Exception {

        if (debugEnabled) {
            logger.debug("{} channelInactive()", logPrefix());
        }

        try {
            writeLock.lock();
            setStateIfNotClosed(LifecycleState.DISCONNECTED);

            if (redisChannelHandler != null) {

                if (debugEnabled) {
                    logger.debug("{} deactivating channel handler", logPrefix());
                }

                setStateIfNotClosed(LifecycleState.DEACTIVATING);
                redisChannelHandler.deactivated();
            }

            setStateIfNotClosed(LifecycleState.DEACTIVATED);

            // Shift all commands to the commandBuffer so the queue is empty.
            // Allows to run onConnect commands before executing buffered commands
            commandBuffer.addAll(queue);
            queue.removeAll(commandBuffer);

        } finally {
            writeLock.unlock();
        }

        if (buffer != null) {
            rsm.reset();
            buffer.clear();
        }

        if (debugEnabled) {
            logger.debug("{} channelInactive() done", logPrefix());
        }
        super.channelInactive(ctx);
    }

Above is branch 3.5.x's code. We can see commandBuffer.addAll(queue) is called without queue size checked in method channelInactive. Because it's called without queue size checked, then commandBuffer can be very huge if channelInactive is being constantly called. But in branch 4.5.x's code, channelInactive won't call commandBuffer.addAll(queue) as belows

@Override
    public void channelInactive(ChannelHandlerContext ctx) throws Exception {

        if (debugEnabled) {
            logger.debug("{} channelInactive()", logPrefix());
        }

        if (channel != null && ctx.channel() != channel) {
            logger.debug("{} My channel and ctx.channel mismatch. Propagating event to other listeners.", logPrefix());
            super.channelInactive(ctx);
            return;
        }

        synchronized (stateLock) {
            try {
                lockWritersExclusive();
                setState(LifecycleState.DISCONNECTED);

                if (redisChannelHandler != null) {

                    if (debugEnabled) {
                        logger.debug("{} deactivating channel handler", logPrefix());
                    }

                    setState(LifecycleState.DEACTIVATING);
                    redisChannelHandler.deactivated();
                }

                rebuildQueue();
                setState(LifecycleState.DEACTIVATED);

                channel = null;
            } finally {
                unlockWritersExclusive();
            }
        }

        rsm.reset();

        if (debugEnabled) {
            logger.debug("{} channelInactive() done", logPrefix());
        }
        super.channelInactive(ctx);
    }

I am wondering what is the reason you remove commandBuffer.addAll(queue) in channelInactive?
Will commandBuffer.addAll(queue); call in method channelInactive version lower than 4.4.x cause heap size issue?

mp911de · 2019-01-08T13:26:55Z

When channelInactive is constantly called and the queue is large, then you have other issues to solve (e.g. solve overloading scenarios, remove the reason for regular disconnects)

Lettuce 4 has moved the queue rebuild into rebuildQueue(). This happens via Queue.poll and Queue.add.

Will commandBuffer.addAll(queue); … cause heap size issue?

Commands are not duplicated, they are retained until commands get canceled or executed. If you run out of memory because having too many queued commands, then you need to solve that issue first. Moving commands between queues requires additional memory which is a fraction of the already allocated command memory.

What is the actual question? What are you trying to solve?

shrimptttttttttt · 2019-01-08T14:05:07Z

Sorry, I have to correct my question. I have this issue
My client server keep full gc all time. And I get the heap dump. And we found out that the ClientResource's transportBuffer holds 90w+'s Command.

And I am wondering why...

shrimptttttttttt · 2019-01-08T14:06:38Z

And we stop the traffic, the client server is keeping full gc until OOM, and can't recover. Seems like the worker thread stop consuming the transportBuffer.

mp911de · 2019-01-08T14:18:58Z

I suggest limiting the request queue via ClientOptions (see https://github.com/lettuce-io/lettuce-core/wiki/Client-Options). Additionally, you can set disconnectedBehavior to reject commands while disconnected. Both settings will protect you from accepting too many command objects.

shrimptttttttttt · 2019-01-08T14:48:41Z

We have configured the ClientOptions max queue size as 4096, but the transportBuffer sitll holds 90w+ Commands. We use 3.5.0.Final, by the way.

And we also found out that we got ConcurrentModificationException as belows

java.util.ConcurrentModificationException: null
        at java.util.ArrayDeque.delete(ArrayDeque.java:554) ~[na:1.8.0_172]
        at java.util.ArrayDeque.removeFirstOccurrence(ArrayDeque.java:375) ~[na:1.8.0_172]
        at java.util.ArrayDeque.remove(ArrayDeque.java:752) ~[na:1.8.0_172]
        at com.lambdaworks.redis.protocol.CommandHandler.queueCommand(CommandHandler.java:428) ~[lettuce-3.5.0.Final.jar:na]
        at com.lambdaworks.redis.protocol.CommandHandler.writeSingleCommand(CommandHandler.java:372) ~[lettuce-3.5.0.Final.jar:na]
        at com.lambdaworks.redis.protocol.CommandHandler.write(CommandHandler.java:355) ~[lettuce-3.5.0.Final.jar:na]

We know the problem above is solved in 3.5.1. But I want figure out the reasons with version 3.5.0.Final.

Why transportBuffer holds 90w+ Commands? As we have already configured max queue size as 4096? As transportBuffer holds too many Commands, our client server keeps full gc but can't get enough memory size recycled thus OOM.
When we stop the traffic, our client server still can't recover. Seems like transportBuffer consuming thread stops working.
Are 1 and 2 related to transportBuffer.remove failure?

I have some ideas and I'll share with you and hope you'll feedback.

clientOptions max queue size check as belows have preconditions

if (commandBuffer.size() + queue.size() >= clientOptions.getRequestQueueSize()) {
                throw new RedisException("Request queue size exceeded: " + clientOptions.getRequestQueueSize()
                        + ". Commands are not accepted until the queue size drops.");
            }

The precondition is we need to remove from transportBuffer and add to the queue successfully.

private void writeSingleCommand(ChannelHandlerContext ctx, RedisCommand<K, V, ?> command, ChannelPromise promise)
            throws Exception {

        if (command.isCancelled()) {
            transportBuffer.remove(command);
            return;
        }

        queueCommand(command, promise);
        ctx.write(command, promise);
    }

But what if transportBuffer.remove(command); throws ConcurrentModificationException? As the code above shows it will throws an Exception and Command won't be added to the queue.

Thus the queue size check below can't work properly. With high load and mGet(1024 keys), the transportBuffer will hold too many Commands until OOM.

if (commandBuffer.size() + queue.size() >= clientOptions.getRequestQueueSize()) {
                throw new RedisException("Request queue size exceeded: " + clientOptions.getRequestQueueSize()
                        + ". Commands are not accepted until the queue size drops.");
            }

Above is my idea about the below question

Why transportBuffer holds 90w+ Commands? As we have already configured max queue size as 4096? As transportBuffer holds too many Commands, our client server keeps full gc but can't get enough memory size recycled thus OOM.

As this question

When we stop the traffic, our client server still can't recover. Seems like transportBuffer consuming thread stops working.

I think this may have some connection with this one
#788

Lettuce can't account for VM errors. If an Error is thrown, then the whole driver gets into an undefined state.

So the working thread stop consuming transportBuffer. And that's why we can't recover even if we stop the traffic.

Am I right?

Thank you for your help and look forward for your feedback.
Have a nice day.

mp911de · 2019-01-09T07:41:16Z

Lettuce 3.5 is out of maintenance for quite a long time and it's not supported anymore. Given the number of bugs we fixed around queue limitation in Lettuce 4 and 5, it's likely that these bugs still persist in Lettuce 3.5.

The actual problem with Lettuce 3.5 command queueing is that operations happen in different threads and that ArrayDeque is not bounded. This allows for conditions in which commands can be added although the queue should be bounded.

I strongly advise upgrading to Lettuce 5.

mp911de · 2019-02-07T12:41:46Z

Closing as there's nothing left to do here. If you would like us to look at this issue, please provide the requested information and we will re-open the issue.

mp911de added the status: waiting-for-feedback We need additional information before we can continue label Jan 8, 2019

mp911de closed this as completed Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

commandBuffer peforms expensive queue in channelInactive? #960

commandBuffer peforms expensive queue in channelInactive? #960

shrimptttttttttt commented Jan 8, 2019

mp911de commented Jan 8, 2019

shrimptttttttttt commented Jan 8, 2019 •

edited by mp911de

Loading

mp911de commented Jan 8, 2019

shrimptttttttttt commented Jan 8, 2019

shrimptttttttttt commented Jan 8, 2019

mp911de commented Jan 8, 2019

shrimptttttttttt commented Jan 8, 2019 •

edited

Loading

mp911de commented Jan 9, 2019

mp911de commented Feb 7, 2019

commandBuffer peforms expensive queue in channelInactive? #960

commandBuffer peforms expensive queue in channelInactive? #960

Comments

shrimptttttttttt commented Jan 8, 2019

mp911de commented Jan 8, 2019

shrimptttttttttt commented Jan 8, 2019 • edited by mp911de Loading

mp911de commented Jan 8, 2019

shrimptttttttttt commented Jan 8, 2019

shrimptttttttttt commented Jan 8, 2019

mp911de commented Jan 8, 2019

shrimptttttttttt commented Jan 8, 2019 • edited Loading

mp911de commented Jan 9, 2019

mp911de commented Feb 7, 2019

shrimptttttttttt commented Jan 8, 2019 •

edited by mp911de

Loading

shrimptttttttttt commented Jan 8, 2019 •

edited

Loading