[Broker] Fix race condition in invalidating ledger cache entries #10480

lhotari · 2021-05-04T12:32:27Z

Motivation

See #10433 . There's a rare race condition in invalidating ledger cache entries stored in RangeCache .

Modifications

add separate invalidate method for invalidating EntryImpl ledger cache entries. This prevents race conditions in invalidation.

eolivelli

@lhotari I left a few ideas.
I am not saying you are on the wrong way, but I am not sure we are really fixing the problem

eolivelli · 2021-05-04T12:48:24Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/util/RangeCache.java

@@ -90,7 +90,7 @@ public Value get(Key key) {
            try {
                value.retain();
                return value;
-            } catch (Throwable t) {
+            } catch (IllegalReferenceCountException e) {


probably we should log something here.
this case must not happen

yes, logging would be useful to get more information. I'm just wondering if it should be done only at debug level since it's not a real problem. It's part of the expected behavior that this could sometimes happen.

eolivelli · 2021-05-04T12:48:29Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/util/RangeCache.java

@@ -113,7 +113,7 @@ public Value get(Key key) {
            try {
                value.retain();
                values.add(value);
-            } catch (Throwable t) {
+            } catch (IllegalReferenceCountException e) {


probably we should log something here.
this case must not happen

It's expected to happen and it's fine when it happens. It just indicates the entry is being evicted when we're trying to access it. If the retain succeeds, the operation was successful, otherwise the entry is already gone.

eolivelli · 2021-05-04T12:58:49Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/util/RangeCache.java

-            removedSize += weighter.getSize(value);
-            value.release();
+            long entrySize = weighter.getSize(value);
+            if (value.invalidate()) {


I am afraid we are only hiding the problem.
there must be some coordination about these entries, some clear protocol about who is the owner of the entry.

when we get to this point then we must be sure that the refcount is valid on the value, otherwise it is always an hazard.

I believe that the right protocol is that before calling weighter.getSize(value); we should try to acquire the entry and in case of failure we can ignore the entry.

Value value = entry.getValue(); if (value.tryAcquire()) { ++removedEntries; removedSize += weighter.getSize(value); value.release(); // this refers to the value.tryAcquire() }

when we remove the entry we must have some write lock over the entry, that prevents double releases

merlimat

@lhotari Great finding. I agree with the assessment that multiple invalidations are happening at the same time to cause this.

I think the correct fix here would be to ensure that there's only one eviction happening at a given point in time, so that we avoid touching an entry whose ref-count is potentially 0.

For that, we'd need to make sure that, in RangeCache, removeRange(),evictLeastAccessedEntries() and evictLEntriesBeforeTimestamp() are either called with a mutex (like clear() is already doing) or at least from the same single thread.

merlimat · 2021-05-04T13:39:20Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/util/RangeCache.java

@@ -113,7 +113,7 @@ public Value get(Key key) {
            try {
                value.retain();
                values.add(value);
-            } catch (Throwable t) {
+            } catch (IllegalReferenceCountException e) {


It's expected to happen and it's fine when it happens. It just indicates the entry is being evicted when we're trying to access it. If the retain succeeds, the operation was successful, otherwise the entry is already gone.

merlimat · 2021-05-04T13:43:26Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/EntryImpl.java

+            release();
+            return true;
+        }
+        return false;


I don't think this is the correct approach. If the issue is that we're using an already released buffer, we should fix that instead.

This will avoid decrementing the ref-count more than once, but it will not prevent the 2nd thread from accessing an entry whose ref-count was already to 0.

This is a safety measure to prevent races in invalidating the entries. I agree that the issues in releasing must be fixed. The benefit of adding a separate method for invalidation would help detect when the problem is caused by invalidating the entry twice. Some logging could be added to detect the issues where there's a race in invalidation which causes a "double release".

Currently, it seems that the problems that we are seeing could occur only when there's a race in invalidation. At a quick glance, there doesn't seem to be other code paths where the entry is released but not retained as part of the same "flow".

Would this justify adding some extra protection against races in invalidation?

This will avoid decrementing the ref-count more than once, but it will not prevent the 2nd thread from accessing an entry whose ref-count was already to 0.

Yes, that's a good point. I'm thinking of a solution where invalidation would be a completely separate operation which triggers when reference count is 1 or gets back to 1. Another protection here would be a change in logic that release operations to change reference count from 1 to 0 would be rejected completely. That would prevent bugs which are caused by release being called too many times. Those issues could be logged and fixed if such bugs exist.

merlimat · 2021-05-04T16:16:56Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/util/RangeCache.java

-            removedSize += weighter.getSize(value);
-            value.release();
-            ++removedEntries;
+            long entrySize = weighter.getSize(value);


eg: in case of concurrent invalidate, the value is already invalid here

dlg99 · 2021-05-04T17:36:05Z

@lhotari I agree with @merlimat on " If the issue is that we're using an already released buffer, we should fix that instead."
It could be caused by i.e. an implementation of the ReadEntriesCallback/ReadEntryCallback/whatever other callback that is expected to properly release entry. Or, as you mentioned at the issue, it could be netty/netty#10986

Other issue is: do we have a repro?
Have we confirmed that upgrade of Netty to a version with fix netty/netty#10986 doesn't help?
and that this fix helps?

merlimat · 2021-05-04T20:32:07Z

Actually, I'm not 100% sure that the having invalidations called by multiple thread could lead to the issue. In all the cases the entries are removed from the ConcurrentSkipList before getting release, and the guarantee there is that the removal should be atomic.

lhotari · 2021-05-05T06:28:21Z

Actually, I'm not 100% sure that the having invalidations called by multiple thread could lead to the issue. In all the cases the entries are removed from the ConcurrentSkipList before getting release, and the guarantee there is that the removal should be atomic.

This is true. Therefore, thinking of the changes as a safety measure and way to detect the source of the problem as explained in my previous comment could be the rationale for adding a separate method for invalidation.

lhotari · 2021-05-05T11:09:24Z

@lhotari I agree with @merlimat on " If the issue is that we're using an already released buffer, we should fix that instead."

Yes I agree on this, I'll try to dig deeper. :)

It could be caused by i.e. an implementation of the ReadEntriesCallback/ReadEntryCallback/whatever other callback that is expected to properly release entry.

Good point. I'll track those release calls.

Or, as you mentioned at the issue, it could be netty/netty#10986

Pulsar might not be impacted in cases where AbstractCASReferenceCounted base class is used. Here's AbstractCASReferenceCounted release logic:

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/util/AbstractCASReferenceCounted.java

Lines 95 to 110 in 5e446a6

    
           private boolean release0(int decrement) { 
        
               for (;;) { 
        
                   int refCnt = this.refCnt; 
        
                   if (refCnt < decrement) { 
        
                       throw new IllegalReferenceCountException(refCnt, -decrement); 
        
                   } 
        
                   if (refCntUpdater.compareAndSet(this, refCnt, refCnt - decrement)) { 
        
                       if (refCnt == decrement) { 
        
                           deallocate(); 
        
                           return true; 
        
                       } 
        
                       return false; 
        
                   } 
        
               } 
        
           }

Other issue is: do we have a repro?

No. I assume that it's a rare issue since there's not many reports about it. It could be possible to achieve a repro at some kind of unit/integration test level using JCStress, but it's hard to estimate the effort to achieve a repro.

Have we confirmed that upgrade of Netty to a version with fix netty/netty#10986 doesn't help?

No. The bug report is about concurrent calls to .recycle(). As mentioned earlier, that is prevented with AbstractCASReferenceCounted in Pulsar in most cases.

lhotari · 2021-05-06T12:59:44Z

I've been trying to spot locations where the entry could get released multiple times, but still continued to be used.

This looks risky:

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/EntryCacheImpl.java

Lines 191 to 195 in dcaa1d3

    
           // invalidate all entries related to ledger from the cache (it might happen if entry gets corrupt 
        
           // (entry.data is already deallocate due to any race-condition) so, invalidate cache and next time read from 
        
           // the bookie) 
        
           invalidateAllEntries(lh.getId()); 
        
           callback.readEntryFailed(createManagedLedgerException(t), ctx);

Together with

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java

Lines 1232 to 1237 in d2138f7

    
           public synchronized void readEntryFailed(ManagedLedgerException mle, Object ctx) { 
        
               log.warn("[{}][{}] Error while replaying entries", ledger.getName(), name, mle); 
        
               if (exception.compareAndSet(null, mle)) { 
        
                   // release the entries just once, any further read success will release the entry straight away 
        
                   entries.forEach(Entry::release); 
        
               }

merlimat · 2021-05-06T18:26:00Z

As mentioned earlier, that is prevented with AbstractCASReferenceCounted in Pulsar in most cases.

At this point, we can actually get rid of AbstractCASReferenceCounted. It was added in #2995 as a temporary measure to work around a change in behavior in Netty. The Netty issue was then fixed in 4.1.32 and we don't need the special treatment anymore.

The change

pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/EntryCacheImpl.java

Lines 187 to 196 in dcaa1d3

    
           try { 
        
               asyncReadEntry0(lh, position, callback, ctx); 
        
           } catch (Throwable t) { 
        
               log.warn("failed to read entries for {}-{}", lh.getId(), position, t); 
        
               // invalidate all entries related to ledger from the cache (it might happen if entry gets corrupt 
        
               // (entry.data is already deallocate due to any race-condition) so, invalidate cache and next time read from 
        
               // the bookie) 
        
               invalidateAllEntries(lh.getId()); 
        
               callback.readEntryFailed(createManagedLedgerException(t), ctx); 
        
           }

was added as a way to handle the same bug, but I agree that it's really dangerous in that we don't really what got wrong, as in where the corruption was...

As for the 2nd part (the Cursor.readEntryFailed), that seems ok to me. The entries there are the partial entries that were already read before. They might be coming from the cache or directly from bookies. In any case, the ref-count was increased when they came out of the cache, so we're required to release (and it should be safe to do so).

lhotari · 2021-05-07T06:04:23Z

At this point, we can actually get rid of AbstractCASReferenceCounted. It was added in #2995 as a temporary measure to work around a change in behavior in Netty. The Netty issue was then fixed in 4.1.32 and we don't need the special treatment anymore.

I'm thinking of replacing it with something that would give extra protection against bugs. Let's see what it evolves into. I'll push changes to this PR once there's something presentable.

was added as a way to handle the same bug, but I agree that it's really dangerous in that we don't really what got wrong, as in where the corruption was...

As for the 2nd part (the Cursor.readEntryFailed), that seems ok to me. The entries there are the partial entries that were already

About these 2 lines of code together:

     invalidateAllEntries(lh.getId()); 
     callback.readEntryFailed(createManagedLedgerException(t), ctx);

The problem here seems to be that invalidateAllEntries will call .release() and then the callback.readEntryFailed will also call .release() for the same set of entries and this leads to the "double release" which can cause the entry to be returned to the recycler (when there's one more outstanding usage of the entry). Once it's returned to the recycler, it can get used in some other usage at the same time. After that, the outstanding usage of the entry in the first place calls .release() and it could lead to the NPE that was reported. Makes sense?

merlimat · 2021-05-07T06:30:21Z

The problem here seems to be that invalidateAllEntries will call .release() and then the callback.readEntryFailed will also call .release() for the same set of entries and this leads to the "double release"

I don't think there's a double release because:

invalidateAllEntries is invalidating, thus removing and releasing the entries kept in cache
callback.readEntryFailed is releasing entries that were already out of the cache. We're just release the additional ref-count that was added when the entry came out of the cache (since we're not going to use the entry).

The main traits of the entry cache are:

The cache has ownership of the entry and keeps 1 ref-count when the entry is cached.
To grab an entry from the cache, we try to retain() (which might throw if the entry is being deallocated)
To invalidate, entries have to first get removed from the map and then release 1 ref-count

lhotari · 2024-05-28T08:43:53Z

A better fix is #22789 which doesn't contain the problems that were in this PR a few years ago.

Fix race in invalidating entries

85abe7b

lhotari mentioned this pull request May 4, 2021

NPE in broker: EntryImpl.getLength() #10433

Closed

lhotari marked this pull request as draft May 4, 2021 12:36

eolivelli reviewed May 4, 2021

View reviewed changes

merlimat reviewed May 4, 2021

View reviewed changes

lhotari closed this Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Broker] Fix race condition in invalidating ledger cache entries #10480

[Broker] Fix race condition in invalidating ledger cache entries #10480

lhotari commented May 4, 2021

eolivelli left a comment

eolivelli May 4, 2021

lhotari May 5, 2021

eolivelli May 4, 2021

merlimat May 4, 2021

eolivelli May 4, 2021

merlimat left a comment •

edited

Loading

merlimat May 4, 2021

merlimat May 4, 2021

lhotari May 5, 2021

lhotari May 6, 2021

merlimat May 4, 2021

dlg99 commented May 4, 2021

merlimat commented May 4, 2021

lhotari commented May 5, 2021

lhotari commented May 5, 2021

lhotari commented May 6, 2021

merlimat commented May 6, 2021 •

edited

Loading

lhotari commented May 7, 2021

merlimat commented May 7, 2021

lhotari commented May 28, 2024

[Broker] Fix race condition in invalidating ledger cache entries #10480

[Broker] Fix race condition in invalidating ledger cache entries #10480

Conversation

lhotari commented May 4, 2021

Motivation

Modifications

eolivelli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merlimat left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlg99 commented May 4, 2021

merlimat commented May 4, 2021

lhotari commented May 5, 2021

lhotari commented May 5, 2021

lhotari commented May 6, 2021

merlimat commented May 6, 2021 • edited Loading

lhotari commented May 7, 2021

merlimat commented May 7, 2021

lhotari commented May 28, 2024

merlimat left a comment •

edited

Loading

merlimat commented May 6, 2021 •

edited

Loading