Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make serialisation in FileManager::saveToFile thread-safe #4025

Merged
merged 5 commits into from
Mar 23, 2020

Conversation

stejbac
Copy link
Contributor

@stejbac stejbac commented Mar 4, 2020

Add toProtoMessageSynchronized() default method to PersistableEnvelope, which performs (blocking) protobuf serialisation in the user thread, regardless of the calling thread. This should prevent data races like the ConcurrentModificationException observed in #3752, under the reasonable assumption that shared persistable objects are only mutated in the user thread.

Also add a ThreadedPersistableEnvelope sub-interface overriding the default method above, to let objects which are expensive to serialise (like DaoStateStore) be selectively serialised in the save-file-task-X thread as before, but directly synchronised with each mutating operation. As most objects are cheap to serialise, this avoids a noticeable performance drop without having to track down every mutating method for each store.

In all cases but one, classes implementing ThreadedPersistableEnvelope are stores like TradeStatistic2Store, with a single ConcurrentHashMap field. These require no further synchronisation, since the map entries are immutable, so the only mutating operations are map.put(..) calls which are already synchronised with map reads. (Even if map.values().stream() sees updates at different keys happen out-of-order, it should be benign, e.g. trade statistics might be persisted out-of-order if two trades come along at almost exactly the same time.)

The remaining case is DaoStateStore, which is only ever reset or modified via a single persist(..) call with a cloned DaoState instance and hash chain from DaoStateSnapshotService, so there is no aliasing risk from the various DAO state mutations done in DaoStateService and elsewhere.

This should fix #3752.

--

Adding some temporary benchmark logging to PersistableEnvelope shows the time spent in each saveToFile protobuf conversion forced to run on the user thread with the PR changes, from my Bisq node (running on mainnet) from startup to shutdown:

Serialisation_Benchmarks.txt

As can be seen in the filtered log, the serialisation of PreferencesPayload, UserPayload and TradableList took 100-140ms (which might cause noticeable jitter on the user thread) the first time they were persisted upon application startup. However, subsequent serialisations of the same objects seem to occur much faster (presumably after the protobuf code has had a chance to warm up), with no (single-threaded) PersistableEnvelope object taking longer than around 20ms to serialise once the application has started.

The assumed expensive-to-serialise objects like DaoStateStore and those managed by MapStoreService all implement ThreadedPersistableEnvelope, so don't appear in the above benchmark and are converted directly on each save-file-task-X thread as before.

Minor change for consistency: narrow the signature of some remaining
such methods, which have return type 'PersistableEnvelope'.

(This excludes some other cases with return type 'NetworkEnvelope'.)
Add toProtoMessageSynchronized() default method to PersistableEnvelope,
which performs (blocking) protobuf serialisation in the user thread,
regardless of the calling thread. This should prevent data races like
the ConcurrentModificationException observed in bisq-network#3752, under the
reasonable assumption that shared persistable objects are only mutated
in the user thread.

Also add a ThreadedPersistableEnvelope sub-interface overriding the
default method above, to let objects which are expensive to serialise
(like DaoStateStore) be selectively serialised in the 'save-file-task-X'
thread as before, but directly synchronised with each mutating op. As
most objects are cheap to serialise, this avoids a noticeable perf drop
without having to track down every mutating method for each store.

In all cases but one, classes implementing ThreadedPersistableEnvelope
are stores like TradeStatistic2Store, with a single ConcurrentHashMap
field. These require no further serialisation, since the map entries are
immutable, so the only mutating operations are map.put(..) calls which
are already synchronised with map reads. (Even if map.values().stream()
sees updates @ different keys happen out-of-order, it should be benign.)

The remaining case is DaoStateStore, which is only ever reset or
modified via a single persist(..) call with a cloned DaoState instance
and hash chain from DaoStateSnapshotService, so there is no aliasing
risk from the various DAO state mutations done in DaoStateService and
elsewhere.

This should fix bisq-network#3752.
@freimair
Copy link
Member

freimair commented Mar 4, 2020

the part about performance sounds awfully familiar to me. see #4003

@sqrrm
Copy link
Member

sqrrm commented Mar 7, 2020

I would like to get this PR into the next release if possible. It touches core DAO code so @ManfredKarrer needs to review it. I think it looks good from what I've seen so far.

Copy link
Member

@ManfredKarrer ManfredKarrer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK - but would prefer to see the suggested changes and want to emphasise that it has to be tested very well. Its certainly a high risk PR. But great to see some progress in that area! Thanks @stejbac !

I would suggest renaming toProtoMessageSynchronized to toPersistableMessage.
Also would suggest to make it more explicit and flexible by adding the new behaviour in a UserThreadMappedPersistableEnvelope interface and keep PersistableEnvelope as non-safe-thread but fast interface with the default method:
default Message toPersistableMessage() { return toProtoMessage(); }

If this is applied it has to be carefully checked for each persisted object which interface is most suitable and check with profiling how the changes compare. There are some generic classes like PersistableList which need to be considered as well (at the moment its all covered as PersistableEnvelope is used for the new method.

There are several classes implementing PersistableEnvelope like SignedWitness but I think that is wrong as they are not persisted directly but wrapped in a storage class. I have not checked more closely but might be worth to clean that up in a separate PR.

I am not sure about the performance impacts when using UserThread mapping. Any delay > 20 ms is visually visible in the UI as lagging.

I would suggest to create 3 categories for all persisted objects:

  • Big objects -> performance is critical, use threaded version (DaoState,...)
  • Small objects, correctness is critical -> use UserThread mapping (SequenceNrMap,...)
  • Other uncritical -> use non-thread safe option in dedicated thread (NavigationPath)

The uncritical ones could be kept as non-threadsafe IMO.

For further improvements the delays between write operations should be checked, there might be some optimisation possible, specially for those which have frequent write operations.

From the DAO perspective I don't see any obvious risk with that change, but it should be tested very well specially with DaoStateHashes and the monitoring for the 3 different hash chains.

For the most problematic objects (DaoState, TradeStatistics, AccountAgeWitness) I suggest to start to look into a different design as suggested in the related issues/PRs (e.g. use a historical data structure which is not handled by the P2PDataStorage and only loaded in memory on demand). For DaoState that approach might not work, but some solution to make it more scalable is required here as well.


/**
* Interface for the outside envelope object persisted to disk.
*/
public interface PersistableEnvelope extends Envelope {

default Message toProtoMessageSynchronized() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the method name a bit confusing as it is only synchronized in ThreadedPersistableEnvelope but here we use mapping to userThread. I would recommend using toPersistableMessage as stated above.


/**
* Interface for the outside envelope object persisted to disk.
*/
public interface PersistableEnvelope extends Envelope {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its better to use a specific interface here as well like in ThreadedPersistableEnvelope? Might make it more explicit that we alter the basic non-thread-safe but fast behaviour. We could use then toPersistableMessage as method name for all 3 cases and the interface marks which style is actually used. Would make it also a bit more distinct from the generic toProtoMessage method and shows its only relevant for persisting an envelope.

Make PersistableEnvelope method name less confusing, as it is only
synchronised in ThreadedPersistableEnvelope.
Make the default toPersistableMessage() method of PersistableEnvelope
simply delegate to Proto.toProtoMessage for speed, so that stores can
explicitly implement (Threaded|UserThreadMapped)PersistableEnvelope if
they actually need concurrency control.

As part of this, make PeerList implement PersistableEnvelope directly
instead of extending PersistableList, as it is non-critical & cloned on
the user thread prior to storage anyway, so doesn't need be thread-safe.
In this way, only PaymentAccountList & small DAO-related stores extend
PersistableList, so they can all be made user-thread-mapped.

After this change, the only concrete store classes not implementing
(Threaded|UserThreadMapped)PersistableEnvelope are:

  AccountAgeWitness, BlindVotePayload, ProposalPayload, SignedWitness,
  TradeStatistics2, NavigationPath & PeerList

The first five appear to erroneously implement PersistableEnvelope and
can be cleaned up in a separate commit. The last two are non-critical.

(Make NavigationPath.path an immutable list, for slightly better thread
safety anyway - that way it will never be observed half-constructed.)
@stejbac
Copy link
Contributor Author

stejbac commented Mar 10, 2020

I've just pushed changes to address some of the issues pointed out above. I renamed toProtoMessageSynchronized and added UserThreadMappedPersistableEnvelope as suggested. There was another commit I intended to push to serialise SequenceNumberMap on the user thread, by making it implement the new interface instead of ThreadedPersistableEnvelope. However, benchmarking the code in the same way as above showed that it adds considerable overhead:

Serialisation_Benchmarks_with_SequenceNumberMap.txt

I think it would be better to make it continue to implement ThreadedPersistableEnvelope and rely on the concurrency guarantees of ConcurrentHashMap if possible. (That should effectively synchronise reads and updates of the optional map value, at each potential key independently.)

@ManfredKarrer
Copy link
Member

ManfredKarrer commented Mar 10, 2020

@stejbac Thanks for the implementation of the suggested changes!

Could we make a list of all potential persisted objects (some are only created in rare cases like FailedTrades) to check for each persisted object if the adequate interface is applied? Would be also great to see a performance comparison before/after change (hope I don't get too annoying with those requests but I think as its a bit of a risky change its better to be extra careful here).

I just found another issue (no actual problem but potential risk for future changes):
VoteWithProposalTxIdList (as well as MeritList) is extending PersistableList which is now using the UserThreadMappedPersistableEnvelope. This list is using the PersistableList only for serialisation but not persistence (see comment in class). This should be fixed anyway in a more correct way (e.g. using a dedicated List only for that purpose). As we don't call the toPersistableMessage method here no behaviour change is introduced, but if anyone would start further changes to PersistableList we might inherit a potential critical change, which can easily lead then to a consensus problem (the serialised protoBuf data are used for consensus). I suggest to clean that up either with the PR (or better with a new one) so that this risk by abusing the PersistableList here is removed. A potential problematic change could be if one changes the list in PersistableList from ArrayList to any other list type which might produce a different protoBuf serialization result. As it is not obvious that PersistableList is used for consensus critical code here this should be changed to remove that risk.

@stejbac
Copy link
Contributor Author

stejbac commented Mar 11, 2020

Earlier, I (manually) worked out a type hierarchy of everything extending / implementing PersistableEnvelope after the proposed changes:

Screenshot from 2020-03-11 07-23-24
(Actuallty PersistableList isn't an abstract class, but probably could be.)

I was intending to clean up the classes incorrectly implementing PersistableEnvelope in a separate PR, as suggested.

It looks like every toProtoMessage and fromProto method of each class in the DAO package extending PersistableList collects items into a new list using a stream, anyway, when mapping the input list items to/from proto.* objects. So I would have thought that only the iteration order of the PersistableList.getList() items matters for proto conversion.

@ManfredKarrer
Copy link
Member

ManfredKarrer commented Mar 11, 2020

Thanks for the list.
Regarding VoteWithProposalTxIdList and MeritList: Yes I also assume there is no issue but I think its more safe to use a dedicated class to avoid potential future issues. Those classes implement also the ConsensusCritical marker interface.

Also the PersistableNetworkPayloadList and its related classes (PersistableNetworkPayloadListService) could be cleaned up and removed (in separate PR). It was long ago used for converting db files.
PersistableHashMap is also not used.

Thanks again for working on that!

@ripcurlx ripcurlx added the is:priority PR or issue marked with this label is up for compensation label Mar 11, 2020
@ripcurlx
Copy link
Contributor

Regarding VoteWithProposalTxIdList and MeritList: Yes I also assume there is no issue but I think its more safe to use a dedicated class to avoid potential future issues. Those classes implement also the ConsensusCritical marker interface.

@ManfredKarrer Is this a requirement for this PR to get merged? If not please NACK or if it is up to be merged ACK from your side to make it clear for the maintainers 😄 . Thanks!

@ManfredKarrer
Copy link
Member

ManfredKarrer commented Mar 21, 2020

@ripcurlx Thanks for reminding on that. I recommend to not change the behaviour of those 2 classes (not using UserThreadMappedPersistableEnvelope as they would do now from the base class). So until that is changed I need to NACK that PR. Beside that utACK.

Provide UserThreadMappedPersistableList subclass for persistable lists
which need to implement UserThreadMappedPersistableEnvelope, instead of
putting the interface on the base class.

Make the (non-storage) classes MeritList and VoteWithProposalTxIdList
keep the original PersistableList superclass, deriving the remaining
subclasses of PersistableList from the new class instead. In this way,
further persistence-related changes are less likely to inadvertently
alter the behaviour of those two consensus-critical classes.

Removing the superfluous PersistableEnvelope interface from the two
classes (via the base class) will be done in a separate PR.
@stejbac
Copy link
Contributor Author

stejbac commented Mar 22, 2020

I've just made another commit to remove the UserThreadMappedPersistableEnvelope interface from the two classes, as recommended by @ManfredKarrer. After the change, the interface still lies on all the remaining classes extending PersistableList, where I believe it's actually needed (except possibly PaymentAccountList, which appears to only have a hypothetical race setting paymentAccount.selectedTradeCurrency when using the non-thread-safe persistence interface).

This does, however, include three other ConsensusCritical classes, changing the original behaviour (to fix the bug): MyProposalList, MyBlindVoteList and BallotList.

Copy link
Contributor

@ripcurlx ripcurlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK - based on #4025 (comment)

Tested it myself on Regtest (Arbitration, Voting Cycle, Trade, Account Creation) and I didn't see any issues so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:priority PR or issue marked with this label is up for compensation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConcurrentModificationException FileManager::saveToFile
5 participants