fix concurrent access to self.cache in nosqldict #121

ldruschk · 2021-05-30T11:23:57Z

No description provided.

ldruschk · 2021-05-30T11:29:03Z

note that the tests do not actually check for the concurrent access issue at the moment, since testing for race conditions is hard to do in a deterministic fashion. Although something like https://github.com/idlesign/pytest-race might be worth a look in the future

src/enochecker/nosqldict.py

Trolldemorted · 2021-05-30T19:16:26Z

note that the tests do not actually check for the concurrent access issue at the moment

Should we try to lock and raise an error if it already is locked? This way we are hiding problems we ought to fix, aren't we?

ldruschk · 2021-05-30T19:20:43Z

note that the tests do not actually check for the concurrent access issue at the moment

Should we try to lock and raise an error if it already is locked? This way we are hiding problems we ought to fix, aren't we?

What problem would we be hiding? Why would we not want the NoSqlDict to be thread-safe?

domenukk · 2021-05-30T20:33:28Z

I think it's fine with a Lock. Checkers generally shouldn't spawn their own threads, I guess that's Bennis point (?) but it's rather irrelevant here

Trolldemorted · 2021-05-30T21:37:19Z

What problem would we be hiding? Why would we not want the NoSqlDict to be thread-safe?

Before this commit we had a checker that wrote to the database in racy ways and thus threw internal errors, now we have a checker that writes to the database in racy ways and it does not throw internal errors.

Instead of a loud error this checker now has things missing in the database, doesn't it? Or do we guarantee an eventual write?

domenukk · 2021-05-30T22:53:22Z

Ah you mean like a concurrent read, read, write, write? Hm depends on the checker I guess, but yeah could be an issue 🤔

ldruschk · 2021-05-31T08:56:25Z

I think you are mixing up two different problems here.

The original issue was caused by sharing a NoSqlDict object between two different method invocations for the same team. Provided this are two methods which write to the database, e.g. putflag and putnoise, this could cause issues when one method is already finished (i.e. it calls the persist() function), while the other one calls __setitem__ with a different key, which lead to a new key being added to self.cache while the persist()-function was iterating over self.cache.items(), which lead to the exception we saw in Matrix.

This alone could have been adressed by the change here: https://github.com/enowars/enochecker/pull/121/files#diff-45f139d1a77768033431ba68bcc89a15d478cb43d8511f45194cb48a71870267R181

As a result, NoSqlDicts are no longer shared between different instances of the Enochecker-class.

In addition, the introduction of locks makes the NoSqlDict itself thread-safe in general, meaning concurrent accesses to the internal variables, e.g. self.cache are prevented. Accessing the functions of the NoSqlDict from alternating threads now works as expected.

The second issue, which I believe @domenukk is referring to as well, is that when two different threads (or even completely separate checker instances using the same database) are accessing the database entry with the same key at the same time. In that case, thread A might read an object from the DB, thread B reads the same object, thread A writes its changes and thread B writes its changes, thus overwriting the changes from thread A.

If we want to address this, we could look at something like optimistic concurrency control. In that case, the write operation of thread B would fail if it detects that the underlying document has been modified between the read and write in thread B. In that case it would be up to the client application to retry that operation.

The question is: how would we do this in an automated fashion? If we notice a conflict in thread B, how can we resolve that? The easiest way would be to simlpy repeat the checker execution from the point where we read from the database, thus applying all of our changes to the modified object. Does that make sense in the context of enochecker? Of course not.

However, in our case, this will never occur in real circumstances. Provided the checkers use the taskChainId as the key, there will be no conflicts between different invocations for different chains. According to our specification, a second method invocation for the same taskChainId will only occur once the previous invocation has finished.

Of course we could implement OCC for the NoSqlDict and throw exceptions that will not be handled by the checkers, but I am not seeing how this would address any problem that we are likely to encounter in real life. And regardless of that, this is a problem which we will be seeing with all checker libraries once we actually start running the checkers in a distributed fashion. Or at least we would be seeing them, if we wouldn't exclude the case where that could occur in the specification. (Of course people might sitll use different keys than the taskChainId, but that will probably cause bigger issue than those related to concurrent access to the DB).

ldruschk added 2 commits May 30, 2021 13:22

fix concurrent access to self.cache in nosqldict

04c29c4

bump version to 0.4.2

57af36c

ldruschk requested a review from domenukk May 30, 2021 11:23

add -d to docker run in CI

ae2ac20

domenukk reviewed May 30, 2021

View reviewed changes

src/enochecker/nosqldict.py Show resolved Hide resolved

domenukk merged commit 6a3016d into main May 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix concurrent access to self.cache in nosqldict #121

fix concurrent access to self.cache in nosqldict #121

ldruschk commented May 30, 2021

ldruschk commented May 30, 2021

Trolldemorted commented May 30, 2021

ldruschk commented May 30, 2021

domenukk commented May 30, 2021

Trolldemorted commented May 30, 2021

domenukk commented May 30, 2021

ldruschk commented May 31, 2021

fix concurrent access to self.cache in nosqldict #121

fix concurrent access to self.cache in nosqldict #121

Conversation

ldruschk commented May 30, 2021

ldruschk commented May 30, 2021

Trolldemorted commented May 30, 2021

ldruschk commented May 30, 2021

domenukk commented May 30, 2021

Trolldemorted commented May 30, 2021

domenukk commented May 30, 2021

ldruschk commented May 31, 2021