Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix concurrent access to self.cache in nosqldict #121

Merged
merged 3 commits into from
May 30, 2021

Conversation

ldruschk
Copy link
Member

No description provided.

@ldruschk ldruschk requested a review from domenukk May 30, 2021 11:23
@ldruschk
Copy link
Member Author

note that the tests do not actually check for the concurrent access issue at the moment, since testing for race conditions is hard to do in a deterministic fashion. Although something like https://github.com/idlesign/pytest-race might be worth a look in the future

@Trolldemorted
Copy link
Member

note that the tests do not actually check for the concurrent access issue at the moment

Should we try to lock and raise an error if it already is locked? This way we are hiding problems we ought to fix, aren't we?

@ldruschk
Copy link
Member Author

note that the tests do not actually check for the concurrent access issue at the moment

Should we try to lock and raise an error if it already is locked? This way we are hiding problems we ought to fix, aren't we?

What problem would we be hiding? Why would we not want the NoSqlDict to be thread-safe?

@domenukk domenukk merged commit 6a3016d into main May 30, 2021
@domenukk
Copy link
Member

I think it's fine with a Lock. Checkers generally shouldn't spawn their own threads, I guess that's Bennis point (?) but it's rather irrelevant here

@Trolldemorted
Copy link
Member

What problem would we be hiding? Why would we not want the NoSqlDict to be thread-safe?

Before this commit we had a checker that wrote to the database in racy ways and thus threw internal errors, now we have a checker that writes to the database in racy ways and it does not throw internal errors.

Instead of a loud error this checker now has things missing in the database, doesn't it? Or do we guarantee an eventual write?

@domenukk
Copy link
Member

Ah you mean like a concurrent read, read, write, write? Hm depends on the checker I guess, but yeah could be an issue 🤔

@ldruschk
Copy link
Member Author

I think you are mixing up two different problems here.

The original issue was caused by sharing a NoSqlDict object between two different method invocations for the same team. Provided this are two methods which write to the database, e.g. putflag and putnoise, this could cause issues when one method is already finished (i.e. it calls the persist() function), while the other one calls __setitem__ with a different key, which lead to a new key being added to self.cache while the persist()-function was iterating over self.cache.items(), which lead to the exception we saw in Matrix.

This alone could have been adressed by the change here: https://github.com/enowars/enochecker/pull/121/files#diff-45f139d1a77768033431ba68bcc89a15d478cb43d8511f45194cb48a71870267R181

As a result, NoSqlDicts are no longer shared between different instances of the Enochecker-class.

In addition, the introduction of locks makes the NoSqlDict itself thread-safe in general, meaning concurrent accesses to the internal variables, e.g. self.cache are prevented. Accessing the functions of the NoSqlDict from alternating threads now works as expected.

The second issue, which I believe @domenukk is referring to as well, is that when two different threads (or even completely separate checker instances using the same database) are accessing the database entry with the same key at the same time. In that case, thread A might read an object from the DB, thread B reads the same object, thread A writes its changes and thread B writes its changes, thus overwriting the changes from thread A.

If we want to address this, we could look at something like optimistic concurrency control. In that case, the write operation of thread B would fail if it detects that the underlying document has been modified between the read and write in thread B. In that case it would be up to the client application to retry that operation.

The question is: how would we do this in an automated fashion? If we notice a conflict in thread B, how can we resolve that? The easiest way would be to simlpy repeat the checker execution from the point where we read from the database, thus applying all of our changes to the modified object. Does that make sense in the context of enochecker? Of course not.

However, in our case, this will never occur in real circumstances. Provided the checkers use the taskChainId as the key, there will be no conflicts between different invocations for different chains. According to our specification, a second method invocation for the same taskChainId will only occur once the previous invocation has finished.

Of course we could implement OCC for the NoSqlDict and throw exceptions that will not be handled by the checkers, but I am not seeing how this would address any problem that we are likely to encounter in real life. And regardless of that, this is a problem which we will be seeing with all checker libraries once we actually start running the checkers in a distributed fashion. Or at least we would be seeing them, if we wouldn't exclude the case where that could occur in the specification. (Of course people might sitll use different keys than the taskChainId, but that will probably cause bigger issue than those related to concurrent access to the DB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants