json.JSONDecoder causes Environment.begin to throw ReadersFullError #345

automorphis · 2023-09-14T15:36:15Z

Affected Operating Systems

Ubuntu 22.04, run through WSL on Windows 10
Windows 10

Affected py-lmdb Version

1.3.0

py-lmdb Installation Method

sudo pip install lmdb

Using bundled or distribution-provided LMDB library?

Bundled

Distribution name and LMDB library version

0.9.29

Machine "free -m" output

               total        used        free      shared  buff/cache   available                                        
Mem:           12456          88       12287           0          81       12180                                        
Swap:           4096           0        4096

Describe Your Problem

It took me quite some time to localize this problem and write a minimal reproducible example. The json.JSONDecoder class in the Python standard library doesn't work very well with LMDB, although I cannot understand why.

The json.JSONDecoder class is a little idiosyncratic, in that you change the default decoding function by passing your own function to JSONDecoder.__init__ via the parameter object_hook. Frequently you need object_hook to call the instance-method JSONDecoder.decode, therefore it makes sense for object_hook to be an instance-method itself.

The trouble is, if you have object_hook point to an instance-method, then calling Environment.begin enough times will inexplicably eventually raise ReadersFullError. If object_hook points to a function (not an instance-method), then no such error is raised.

Example:

import json, lmdb, pathlib

class BadDecoder(json.JSONDecoder):

    def __init__(self, txn):
        self.txn = txn
        super().__init__(object_hook = self.obj_hook1)

    def obj_hook1(self):
        pass


class GoodDecoder(json.JSONDecoder):

    def __init__(self, txn):
        self.txn = txn
        super().__init__(object_hook=obj_hook2)


def obj_hook2():
    pass


if __name__ == "__main__":

    num_queries = 100000 # big enough, usually crashes before this number
    db_path = pathlib.Path.home() / "pylmdb_json_mre"
    db_path.mkdir(exist_ok=True)
    db = lmdb.open(str(db_path))
    i = -1

    try:
        for i in range(num_queries):
            with db.begin() as txn:
                decoder = GoodDecoder(txn)
    except lmdb.ReadersFullError:
        print(i)
        raise

    i = -1

    try:
        for i in range(num_queries):
            with db.begin() as txn:
                decoder = BadDecoder(txn)
    except lmdb.ReadersFullError:
        print(i)
        raise

Errors/exceptions Encountered

1597                                                                                                                    
Traceback (most recent call last):                                                                                        
File "/mnt/c/Users/mlane/OneDrive/PycharmProjects/cornifer/scripts/bug_report.py", line 44, in <module>                   
with db.begin() as txn:                                                                                             
lmdb.ReadersFullError: mdb_txn_begin: MDB_READERS_FULL: Environment maxreaders limit reached

The text was updated successfully, but these errors were encountered:

jnwatson · 2023-10-20T21:52:42Z

This is an object lifetime problem that is exposing a subtle pylmdb bug. Keeping a reference to txn in BadDecoder causes a delayed finalization of txn.

As a temporary workaround, I found that adding decoder.txn = None after decoder = BadDecoder(txn) fixes it.

The performant solution is to create the txn context outside the loop.

I will investigate further.

automorphis · 2023-10-31T22:44:36Z

I've encountered ReadersFullError in a different context from the one I posted above. The error occurs if I read from and write to a single Environment using more than one process. I didn't even try to isolate the error, but I did manage a workaround, which is basically "turn it off and on again".

If Environment.begin() throws ReadersFullError, then I do a "soft reset" in the process where the error occurred: I call Environment.close() followed immediately by a call to lmdb.open(). In my test cases, doing soft resets worked if I ran on only two processes. (A soft reset wasn't needed at all for a single process.)

A soft reset could fail in one of two ways: Either lmdb.open() throws ReadersFullError or the first call to Environment.begin() does. If a soft reset fails in any process, I do a "hard reset":

Every (alive) process calls Environment.close(),
Every process waits at a multiprocessing.Event,
Once every alive process is waiting, a single process deletes the lockfile (lock.mdb),
The same process calls lmdb.open (which recreates the lockfile),
The remaining processes call lmdb.open.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json.JSONDecoder causes Environment.begin to throw ReadersFullError #345

json.JSONDecoder causes Environment.begin to throw ReadersFullError #345

automorphis commented Sep 14, 2023 •

edited

Loading

jnwatson commented Oct 20, 2023

automorphis commented Oct 31, 2023

json.JSONDecoder causes Environment.begin to throw ReadersFullError #345

json.JSONDecoder causes Environment.begin to throw ReadersFullError #345

Comments

automorphis commented Sep 14, 2023 • edited Loading

Affected Operating Systems

Affected py-lmdb Version

py-lmdb Installation Method

Using bundled or distribution-provided LMDB library?

Distribution name and LMDB library version

Machine "free -m" output

Describe Your Problem

Errors/exceptions Encountered

jnwatson commented Oct 20, 2023

automorphis commented Oct 31, 2023

automorphis commented Sep 14, 2023 •

edited

Loading