-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDB_CORRUPTED when multiprocessing #269
Comments
One clarification: when i said number of pages used, I meant the |
The only time where I've seen this sort of thing happen is when file locking was accidentally bypassed, for example by sharing a file between docker containers, or opening the database with lock=False. The general concept of what you're doing (multiple processes) is definitely something I've successfully done. I'd be happy to track this to ground, but I need something reproducible. I know that is easier said than done. I've had success with using logging statements on every database action, and then writing a program that can replay those actions. Good luck. |
Hi Nic, thanks for your answer! I am not doing anything bizarre with the DB or the environment. I am using The bus error crashes are easily reproduced here. At least one condition that seems to trigger them is when one process opens the database and uses the The first write operation immediately hits a For completeness, I have tried not using buffers, and these bus error crashes are easily reproducible. On the other hand, the full DB corruption is quite difficult to reproduce, as it seems to depend on the timings and writing patterns of each process. I will try to make a reproducible test, but I am not sure how feasible it is.. |
This whole area seems vaguely familiar. I did run across an issue on Windows once upon a time where a failure in It would be useful to get a stack trace of the crash. You can get that my running your process under gdb or enabling core dumps (e.g. |
This is the top of the stack trace when the Bus error happens, sadly, it does not show much about the critical code path:
|
That's useful. The next step is installing py-lmdb from source and doing it again.
That should get me line numbers at least. |
Thanks for the info, I am quite rusty with C and extensions these days! :) After a few attempts, I managed to get a better stacktrace. I also compiled with -O0 so nothing is optimised out. This is the Bus error crash:
|
One thing I noticed while testing this, is that there is definitely a bug in the persistance of map_size changes. Here is a minimal reproducible PoC, using the just compiled library:
|
I'm not 100% convinced either way that mapsize not sticking is a problem. Ultimately, the size of the file is a function of the OS mmap function. If there's no write to the latter part it might be OK? At the same time, I do agree it looks fishy. This whole area is problematic. See #96. Also see: https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L4056. Do you happen to be calling set_mapsize while there's an open transaction in the same process? A real bug would be if there's a problem when calling set_mapsize when another process has a transaction. It certainly isn't reasonable for users to have external inter-process locking to increase map size. |
Back to your call trace, the error looks pretty basic. It is looking for the root of the btree and crashing, which probably means there's a disconnect between what's being mapped in and where the root is. I'm definitely interested in getting to the root of this problem. I'm going to work tonight on seeing if I can exploit what you found about the map size decreasing to cause a crash between two processes. If I launch a second process after the map size increase, but before the decrease, open a write transaction in the second process, then let the first process environment close, maybe that will crash? |
The size of the file is not an issue, but the fact that the mapsize reported is just the file size means that I end up with different processes having different ideas of the mapsize, if one process starts after the second has done a resize. One scenario that I am not sure how is handled is: if this second process tries to write, and the mapsize is exactly the file size, it will trigger a MDB_MAP_FULL immediately and resize; but at the same time the first process was operating with a different map_size that might be smaller or bigger than the new mapsize of the second process, what happens there?
No, I implemented a locking mechanism similar to the one suggested in #96. I keep a count of active transactions, when I detect a MDB_MAP_FULL, I use a barrier to wait for all transactions to finish, then close the database, open again and resize, and only then new transactions can start.
I was thinking about that while trying to find a workaround for this.. It would be very difficult to do that properly. The bus error crashes do seem to be caused by something like that though, but I am not sure yet if it is because the other process has a transaction, or if it also happens while idling. |
I really appreciate your help! This is driving me nuts, as I have a whole thing written on top of LMDB, and now I can't find a way to ensure it will not lose all the data. Please, let me know if I can help in any way.
One thing to note is that in these latter tests, I did not see the mapsize decreasing, it looked more like it not being persisted and the file size being the only indication of it. OTOH, the logs from the big crash with data corruption DO indicate that somehow the mapsize was decreased below the last page used. |
I have spent some more time on this today looking at the C code.. Two things struck me:
|
Well, number 3 can only happen if you have write_map set. I was unable to repro anything. Looking at some of my code that does the same thing, there is a value that you can pass as a parameter to I'm afraid all that is left is printf debugging. You can comment a line out here and you'll get a bunch of stuff printed out that might help figure out the order of operations. |
Yes, I have it set.. Is that a bad idea? From the docs and considering how I am using it, it seemed safe. More importantly, unless I am missing something, there is a race condition there where you could be truncating the file that has just been extended by another process. It is a bit difficult to follow for me, but from what I see In any case, it does not change the behaviour I see with the short test I wrote 2 days ago, but it does seem to prevent the bus errors.
Not even the bus error? Sorry, I thought you had already. I will work on getting a repro test. You were able to reproduce the mapsize shrinking after reopen, right?
Oh, I was not expecting that, and I see that the mapsize gets corrected after doing it... Now, isn't this a bug?
Thanks for the pointer, I see now why defining constants in BTW, I forgot to show what I am doing in case it helps.. This is the wrapper class I wrote: https://gist.github.com/NightTsarina/bd42736531a52c2843ee3e6d0d8b0ae2 |
Writemap should be safe, but turning it off does help characterize the problem. I don't think the use of See http://www.lmdb.tech/doc/group__mdb.html#gaa2506ec8dab3d969b0e609cd82e619e5 for details on the upstream set_mapsize API. "This function may be called with a size of zero to adopt the new size." I need to document this in the Python API. The other details of that call might be important too. I'll take a look at your code later. |
One more question? You're not sharing the lmdb file across different docker containers or from inside outside a container, right? I don't believe file locking works correctly across a container boundary. |
OK, I wiil try to get you a way to reproduce the bus error. About the API, I had read it, but did not expect having to call it with zero just after opening the DB! In any case, I still need to wait for all transactions to finish if I want to use it after a MDB_MAP_RESIZED, so it does not change much what I need to do. And no, no containers are in use at all.. I think they do not affect file locking unless you are using some bizarre fuse tricks, but it is not the case here. |
IIRC, the problem with file locking was on the Mac between the host and the container (running different OS's). I look forward the the repro. |
Sorry for the delay, I finally have a script to reproduce this problem very consistently. It took me way longer to manage errors, dead processes and whatnot than actually reproducing the problem :) This is the script: https://gist.github.com/NightTsarina/e6bb8734bc15f333ccdcd6ae7d31893f I have run this multiple times on my laptop running Debian unstable with linux 5.9.0-1-amd64; I have tried compiling the lib directly from your repo and I made the script so it can take two parameters: whether to use On the other hand, if I set
When running under
|
This gives me a lot to work with. On the bright size, running it with 0 1 seems to be reliable. I'll investigate everything else. |
With lots of debug prints, I can confirm the fruncate issue you identified. One process is nuking the last part of the database file out from under the other. I'm still on it. |
Thank you for your patience. It is clear to me now that you've known the problem for a while. |
No, thank you for taking the time to investigate this 🥰 |
New minimal repro:
|
OK. I have at least one problem characterized. Ultimately, the meta page (with the map size) is being saved and read correctly. The problem is that the in-memory value ignores it if the new value is not before the end of the current data (see https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L4416). It then The problem is that the first process then uses its cached in-memory (in env) version, so it will happily try to write past the end of the file. This can happen any time you open an environment or call This doesn't explain why it still fails when As long as the smaller one doesn't end up writing the meta page with the smaller value, we don't have a loss of data. I will continue to explore this direction. The question is whether it is possible for the smaller-value process to successfully complete a transaction (without getting a This is clearly an upstream problem. I'll need to put together a repro in C. |
Hi,
Ah, i see, that makes sense. It should probably be checking if the on-disk mapsize is bigger than the configured one.
Wouldn't it also be a problem when the second process writes a transaction and persists the smaller map_size?
The issue with
Yeah, it seems so. Please let me know if I can help somehow! |
Yes. I need to confirm whether this is possible.
The meta page is always read, but then chooses to ignore it, preferring to minimize the file size to what's actually used. I need to confirm that it is a failure to mmap, then ftruncate, then write to the old end of the file (the behavior is actually undefined in POSIX). There is a default value (https://github.com/jnwatson/py-lmdb/blob/master/lib/mdb.c#L615), but it is only applied when a database is first created. |
I spoke too soon. You are absolutely on the right path regarding the default mapsize. This is actually a bug in something I control. py-lmdb provides a default value (that's the same value as the default value for lmdb when creating a database). This might be completely fixed if I change that default value to 0. |
And that fixes it completely. All four combinations of your repro pass once I changed the default in py-lmdb. What we have here is two bugs, one in py-lmdb and one in lmdb. The first bug is that a non-zero default value of That bug triggers a second bug in the underlying lmdb where opening a database with |
Upstream bug report: https://bugs.openldap.org/show_bug.cgi?id=9397 |
Hi Nic, OTOH, I had completely missed the message when you say removing the default value avoids this bug completely! If I understand your code correctly, it would be enough to pass |
Making map_size=0 significantly reduces the window where a problem could happen, but it doesn't eliminate it. The patch I'll submit upstream will narrow that further, but can't eliminate it without explicit locking. My immediate plans are to implement the external locking in py-lmdb since it doesn't seem like upstream is amenable. |
The same error is when py-lmdb 1.2.1 builds on linux_ppc64le platform (py3.6-py3.9):
|
This is not actually the same issue. Please open a new issue. |
Hi Apologies in advance for making use of this thread! I think its same one but different configuration so I am not sure.. Here is the script which can reproduce the Bus errors quite easily ( Summary:
Crash (ubuntu 18.04/python 3.7.5/lmdb 1.2.1)
With I am looking forward to hear your thoughts on this.. Thank you in advance! |
See my comment here: #301 (comment) Essentially, multiprocessing is being too smart and recycling processes that still have open environment handles. You're relying on the garbage collector to close the DB. Try an explicit |
Thank you Nic for your quick reply and apolgoies for the delay (Different timezones) I have looked into your suggestion and I have tried following things
I can confirm the issue still persists.. And this is happening when new process open environment while writer doing resizing.. For now, we have decided to go with big mapsize (1TB) to avoid frequent map resizing. We have also few questions about below configuration. Sorry for these (if there is any channel for communication with you, please let me know, I will approach via that) Plan: We want to use async writing to make requests faster and also avoid writing for every request and reduce IOPS.
I read the document and I have following questions
With above configuration,
Thank you in advance! |
sync=False is not compatible with multiple processes accessing the data simultaneously. |
Okay, Thank you! |
Is this the same/related to the bug filed here? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244493 Seems like there was a recent fix for it |
Affected Operating Systems
Linux Debian testing (bullseye)
Affected py-lmdb Version
1.0.0
py-lmdb Installation Method
apt
Using bundled or distribution-provided LMDB library?
Distribution
Distribution name and LMDB library version
(0, 9, 24)
Machine "free -m" output
Other important machine info
All pretty standard, not using containers, and cgroups as might be configured by systemd/apache2. Filesystem is ext4.
Describe Your Problem
Hi,
I know this might be an upstream issue, but it is difficult to tell. It is a difficult-to-reproduce scenario, but the outcome is very serious: once this happened, my application stopped working, and found no way of fixing the database except by re-loading all the data from scratch (
mdb_dump
/mdb_load
did not seem to handle it properly and give no debugging details).So my application is built around py-lmdb, I have my own Environment and Transaction wrapping classes, that allow me to ensure only one Environment instance is created, and to keep track of all the opened transactions. Those classes are also able to detect MDB_MAP_FULL and MDB_MAP_RESIZED, and when either of those happen, they wait for all transactions to finish, close the environment, and reopen it (with an increased map size in the first case, and with the detected map size in the second case).
This was working like a charm with multiple threads, until I started also using multiple processes. A few times, either process will fill the DB, close and resize it; which will trigger the other process detecting the map being increased under its feet, and reopening the DB. But then a sequence of events happened that ended with the database corrupted.
This is an extract of the logs of both processes (process 1 is an WSGI application, process 2 is a CLI application), in which I see:
map_size
has not been properly persisted, although I always commit a write transaction after opening the DB.map_size
.Process 1
Process 2
So, any clues? Is there some extra precaution I should be taking when doing these complicated dances? Or is this just some obscure bug i stepped on? Any help would be appreciated.
The text was updated successfully, but these errors were encountered: