-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about set_key performance #199
Comments
Wow. I think you've uncovered a pretty significant problem. This would be a problem for any iterator where values=False. From a user-facing API perspective, I don't think it makes sense to make the user decide whether to preload or not. The right place to preload is actually when they retrieve the data. We should mark in the cursor the first time we preload (and clear when the position is moved) so we don't do it twice for the same key. Good catch. I'm going to work on this later Monday. |
Great, that sounds better than what I had in mind! On a related note, the user-facing API has a |
I like both your ideas. The only issue I see is working out the interaction between get and dupsort DBs. |
I have a branch up. #202. It doesn't work on Windows right now. It would be fantastic if you could see if this helps your performance issue. I'm scratching my head thinking of a way to actually evaluate whether this worked. |
You can test whether a page was pulled into cache using mincore(): mincore = ctypes.CDLL(None).mincore
mincore.argtypes = [ctypes.c_void_p, ctypes.c_size_t, ctypes.c_void_p]
bitset = bytearray(1) # 8 bits, room to track status of 8 pages
somevalue = "foo"
assert 0 == mincore(
ctypes.cast(somevalue, ctypes.c_void_p).value & ~0xfff, # round down to pagesize
4096,
(ctypes.c_char*1).from_buffer(bitset),
)
print("page is in memory?", bitset[0] & 0x01) It should be possible to extract the underlying pointer for buffer() objects returned when edit: seems ctypes won't do it directly, but it's still possible |
Thanks. I can confirm that it's a lot faster with this change. My very rudimentary test was to create a new db and populate it with a number of large-ish values, each of which is just a chunk of random noise (I went with 100 x 32MB), then open the env again in a new process and do a |
OK. I'm all set. David, would you mind reviewing? |
Fix is merged to master and released in 0.96. |
This is a question, not a problem per se. I'm not sure of the correct etiquette, so please accept my apologies if I've got it wrong.
My use case involves repeatedly calling
set_key
on a Cursor object, for a large number of keys, purely in order to determine whether or not each such key exists within the database. The values associated with the keys may be large, and I don't want to read them.The invocation of
preload
inside the_cursor_get_c
function (added by @dw in this commit) results in every page of the value being touched, potentially generating a number of page faults. This is clearly undesirable if the caller has no intention of actually reading the values.My question is whether it would be possible to allow the caller to "opt out" of this behaviour on a per-call (or per-Cursor) basis? I'd be happy to attempt the change myself and submit a PR, but obviously there's not a lot of point if you think it's a terrible idea (or have an alternative suggestion).
Many thanks.
The text was updated successfully, but these errors were encountered: