Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MonoDict and TripleDict data structures #13387

Closed
nbruin opened this issue Aug 22, 2012 · 70 comments
Closed

Improve MonoDict and TripleDict data structures #13387

nbruin opened this issue Aug 22, 2012 · 70 comments

Comments

@nbruin
Copy link
Contributor

nbruin commented Aug 22, 2012

On #715 and #12313 some variants of WeakKeyRef data structures are introduced, used in caching for, among other things, uniqueness of parents and coercions. In #12313 comment 162 is some data on how these data structures can be improved.

Apply:

Depends on #11521
Depends on #12313

CC: @simon-king-jena @jpflori

Component: memleak

Author: Nils Bruin

Reviewer: Simon King

Merged: sage-5.8.beta2

Issue created by migration from https://trac.sagemath.org/ticket/13387

@nbruin nbruin added this to the sage-5.6 milestone Aug 22, 2012
@nbruin
Copy link
Contributor Author

nbruin commented Aug 22, 2012

First shot at improvements of MonoDict and TripleDict data structures

@nbruin
Copy link
Contributor Author

nbruin commented Aug 22, 2012

comment:1

Attachment: trac_13387.patch.gz

Initial tests show only a doctest failure due to a doctest change introduced on #12313 on sage/categories/modules_with_basis.py. We'll see what the bots think.

@nbruin
Copy link
Contributor Author

nbruin commented Aug 23, 2012

comment:3

Excellent work! Looks promising. I tested IdKeyDict against the revamped classes here with the same tests you posted there. The MonoDict wins of course:

sage: %timeit K in W
625 loops, best of 3: 38.1 µs per loop
sage: %timeit K in M
625 loops, best of 3: 112 ns per loop
sage: %timeit K in I
625 loops, best of 3: 160 ns per loop
sage: %timeit K2 in W
625 loops, best of 3: 680 ns per loop
sage: %timeit K2 in M
625 loops, best of 3: 64.1 ns per loop
sage: %timeit K2 in I
625 loops, best of 3: 104 ns per loop
sage: %timeit W[K]
625 loops, best of 3: 36.7 µs per loop
sage: %timeit M[K]
625 loops, best of 3: 99.2 ns per loop
sage: %timeit I[K]
625 loops, best of 3: 146 ns per loop

but already for the TripleDict it seems a reasonable choice:

sage: %timeit (K,K,True) in I
625 loops, best of 3: 392 ns per loop
sage: %timeit (K,K,True) in T
625 loops, best of 3: 451 ns per loop
sage: %timeit (K2,K,True) in I
625 loops, best of 3: 133 ns per loop
sage: %timeit (K2,K,True) in T
625 loops, best of 3: 541 ns per loop
sage: 
sage: %timeit I[K,K,True]
625 loops, best of 3: 366 ns per loop
sage: %timeit T[K,K,True]
625 loops, best of 3: 432 ns per loop

I don't think I'll even understand modern CPU's ... (that or TripleDict can be further tuned. It was a first try. I think I did use some ideas that might be useful for you, though.

Anyway, at least the iteration needs some work (nice that cython generators now work!)

sage: from sage.structure.idkey_dict import IdKeyDict
sage: I = IdKeyDict(3,53, threshold=0.7)
sage: L = []
sage: for p in prime_range(10000):
....:         L.append(GF(p)['x','y'])
....: 
sage: for i,K in enumerate(L):
....:         I[K,K,True] = i
....: 
sage: from collections import Counter
sage: Counter([k in I for k,v in I.iteritems()])
Counter({False: 1229})
sage: Counter([(k,k,True) in I for k in L])
Counter({True: 1229})
sage: Counter([(k[3](),k[4](),k[5]) in I for k,v in I.iteritems()])
Counter({True: 1229})

but as you see, that's easily fixed. You just have to extract the actual keys. Doing it this way might report errors a bit better than with the unsafe casting of k[:3] (dead weakrefs etc.)

sage: Counter([(id(k[3]()),id(k[4]()),id(k[5])) == k[:3] for k,v in I.iteritems()])
Counter({True: 1229})

Pitfall, by the way:

sage: import weakref
sage: W=weakref.ref(ZZ)
sage: weakref.ref(W)
TypeError: cannot create weak reference to 'weakref' object

So:

sage: from sage.structure.idkey_dict import IdKeyDict
sage: import weakref
sage: class D(object): pass
....: 
sage: d = D()
sage: W = weakref.KeyedRef(d,None,None)
sage: I = IdKeyDict(1,53, threshold=0.7)
sage: I[W]= 10
sage: I[W]
10
sage: del d
sage: len(I)
1
sage: I[W]
KeyError: <weakref at 0x5f6c4d0; dead>
sage: len(I)
0
sage: W
<weakref at 0x5f6c4d0; dead>

Note that my key, W still exists! Also note that an IdKeyDict would accept mutable (unhashable) objects.
That might be a selling point beyond its "weak" and speed properties.

So we should document that one should not use KeyedRefs as key elements and probably forbid them. For key construction, after weakreffing has failed, doing a little isinstance(k,KeyedRef) shouldn't be so bad (that's not the code path we really care about anyway)

@nbruin
Copy link
Contributor Author

nbruin commented Jan 3, 2013

comment:4

I noticed this python-ideas thread. So there seems some interest for dictionaries keyed by ID elsewhere too (although they're not mentioning a weak key dict)

@simon-king-jena
Copy link
Member

comment:5

I see a logistic problem: The aim of #13904 is to fix some problems introduced with #715. It should be a quick solution, since otherwise #715 will be unmerged. However, the ticket here also depends on #12313, which has a positive review, but isn't merged yet. So, that's too slow.

Therefore, I suggest that #13904 shall use the ideas from here, but only applied to TripleDict, so that #715 can remain in Sage.

I think it is worth while to provide the IdKeyDict code in its full generality (that's an attachment I made in #12313), but I am not sure, perhaps that's for a different ticket. Then, #12313 should be tackled, and probably again done as in IdKeyDict (but perhaps with specialised code).

In other words: I suggest to split this ticket (one part done in #13904 ASAP, the other part later in #12313), so that this ticket shall be marked as a duplicate.

Do you agree?

@simon-king-jena simon-king-jena removed this from the sage-5.6 milestone Jan 3, 2013
@nbruin
Copy link
Contributor Author

nbruin commented Jan 3, 2013

comment:6

Replying to @simon-king-jena:

I see a logistic problem: The aim of #13904 is to fix some problems introduced with #715. It should be a quick solution, since otherwise #715 will be unmerged. However, the ticket here also depends on #12313, which has a positive review, but isn't merged yet. So, that's too slow.

Therefore, I suggest that #13904 shall use the ideas from here, but only applied to TripleDict, so that #715 can remain in Sage.

I think it is worth while to provide the IdKeyDict code in its full generality (that's an attachment I made in #12313), but I am not sure, perhaps that's for a different ticket. Then, #12313 should be tackled, and probably again done as in IdKeyDict (but perhaps with specialised code).

In other words: I suggest to split this ticket (one part done in #13904 ASAP, the other part later in #12313), so that this ticket shall be marked as a duplicate.

Do you agree?

There is nothing wrong with TripleDict.

EDIT: It used to say except for an eraser problem we don't know how to fix. We should solve that for now by forbidding the problematic keys in the documentation. In other words: "Don't use TripleDict unless you know what you're doing". Then TripleDict doesn't have any known bugs and we can take our time to think about how to make TripleDict more generally applicable. here. There is no such problem as far as we know now though (required reading:
Modules/gc_weakref.txt)

Memory management, deallocation, and weakref callbacks are genuinely hard ...

@jpflori
Copy link

jpflori commented Jan 4, 2013

comment:7

I'd prefer to get #13904 quickly in, then #12313 as it is (or with minimal changes to avoid bugs), and then introduce this one which deals with imporvements rather than fixes.

@nbruin
Copy link
Contributor Author

nbruin commented Jan 28, 2013

Attachment: trac_13387-rebased.patch.gz

rebased to apply cleanly over #12313 (28 jan 2013)

@nbruin
Copy link
Contributor Author

nbruin commented Jan 28, 2013

comment:8

Concerning the timing reported on #12313:300, with the patch here we get

sage: M=sage.structure.coerce_dict.MonoDict(23)
sage: M[ZZ]=1
sage: %timeit _=M[ZZ]
625 loops, best of 3: 240 ns per loop

which is an improvement of 91 ns over the 331 ns reported there. Since a normal dictionary times 110 ns for this operation, and this patch avoids the internal use of another dictionary lookup in _refcache, it seems like the gain is indeed basically purely the removal of _refcache.

We now have

sage: x=-20
sage: def test():
....:            for n in xrange(10**7):
....:              _=QQ(x)
....: 
sage: sage: time test()
Time: CPU 7.92 s, Wall: 7.95 s

That is a bit of a gain over Time: CPU 8.53 s, Wall: 8.57 s we have without #13387, but still quite a bit worse than the Time: CPU 2.97 s, Wall: 2.98 s we have prior to #12313. The difference is that the dictionary that stores the conversion and coercion maps turned into a weakkeyref dict (MonoDict), which is necessarily slower in key lookup than a normal dict, because there's an extra indirection level in the keys.

Concerning Jeroen's original report, we now have (#12313 + #13387)

sage: def test(RR):
....:         for d in range(-20,0):
....:             if abs(RR(quadratic_L_function__numerical(1, d, 10000) - quadratic_L_function__exact(1, d))) > 0.001:
....:                 print "Oops!  We have a problem at d = ", d, "    exact = ", RR(quadratic_L_function__exact(1, d)), "    numerical = ", RR(quadratic_L_function__numerical(1, d))
....: 
sage: time test(RealField(50))
Time: CPU 1.81 s, Wall: 1.82 s

versus reference timing

Time: CPU 1.63 s, Wall: 1.64 s

To compare, on this machine, the same test with just #12313 takes

Time: CPU 2.07 s, Wall: 2.08 s

so the improvements here do significantly reduce the regression originally reported by Jeroen.

@nbruin
Copy link
Contributor Author

nbruin commented Jan 28, 2013

comment:9

Incidentally, doing a del on a bucket slice is indeed safe: Python defers DECREF activity until the slice has been removed from the list. Also, since the slices we delete are not longer than 8 entries, the temporary references are kept on the stack (faster!). See Objects/listobject.c:614. That means the deletion routines as proposed should be safe, also in view of callbacks.

The only scenario that would lead to undue deletions due to callback would be if a callback on a key with id=I gets delayed until a new object with the same id I gets stored in the same dictionary. However, that would need an object to get deallocated before its callbacks get executed or scrapped. That seems like a real violation of protocol.

For the truly paranoid, we could check in the Eraser classes that the weakref corresponding to the callback is indeed NULL-ed. However, I'd be shocked if that were necessary.

@simon-king-jena
Copy link
Member

comment:10

Somewhere I had posted a general IdKeyDict, with a fixed key length that can be chosen at creation time, weak keys (where possible) and comparison by identity. But I can't find it. Can you help me?

@nbruin
Copy link
Contributor Author

nbruin commented Feb 4, 2013

comment:11

It's where everything is: #12313 (idkey_dict attachment)

@nbruin
Copy link
Contributor Author

nbruin commented Feb 5, 2013

Attachment: trac_13387-fast-weakref-calls.patch.gz

Use PyWeakref_GetObject

@nbruin
Copy link
Contributor Author

nbruin commented Feb 5, 2013

comment:12

Yippee!! using PyWeakref_GetObject instead of calling weakrefs is indeed a little faster:

sage: M=sage.structure.coerce_dict.MonoDict(23)
sage: M[ZZ]=1
sage: %timeit _=M[ZZ]
625 loops, best of 3: 211 ns per loop
sage: %timeit _=M[ZZ]
sage: x=-20
sage: def test():
....:         for n in xrange(10**7):
....:             _=QQ(x)
....:  
sage: sage: time test()
Time: CPU 7.76 s, Wall: 7.79 s
sage: def test(RR):
....:             for d in range(-20,0):
....:                 if abs(RR(quadratic_L_function__numerical(1, d, 10000) - quadratic_L_function__exact(1, d))) > 0.001:
....:                     print "Oops!  We have a problem at d = ", d, "    exact = ", RR(quadratic_L_function__exact(1, d)), "    numerical = ", RR(quadratic_L_function__numerical(1, d))
....: 
sage: time test(RealField(50))
Time: CPU 1.76 s, Wall: 1.77 s

So we shave a little more off (something like 30ns, i.e., some 15% on a simple MonoDict lookup)

Apply trac_13387-rebased.patch trac_13387-fast-weakref-calls.patch

TODO: Further improvement: in the coercion framework, hardwire the use of MonoDict, provide get and set cdef methods and call those to eliminate further method lookup overhead.

@nbruin
Copy link
Contributor Author

nbruin commented Feb 5, 2013

comment:13

I think the patches on this ticket have value already, so I'm refiling under "sage-feature". We'd already benefit from merging these tickets as-is. That said, there is probably room for some easy further improvements, so we can wait with merging it unless someone deems this tickets needs to be a dependency for another one that needs merging.

@nbruin
Copy link
Contributor Author

nbruin commented Feb 9, 2013

comment:14

TripleDict has cdef get and set, which are directly called in coercion. This is faster than relying on the slotted __getitem__ and __setitem__ calls (which are still a bit better than completely generic method lookup). Implementing this for MonoDict and ensuring that it's used for _convert_from_hash and _coerce_from_hash speeds things up to an extent that we seem to lose any speed regression wrt prior #12313:

sage: def test(RR):
....:         for d in range(-20,0):
....:                 if abs(RR(quadratic_L_function__numerical(1, d, 10000) - quadratic_L_function__exact(1, d))) > 0.001:
....:                         print "Oops!  We have a problem at d = ", d, "    exact = ", RR(quadratic_L_function__exact(1, d)), "    numerical = ", RR(quadratic_L_function__numerical(1, d))
....:
sage: %time test(RealField(50))
CPU times: user 1.50 s, sys: 0.01 s, total: 1.51 s
Wall time: 1.51 s

and

sage: x=-20
sage: def test():
....:            for n in xrange(10**7):
....:                  _=QQ(x)
....:          
sage: %time test()
CPU times: user 3.70 s, sys: 0.01 s, total: 3.71 s
Wall time: 3.71 s

I have noticed that these timings can vary quite a bit between compiles and branch names. I guess loops this tight get sensitive to cache alignment and fortunate code locations.

Apply trac_13387-rebased.patch trac_13387-fast-weakref-calls.patch trac_13387-cdef_monodict_access.patch

@nbruin
Copy link
Contributor Author

nbruin commented Feb 9, 2013

Attachment: trac_13387-cdef_monodict_access.patch.gz

cdef get and set access to MonoDict to speed up coercion lookups

@nbruin
Copy link
Contributor Author

nbruin commented Feb 9, 2013

comment:15

Hm. Seems MonoDict had a _get_buckets method with a doctest referring to TripleDict. Since both MonoDict and TripleDict now have cdef get and set methods with incompatible signatures, we cannot have TripleDict inherit from MonoDict anymore, so I've just copied the definition and adapted the doctest. _get_buckets is only for debugging anyway.

@simon-king-jena
Copy link
Member

comment:16

Replying to @nbruin:

Hm. Seems MonoDict had a _get_buckets method with a doctest referring to TripleDict. Since both MonoDict and TripleDict now have cdef get and set methods with incompatible signatures, we cannot have TripleDict inherit from MonoDict anymore,

I think this is why I originally dropped get/set for MonoDict.

What about having different names for the getters and setters (mget/mset versus tget/tset)? It would avoid the duplication, but would certainly not be very clean.

@simon-king-jena
Copy link
Member

comment:17

By the way, what patches are to be applied? The following?

Apply trac_13387-rebased.patch trac_13387-fast-weakref-calls.patch trac_13387-cdef_monodict_access.patch

@nbruin
Copy link
Contributor Author

nbruin commented Feb 9, 2013

comment:18

Replying to @simon-king-jena:

What about having different names for the getters and setters (mget/mset versus tget/tset)? It would avoid the duplication, but would certainly not be very clean.

I think it's worse than the current situation. The reality is that MonoDict and TripleDict only share some data attribute names and a debug routine. They are each separately implemented. Code sharing is more inviting bugs due to unintended side-effects of code changes that appear local (as demonstrated here) than saving work. I think having the classes independent is better. If it were up to me I'd also make the Eraser classes independent, but that's just a small issue. In the end, both are just unrolled special cases of IdKeyDict (which, as we found, might be a contender for TripleDict but not for MonoDict) so we don't have them for codce cleanliness but for speed.

The patch order is correct, as with the Apply statement before. Note that the patchbot results page sorts records by the time local to the bot, so reports are not necessarily in reverse chronological order. It did get a "tests past" with the current patch sequence.

@nbruin

This comment has been minimized.

@nbruin

This comment has been minimized.

@simon-king-jena
Copy link
Member

Combined patch of previous work, rel 5.8.beta0

@simon-king-jena
Copy link
Member

comment:43

Attachment: trac_13387-combined-rel5.8.b0.patch.gz

If patchbot starts with sage-5.8.beta0, it should first apply one patch from #12313 (namely the one that reverts the usage of a callback for the weak reference to homsets), and then

Apply trac_13387-combined-rel5.8.b0.patch trac_13387-guard_GC.patch

@simon-king-jena

This comment has been minimized.

@simon-king-jena
Copy link
Member

comment:44

I am puzzled. The patchbot is applying the correct patches in the correct order to the correct sage version, but almost all patches fail to apply.

For the record, I have

trac_12313-revert_callback_from_11521.patch
trac_13387-combined-rel5.8.b0.patch
trac_13387-guard_GC.patch

on top of sage-5.8.beta0.

@simon-king-jena
Copy link
Member

comment:45

To be on the safe side, I was deleting the three patches and downloaded them from trac. They applied fine. So, I think the error is on the side of the patchbot.

@simon-king-jena
Copy link
Member

comment:46

Argh. The patch keeps failing to apply! Could it be that the "official" version of 5.8.beta0 is different from the one that I had downloaded?

@nbruin
Copy link
Contributor Author

nbruin commented Feb 24, 2013

Changed work issues from rebase to none

@nbruin

This comment has been minimized.

@nbruin
Copy link
Contributor Author

nbruin commented Feb 24, 2013

comment:47

I don't see why you think rebasing is necessary. The previous patchbot logs already show that the ticket here applies cleanly on 5.8beta0 and that tests pass.

The fact that #12313 fails to apply with its currently stated dependencies is expected: The patchbot doesn't understand "merge with". However, if patches apply here and tests pass, that's an implicit validation of #12313 (with the understanding that this ticket needs to be merged after it).

I'm also refiling this as a "defect" rather than an enhancement: Some of the fixes here fix real bugs (although we might not have seen them triggered).

Let's try again, since the "guard_GC" patch didn't get picked up by the bot before:

Apply trac_13387-combined.patch trac_13387-guard_GC.patch

@simon-king-jena
Copy link
Member

Attachment: trac_13387-guard_gc.patch.gz

Guard against GC during dict resize

@simon-king-jena

This comment has been minimized.

@simon-king-jena
Copy link
Member

comment:48

IIRC, patchbot won't apply patches that have an uppercase letter in the name.

Hence, I renamed your second patch.

Apply trac_13387-combined.patch trac_13387-guard_gc.patch

@simon-king-jena
Copy link
Member

comment:49

What is the startup_modules script complaining about?

@nbruin
Copy link
Contributor Author

nbruin commented Feb 24, 2013

comment:50

Replying to @simon-king-jena:

What is the startup_modules script complaining about?

It's complaining that there seems to be a new module now, sage.structures.gc. I don't know why that's a bad thing. It's a direct consequence of the statement import gc that I added, which I need because I need to do gc.enable() and gc.disable(). This is in a routine that should rarely execute and has an expensive inner loop that doesn't contain these statements, so I don't think the python method calls are going to make a difference. Furthermore, I wasn't able to find a C-level interface to this functionality. I've checked the source of the GC module and the underlying C routines are explicitly declared static, so our only access seems to be via method lookup.

If someone has a recommended way of accessing this functionality without importing gc, that's fine with me. Moving the import gc statement into runtime, say one of the __init__ methods is a bad idea: Several of these dictionaries get instantiated every time a parent's coercion framework is initialized (which can be pretty often), but a documented C-level API for turning gc on and off would be fine.

@simon-king-jena
Copy link
Member

Reviewer: Simon King

@simon-king-jena
Copy link
Member

comment:51

With sage-5.8.beta0 plus #14159 plus #12313 plus #13387, make ptest works fine for me. The ideas behind the patches from here are well-documented in comments in the code. The precautions taken to prevent trouble with garbage collection happening in the wrong moment seem reasonable to me. And we have seen that the performance is fine as well.

Hence, it is a positive review.

@simon-king-jena
Copy link
Member

comment:52

PS: From my perspective, it is no problem that gc is imported during Sage startup.

@jdemeyer
Copy link

Merged: sage-5.8.beta2

@jdemeyer
Copy link

comment:54

Please see #14254 for a blocker follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants