Don't "inverted per-object mapping" representations for WeakMap/WeakSet still need GC integration? #1657

SamB · 2019-08-07T01:09:27Z

In https://tc39.es/ecma262/#sec-weakmap-objects, there is a note that says:

WeakMap and WeakSets are intended to provide mechanisms for dynamically associating state with an object in a manner that does not “leak” memory resources if, in the absence of the WeakMap or WeakSet, the object otherwise became inaccessible and subject to resource reclamation by the implementation's garbage collection mechanisms. This characteristic can be achieved by using an inverted per-object mapping of weak map instances to keys. Alternatively each weak map may internally store its key to value mappings but this approach requires coordination between the WeakMap or WeakSet implementation and the garbage collector.

(Emphasis mine.)

This would seem to imply that, when using an inverted per-object mapping of weak map instances to keys, there is no need for coordination with the garbage collector. And, if the only type of leak you're worried about is the type characterized here, I guess that's true, but … What if the keys remain accessible longer than the WeakMap/WeakSet objects? Won't we then, in the absence of coordination with the garbage collector, just end up leaking the WeakMap/WeakSet objects until all of their respective keys are collected?

I mean, maybe this is obvious to most readers of the spec, but it's the kind of thing I generally assume isn't actually safe to leave unsaid.

Then again, maybe there's some crazy olegian scheme for implementing weak maps behind the garbage collector's back using delimited continuations, zippers, and comonads, but I sort of doubt it.

allenwb · 2019-08-07T02:36:07Z

Agreed, the inverted representation is not a full implementation of an ephemeron table. It can still have circularity induced leaks.

Somewhere in the old tc39 bugzilla data dump there is an issue where I explain this.

Some TC39 members and/or engine implementors may still think the inverted approach is good enough. As a old-time GC guy, I consider a leaky GC to be a buggy GC,

erights · 2019-08-07T02:41:41Z

This characteristic can be achieved by using an inverted per-object mapping of weak map instances to keys.

Thanks for catching this. It is a severe bug. It is likely due to an oversimplification of historical controversies, though I don't know the actual history of the text you quote.

I have long advocated for the inverted mapping, so that the case that is by far more common could be more efficient: when the weakmap lifetime exceeds the key lifetime. But the obligation is symmetric. Whichever representation you choose, you must do full ephemeron collection for the case not optimized by that representation.

Today, zero of the browser engines do the right thing. v8, FF, and JSC all use the non-inverted representation, making WeakMaps so terribly and needlessly expensive that too many people have learned to avoid them. But at least they are correct.

Chakra does the inverted representation, making the common case efficient. However, they do not backstop it with ephemeron collection. Since the spec can only mandate conservative collection, this is not observably incorrect, but it does violate expectations.

erights · 2019-08-07T02:49:13Z

Membranes is the use case that motivated both weakmaps and proxies in the first place. Across a membrane, there are two gc issues:

While the membrane stays up, do garbage cycles that cross the membrane get collected?
When the membrane is revoked, does the unreachable subgraph within the membrane get collected?

The only way to solve both problems without violating encapsulation or introducing observable non-determinism is with full ephemeron collection.

v8, FF, and JSC, using the non-inverted representation, collect the uncommon case: membrane revocation, cheaply. But they do succeed at collecting membrane crossing cycles by ephemeron collection.

Chakra, by using the inverted representation, collects membrane crossing cycles cheaply. But because they don't backstop it with ephemeron collection, membrane revocation leaves behind an uncollectable unreachable subgraph.

erights · 2019-08-07T02:57:04Z

Here's the way I think about the underlying symmetry of the problem:

Normal GC treats each meet in the reference graph as an OR. If A and B both point at C, then C is reachable if A is reachable OR if B is reachable.

WeakMaps introduce AND meets into the reference graph. After

const m = new WeakMap([[k, v]]);

v is reachable if k is reachable AND if m is reachable. We can invert the representation because AND is commutative. We must still backstop the other case with ephemeron collection because AND is commutative.

rossberg · 2019-08-07T15:48:57Z

I have long advocated for the inverted mapping, so that the case that is by far more common could be more efficient:

Mark, do you have something to back up the assumption that it is far more common? And more efficient? Because I’m not sure I’m convinced. There are disadvantages to the inverted representation. For example, it requires internal mutation of objects and their representations that would otherwise be immutable or even frozen. That can lead to deoptimisations of unrelated code and other costly secondary effects, including more complex implementation.

devsnek · 2019-08-07T16:22:30Z

I think mark is saying the common case is that a weakmap typically outlives its keys, which generally matches how people use them to store relationships and/or metadata for instances of a certain type of object that they own. The most common thing I use weakmaps used for is this:

const map = new WeakMap();
class MyClass {
  constructor() {
    map.set(this, my secret metadata);
  }
}

I can't speak to the more efficient part.

tabatkins · 2019-08-07T17:37:59Z

Oooh, just learning about this, quite interesting.

So yeah, as far as I can tell, the cost of ephemeron collection is relative to the populations of the target objects (lots of sources weakreffing a small number of targets is cheaper than the opposite) and to the relative lifetimes of the source and target (short-lived object weakreffing long-lived objects is cheaper than the opposite). So the inverted case (the key weakrefs the weakmap) should be better on both of those axises - programs will usually have a much smaller # of weakmaps than keys, and keys will generally be shorter-lived than the weakmaps (essentially by definition of the core weakmap use-case).

There might very well be second-order effects, as @rossberg states, that could overwhelm the cost savings of the inverted representation, but that's something that needs to be measured; we can't determine it outright.

erights · 2019-08-07T17:57:22Z

Thanks @tabatkins and @devsnek .

@rossberg , what they said ;)

A possible source of evidence is Chakra, which uses the inverted representation.

jridgewell · 2019-08-07T17:59:15Z

Complete agreement with @erights here. Every time I've used a WeakMap, the wm has been held by the class definition. This guarantees that the wm lives at least as long (and probably much longer) than any instance of the class.

zenparsing · 2019-08-07T19:16:09Z

Actually, Chakra is able to collect WM values when the WM is collected. It does this by maintaining a list of weak references to the keys; it iterates this list on finalization. The main implementation tradeoff in Chakra is that WM usage can change the internal type of the key object, which can affect caches.

In a different universe, we chose to create a parallel API to enable the programmer to explicitly select the inverted representation. Alas, we chose # instead.

Regardless, the two relevant sentences in the spec (emphasized in the OP) should probably be dropped.

mhofman mentioned this issue Apr 2, 2020

WeakMap is cleaned up too aggressively engine262/engine262#97

Closed

acaly mentioned this issue Apr 12, 2021

ConditionalWeakTable causes a memory leak if one of their values references the table dotnet/runtime#12255

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't "inverted per-object mapping" representations for WeakMap/WeakSet still need GC integration? #1657

Don't "inverted per-object mapping" representations for WeakMap/WeakSet still need GC integration? #1657

SamB commented Aug 7, 2019

allenwb commented Aug 7, 2019

erights commented Aug 7, 2019

erights commented Aug 7, 2019

erights commented Aug 7, 2019

rossberg commented Aug 7, 2019 via email

devsnek commented Aug 7, 2019 •

edited

Loading

tabatkins commented Aug 7, 2019

erights commented Aug 7, 2019

jridgewell commented Aug 7, 2019

zenparsing commented Aug 7, 2019

Don't "inverted per-object mapping" representations for WeakMap/WeakSet still need GC integration? #1657

Don't "inverted per-object mapping" representations for WeakMap/WeakSet still need GC integration? #1657

Comments

SamB commented Aug 7, 2019

allenwb commented Aug 7, 2019

erights commented Aug 7, 2019

erights commented Aug 7, 2019

erights commented Aug 7, 2019

rossberg commented Aug 7, 2019 via email

devsnek commented Aug 7, 2019 • edited Loading

tabatkins commented Aug 7, 2019

erights commented Aug 7, 2019

jridgewell commented Aug 7, 2019

zenparsing commented Aug 7, 2019

devsnek commented Aug 7, 2019 •

edited

Loading