Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't "inverted per-object mapping" representations for WeakMap/WeakSet still need GC integration? #1657

Open
SamB opened this issue Aug 7, 2019 · 10 comments

Comments

@SamB
Copy link

SamB commented Aug 7, 2019

In https://tc39.es/ecma262/#sec-weakmap-objects, there is a note that says:

WeakMap and WeakSets are intended to provide mechanisms for dynamically associating state with an object in a manner that does not “leak” memory resources if, in the absence of the WeakMap or WeakSet, the object otherwise became inaccessible and subject to resource reclamation by the implementation's garbage collection mechanisms. This characteristic can be achieved by using an inverted per-object mapping of weak map instances to keys. Alternatively each weak map may internally store its key to value mappings but this approach requires coordination between the WeakMap or WeakSet implementation and the garbage collector.

(Emphasis mine.)

This would seem to imply that, when using an inverted per-object mapping of weak map instances to keys, there is no need for coordination with the garbage collector. And, if the only type of leak you're worried about is the type characterized here, I guess that's true, but … What if the keys remain accessible longer than the WeakMap/WeakSet objects? Won't we then, in the absence of coordination with the garbage collector, just end up leaking the WeakMap/WeakSet objects until all of their respective keys are collected?

I mean, maybe this is obvious to most readers of the spec, but it's the kind of thing I generally assume isn't actually safe to leave unsaid.

Then again, maybe there's some crazy olegian scheme for implementing weak maps behind the garbage collector's back using delimited continuations, zippers, and comonads, but I sort of doubt it.

@allenwb
Copy link
Member

allenwb commented Aug 7, 2019

Agreed, the inverted representation is not a full implementation of an ephemeron table. It can still have circularity induced leaks.

Somewhere in the old tc39 bugzilla data dump there is an issue where I explain this.

Some TC39 members and/or engine implementors may still think the inverted approach is good enough. As a old-time GC guy, I consider a leaky GC to be a buggy GC,

@erights
Copy link

erights commented Aug 7, 2019

This characteristic can be achieved by using an inverted per-object mapping of weak map instances to keys.

Thanks for catching this. It is a severe bug. It is likely due to an oversimplification of historical controversies, though I don't know the actual history of the text you quote.

I have long advocated for the inverted mapping, so that the case that is by far more common could be more efficient: when the weakmap lifetime exceeds the key lifetime. But the obligation is symmetric. Whichever representation you choose, you must do full ephemeron collection for the case not optimized by that representation.

Today, zero of the browser engines do the right thing. v8, FF, and JSC all use the non-inverted representation, making WeakMaps so terribly and needlessly expensive that too many people have learned to avoid them. But at least they are correct.

Chakra does the inverted representation, making the common case efficient. However, they do not backstop it with ephemeron collection. Since the spec can only mandate conservative collection, this is not observably incorrect, but it does violate expectations.

@erights
Copy link

erights commented Aug 7, 2019

Membranes is the use case that motivated both weakmaps and proxies in the first place. Across a membrane, there are two gc issues:

  • While the membrane stays up, do garbage cycles that cross the membrane get collected?
  • When the membrane is revoked, does the unreachable subgraph within the membrane get collected?

The only way to solve both problems without violating encapsulation or introducing observable non-determinism is with full ephemeron collection.

v8, FF, and JSC, using the non-inverted representation, collect the uncommon case: membrane revocation, cheaply. But they do succeed at collecting membrane crossing cycles by ephemeron collection.

Chakra, by using the inverted representation, collects membrane crossing cycles cheaply. But because they don't backstop it with ephemeron collection, membrane revocation leaves behind an uncollectable unreachable subgraph.

@erights
Copy link

erights commented Aug 7, 2019

Here's the way I think about the underlying symmetry of the problem:

Normal GC treats each meet in the reference graph as an OR. If A and B both point at C, then C is reachable if A is reachable OR if B is reachable.

WeakMaps introduce AND meets into the reference graph. After

const m = new WeakMap([[k, v]]);

v is reachable if k is reachable AND if m is reachable. We can invert the representation because AND is commutative. We must still backstop the other case with ephemeron collection because AND is commutative.

@rossberg
Copy link
Member

rossberg commented Aug 7, 2019 via email

@devsnek
Copy link
Member

devsnek commented Aug 7, 2019

I think mark is saying the common case is that a weakmap typically outlives its keys, which generally matches how people use them to store relationships and/or metadata for instances of a certain type of object that they own. The most common thing I use weakmaps used for is this:

const map = new WeakMap();
class MyClass {
  constructor() {
    map.set(this, my secret metadata);
  }
}

I can't speak to the more efficient part.

@tabatkins
Copy link

Oooh, just learning about this, quite interesting.

So yeah, as far as I can tell, the cost of ephemeron collection is relative to the populations of the target objects (lots of sources weakreffing a small number of targets is cheaper than the opposite) and to the relative lifetimes of the source and target (short-lived object weakreffing long-lived objects is cheaper than the opposite). So the inverted case (the key weakrefs the weakmap) should be better on both of those axises - programs will usually have a much smaller # of weakmaps than keys, and keys will generally be shorter-lived than the weakmaps (essentially by definition of the core weakmap use-case).

There might very well be second-order effects, as @rossberg states, that could overwhelm the cost savings of the inverted representation, but that's something that needs to be measured; we can't determine it outright.

@erights
Copy link

erights commented Aug 7, 2019

Thanks @tabatkins and @devsnek .

@rossberg , what they said ;)

A possible source of evidence is Chakra, which uses the inverted representation.

@jridgewell
Copy link
Member

Complete agreement with @erights here. Every time I've used a WeakMap, the wm has been held by the class definition. This guarantees that the wm lives at least as long (and probably much longer) than any instance of the class.

@zenparsing
Copy link
Member

Actually, Chakra is able to collect WM values when the WM is collected. It does this by maintaining a list of weak references to the keys; it iterates this list on finalization. The main implementation tradeoff in Chakra is that WM usage can change the internal type of the key object, which can affect caches.

In a different universe, we chose to create a parallel API to enable the programmer to explicitly select the inverted representation. Alas, we chose # instead.

Regardless, the two relevant sentences in the spec (emphasized in the OP) should probably be dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants