Improve ram usage #243

Menduist · 2021-11-26T10:50:22Z

This PR modifies the async flow to improve RAM usage of async transformation.
Results, about half of ram consumption in nimbus:

(obviously, the RSS being sentient and having is own mind, it isn't divided by two, but actual consumption is)

How

The main goal is to rely as little as possible on mark & sweep, and rely instead on reference counting.
Reference counting fails on cyclic references, and also in other cases: nim-lang/Nim#19193

So first step, add a workaround in await to avoid 19193.

Second step, more tricky, avoid cyclic reference of the closure/iterator.

Before this PR:

proc testInner(): Future[seq[int]] {.stackTrace: off, gcsafe.} =
  var chronosInternalRetFuture = newFuture[seq[int]]("testInner")
  iterator testInner_436207618(): FutureBase {.closure, gcsafe,
      raises: [Defect, CatchableError, Exception].} =
    {.push, warning[resultshadowed]: off.}
    var result: seq[int]
    {.pop.}
    block:
      var innerVal = newSeq[int](1000000)
      complete(chronosInternalRetFuture, newSeq[int](1000000))
      return nil
    complete(chronosInternalRetFuture, result)

  bind finished
  var nameIterVar`gensym0 = testInner_436207618
  {.push, stackTrace: off.}
  var testInner_continue_436207619: proc (udata`gensym0: pointer) {.gcsafe,
      raises: [Defect].}
  testInner_continue_436207619 = proc (udata`gensym0: pointer) {.gcsafe,
      raises: [Defect].} =
    try:
      if not (nameIterVar`gensym0.finished()):
        var next`gensym0 = nameIterVar`gensym0()
        while (not next`gensym0.isNil()) and next`gensym0.finished():
          next`gensym0 = nameIterVar`gensym0()
          if nameIterVar`gensym0.finished():
            break
        if next`gensym0 == nil:
          if not (chronosInternalRetFuture.finished()):
            const
              msg`gensym0 = "Async procedure (&" & "testInner" &
                  ") yielded `nil`, " &
                  "are you await\'ing a `nil` Future?"
            raiseAssert msg`gensym0
        else:
          next`gensym0.addCallback(testInner_continue_436207619)
    except CancelledError:
      chronosInternalRetFuture.cancelAndSchedule()
    except CatchableError as exc`gensym0:
      chronosInternalRetFuture.fail(exc`gensym0)
    except Exception as exc`gensym0:
      if exc`gensym0 of Defect:
        raise (ref Defect)(exc`gensym0)
      chronosInternalRetFuture.fail((ref ValueError)(msg: exc`gensym0.msg,
          parent: exc`gensym0))
  testInner_continue_436207619(nil)
  {.pop.}
  return chronosInternalRetFuture

A closure iterator and a _continue proc is generated inside the main proc.
Everything has a reference to everything: the closure, the continue, and the Future, which is bad.

After this PR:

proc testInner(): Future[seq[int]] {.stackTrace: off, gcsafe.} =
  iterator testInner_436207618(chronosInternalRetFuture: Future[seq[int]]): FutureBase {.
      closure, gcsafe, raises: [Defect, CatchableError, Exception].} =
    {.push, warning[resultshadowed]: off.}
    var result: seq[int]
    {.pop.}
    block:
      var innerVal = newSeq[int](1000000)
      complete(chronosInternalRetFuture, newSeq[int](1000000))
      return nil
    complete(chronosInternalRetFuture, result)

  var resultFuture = newFuture[seq[int]]("testInner")
  resultFuture.closure = testInner_436207618
  futureContinue(resultFuture)
  return resultFuture

The future is generated at the last minute to be sure that no one has a reference to it.
The "continue" is extracted somewhere else, and doesn't rely on closure environement (@stefantalpalaru I guess it broke stacktraces, could use a hand here)

The future is in charge of holding a reference to the closure, no one else. It means the closure can be freed easily when no longer needed, and no weird cycle here.
So, the iterator needs to take the future as argument. And we need to be very careful not to store it in the closure environment, or we're back to a cyclic ref. Maybe we can block it in some way? Not sure

For reference, here is the futureContinue:

{.push stackTrace: off.}
proc futureContinue*[T](fut: Future[T]) {.gcsafe, raises: [Defect].} =
  # Used internally by async transformation
  try:
    if not(fut.closure.finished()):
      var next = fut.closure(fut)
      # Continue while the yielded future is already finished.
      while (not next.isNil()) and next.finished():
        next = fut.closure(fut)
        if fut.closure.finished():
          break

      if next == nil:
        if not(fut.finished()):
          # Can't get strName anymore
          const msg = "Async procedure yielded `nil`, " &
                      "are you await'ing a `nil` Future?"
          raiseAssert msg
      else:
        next.addCallback(proc (arg: pointer) {.gcsafe, raises: [Defect].} =
          futureContinue(fut))
  except CancelledError:
    fut.cancelAndSchedule()
  except CatchableError as exc:
    fut.fail(exc)
  except Exception as exc:
    if exc of Defect:
      raise (ref Defect)(exc)

    fut.fail((ref ValueError)(msg: exc.msg, parent: exc))
{.pop.}

The addCallback relies on a new closure, which is unfortunate, but passing the fut as addCallback argument doesn't work: addCallback except a pointer, so it's not counted for GC, and everything breaks.

My test for this is:

proc testInner(): Future[seq[int]] {.async.} =
  var innerVal = newSeq[int](1000000)
  return newSeq[int](1000000)

proc testOuter() {.async.} =
  for itera in 0..<10000:
    discard await testInner()

GC_disableMarkAndSweep() # disable cycle collection, leave non cycle collection on

waitFor(testOuter())

With master it crashes, with this branch, it takes a bit of memory but eventually works

It's is better than the test in the original issue, because it allocates memory both in the return value, and in the iterator.
But it's not very CI-able, so not sure how to test this automatically

Menduist · 2021-11-26T10:53:49Z

chronos/asyncfutures2.nim

+    when defined(chronosStrictException):
+      closure*: iterator(f: Future[T]): FutureBase {.raises: [Defect, CatchableError], gcsafe.}
+    else:
+      closure*: iterator(f: Future[T]): FutureBase {.raises: [Defect, CatchableError, Exception], gcsafe.}


I would have put it in FutureBase, but:
We need to give the future as argument of iterator, so we need to know it's type.

We could give a FutureBase instead, and cast inside the iterator, but since we need to be very careful not to have a reference to the future inside the iterator, that seem risky to me

Menduist · 2021-11-26T10:54:19Z

chronos/asyncfutures2.nim

+  except Exception as exc:
+    if exc of Defect:
+      raise (ref Defect)(exc)
+    fut.fail((ref ValueError)(msg: exc.msg,


This will never be reached with strictExceptions, but no need to remove it, afaik

stefantalpalaru · 2021-11-26T13:51:44Z

Very impressive!

I guess we're stuck with closures, if we want to do async, right? We can't fake them by manually building our own struct with copies of the local vars we actually need and pass it as a param to a regular proc?

(obviously, the RSS being sentient and having is own mind, it isn't divided by two, but actual consumption is)

You're probably seeing the high-water mark from a big initial allocation while loading the DAG from the db. Nim's GC will not return memory to the OS, after it's done with it.

chronos/asyncfutures2.nim

Menduist · 2021-11-26T14:32:29Z

There are two things which are very distinct:

Closures

proc a(i: int) =
  return proc {.closure.} = i + 5

This will put a variables & parameters into a separate object instead of the stack, and give this object as a hidden parameter to the closure when needed.
I guess this could be done in "userspace"? But the compiler seems to do this in a pretty good way

Closure iterators

iterator a(i: int): int {.closure.} =
  let b = i + 5
  yield b
  dec b
  yield b

Now, the iterator variables (not parameters) will be put into one of theses hidden object, but as a bonus, the compiler will butcher the proc into a state machine, to allow it to be resumable at each yield (which in our case, is each await)
So, let's say into something like that:

type a_env = object
  b: int
  state: int
proc a(env: a_env, i: int): int =
  switch env.state:
  case 0:
    env.b = i + 5
    env.state = 1
    return env.b
  case 1:
    dec env.b
    env.state = finished
    return env.b

This transformation is fairly complex, because you basically need to rewrite every flow control construction into a state machine (conditions, loops, exceptions, etc etc)
And the compiler has gone fairly far as to how to do this properly, but there are still issues obviously. So we could re-do it, and some are trying (nim-lang/Nim#15655, and I think CPS is another attempt at it), but the compiler has a big head start, and it's no trivial task

To sum up, this PR is still based on a closure iterator for the async transformation, and the "continue callback" uses a closure

Menduist · 2021-11-26T14:50:47Z

Just noticed that while doing that I squeezed out some code for FutureVars in the continue, not sure what it was (every test are still passing), if someone has some info on that:

nim-chronos/chronos/asyncmacro2.nim

Line 91 in 37c62af

futureVarCompletions

arnetheduck · 2021-11-26T18:17:54Z

FutureVars in the continue

https://forum.nim-lang.org/t/8047#51452

Menduist · 2021-11-29T11:09:08Z

Thanks to @cheatfate input, we managed to remove the closure from the "continue" proc with GC_ref and GC_unref.
This has been running smoothly over the week-end on the test server.

FutureVars don't work with this PR, but they also don't work with master, so I would move their fix to another PR, instead of blocking this one

Backtraces look almost the same as before, so I think this is ready for review/merge :)

Menduist · 2021-11-29T11:13:13Z

Here is a PR for FutureVar: #245

zah · 2021-11-29T12:53:37Z

chronos/asyncfutures2.nim

+proc internalContinue[T](fut: pointer) {.gcsafe, raises: [Defect].} =
+  let asFut = cast[Future[T]](fut)
+  futureContinue(asFut)
+  GC_unref(asFut)


It makes more sense to place the GC_unref call immediately after we restore the asFut variable.
This eliminates the risk of not reaching this line due to an exception from futureContinue

From what I can see, the variable is never assigned (no assgnRef) in the generated C, so if we unref, it may be freed during the futureContinue because the refcount = 0

N_LIB_PRIVATE N_NIMCALL(void, internalContinue__tOi1GGiApxKdBYtRcYwB7A)(void* fut) { tyObject_FuturecolonObjectType___GXFSekg1U8JRoedGa2vBSA* asFut; asFut = ((tyObject_FuturecolonObjectType___GXFSekg1U8JRoedGa2vBSA*) (fut)); futureContinue__XuNTB7fHwBI8KII0qEQaCw_2(asFut); if (asFut) { nimGCunref(asFut); } }

It won't be freed because it exists on the stack. The conservative stack scanning will discover it.

I'm pretty sure the stack scanning only happens during M&S. If you have a refcount = 0, it will be freed no matter what.

https://github.com/nim-lang/Nim/blob/0d0c249074d6a1041de16108dc247396efef5513/lib/system/gc.nim#L249-L250
Adds it to the ZCT if the refcount = 0, and the ZCT will be collected a bit later

I can test it, but it may be hard to catch that

ZTC means "zero count table". This is where the objects start their life before they are assigned to a field of a heap allocated object and also where they end up once their ref count drops to zero. Most stack variables remain with ref count = 0 throughout their lifetime. Once an object is in the ZTC, it's a candidate for collection, but collection will only happen if a stack scan confirms that there are no remaining references to the object on the stack.

This approach is known in the literature as "deferred reference counting" and it's what Nim's default GC employs:
https://en.wikipedia.org/wiki/Reference_counting#Dealing_with_inefficiency_of_updates

The goal is to avoid all the intermediate addRef and decRef calls while the stack variables are being passed around and copied. You count only the assignments to heap locations.

Oh right, sorry, I saw all the stack scanning happening after the
# ---------------- cycle collector
line, so I imagined that stack scanning only happened during M&S

But indeed, the ZCT collection happens with the stack marked, so this should work, I'll try it out today :)

It's indeed stable with GC_unref before continue, thanks for the explanation!

zah · 2021-11-29T12:59:44Z

chronos/asyncmacro2.nim

+        baseType
+    # Do not change this code to `quote do` version because `instantiationInfo`
+    # will be broken for `newFuture()` call.
+    outerProcBody.add(


Is there any reason why you can't use macros.quote here?

No idea, I just moved that from above

chronos/asyncfutures2.nim

chronos/asyncmacro2.nim

chronos/asyncfutures2.nim

arnetheduck

I'm very glad this mystery has been solved :)

Menduist added 2 commits November 25, 2021 16:58

Remove cycle v3

64de3f3

minor cleanup

58bec3a

Menduist requested review from arnetheduck and cheatfate November 26, 2021 10:50

Menduist linked an issue Nov 26, 2021 that may be closed by this pull request

Future are cyclically referenced in the memory #203

Closed

Menduist commented Nov 26, 2021

View reviewed changes

Menduist mentioned this pull request Nov 26, 2021

fast path for writes vacp2p/nim-libp2p#659

Merged

cheatfate reviewed Nov 26, 2021

View reviewed changes

chronos/asyncfutures2.nim Show resolved Hide resolved

copied back the continuation

9c6c4dc

remove closure in continue

3e02eab

Menduist marked this pull request as ready for review November 29, 2021 11:09

zah reviewed Nov 29, 2021

View reviewed changes

arnetheduck reviewed Nov 29, 2021

View reviewed changes

chronos/asyncfutures2.nim Outdated Show resolved Hide resolved

arnetheduck reviewed Nov 29, 2021

View reviewed changes

chronos/asyncmacro2.nim Outdated Show resolved Hide resolved

Menduist added 2 commits November 29, 2021 14:07

Restore 'yield nil' message

6ff227c

Fix comment variables

d8b15ec

arnetheduck reviewed Nov 29, 2021

View reviewed changes

chronos/asyncfutures2.nim Show resolved Hide resolved

Menduist added 2 commits November 29, 2021 15:18

Release closure eagerly

4cc37b2

cleanup & unref earlier

f4ce39a

stefantalpalaru mentioned this pull request Dec 6, 2021

DAG loading: don't preload summaries in memory status-im/nimbus-eth2#3164

Closed

Merge branch 'master' into removecycle3

e82b136

arnetheduck approved these changes Dec 10, 2021

View reviewed changes

Menduist merged commit 7dc58d4 into master Dec 10, 2021

Menduist deleted the removecycle3 branch December 10, 2021 10:19

Menduist mentioned this pull request Dec 10, 2021

Bump chronos (Improve ram usage) status-im/nimbus-eth2#3180

Merged

KonradStaniec mentioned this pull request Feb 11, 2022

Cannot instantiate Future got: <T> but expected: <T> after bumping nim-chronos status-im/nimbus-eth1#963

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ram usage #243

Improve ram usage #243

Menduist commented Nov 26, 2021 •

edited

Loading

Menduist Nov 26, 2021 •

edited

Loading

Menduist Nov 26, 2021

stefantalpalaru commented Nov 26, 2021

Menduist commented Nov 26, 2021

Menduist commented Nov 26, 2021

arnetheduck commented Nov 26, 2021

Menduist commented Nov 29, 2021

Menduist commented Nov 29, 2021

zah Nov 29, 2021

Menduist Nov 29, 2021 •

edited

Loading

zah Nov 29, 2021

Menduist Nov 29, 2021

zah Nov 30, 2021

Menduist Nov 30, 2021

Menduist Nov 30, 2021

zah Nov 29, 2021

Menduist Nov 29, 2021

arnetheduck left a comment

Improve ram usage #243

Improve ram usage #243

Conversation

Menduist commented Nov 26, 2021 • edited Loading

How

Menduist Nov 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefantalpalaru commented Nov 26, 2021

Menduist commented Nov 26, 2021

Menduist commented Nov 26, 2021

arnetheduck commented Nov 26, 2021

Menduist commented Nov 29, 2021

Menduist commented Nov 29, 2021

Choose a reason for hiding this comment

Menduist Nov 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnetheduck left a comment

Choose a reason for hiding this comment

Menduist commented Nov 26, 2021 •

edited

Loading

Menduist Nov 26, 2021 •

edited

Loading

Menduist Nov 29, 2021 •

edited

Loading