Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ram usage #243

Merged
merged 9 commits into from
Dec 10, 2021
Merged

Improve ram usage #243

merged 9 commits into from
Dec 10, 2021

Conversation

Menduist
Copy link
Contributor

@Menduist Menduist commented Nov 26, 2021

This PR modifies the async flow to improve RAM usage of async transformation.
Results, about half of ram consumption in nimbus:
image

(obviously, the RSS being sentient and having is own mind, it isn't divided by two, but actual consumption is)

How

The main goal is to rely as little as possible on mark & sweep, and rely instead on reference counting.
Reference counting fails on cyclic references, and also in other cases: nim-lang/Nim#19193

So first step, add a workaround in await to avoid 19193.

Second step, more tricky, avoid cyclic reference of the closure/iterator.

Before this PR:

proc testInner(): Future[seq[int]] {.stackTrace: off, gcsafe.} =
  var chronosInternalRetFuture = newFuture[seq[int]]("testInner")
  iterator testInner_436207618(): FutureBase {.closure, gcsafe,
      raises: [Defect, CatchableError, Exception].} =
    {.push, warning[resultshadowed]: off.}
    var result: seq[int]
    {.pop.}
    block:
      var innerVal = newSeq[int](1000000)
      complete(chronosInternalRetFuture, newSeq[int](1000000))
      return nil
    complete(chronosInternalRetFuture, result)

  bind finished
  var nameIterVar`gensym0 = testInner_436207618
  {.push, stackTrace: off.}
  var testInner_continue_436207619: proc (udata`gensym0: pointer) {.gcsafe,
      raises: [Defect].}
  testInner_continue_436207619 = proc (udata`gensym0: pointer) {.gcsafe,
      raises: [Defect].} =
    try:
      if not (nameIterVar`gensym0.finished()):
        var next`gensym0 = nameIterVar`gensym0()
        while (not next`gensym0.isNil()) and next`gensym0.finished():
          next`gensym0 = nameIterVar`gensym0()
          if nameIterVar`gensym0.finished():
            break
        if next`gensym0 == nil:
          if not (chronosInternalRetFuture.finished()):
            const
              msg`gensym0 = "Async procedure (&" & "testInner" &
                  ") yielded `nil`, " &
                  "are you await\'ing a `nil` Future?"
            raiseAssert msg`gensym0
        else:
          next`gensym0.addCallback(testInner_continue_436207619)
    except CancelledError:
      chronosInternalRetFuture.cancelAndSchedule()
    except CatchableError as exc`gensym0:
      chronosInternalRetFuture.fail(exc`gensym0)
    except Exception as exc`gensym0:
      if exc`gensym0 of Defect:
        raise (ref Defect)(exc`gensym0)
      chronosInternalRetFuture.fail((ref ValueError)(msg: exc`gensym0.msg,
          parent: exc`gensym0))
  testInner_continue_436207619(nil)
  {.pop.}
  return chronosInternalRetFuture

A closure iterator and a _continue proc is generated inside the main proc.
Everything has a reference to everything: the closure, the continue, and the Future, which is bad.

After this PR:

proc testInner(): Future[seq[int]] {.stackTrace: off, gcsafe.} =
  iterator testInner_436207618(chronosInternalRetFuture: Future[seq[int]]): FutureBase {.
      closure, gcsafe, raises: [Defect, CatchableError, Exception].} =
    {.push, warning[resultshadowed]: off.}
    var result: seq[int]
    {.pop.}
    block:
      var innerVal = newSeq[int](1000000)
      complete(chronosInternalRetFuture, newSeq[int](1000000))
      return nil
    complete(chronosInternalRetFuture, result)

  var resultFuture = newFuture[seq[int]]("testInner")
  resultFuture.closure = testInner_436207618
  futureContinue(resultFuture)
  return resultFuture

The future is generated at the last minute to be sure that no one has a reference to it.
The "continue" is extracted somewhere else, and doesn't rely on closure environement (@stefantalpalaru I guess it broke stacktraces, could use a hand here)

The future is in charge of holding a reference to the closure, no one else. It means the closure can be freed easily when no longer needed, and no weird cycle here.
So, the iterator needs to take the future as argument. And we need to be very careful not to store it in the closure environment, or we're back to a cyclic ref. Maybe we can block it in some way? Not sure

For reference, here is the futureContinue:

{.push stackTrace: off.}
proc futureContinue*[T](fut: Future[T]) {.gcsafe, raises: [Defect].} =
  # Used internally by async transformation
  try:
    if not(fut.closure.finished()):
      var next = fut.closure(fut)
      # Continue while the yielded future is already finished.
      while (not next.isNil()) and next.finished():
        next = fut.closure(fut)
        if fut.closure.finished():
          break

      if next == nil:
        if not(fut.finished()):
          # Can't get strName anymore
          const msg = "Async procedure yielded `nil`, " &
                      "are you await'ing a `nil` Future?"
          raiseAssert msg
      else:
        next.addCallback(proc (arg: pointer) {.gcsafe, raises: [Defect].} =
          futureContinue(fut))
  except CancelledError:
    fut.cancelAndSchedule()
  except CatchableError as exc:
    fut.fail(exc)
  except Exception as exc:
    if exc of Defect:
      raise (ref Defect)(exc)

    fut.fail((ref ValueError)(msg: exc.msg, parent: exc))
{.pop.}

The addCallback relies on a new closure, which is unfortunate, but passing the fut as addCallback argument doesn't work: addCallback except a pointer, so it's not counted for GC, and everything breaks.

My test for this is:

proc testInner(): Future[seq[int]] {.async.} =
  var innerVal = newSeq[int](1000000)
  return newSeq[int](1000000)

proc testOuter() {.async.} =
  for itera in 0..<10000:
    discard await testInner()

GC_disableMarkAndSweep() # disable cycle collection, leave non cycle collection on

waitFor(testOuter())

With master it crashes, with this branch, it takes a bit of memory but eventually works

It's is better than the test in the original issue, because it allocates memory both in the return value, and in the iterator.
But it's not very CI-able, so not sure how to test this automatically

@Menduist Menduist linked an issue Nov 26, 2021 that may be closed by this pull request
when defined(chronosStrictException):
closure*: iterator(f: Future[T]): FutureBase {.raises: [Defect, CatchableError], gcsafe.}
else:
closure*: iterator(f: Future[T]): FutureBase {.raises: [Defect, CatchableError, Exception], gcsafe.}
Copy link
Contributor Author

@Menduist Menduist Nov 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have put it in FutureBase, but:
We need to give the future as argument of iterator, so we need to know it's type.

We could give a FutureBase instead, and cast inside the iterator, but since we need to be very careful not to have a reference to the future inside the iterator, that seem risky to me

except Exception as exc:
if exc of Defect:
raise (ref Defect)(exc)
fut.fail((ref ValueError)(msg: exc.msg,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never be reached with strictExceptions, but no need to remove it, afaik

@stefantalpalaru
Copy link
Contributor

Very impressive!

I guess we're stuck with closures, if we want to do async, right? We can't fake them by manually building our own struct with copies of the local vars we actually need and pass it as a param to a regular proc?

(obviously, the RSS being sentient and having is own mind, it isn't divided by two, but actual consumption is)

You're probably seeing the high-water mark from a big initial allocation while loading the DAG from the db. Nim's GC will not return memory to the OS, after it's done with it.

@Menduist
Copy link
Contributor Author

There are two things which are very distinct:

  • Closures
proc a(i: int) =
  return proc {.closure.} = i + 5

This will put a variables & parameters into a separate object instead of the stack, and give this object as a hidden parameter to the closure when needed.
I guess this could be done in "userspace"? But the compiler seems to do this in a pretty good way

  • Closure iterators
iterator a(i: int): int {.closure.} =
  let b = i + 5
  yield b
  dec b
  yield b

Now, the iterator variables (not parameters) will be put into one of theses hidden object, but as a bonus, the compiler will butcher the proc into a state machine, to allow it to be resumable at each yield (which in our case, is each await)
So, let's say into something like that:

type a_env = object
  b: int
  state: int
proc a(env: a_env, i: int): int =
  switch env.state:
  case 0:
    env.b = i + 5
    env.state = 1
    return env.b
  case 1:
    dec env.b
    env.state = finished
    return env.b

This transformation is fairly complex, because you basically need to rewrite every flow control construction into a state machine (conditions, loops, exceptions, etc etc)
And the compiler has gone fairly far as to how to do this properly, but there are still issues obviously. So we could re-do it, and some are trying (nim-lang/Nim#15655, and I think CPS is another attempt at it), but the compiler has a big head start, and it's no trivial task

To sum up, this PR is still based on a closure iterator for the async transformation, and the "continue callback" uses a closure

@Menduist
Copy link
Contributor Author

Just noticed that while doing that I squeezed out some code for FutureVars in the continue, not sure what it was (every test are still passing), if someone has some info on that:

futureVarCompletions

@arnetheduck
Copy link
Member

FutureVars in the continue

https://forum.nim-lang.org/t/8047#51452

@Menduist
Copy link
Contributor Author

Thanks to @cheatfate input, we managed to remove the closure from the "continue" proc with GC_ref and GC_unref.
This has been running smoothly over the week-end on the test server.

FutureVars don't work with this PR, but they also don't work with master, so I would move their fix to another PR, instead of blocking this one

Backtraces look almost the same as before, so I think this is ready for review/merge :)

@Menduist Menduist marked this pull request as ready for review November 29, 2021 11:09
@Menduist
Copy link
Contributor Author

Here is a PR for FutureVar: #245

proc internalContinue[T](fut: pointer) {.gcsafe, raises: [Defect].} =
let asFut = cast[Future[T]](fut)
futureContinue(asFut)
GC_unref(asFut)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes more sense to place the GC_unref call immediately after we restore the asFut variable.
This eliminates the risk of not reaching this line due to an exception from futureContinue

Copy link
Contributor Author

@Menduist Menduist Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see, the variable is never assigned (no assgnRef) in the generated C, so if we unref, it may be freed during the futureContinue because the refcount = 0

N_LIB_PRIVATE N_NIMCALL(void, internalContinue__tOi1GGiApxKdBYtRcYwB7A)(void* fut) {
        tyObject_FuturecolonObjectType___GXFSekg1U8JRoedGa2vBSA* asFut;
        asFut = ((tyObject_FuturecolonObjectType___GXFSekg1U8JRoedGa2vBSA*) (fut));
        futureContinue__XuNTB7fHwBI8KII0qEQaCw_2(asFut);
        if (asFut) { nimGCunref(asFut); }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be freed because it exists on the stack. The conservative stack scanning will discover it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure the stack scanning only happens during M&S. If you have a refcount = 0, it will be freed no matter what.

https://github.com/nim-lang/Nim/blob/0d0c249074d6a1041de16108dc247396efef5513/lib/system/gc.nim#L249-L250
Adds it to the ZCT if the refcount = 0, and the ZCT will be collected a bit later

I can test it, but it may be hard to catch that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZTC means "zero count table". This is where the objects start their life before they are assigned to a field of a heap allocated object and also where they end up once their ref count drops to zero. Most stack variables remain with ref count = 0 throughout their lifetime. Once an object is in the ZTC, it's a candidate for collection, but collection will only happen if a stack scan confirms that there are no remaining references to the object on the stack.

This approach is known in the literature as "deferred reference counting" and it's what Nim's default GC employs:
https://en.wikipedia.org/wiki/Reference_counting#Dealing_with_inefficiency_of_updates

The goal is to avoid all the intermediate addRef and decRef calls while the stack variables are being passed around and copied. You count only the assignments to heap locations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, sorry, I saw all the stack scanning happening after the
# ---------------- cycle collector
line, so I imagined that stack scanning only happened during M&S

But indeed, the ZCT collection happens with the stack marked, so this should work, I'll try it out today :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's indeed stable with GC_unref before continue, thanks for the explanation!

baseType
# Do not change this code to `quote do` version because `instantiationInfo`
# will be broken for `newFuture()` call.
outerProcBody.add(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why you can't use macros.quote here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, I just moved that from above

Copy link
Member

@arnetheduck arnetheduck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very glad this mystery has been solved :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Future are cyclically referenced in the memory
5 participants