Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic memory allocator #521

Closed
wants to merge 2 commits into from
Closed

Atomic memory allocator #521

wants to merge 2 commits into from

Conversation

nidin
Copy link
Contributor

@nidin nidin commented Feb 28, 2019

Atomic allocator for shared memory

nidin and others added 2 commits February 28, 2019 15:21
Atomic allocator for shared memory
@dcodeIO
Copy link
Member

dcodeIO commented Feb 28, 2019

Would it be feasible, as an alternative, to create a common wrapper around any existing memory allocator, given that the interface is always the same? I.e. guarantee that max. one thread at a time executes an allocation or free operation through a lock when any such operation is attempted?

@nidin
Copy link
Contributor Author

nidin commented Feb 28, 2019

We need a global variable shared either in global namespace or in memory namespace. If memory.shared is true, we can use atomic.cmpxchg to allocate else normal way.

@dcodeIO
Copy link
Member

dcodeIO commented Feb 28, 2019

My though process was that, if we had a common wrapper, we could provide for example allocator/tlsf.atomic, allocator/buddy.atomic, allocator/arena.atomic that use the respective mm but also the common atomic wrapper that sits between the user and the mm. I'm not sure about the locking overhead though. Might be significant when locking each attempt to allocate/free. Not sure.

@nidin
Copy link
Contributor Author

nidin commented Feb 28, 2019 via email

@dcodeIO
Copy link
Member

dcodeIO commented Feb 28, 2019

So, what if we'd designate let's say memory offset 8 to hold a value whether any thread is currently within either allocate or free? Something like

function memory_allocate(size: usize): usize {
  while (atomic.cmpxchg(8, 0, 1)) {}
  var ret = original_memory_allocate(size);
  store(8, 0);
  return ret;
}

Wouldn't that work with any allocator if all threads then used the wrapper?

@nidin
Copy link
Contributor Author

nidin commented Feb 28, 2019 via email

@MaxGraey
Copy link
Member

MaxGraey commented Feb 28, 2019

May be better use futex for this long lock section? I mean Atomic.wait/notify

@nidin
Copy link
Contributor Author

nidin commented Feb 28, 2019 via email

@dcodeIO
Copy link
Member

dcodeIO commented Mar 1, 2019

So, according to the example on the threads spec, a working mechanism here could be

function lock(addr: usize): void {
  while (atomic_cmpxchg<i32>(addr, 0, 1)) {
    atomic.wait<i32>(addr, 1, -1);
  }
}

function unlock(addr: usize): void {
  atomic.store<i32>(addr, 0);
  atomic.notify<i32>(addr, 1);
}

const MM_LOCK: usize = 8;

function memory_allocate(size: usize): usize {
  lock(MM_LOCK);
  var ret = original_allocate(size);
  unlock(MM_LOCK);
  return ret;
}

function memory_free(addr: usize): void {
  lock(MM_LOCK);
  original_free(addr);
  unlock(MM_LOCK);
}

Now, if shared memory is enabled with the respective compiler flag, the compiler would automatically inject MM_LOCK at offset 8 in static memory, similar to what it does with HEAP_BASE, ensuring that all threads are operating on the same assumptions.

Special care must be taken when initializing a memory allocator of course. The main thread will usually set it up while threads inherit its state, which can be done conditionally based on isDefined(MM_LOCK) and performing another lock/unlock step to determine whether to initialize or inherit.

@MaxGraey
Copy link
Member

MaxGraey commented Mar 1, 2019

More anvanced mutex implementation with spin locks:

const SPIN_LOCK_ITER_LIMIT: i32 = 128;

function mutexLock(addr: usize): void {
    var stat = 0;
    for (let i = 0; i < SPIN_LOCK_ITER_LIMIT; i++) {
      stat = atomic.cmpxchg<i32>(addr, 0, 1)
      if (!stat) break;
    }
    if (stat == 1) {
      stat = atomic.xchg<i32>(addr, 2);
    }
    while (stat) {
      atomic.wait<i32>(addr, 0, 2); // <-- not sure about this params
      stat = atomic.xchg<i32>(addr, 2);
    }
}

function mutexUnlock(addr: usize): void {
    if (addr == 2) addr = 0;
    else if (atomic.xchg<i32>(m, 0) == 1) return;
    for (let i = 0; i < SPIN_LOCK_ITER_LIMIT; i++) {
      if (addr && atomic.cmpxchg<i32>(addr, 1, 2)) return;
    }
    atomic.notify<i32>(addr, 1);
}

@dcodeIO
Copy link
Member

dcodeIO commented Mar 1, 2019

What does it do? Spinlock first and if that doesn't work, wait, in order to reduce context switches?

@MaxGraey
Copy link
Member

MaxGraey commented Mar 1, 2019

It just do spin lock limited to 128 iterations and after fallback to atomic.wait/notify (futex) approach after iter timeout. So if could lock/unlock faster if possible

@MaxGraey
Copy link
Member

MaxGraey commented Mar 1, 2019

Ideally after each iteration we should signalize cpu to sleep(0) but this not support on wasm

@dcodeIO
Copy link
Member

dcodeIO commented Mar 1, 2019

I see, that's the usual tradeoff between wasting cycles and switching context then. I wonder how that'd compare to a naive wait/notify approach without a way to signal sleep(0), though, as this is guaranteed to waste more cycles than on a non-wasm platform then which might, or might not, be more costly than the context switch (in some scenarios). Feels like something to benchmark eventually :)

@MaxGraey
Copy link
Member

MaxGraey commented Mar 1, 2019

yeah, definitely need benchmark.

@dcodeIO
Copy link
Member

dcodeIO commented Jun 19, 2019

One more remaining building block for a shared memory manager/gc, apart from using locking, appears to be that the current implementations utilize globals to store some of their state. If I'm not mistaken, global states (except immutable globals like __heap_base) are not shared and their values differ between threads, so the information stored there must be synchronized somehow, like storing inside of the MM/GC control structure in memory.

@nidin
Copy link
Contributor Author

nidin commented Jun 20, 2019

Is mutable global thread safe?
Or can we use atomic operations on globals?

@jtenner
Copy link
Contributor

jtenner commented Jun 22, 2019

globals are not shared between instances. There is no possible way to use globals unless they are readonly or mutable for the given instance.

We need to store all data in shared memory between the instances.

@dcodeIO
Copy link
Member

dcodeIO commented May 25, 2020

Closing this PR as part of 2020 vacuum as it appears to be outdated. In general there are still some open questions regarding a thread-safe allocator, in particular whether we should rather think about a more JS-y approach like workers and postMessage, keeping allocation local to each thread.

@dcodeIO dcodeIO closed this May 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants