New atom table implementation #892

bettio · 2023-10-25T12:50:33Z

Implement new atom table that overcomes previous implementation shortcomings.

Some further improvements are not part of this PR and they will be addressed with further PRs, such as:

Using gperf for default atoms
Removing atom_table_get_atom_string
Coalescing together atom strings that are allocated on the heap (instead of using strdup)

Memory benchmark:

With atom table:

[...]
Heap summary for capabilities 0x00000004:
  At 0x3ffb70e8 len 167704 free 104160 allocated 58664 min_free 102252
    largest_free_block 102400 alloc_blocks 814 free_blocks 2 total_blocks 816
[...]

without:

[...]
Heap summary for capabilities 0x00000004:
  At 0x3ffb70e8 len 167704 free 96076 allocated 63096 min_free 94252
    largest_free_block 94208 alloc_blocks 1727 free_blocks 2 total_blocks 1729
[...]

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

src/libAtomVM/atom_table.c

pguyot · 2023-11-10T22:32:03Z

If I understand this right, this is a hash table with separate chaining and grouping of entries.

I am a little bit confused about grouping of entries that I understand is some kind of optimization. Indeed, I believe malloc is tlsf_malloc on esp32 and probably has an overhead of 4 bytes plus 32 bits alignment.

I would rather not optimizing too early here if the plan is to have VM-based atoms that would be gperf and module/dynamic atoms that we would strdup. Indeed, with strdup, we would already lose the grouping benefit and should rather allocate both the string and the pointer to the next entry (if we stick to hash table with separate chaining). We could also consider other strategies such as realloc.

Is it indeed an optimization trying to minimize the overhead of malloc? If so, could it be supported by a benchmark?

Also, as previously mentioned, we could imagine other optimizations based on the fact that Erlang/OTP by default has a maximum of 2^20 atoms and on 32 bits platforms we have an index on 2^26 and on the fact that we already process modules with Packbeam and could embed them with some mmap'd structure that could help reduce RAM footprint.

bettio · 2023-11-11T19:03:33Z

If I understand this right, this is a hash table with separate chaining and grouping of entries.

I am a little bit confused about grouping of entries that I understand is some kind of optimization. Indeed, I believe malloc is tlsf_malloc on esp32 and probably has an overhead of 4 bytes plus 32 bits alignment.

yes, it is quite efficient, but still for 100 atoms they are 400 bytes that we can save.

I would rather not optimizing too early here if the plan is to have VM-based atoms that would be gperf and module/dynamic atoms that we would strdup. Indeed, with strdup, we would already lose the grouping benefit and should rather allocate both the string and the pointer to the next entry (if we stick to hash table with separate chaining). We could also consider other strategies such as realloc.

I don't want to strdup atoms coming from modules, they are quite a lot, and that takes memory.
I'd rather try to keep them as much as possible inside loaded module atom tables.
Roughly the idea is that when loading a new module, existing pointers are replaced with pointers to atoms in the newly loaded module atom table. When unloading a module, atoms that are not available in any other module atom table, are duplicated.
By doing this we duplicate just atoms coming from *_to_atom or when (unsafe) loading external terms.

Is it indeed an optimization trying to minimize the overhead of malloc? If so, could it be supported by a benchmark?

Yes, we can benchmark it. The main purpose is having cleaner implementation without the double table issue etc...
Also this implementation saves a lot of memory for the index to atom string table.

Also, as previously mentioned, we could imagine other optimizations based on the fact that Erlang/OTP by default has a maximum of 2^20 atoms and on 32 bits platforms we have an index on 2^26 and on the fact that we already process modules with Packbeam and could embed them with some mmap'd structure that could help reduce RAM footprint.

We can discuss this, but this approach introduces some additional complexity and additional assumptions, so a fallback implementation would be required anyway.

tests/test-structs.c

src/libAtomVM/atom_table.c

pguyot · 2023-11-11T20:23:49Z

Yes, we can benchmark it. The main purpose is having cleaner implementation without the double table issue etc... Also this implementation saves a lot of memory for the index to atom string table.

After I fixed the two bugs I found related to rehash (see my comments above), I ran a quick test against a much simpler version that simply mallocs HNode (including the string on copy).

Running test-structs, I get the following values:

alloc_total = 3640 - alloc_count = 224
alloc_total = 4024 - alloc_count = 127

If a malloc has 4 bytes of overhead, we only saved 4 bytes in exchange of a much more complex implementation...

The code used for the benchmark can be found here:
https://gist.github.com/pguyot/00c7c52ed9a719ae75004c254b85ebf8

The benchmark counter was wrongly updated on realloc (group implementation) and wrongly not decremented on free (simpler implementation), but the difference is the same. With the fix, the results are:
alloc_total = 4024 - alloc_count = 16 - alloc_free = 0 -- groups
alloc_total = 3640 - alloc_count = 113 - alloc_free = 111 -- simpler

We have 16 malloc'd blocks with the group implementation and 97 additional ones with the simpler implementation. If it's 4 bytes per allocation, it means an extra 388 bytes.
Last group is not entirely filled and groups have some significant overhead, so the group implementation has an additional 384 bytes.

New atom table addresses a number of issues, such as memory overhead, race conditions, etc... This new atom table implements both AtomString to index lookup and viceversa, so locking is simplified. Old atomshashtable is still required for modules table. Signed-off-by: Davide Bettio <[email protected]>

Signed-off-by: Davide Bettio <[email protected]>

Further reduce memory consumption and improve performances by adding a function that inserts multiple atoms at once, that can be used during module loading. Signed-off-by: Davide Bettio <[email protected]>

Add also option flags for `AtomTableAlreadyExisting` and `AtomTableCopyAtom`. Signed-off-by: Davide Bettio <[email protected]>

Use improved `atom_table_ensure_atom` which supports `AtomTableCopyAtom` and `AtomTableAlreadyExisting` options that fixes that possible race condition. Signed-off-by: Davide Bettio <[email protected]>

Instead of abort propagate error. Signed-off-by: Davide Bettio <[email protected]>

avail count can be kept as part of the table main structure, rather having it for each node group. Signed-off-by: Davide Bettio <[email protected]>

Rehash table when count is above a certain threshold, which is computed using capacity. Signed-off-by: Davide Bettio <[email protected]>

Remove forward declaration by reorganizing declaration order. Signed-off-by: Davide Bettio <[email protected]>

Do not access anything other than local variables after unlocking. Signed-off-by: Davide Bettio <[email protected]>

Instead of returning an AtomString outside the atom table, implement a compare primitive which avoids the need for that. As a further step it will be possible to move atom strings to other locations without worrying about dangling pointers or complex locking logic. Signed-off-by: Davide Bettio <[email protected]>

Account for `\0`, so use `<=` instead of just `<`. Signed-off-by: Davide Bettio <[email protected]>

Instead of returning a pointer to the atom string actual data, copy the string to a caller owned buffer. This will allow later implementing atom strings replacement when loading a newer module version, without worrying about reference counting or additional locks. Additional changes are required so `atom_table_get_atom_string` will be around for some more time. Signed-off-by: Davide Bettio <[email protected]>

globalcontext_insert_atom just calls globalcontext_insert_atom_maybe_copy that calls `atom_table_ensure_atom`. Signed-off-by: Davide Bettio <[email protected]>

- Handle out of memory in *_to_atom - Avoid unnecessary malloc - Fix memory leak Signed-off-by: Davide Bettio <[email protected]>

All functions should be lock and unlock, in order to keep everything maintainable. Signed-off-by: Davide Bettio <[email protected]>

Signed-off-by: Davide Bettio <[email protected]>

bettio · 2023-11-14T13:32:03Z

The code used for the benchmark can be found here: https://gist.github.com/pguyot/00c7c52ed9a719ae75004c254b85ebf8

Just an additional comment here: in order to have a fair comparison we should take account of index to string complexity.
With few changes (and using the same amount of memory) I can provide O(log(node-group-n)) complexity for that operation.
With some additional changes I can lower memory usage of further n*sizeof(ptr).

I'll provide more information soon about this improvement.

bettio force-pushed the atom-table-revamp branch 5 times, most recently from d343d16 to 322e3a8 Compare October 29, 2023 15:30

bettio marked this pull request as ready for review October 29, 2023 15:30

bettio changed the title ~~Atom table~~ New atom table implementation Oct 29, 2023

bettio force-pushed the atom-table-revamp branch 4 times, most recently from 5ba4b80 to 25a65ac Compare October 29, 2023 19:40

bettio marked this pull request as draft October 30, 2023 00:36

bettio force-pushed the atom-table-revamp branch 2 times, most recently from fa0c866 to 8a70c75 Compare November 8, 2023 14:01

bettio marked this pull request as ready for review November 8, 2023 17:17

pguyot reviewed Nov 8, 2023

View reviewed changes

bettio force-pushed the atom-table-revamp branch 4 times, most recently from cecc731 to b78ae45 Compare November 9, 2023 22:33

bettio requested a review from pguyot November 9, 2023 22:44

pguyot approved these changes Nov 10, 2023

View reviewed changes

pguyot reviewed Nov 11, 2023

View reviewed changes

tests/test-structs.c Show resolved Hide resolved

pguyot reviewed Nov 11, 2023

View reviewed changes

src/libAtomVM/atom_table.c Show resolved Hide resolved

pguyot reviewed Nov 11, 2023

View reviewed changes

src/libAtomVM/atom_table.c Show resolved Hide resolved

bettio force-pushed the atom-table-revamp branch from b78ae45 to 3c44640 Compare November 12, 2023 23:46

bettio added 7 commits November 13, 2023 13:59

test-structs: add tests for atom_table

aa2f938

Signed-off-by: Davide Bettio <[email protected]>

atom_table: add function for batch insert

b9400a6

Further reduce memory consumption and improve performances by adding a function that inserts multiple atoms at once, that can be used during module loading. Signed-off-by: Davide Bettio <[email protected]>

atom_table: add opts parameter to atom_table_ensure_atom

de6f96d

Add also option flags for `AtomTableAlreadyExisting` and `AtomTableCopyAtom`. Signed-off-by: Davide Bettio <[email protected]>

NIFs: fix possible race condition in binary/list_to_atom

3c0726c

Use improved `atom_table_ensure_atom` which supports `AtomTableCopyAtom` and `AtomTableAlreadyExisting` options that fixes that possible race condition. Signed-off-by: Davide Bettio <[email protected]>

atom_table: handle malloc failure

7c77854

Instead of abort propagate error. Signed-off-by: Davide Bettio <[email protected]>

atom_table: avail is useful only for last node

93020bc

avail count can be kept as part of the table main structure, rather having it for each node group. Signed-off-by: Davide Bettio <[email protected]>

atom_table: implement rehashing

cd9e617

Rehash table when count is above a certain threshold, which is computed using capacity. Signed-off-by: Davide Bettio <[email protected]>

bettio force-pushed the atom-table-revamp branch from 3c44640 to a4b28da Compare November 13, 2023 13:06

bettio added 9 commits November 13, 2023 14:06

atom_table: move struct declaration

4e55662

Remove forward declaration by reorganizing declaration order. Signed-off-by: Davide Bettio <[email protected]>

atom_table: nitpicking: copy to local var and UNLOCK

4a1d8b3

Do not access anything other than local variables after unlocking. Signed-off-by: Davide Bettio <[email protected]>

Fix atom_string_to_c

b52474b

Account for `\0`, so use `<=` instead of just `<`. Signed-off-by: Davide Bettio <[email protected]>

Make globalcontext_insert_atom(_maybe_copy) an inline function

40e4c28

globalcontext_insert_atom just calls globalcontext_insert_atom_maybe_copy that calls `atom_table_ensure_atom`. Signed-off-by: Davide Bettio <[email protected]>

erlang:(binary,list)_to_atom misc fixes

c82d914

- Handle out of memory in *_to_atom - Avoid unnecessary malloc - Fix memory leak Signed-off-by: Davide Bettio <[email protected]>

atom_table: lock_and_get_node function might be confusing

4b3e4f2

All functions should be lock and unlock, in order to keep everything maintainable. Signed-off-by: Davide Bettio <[email protected]>

CHANGELOG.md: mention new atom table implementation

9e76b00

Signed-off-by: Davide Bettio <[email protected]>

bettio force-pushed the atom-table-revamp branch from a4b28da to 9e76b00 Compare November 13, 2023 13:07

fadushin approved these changes Nov 14, 2023

View reviewed changes

bettio merged commit 273513f into atomvm:master Dec 24, 2023
84 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New atom table implementation #892

New atom table implementation #892

bettio commented Oct 25, 2023 •

edited

Loading

pguyot commented Nov 10, 2023

bettio commented Nov 11, 2023

pguyot commented Nov 11, 2023 •

edited

Loading

bettio commented Nov 14, 2023

New atom table implementation #892

New atom table implementation #892

Conversation

bettio commented Oct 25, 2023 • edited Loading

pguyot commented Nov 10, 2023

bettio commented Nov 11, 2023

pguyot commented Nov 11, 2023 • edited Loading

bettio commented Nov 14, 2023

bettio commented Oct 25, 2023 •

edited

Loading

pguyot commented Nov 11, 2023 •

edited

Loading