Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New atom table implementation #892

Merged
merged 17 commits into from
Dec 24, 2023
Merged

Conversation

bettio
Copy link
Collaborator

@bettio bettio commented Oct 25, 2023

Implement new atom table that overcomes previous implementation shortcomings.

Some further improvements are not part of this PR and they will be addressed with further PRs, such as:

  • Using gperf for default atoms
  • Removing atom_table_get_atom_string
  • Coalescing together atom strings that are allocated on the heap (instead of using strdup)

Memory benchmark:

With atom table:

[...]
Heap summary for capabilities 0x00000004:
  At 0x3ffb70e8 len 167704 free 104160 allocated 58664 min_free 102252
    largest_free_block 102400 alloc_blocks 814 free_blocks 2 total_blocks 816
[...]

without:

[...]
Heap summary for capabilities 0x00000004:
  At 0x3ffb70e8 len 167704 free 96076 allocated 63096 min_free 94252
    largest_free_block 94208 alloc_blocks 1727 free_blocks 2 total_blocks 1729
[...]

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@bettio bettio force-pushed the atom-table-revamp branch 5 times, most recently from d343d16 to 322e3a8 Compare October 29, 2023 15:30
@bettio bettio marked this pull request as ready for review October 29, 2023 15:30
@bettio bettio changed the title Atom table New atom table implementation Oct 29, 2023
@bettio bettio force-pushed the atom-table-revamp branch 4 times, most recently from 5ba4b80 to 25a65ac Compare October 29, 2023 19:40
@bettio bettio marked this pull request as draft October 30, 2023 00:36
@bettio bettio force-pushed the atom-table-revamp branch 2 times, most recently from fa0c866 to 8a70c75 Compare November 8, 2023 14:01
@bettio bettio marked this pull request as ready for review November 8, 2023 17:17
src/libAtomVM/atom_table.c Outdated Show resolved Hide resolved
src/libAtomVM/atom_table.c Outdated Show resolved Hide resolved
src/libAtomVM/atom_table.c Show resolved Hide resolved
src/libAtomVM/atom_table.c Show resolved Hide resolved
src/libAtomVM/atom_table.c Show resolved Hide resolved
src/libAtomVM/atom_table.c Show resolved Hide resolved
@bettio bettio force-pushed the atom-table-revamp branch 4 times, most recently from cecc731 to b78ae45 Compare November 9, 2023 22:33
@bettio bettio requested a review from pguyot November 9, 2023 22:44
@pguyot
Copy link
Collaborator

pguyot commented Nov 10, 2023

If I understand this right, this is a hash table with separate chaining and grouping of entries.

I am a little bit confused about grouping of entries that I understand is some kind of optimization. Indeed, I believe malloc is tlsf_malloc on esp32 and probably has an overhead of 4 bytes plus 32 bits alignment.

I would rather not optimizing too early here if the plan is to have VM-based atoms that would be gperf and module/dynamic atoms that we would strdup. Indeed, with strdup, we would already lose the grouping benefit and should rather allocate both the string and the pointer to the next entry (if we stick to hash table with separate chaining). We could also consider other strategies such as realloc.

Is it indeed an optimization trying to minimize the overhead of malloc? If so, could it be supported by a benchmark?

Also, as previously mentioned, we could imagine other optimizations based on the fact that Erlang/OTP by default has a maximum of 2^20 atoms and on 32 bits platforms we have an index on 2^26 and on the fact that we already process modules with Packbeam and could embed them with some mmap'd structure that could help reduce RAM footprint.

@bettio
Copy link
Collaborator Author

bettio commented Nov 11, 2023

If I understand this right, this is a hash table with separate chaining and grouping of entries.

I am a little bit confused about grouping of entries that I understand is some kind of optimization. Indeed, I believe malloc is tlsf_malloc on esp32 and probably has an overhead of 4 bytes plus 32 bits alignment.

yes, it is quite efficient, but still for 100 atoms they are 400 bytes that we can save.

I would rather not optimizing too early here if the plan is to have VM-based atoms that would be gperf and module/dynamic atoms that we would strdup. Indeed, with strdup, we would already lose the grouping benefit and should rather allocate both the string and the pointer to the next entry (if we stick to hash table with separate chaining). We could also consider other strategies such as realloc.

I don't want to strdup atoms coming from modules, they are quite a lot, and that takes memory.
I'd rather try to keep them as much as possible inside loaded module atom tables.
Roughly the idea is that when loading a new module, existing pointers are replaced with pointers to atoms in the newly loaded module atom table. When unloading a module, atoms that are not available in any other module atom table, are duplicated.
By doing this we duplicate just atoms coming from *_to_atom or when (unsafe) loading external terms.

Is it indeed an optimization trying to minimize the overhead of malloc? If so, could it be supported by a benchmark?

Yes, we can benchmark it. The main purpose is having cleaner implementation without the double table issue etc...
Also this implementation saves a lot of memory for the index to atom string table.

Also, as previously mentioned, we could imagine other optimizations based on the fact that Erlang/OTP by default has a maximum of 2^20 atoms and on 32 bits platforms we have an index on 2^26 and on the fact that we already process modules with Packbeam and could embed them with some mmap'd structure that could help reduce RAM footprint.

We can discuss this, but this approach introduces some additional complexity and additional assumptions, so a fallback implementation would be required anyway.

@pguyot
Copy link
Collaborator

pguyot commented Nov 11, 2023

Yes, we can benchmark it. The main purpose is having cleaner implementation without the double table issue etc... Also this implementation saves a lot of memory for the index to atom string table.

After I fixed the two bugs I found related to rehash (see my comments above), I ran a quick test against a much simpler version that simply mallocs HNode (including the string on copy).

Running test-structs, I get the following values:

alloc_total = 3640 - alloc_count = 224
alloc_total = 4024 - alloc_count = 127

If a malloc has 4 bytes of overhead, we only saved 4 bytes in exchange of a much more complex implementation...

The code used for the benchmark can be found here:
https://gist.github.com/pguyot/00c7c52ed9a719ae75004c254b85ebf8

The benchmark counter was wrongly updated on realloc (group implementation) and wrongly not decremented on free (simpler implementation), but the difference is the same. With the fix, the results are:
alloc_total = 4024 - alloc_count = 16 - alloc_free = 0 -- groups
alloc_total = 3640 - alloc_count = 113 - alloc_free = 111 -- simpler

We have 16 malloc'd blocks with the group implementation and 97 additional ones with the simpler implementation. If it's 4 bytes per allocation, it means an extra 388 bytes.
Last group is not entirely filled and groups have some significant overhead, so the group implementation has an additional 384 bytes.

New atom table addresses a number of issues, such as memory overhead,
race conditions, etc...
This new atom table implements both AtomString to index lookup and viceversa,
so locking is simplified.

Old atomshashtable is still required for modules table.

Signed-off-by: Davide Bettio <[email protected]>
Further reduce memory consumption and improve performances by adding a
function that inserts multiple atoms at once, that can be used during
module loading.

Signed-off-by: Davide Bettio <[email protected]>
Add also option flags for `AtomTableAlreadyExisting` and
`AtomTableCopyAtom`.

Signed-off-by: Davide Bettio <[email protected]>
Use improved `atom_table_ensure_atom` which supports `AtomTableCopyAtom` and
`AtomTableAlreadyExisting` options that fixes that possible race
condition.

Signed-off-by: Davide Bettio <[email protected]>
Instead of abort propagate error.

Signed-off-by: Davide Bettio <[email protected]>
avail count can be kept as part of the table main structure, rather
having it for each node group.

Signed-off-by: Davide Bettio <[email protected]>
Rehash table when count is above a certain threshold, which is computed
using capacity.

Signed-off-by: Davide Bettio <[email protected]>
Remove forward declaration by reorganizing declaration order.

Signed-off-by: Davide Bettio <[email protected]>
Do not access anything other than local variables after unlocking.

Signed-off-by: Davide Bettio <[email protected]>
Instead of returning an AtomString outside the atom table,
implement a compare primitive which avoids the need for that.
As a further step it will be possible to move atom strings to other
locations without worrying about dangling pointers or complex locking
logic.

Signed-off-by: Davide Bettio <[email protected]>
Account for `\0`, so use `<=` instead of just `<`.

Signed-off-by: Davide Bettio <[email protected]>
Instead of returning a pointer to the atom string actual data, copy the
string to a caller owned buffer.
This will allow later implementing atom strings replacement when loading
a newer module version, without worrying about reference counting or
additional locks.

Additional changes are required so `atom_table_get_atom_string` will be
around for some more time.

Signed-off-by: Davide Bettio <[email protected]>
globalcontext_insert_atom just calls
globalcontext_insert_atom_maybe_copy that calls `atom_table_ensure_atom`.

Signed-off-by: Davide Bettio <[email protected]>
- Handle out of memory in *_to_atom
- Avoid unnecessary malloc
- Fix memory leak

Signed-off-by: Davide Bettio <[email protected]>
All functions should be lock and unlock, in order to keep everything
maintainable.

Signed-off-by: Davide Bettio <[email protected]>
@bettio
Copy link
Collaborator Author

bettio commented Nov 14, 2023

The code used for the benchmark can be found here: https://gist.github.com/pguyot/00c7c52ed9a719ae75004c254b85ebf8

Just an additional comment here: in order to have a fair comparison we should take account of index to string complexity.
With few changes (and using the same amount of memory) I can provide O(log(node-group-n)) complexity for that operation.
With some additional changes I can lower memory usage of further n*sizeof(ptr).

I'll provide more information soon about this improvement.

@bettio bettio merged commit 273513f into atomvm:master Dec 24, 2023
84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants