-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compose: drop the 65535 node limit #343
Conversation
These are the biggest compose files that I know:
I am curious to know how do you get so much entries in your Compose file. Would you like to explain? |
I'm using a derivative of the Neo2 compose file that has a bit more than 40k entries (with Unicode, without keypad, but with a newer Unicode version hacked in). However, several compose sequences seem to require more than one node, since the warnings start shortly after line 35000. |
Thinking a bit more about it, a binary tree that does not remember its insertion order should not require more than an average 2 pointer bits per node. Maybe there is a way to support large compose files without increasing memory usage at all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware the were files that big, knowing about https://git.neo-layout.org/neo/neo-layout/src/branch/master/Compose/src would make optimizing the compose implementation more fun :)
This PR increases the compose_node
size from 16 bytes to 20 bytes. Let's take an average system, which uses the en_US.UTF-8
compose file (~7000 nodes) and has ~25 GUI apps with Compose loaded. Before the memory usage would be 2.67MiB, after 3.33MiB. That sounds OK to me.
Of course with larger Compose files like the Neo one, let's say 100,000 nodes, the difference becomes larger ~38MiB vs. ~47MiB. But that's not the average case and not too bad either.
I left one comment, but LGTM otherwise.
@@ -75,16 +75,13 @@ | |||
* \0 is so offset 0 points to an empty string). | |||
*/ | |||
|
|||
/* Fits in uint16_t, also a good idea to have some limit. */ | |||
#define MAX_COMPOSE_NODES 65535 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please keep some limit; I think limits are generally good to have for robustness. How about 1 million?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've raised the limit to 8 million. That way, even a Neo2 compose with all potential Unicode characters (a bit more than 1 million) should fit easily.
The implementation is currently a ternary tree; it was previously a regular trie. It's a pretty fun problem if you can come up with a better data structure. The evaluation parameters are: parsing time, lookup time, memory usage, graceful handling of edge cases. |
I'm not saying that a ternary tree is bad, I'm saying that storing the nodes in insertion order increases memory usage. That said, I'm not aware of an implementation that can do better than 20 bytes per node without O(n^2) parse time (which would obviously be unacceptable). |
In commit 1638409, the number of compose nodes was limited to 65535 to enable "future optimizations", which apparently means slightly reduced memory usage due to fitting in a uint16_t. At this time, it was mentioned that the author was not aware of "any compose files which come close". However, I'm one of the users that actually do require a larger number of nodes for their compose file. Thus, use a uint32_t again and raise the limit significantly.
In commit 1638409, the number of compose nodes was limited to 65535 to enable "future optimizations", which apparently means slightly reduced memory usage due to fitting in a uint16_t. At this time, it was mentioned that the author was not aware of "any compose files which come close".
However, I'm one of the users that actually do require a larger number of nodes for their compose file. Thus, drop the limit and use a uint32_t again.