Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iterator API for Compose table #367

Merged
merged 1 commit into from
Sep 26, 2023

Conversation

wismill
Copy link
Member

@wismill wismill commented Jul 25, 2023

This is a revival of the compose-traverse branch from @bluetech.

Fixes #366
Could be also useful for #361.

@wismill wismill requested a review from bluetech July 25, 2023 13:07
@wismill wismill added enhancement Indicates new feature requests compose Indicates a need for improvements or additions to Compose handling labels Jul 25, 2023
Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up @wismill, looks good!

I still want to read over xkb_compose_table_iterator_next more carefully, but left a few comments in the meantime.

include/xkbcommon/xkbcommon-compose.h Outdated Show resolved Hide resolved
include/xkbcommon/xkbcommon-compose.h Outdated Show resolved Hide resolved
include/xkbcommon/xkbcommon-compose.h Outdated Show resolved Hide resolved
include/xkbcommon/xkbcommon-compose.h Outdated Show resolved Hide resolved
* @since 1.6.0
*/
struct xkb_compose_table_iterator *
xkb_compose_table_iterator_new(struct xkb_compose_table *table);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that if we ever want to modify this, e.g. add support for preserving the original entries order, then we should add a flags argument to future proof it. BUT, I think that's unlikely enough that it's not worth it. We would just add a xkb_compose_table_iterator_new2 function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the use case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like support for preserving the original entries order. But as I said we can omit the flags now as it's probably not going to be needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand the preserving order part, but where would it be useful? One use case I see is for cleaning/normalizing compose files, but as we do not keep comments nor white spaces, such a hypothetical tool will still have to parse and compare resulting entries with xkbcommon.

tools/compose.c Outdated Show resolved Hide resolved
src/compose/table.c Show resolved Hide resolved
src/compose/table.c Outdated Show resolved Hide resolved
src/compose/table.c Outdated Show resolved Hide resolved
src/compose/table.c Outdated Show resolved Hide resolved
@bluetech
Copy link
Member

I read over the iterator_next code. Nice work, it's not easy to convert recursion to an iterator like that.

I think code can be streamlined a bit to make it more directly analogous to the recursive version. Below is my suggestion. I didn't test it really beyond running your new tests and verifiying that the output stays the same on en_US.UTF8/Compose. Let me know what you think.

diff --git a/src/compose/table.c b/src/compose/table.c
index 1ff26b3..403309e 100644
--- a/src/compose/table.c
+++ b/src/compose/table.c
@@ -312,28 +312,47 @@ xkb_compose_table_iterator_current(struct xkb_compose_table_iterator *iter)
 XKB_EXPORT struct xkb_compose_table_entry *
 xkb_compose_table_iterator_next(struct xkb_compose_table_iterator *iter)
 {
+    /*
+     * This function takes the following recursive traversal function,
+     * and makes it non-recursive and resumable. The iter->cursors stack
+     * is analogous to the call stack, and cursor->direction to the
+     * instruction pointer of a stack frame.
+     *
+     *    traverse(xkb_keysym_t *sequence, size_t sequence_length, uint16_t p) {
+     *        if (!p) return
+     *        // cursor->direction == NODE_LEFT
+     *        node = &darray_item(table->nodes, p)
+     *        traverse(sequence, sequence_length, node->lokid)
+     *        // cursor->direction == NODE_DOWN
+     *        sequence[sequence_length++] = node->keysym
+     *        if (node->is_leaf)
+     *            emit(sequence, sequence_length, node->leaf.keysym, table->utf[node->leaf.utf8])
+     *        else
+     *            traverse(sequence, sequence_length, node->internal.eqkid)
+     *        sequence_length--
+     *        // cursor->direction == NODE_RIGHT
+     *        traverse(sequence, sequence_length, node->hikid)
+     *        // cursor->direction == NODE_UP
+     *    }
+     */
+
     struct xkb_compose_table_iterator_cursor *cursor;
-    struct xkb_compose_table_iterator_cursor new_cursor;
     const struct compose_node *node;
 
-    cursor = &darray_item(iter->cursors, iter->cursors.size - 1);
-    new_cursor.direction = NODE_LEFT;
-
-    while (cursor->direction < NODE_UP) {
+    while (!darray_empty(iter->cursors)) {
+        cursor = &darray_item(iter->cursors, iter->cursors.size - 1);
         node = &darray_item(iter->table->nodes, cursor->node_offset);
 
-        if (cursor->direction == NODE_LEFT) {
-            /* Left node */
+        switch (cursor->direction) {
+        case NODE_LEFT:
             cursor->direction = NODE_DOWN;
             if (node->lokid) {
-                new_cursor.node_offset = node->lokid;
+                struct xkb_compose_table_iterator_cursor new_cursor = {node->lokid, NODE_LEFT};
                 darray_append(iter->cursors, new_cursor);
-                cursor = &darray_item(iter->cursors, iter->cursors.size - 1);
-                continue;
             }
-        }
-        if (cursor->direction == NODE_DOWN) {
-            /* Down node */
+            break;
+
+        case NODE_DOWN:
             cursor->direction = NODE_RIGHT;
             assert (iter->entry.sequence_length <= MAX_LHS_LEN);
             iter->entry.sequence[iter->entry.sequence_length] = node->keysym;
@@ -343,29 +362,23 @@ xkb_compose_table_iterator_next(struct xkb_compose_table_iterator *iter)
                 iter->entry.utf8 = &darray_item(iter->table->utf8, node->leaf.utf8);
                 return &iter->entry;
             } else {
-                new_cursor.node_offset = node->internal.eqkid;
+                struct xkb_compose_table_iterator_cursor new_cursor = {node->internal.eqkid, NODE_LEFT};
                 darray_append(iter->cursors, new_cursor);
-                cursor = &darray_item(iter->cursors, iter->cursors.size - 1);
-                continue;
             }
-        }
-        cursor->direction = NODE_UP;
-        iter->entry.sequence_length--;
-        if (node->hikid) {
-            /* Right node */
-            new_cursor.node_offset = node->hikid;
-            darray_append(iter->cursors, new_cursor);
-            cursor = &darray_item(iter->cursors, iter->cursors.size - 1);
-            continue;
-        } else {
-            /* Look for next node in parents */
-            while (iter->cursors.size > 1) {
-                iter->cursors.size--;
-                cursor = &darray_item(iter->cursors, iter->cursors.size - 1);
-                if (cursor->direction < NODE_UP) {
-                    break;
-                }
+            break;
+
+        case NODE_RIGHT:
+            cursor->direction = NODE_UP;
+            iter->entry.sequence_length--;
+            if (node->hikid) {
+                struct xkb_compose_table_iterator_cursor new_cursor = {node->hikid, NODE_LEFT};
+                darray_append(iter->cursors, new_cursor);
             }
+            break;
+
+        case NODE_UP:
+            iter->cursors.size--;
+            break;
         }
     }
 

@wismill
Copy link
Member Author

wismill commented Sep 16, 2023

Rebased. @bluetech your version is slightly slower in the new benchmark. But I think we can still do better (faster), as the current implementation with “direction” is maybe a bit naive. I have something in the works, I will upload it soon.

@wismill
Copy link
Member Author

wismill commented Sep 17, 2023

@bluetech Let’s continue the experience: I added a new implemention and restore the other ones, guarded by #if macros (look for ITER_IMPL and USE_FOREACH to choose).

Implementation Execution time (s) for 10,000 iterations
Foreach (reference) 1.200935
Iterator 1 (with bit field) 2.425992
Iterator 1 (without bit field) 2.319385
Iterator 2 2.829882
Iterator 3 (with bit field) 2.042240
Iterator 3 (without bit field) 1.683946

As you can see, the latest implementation is the closest to the reference foreach one. Because we need to keep track of the current position, we cannot beat the foreach implementation. I wonder how we could get closer though.

@wismill wismill added this to the 1.6.0 milestone Sep 19, 2023
Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the Iterator 2 solution because it corresponds directly to the obviously-correct recursive code, so was easiest to understand and assess correctness.

I haven't read the Iterator 3 solution in detail yet. Seems like its the fastest, though all solutions are plenty fast for the task at hand, so my suggestion is to prefer to maintainability/correctness aspects over performance.

If you think Iterator 3 is cleaner than Iterator 2 then let me know and I'll read it over. If not, my suggestion is to go with Iterator 2. But I'll leave the decision to you :)

include/xkbcommon/xkbcommon-compose.h Show resolved Hide resolved
tools/compose.c Outdated Show resolved Hide resolved
@wismill
Copy link
Member Author

wismill commented Sep 24, 2023

If you think Iterator 3 is cleaner than Iterator 2 then let me know and I'll read it over. If not, my suggestion is to go with Iterator 2. But I'll leave the decision to you :)

@bluetech I understand your argument. My POV is that once I have a correct implementation, I do like to improve it to make it efficient. Implementation 3 runs in about 40% less time that implementation 2, which is not negligible.

I have the feeling that implementation 3 has good perf but is still too verbose. I would appreciate a review, but I know we all have limited time for this project.

So I propose the following: we can go with implementation 2 for now, and iterate on better implementation in a follow-up PR. This way we ensure it will make for 1.6.0 release without delaying it further.

@bluetech
Copy link
Member

So I propose the following: we can go with implementation 2 for now, and iterate on better implementation in a follow-up PR. This way we ensure it will make for 1.6.0 release without delaying it further.

👍

@wismill wismill force-pushed the compose/traverse branch 2 times, most recently from 32f3e18 to 4886df2 Compare September 25, 2023 07:53
@wismill
Copy link
Member Author

wismill commented Sep 25, 2023

@bluetech I removed alternative implementations and squashed all the commits. I left your original commit atm. Are you ok to squash it as well? I used the Co-authored-by in the last commit with your name.

@wismill wismill marked this pull request as ready for review September 25, 2023 07:58
@wismill wismill mentioned this pull request Sep 25, 2023
1 task
Copy link
Member

@bluetech bluetech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, feel free to squash.

include/xkbcommon/xkbcommon-compose.h Show resolved Hide resolved
@wismill
Copy link
Member Author

wismill commented Sep 25, 2023

@bluetech Squashed and rebased.

Remaining items of the original commit’s TODO:

  • Convert tools/compose.c to an xkbcli compose:
    • Better names for the --locale arguments.
    • Add --format (currently only text-v1).
    • Check return value of printfs.
    • Add to xkbcli.
    • Test it in test/tool-option-parsing.py.
    • Write manpage.

I think this belongs to another PR, as this tool was not introduced by the current PR. See #381.

Allow users to iterate the entries in a compose table. This is useful
for other projects which want programmable access to the sequences,
without having to write their own parser.

- New API:
  - `xkb_compose_table_entry_sequence`;
  - `xkb_compose_table_entry_keysym`;
  - `xkb_compose_table_entry_utf8`;
  - `xkb_compose_table_iterator_new`;
  - `xkb_compose_table_iterator_free`;
  - `xkb_compose_table_iterator_next`.
- Add tests in `test/compose.c`.
- Add benchmark for compose traversal.
- `tools/compose.c`:
  - Print entries instead of just validating them.
  - Add `--file` option.
  - TODO: make this tool part of the xkbcli commands.

Co-authored-by: Pierre Le Marre <[email protected]>
Co-authored-by: Ran Benita <[email protected]>
Signed-off-by: Ran Benita <[email protected]>
@wismill
Copy link
Member Author

wismill commented Sep 26, 2023

Reworked the commit message. Will merge when CI is successful.

@wismill wismill merged commit a177013 into xkbcommon:master Sep 26, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compose Indicates a need for improvements or additions to Compose handling enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add iteration API for compose
2 participants