Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pushing large number of items in tree iterators #12172

Merged
merged 2 commits into from
Apr 21, 2023

Conversation

martin-fleck-at
Copy link
Contributor

What it does

  • Provide utility to push in a callstack-safe manner
  • Add tests

Fixes #12171

How to test

Test cases are included in the PR.

Review checklist

Reminder for reviewers

@martin-fleck-at
Copy link
Contributor Author

I believe the build failure has nothing to do with my change, it seems to happen on all open PRs.

Copy link
Contributor

@tsmaeder tsmaeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I ask myself: "does this PR make the code better/easier to understand/faster", I'm not sure we're going in the right direction. Do we have measurements that support these performance optimisations?

stack.push(root);
while (stack.length > 0) {
const top = stack.pop()!;
yield top;
stack.push(...(children(top) || []).filter(include).reverse());
stack = ArrayUtils.pushAll(stack, (children(top) || []).filter(include).reverse());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a piece of micro-optimized code, it seems rather strange to do first a copy of the array, then a filter, then a "reverse".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure why it was done that way but I didn't want to touch existing functionality without fully understanding the logic behind it.

// typically faster but might fail depending on the number of items and the callstack size
array.push(...items);
return array;
} catch (error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really would advise against this. Did someone actually ever measure the impact of TreeIterator performance on real world performance in Theia? Is our version even better than just doing a recursive version? VMs usually can optimize tail recursion into iteration anyway these days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean exactly by our version vs recursive version? In our scenario we had a table will millions of data sets and everything was still working very quickly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplest depth first tree traversal is recursive. We're using a stack here, for some reason. Since I couldn't find any info why we're using a more complicated version of tree traversal here, I feel it may be a case of premature optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested a little bit locally (nothing scientific for sure) against this recursive version:

    export function* depthFirst<T>(root: T, ...): IterableIterator<T> {
        yield root;
        const sortedChildren = (children(root) || []).filter(include).reverse();
        for (const child of sortedChildren) {
            yield* depthFirst(child, children);
        }
    }

and I'd say with 5.000.000 I saw about 10-15% improvement with the stack-version vs the recursive version. So I think it is useful to keep it like that for now. It also matches the breadth-first search so it is not that hard to read.

@martin-fleck-at
Copy link
Contributor Author

@tsmaeder Thank you for having such a quick look! It's not really a performance optimization and more of a "getting it to work at all" with large number of items. I did a very quick performance here: https://jsbench.me/rjle2jyvie. So basically concat was faster than a forEach but I am not sure what you are suggesting.

@martin-fleck-at martin-fleck-at removed the request for review from msujew February 17, 2023 14:45
@tsmaeder
Copy link
Contributor

What I'm suggesting is that in real life, it might not really make a difference which is faster. How long are we talking in absolute terms? If a user interaction takes 0.12 instead of 0.08 seconds, it's really does not matter and we should simply not optimize for speed. The relative speeds are not really relevant. Do we have to use the spread operator, knowing it might blow the stack?

@martin-fleck-at
Copy link
Contributor Author

martin-fleck-at commented Feb 17, 2023

@tsmaeder Now I get it, sorry for the confusion! I didn't do any particular measurements in this direction yet but as a user it really didn't matter whether we use the spread operator on smaller lists or use concat on them in the first place - I didn't try to forEach variant in my use case. So in my opinion, the array probably has to be enormous for the user to notice any difference. However, it may also depend on the size of the nodes in the tree?

So we should remove the util again and simply use one of the methods, right? Do you have any preference which method to use?

@tsmaeder
Copy link
Contributor

it may also depend on the size of the nodes in the tree

Probably not: the array should reference objects on the heap (pointers)

I would just remove the spread, unless we can observe noticeably worse performance with "concat".

@martin-fleck-at
Copy link
Contributor Author

@tsmaeder Thanks again for having a look. I pushed an update where I simply replace the spread with the concat.

Copy link
Contributor

@tsmaeder tsmaeder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me now.

This was referenced Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Maximum call stack size exceeded on tree selection
2 participants