Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second round of performance profiling #691

Open
Tracked by #385
char0n opened this issue Sep 29, 2021 · 11 comments
Open
Tracked by #385

Second round of performance profiling #691

char0n opened this issue Sep 29, 2021 · 11 comments
Labels
ApiDOM enhancement New feature or request

Comments

@char0n
Copy link
Member

char0n commented Sep 29, 2021

This issue is specific to apidom-parser-adapter-json, but the optimization that comes out of this issue could be possibly used in every package. It seems that we spend a lot of time in syntactic analysis phase, specifically in building minim objects. This needs to be inspected and profiled; it seems that the building time is unreasonably long.

 * Parse stage: 112,61 ms
 *   Lexical Analysis phase: 1,42 ms
 *   Syntactic Analysis phase: 85,91 ms
 *     Traversing time: 0,27 ms
 *     Building time: 85,64 ms
 * Refract stage: 18,18 ms

Node internal profiler mechanism is used to identify code which needs to be optimized.

Refs #385

@char0n char0n added enhancement New feature or request ApiDOM labels Sep 29, 2021
@char0n char0n self-assigned this Sep 29, 2021
@char0n char0n changed the title * Parse stage: 112,61 ms * Lexical Analysis phase: 1,42 ms * Syntactic Analysis phase: 85,91 ms * Traversing time: 0,27 ms * Building time: 85,64 ms * Refract stage: 18,18 ms Second round of performance profiling Sep 29, 2021
@char0n
Copy link
Member Author

char0n commented Oct 21, 2021

Real world fixtures are being integrated in #792 and #802

@char0n
Copy link
Member Author

char0n commented Oct 31, 2021

JSON direct syntactic analyzer (layer that translates tree-sitter CST into generic ApiDOM). Here are some numbers how ApiDOM parsing speed increased after this perf optimization.

Before

 * Parse stage: 112,61 ms
 *   Lexical Analysis phase: 1,42 ms
 *   Syntactic Analysis phase: 85,91 ms
 *     Traversing time: 0,27 ms
 *     Building time: 85,64 ms
 * Refract stage: 18,18 ms

After

 * Parse stage: 24,15 ms
 *   Lexical Analysis phase: 1,28 ms
 *   Syntactic Analysis phase: 14,99 ms
 *     Traversing time: 0,59 ms
 *     Building time: 14,39 ms
 * Refract stage: 15,75 ms

@char0n
Copy link
Member Author

char0n commented Oct 31, 2021

JSON indirecty syntactic analysis optimization results:

Before

4.45 ops/sec ±0.92% (621 runs sampled)

After

11.07 ops/sec ±0.90% (650 runs sampled)

@char0n
Copy link
Member Author

char0n commented Oct 31, 2021

YAML syntactic analysis optimization results:

Before

4.77 ops/sec ±0.82% (623 runs sampled)

After

9.93 ops/sec ±1.11% (642 runs sampled)

char0n added a commit that referenced this issue Oct 31, 2021
@char0n
Copy link
Member Author

char0n commented Nov 1, 2021

During the weekend I did extensive optimizations on all our syntactic analyzers - YAML and JSON. I managed to increase the syntactic analysis phase by another factor of 2 - parsing is now two times faster again. It's not much, but it's something at least...

During the optimizations I've managed to identify the biggest performance problem that we currently have. It's unfortunately not in our code, but in tree-sitter binding code. It takes extreme amount of time to access tree-sitter CST nodes and it's attributes. On relatively small CST trees (300 lines of YAML), the traversal of CST takes more than 20 ms. Access time of tree sitter CST seems to be linear, so for 3000 lines of YAML we get around 200 ms of just traversal time. I've created an issue with the tree-sitter, to find out If I'm doing something wrong or this is something expected.

Just for comparison:

  • widely used yaml-js library (written in pure JS) can parse the 300 lines of YAML in 0.5 ms.
  • yaml library (written in pure JS) produces CST internally and then transformes into JavaScript AST in under 5 ms

tree-sitter/tree-sitter#1469

@char0n
Copy link
Member Author

char0n commented Nov 1, 2021

tree-sitter cursor traversal needs to be used to get the performance we need. Full CST traversal is around 10x faster when used.

POC of cursor traversal:

  const cursor = cst.walk();

  let reached_root = false;
  while (!reached_root) {
    //github.com/tree-sitter/py-tree-sitter/issues/33
    https: const type = cursor.nodeType;
    console.dir(cursor.nodeText);

    if (cursor.gotoFirstChild()) {
      continue;
    }

    if (cursor.gotoNextSibling()) {
      continue;
    }

    let retracting = true;
    while (retracting) {
      if (!cursor.gotoParent()) {
        retracting = false;
        reached_root = true;
      }

      if (cursor.gotoNextSibling()) {
        retracting = false;
      }
    }
  }

char0n added a commit that referenced this issue Nov 29, 2021
@char0n
Copy link
Member Author

char0n commented Nov 29, 2021

In #933 I've implemented two specialized iterators that provide optimized access to tree-sitter CST. Accessing the tree-sitter CST via cursor mechanism is 10 times faster. As cursor traversal is not compatible with our own ApiDOM traversal, adapter in form of two (2) iterators have been created.

PreOrderCursorChildrenIterator

This iterator uses Preorder Depth First traversal to create tree structure compatible with tree-sitter Tree using tree-sitter cursor traversal mechanism which provides cached and optimized access to CST.

PreOrderCursorIterator

This iterator uses Preorder Depth First traversal to create list of tree-sitter SyntaxNode like structures using tree-sitter cursor traversal mechanism which provides cached and optimized access to CST.

Currently these iterators are not yet utilized as their utilization is blocked by the following PR I've issued for tree-sitter Node.js binding. If we utilized them now without the PR we'll see performance degradation in parsing speed for Node.js env, but increased performance for Browser env (which might be acceptable for now).

Performance improvement

According to benchmarks and profiling I've made, by using iterators the performance increased by around 400% (4 times), from 45 ops/s to 171 ops/s.

@char0n
Copy link
Member Author

char0n commented Dec 13, 2021

Current status: waiting for tree-sitter/node-tree-sitter#96 to be merged to utilize the cursor traversal iterators and increase the parsing speed.

@char0n char0n removed their assignment Dec 13, 2021
@char0n
Copy link
Member Author

char0n commented Jun 27, 2023

tree-sitter/node-tree-sitter#96 has been finally merged. We need to wait for new release of tree-sitter and we can utilize the cursor traveral.

@char0n
Copy link
Member Author

char0n commented Jun 28, 2023

[email protected] has been released and tree-sitter cursor traversal has been utilized in apidom-parser-adapter-json. The performance of JSON syntactic analysis has been increased by 100% (direct & indirect).

@char0n
Copy link
Member Author

char0n commented Jul 15, 2023

[email protected] has been released and tree-sitter cursor traversal has been utilized in apidom-parser-adapter-yaml-1-2. The performance of YAML 1.2 syntactic analysis is now 7x faster (indirect).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ApiDOM enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant