Skip to content

Commit

Permalink
perf(perser-adaper-json): fold syntactic analysis phases
Browse files Browse the repository at this point in the history
Code of the parser have been consolidate into idiomatic
parsing phases: lexical and syntactic analysis.

There are two types of syntactic analysis: direct and indirect.

Indirect syntactic analysis is refactored and simplified original code
which turncs CST into JSON AST and then into ApiDOM.
Direct one is a new one which turns CST directly into ApiDOM.
Switching syntactic analysis is possible via configuration option.

Performance of direct syntactic analysis has been increased by
300% comparsing to original code.

Performance of indirect syntactic analysis has been increased by
10% comparing to original code and has been significantly simplified.

Closes #406
  • Loading branch information
char0n committed May 19, 2021
1 parent 633b3fb commit c983b47
Show file tree
Hide file tree
Showing 36 changed files with 1,725 additions and 995 deletions.
1 change: 1 addition & 0 deletions apidom/packages/apidom-ast/src/Error.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import Node from './Node';

interface Error extends Node {
value: unknown;
isUnexpected: boolean;
}

const Error: stampit.Stamp<Error> = stampit(Node, {
Expand Down
2 changes: 2 additions & 0 deletions apidom/packages/apidom-ast/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ export { default as JsonTrue } from './json/nodes/JsonTrue';
export { default as JsonFalse } from './json/nodes/JsonFalse';
export { default as JsonNull } from './json/nodes/JsonNull';
export {
isDocument as isJsonDocument,
isFalse as isJsonFalse,
isProperty as isJsonProperty,
isStringContent as isJsonStringContent,
Expand Down Expand Up @@ -59,6 +60,7 @@ export { default as Literal } from './Literal';
export { Point, default as Position } from './Position';
export { default as Error } from './Error';
export { default as ParseResult } from './ParseResult';
export { isParseResult, isLiteral, isPoint, isPosition } from './predicates';
// AST traversal related exports
export {
getVisitFn,
Expand Down
2 changes: 2 additions & 0 deletions apidom/packages/apidom-ast/src/json/nodes/predicates.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import { isNodeType } from '../../predicates';

export const isDocument = isNodeType('document');

export const isString = isNodeType('string');

export const isFalse = isNodeType('false');
Expand Down
1 change: 1 addition & 0 deletions apidom/packages/apidom-ls/test/openapi-json-async.ts
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ describe('apidom-ls-async', function () {

assert.deepEqual(result, expected as Diagnostic[]);
doc = TextDocument.create('foo://bar/file.json', 'json', 0, specError);
console.dir(doc);
result = await languageService.doValidation(doc, validationContext);

assert.deepEqual(result, [
Expand Down
53 changes: 48 additions & 5 deletions apidom/packages/apidom-parser-adapter-json/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,54 @@
# apidom-parser-adapter-json

`apidom-parser-adapter-json` is a parser adapter for the [JSON format](https://www.json.org/json-en.html).
This parser adapter uses [tree-sitter](https://www.npmjs.com/package/tree-sitter) / [web-tree-sitter](https://www.npmjs.com/package/web-tree-sitter) as an underlying parser.
Tree-sitter uses [tree-sitter-json grammar](https://www.npmjs.com/package/tree-sitter-json) to produce [CST](https://tree-sitter.github.io/tree-sitter/using-parsers#syntax-nodes) from a source string.

[CST](https://tree-sitter.github.io/tree-sitter/using-parsers#syntax-nodes) produced by tree-sitter parser is [syntactically analyzed](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/parser/syntactic-analysis.ts) and [JSON AST](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom-ast#json-ast-nodes) is produced.
JSON AST is then transformed into generic ApiDOM structure using [base ApiDOM namespace](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom#base-namespace).
[CST](https://tree-sitter.github.io/tree-sitter/using-parsers#syntax-nodes) produced by lexical analysis is [syntactically analyzed](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/syntactic-analysis) and
and ApiDOM structure using [base ApiDOM namespace](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom#base-namespace) is produced.
[JSON AST](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom-ast#json-ast-nodes) is produced.


## Parse phases

The parse stage takes JSON string and producesApiDOM structure using [base ApiDOM namespace](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom#base-namespace) is produced.
There are two phases of parsing: **Lexical Analysis** and **Syntactic Analysis**.

### Lexical Analysis

Lexical Analysis will take a string of code and turn it into a stream of tokens.
[tree-sitter](https://www.npmjs.com/package/tree-sitter) / [web-tree-sitter](https://www.npmjs.com/package/web-tree-sitter) is used as an underlying lexical analyzer.

### Syntactic Analysis

Syntactic Analysis will take a stream of tokens and turn it into an ApiDOM representation.
[CST](https://tree-sitter.github.io/tree-sitter/using-parsers#syntax-nodes) produced by lexical analysis is [syntactically analyzed](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/syntactic-analysis)
and ApiDOM structure using [base ApiDOM namespace](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom#base-namespace) is produced.

#### [Direct Syntactical analysis](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/syntactic-analysis/direct)

This analysis directly turns tree-sitter CST into ApiDOM. Single traversal is required which makes
it super performant, and it's the default analysis used.

```js
import { parse } from 'apidom-parser-adapter-json';

const parseResult = await adapter.parse('{"prop": "value"}', {
syntacticAnalysis: 'direct',
});
```

#### [Indirect Syntactic analysis]((https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/syntactic-analysis/indirect))

This analysis turns trees-sitter CST into [JSON AST](https://github.com/swagger-api/apidom/tree/master/apidom/packages/apidom-ast#json-ast-nodes) representation.
Then JSON AST is turned into ApiDOM. Two traversals are required, which makes indirect analysis less performant than direct one.
Thought less performant, having JSON AST representation allows us to do further complex analysis.

```js
import { parse } from 'apidom-parser-adapter-json';

const parseResult = await adapter.parse('{"prop": "value"}', {
syntacticAnalysis: 'indirect',
});
```

## Parser adapter API

Expand Down Expand Up @@ -34,8 +77,8 @@ This adapter exposes an instance of [base ApiDOM namespace](https://github.com/s

Option | Type | Default | Description
--- | --- | --- | ---
<a name="specObj"></a>`specObj` | `Object` | [Specification Object](https://github.com/swagger-api/apidom/blob/master/apidom/packages/apidom-parser-adapter-json/src/parser/specification.ts) | This specification object drives the JSON AST transformation to base ApiDOM namespace.
<a name="sourceMap"></a>`sourceMap` | `Boolean` | `false` | Indicate whether to generate source maps.
<a name="syntacticAnalysis"></a>`syntacticAnalysis` | `String` | `direct` | Indicate type of syntactic analysis

All unrecognized arbitrary options will be ignored.

Expand Down
5 changes: 4 additions & 1 deletion apidom/packages/apidom-parser-adapter-json/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,10 @@
"clean": "rimraf ./es ./cjs ./dist ./types",
"typescript:check-types": "tsc --noEmit",
"typescript:declaration": "tsc -p declaration.tsconfig.json",
"test": "cross-env BABEL_ENV=cjs mocha"
"test": "cross-env BABEL_ENV=cjs mocha",
"perf": "cross-env BABEL_ENV=cjs node ./test/perf/index.js",
"perf:parsing-syntactic-analysis-direct": "cross-env BABEL_ENV=cjs node ./test/perf/parsing-syntactic-analysis-direct.js",
"perf:parsing-syntactic-analysis-indirect": "cross-env BABEL_ENV=cjs node ./test/perf/parsing-syntactic-analysis-indirect.js"
},
"author": "Vladimir Gorej",
"license": "Apache-2.0",
Expand Down
35 changes: 33 additions & 2 deletions apidom/packages/apidom-parser-adapter-json/src/adapter-browser.ts
Original file line number Diff line number Diff line change
@@ -1,2 +1,33 @@
export { default as parse, namespace } from './parser/index-browser';
export { detect, mediaTypes } from './adapter';
import { ParseResultElement } from 'apidom';

import lexicallyAnalyze from './lexical-analysis/browser';
import syntacticallyAnalyzeDirectly from './syntactic-analysis/direct';
import syntacticallyAnalyzeIndirectly from './syntactic-analysis/indirect';

export { detect, mediaTypes, namespace } from './adapter';

interface ParseFunctionOptions {
sourceMap?: boolean;
syntacticAnalysis?: 'direct' | 'indirect';
}

type ParseFunction = (
source: string,
options?: ParseFunctionOptions,
) => Promise<ParseResultElement>;

export const parse: ParseFunction = async (
source,
{ sourceMap = false, syntacticAnalysis = 'direct' } = {},
) => {
const cst = await lexicallyAnalyze(source);
let apiDOM;

if (syntacticAnalysis === 'indirect') {
apiDOM = syntacticallyAnalyzeIndirectly(cst, { sourceMap });
} else {
apiDOM = syntacticallyAnalyzeDirectly(cst, { sourceMap });
}

return apiDOM;
};
35 changes: 33 additions & 2 deletions apidom/packages/apidom-parser-adapter-json/src/adapter-node.ts
Original file line number Diff line number Diff line change
@@ -1,2 +1,33 @@
export { default as parse, namespace } from './parser/index-node';
export { mediaTypes, detect } from './adapter';
import { ParseResultElement } from 'apidom';

import lexicallyAnalyze from './lexical-analysis/node';
import syntacticallyAnalyzeDirectly from './syntactic-analysis/direct';
import syntacticallyAnalyzeIndirectly from './syntactic-analysis/indirect';

export { detect, mediaTypes, namespace } from './adapter';

interface ParseFunctionOptions {
sourceMap?: boolean;
syntacticAnalysis?: 'direct' | 'indirect';
}

type ParseFunction = (
source: string,
options?: ParseFunctionOptions,
) => Promise<ParseResultElement>;

export const parse: ParseFunction = async (
source,
{ sourceMap = false, syntacticAnalysis = 'direct' } = {},
) => {
const cst = await lexicallyAnalyze(source);
let apiDOM;

if (syntacticAnalysis === 'indirect') {
apiDOM = syntacticallyAnalyzeIndirectly(cst, { sourceMap });
} else {
apiDOM = syntacticallyAnalyzeDirectly(cst, { sourceMap });
}

return apiDOM;
};
4 changes: 4 additions & 0 deletions apidom/packages/apidom-parser-adapter-json/src/adapter.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import { createNamespace } from 'apidom';

export const mediaTypes = ['application/json'];

export const detect = async (source: string): Promise<boolean> => {
Expand All @@ -8,3 +10,5 @@ export const detect = async (source: string): Promise<boolean> => {
}
return true;
};

export const namespace = createNamespace();
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import { tail } from 'ramda';
import { isString, isFunction } from 'ramda-adjunct';
// @ts-ignore
import treeSitterWasm from 'web-tree-sitter/tree-sitter.wasm';

// patch fetch() to let emscripten load the WASM file
const realFetch = globalThis.fetch;

if (isFunction(realFetch)) {
globalThis.fetch = (...args) => {
// @ts-ignore
if (isString(args[0]) && args[0].endsWith('/tree-sitter.wasm')) {
// @ts-ignore
return realFetch.apply(globalThis, [treeSitterWasm, tail(args)]);
}
return realFetch.apply(globalThis, args);
};
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import './browser-patch';

import Parser, { Tree } from 'web-tree-sitter';
// @ts-ignore
import treeSitterJson from 'tree-sitter-json/tree-sitter-json.wasm';

/**
* We initialize the WebTreeSitter as soon as we can.
*/
const parserP = (async () => {
await Parser.init();
await Parser.Language.load(treeSitterJson);

return new Parser();
})();

/**
* Lexical Analysis of source string using WebTreeSitter.
* This is WebAssembly version of TreeSitters Lexical Analysis.
*/
const analyze = async (source: string): Promise<Tree> => {
const parser = await parserP;
return parser.parse(source);
};

export default analyze;
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import Parser, { Tree } from 'tree-sitter';
// @ts-ignore
import JSONLanguage from 'tree-sitter-json';

const parser = new Parser();
parser.setLanguage(JSONLanguage);

/**
* Lexical Analysis of source string using TreeSitter.
* This is Node.js version of TreeSitters Lexical Analysis.
*/
const analyze = async (source: string): Promise<Tree> => {
return parser.parse(source);
};

export default analyze;

This file was deleted.

This file was deleted.

This file was deleted.

51 changes: 0 additions & 51 deletions apidom/packages/apidom-parser-adapter-json/src/parser/index.ts

This file was deleted.

Loading

0 comments on commit c983b47

Please sign in to comment.