Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-28751][Table SQL/Runtime] Optimize the performance of the json… #20397

Merged
merged 1 commit into from
Aug 30, 2022

Conversation

Aitozi
Copy link
Contributor

@Aitozi Aitozi commented Jul 30, 2022

… functions

What is the purpose of the change

This PR is meant to improve the performance of the built in json functions. The default LRUCache used in the JsonPath CacheProvider heavily use the lock which bring the bad performance.

Brief change log

  • Create a new JsonPathCache and set it to the CacheProvider when load the class

Verifying this change

The functionality is covered by the existing tests. I have not written a performance test for it, If needed, I will add one. I manually test the case with the production job, which will have 2~4 times performance improvement.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 30, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@wuchong
Copy link
Member

wuchong commented Aug 1, 2022

Do you have any benchmark between the two implementations?

@Aitozi
Copy link
Contributor Author

Aitozi commented Aug 1, 2022

I have not written a benchmark test for it now, I only run a production test which seen 2~4 times improved. Do you mean add an extra test for it in the flink-benchmark project? @wuchong

@Aitozi
Copy link
Contributor Author

Aitozi commented Aug 2, 2022

@wuchong I wrote a test with

  • 4 threads
  • 400 cache item

The result shows below:

Benchmark                 Mode  Cnt     Score     Error   Units
GuavaCacheBenchmark.get  thrpt   30  4480.563 ± 203.311  ops/ms
GuavaCacheBenchmark.put  thrpt   30  1774.769 ± 119.198  ops/ms

LRUCacheBenchmark.get  thrpt   30  441.239 ±  2.812  ops/ms
LRUCacheBenchmark.put  thrpt   30  350.549 ± 12.285  ops/ms

The results show that the new cache will bring 5~10 times throughputs

@Aitozi
Copy link
Contributor Author

Aitozi commented Aug 8, 2022

ping @wuchong

@Aitozi
Copy link
Contributor Author

Aitozi commented Aug 20, 2022

@wuchong any further comments ?

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

wuchong pushed a commit to Aitozi/flink that referenced this pull request Aug 26, 2022
@wuchong
Copy link
Member

wuchong commented Aug 28, 2022

@flinkbot run azure

@Aitozi
Copy link
Contributor Author

Aitozi commented Aug 30, 2022

Rebase and force pushed

@wuchong wuchong merged commit 2220f24 into apache:master Aug 30, 2022
huangxiaofeng10047 pushed a commit to huangxiaofeng10047/flink that referenced this pull request Nov 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants