Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the simdutf library #101

Closed
wants to merge 1 commit into from
Closed

Use the simdutf library #101

wants to merge 1 commit into from

Conversation

lpinca
Copy link
Member

@lpinca lpinca commented Aug 3, 2021

Here are some benchmarks using the uv benchmark suite:

$ npx envinfo --system

  System:
    OS: macOS 11.5
    CPU: (16) x64 Intel(R) Xeon(R) W-2140B CPU @ 3.20GHz
    Memory: 19.39 GB / 32.00 GB
    Shell: 5.1.8 - /usr/local/bin/bash
$ node bench.js 
Loading https://en.wikipedia.org/wiki/Main_Page ...
uv x 17,911 ops/sec ±0.08% (93 runs sampled)
utf-8-validate (5.0.5, C++) x 110,868 ops/sec ±0.09% (96 runs sampled)
utf-8-validate (simdutf, C++) x 698,016 ops/sec ±0.16% (95 runs sampled)
utf-8-validate (5.0.5, JS) x 10,086 ops/sec ±0.08% (99 runs sampled)
isutf8 x 12,411 ops/sec ±0.43% (97 runs sampled)
------------------------------------------------------------

Loading https://ro.wikipedia.org/wiki/Pagina_principală ...
uv x 7,570 ops/sec ±0.51% (95 runs sampled)
utf-8-validate (5.0.5, C++) x 25,982 ops/sec ±0.46% (95 runs sampled)
utf-8-validate (simdutf, C++) x 160,639 ops/sec ±0.40% (93 runs sampled)
utf-8-validate (5.0.5, JS) x 5,091 ops/sec ±0.32% (96 runs sampled)
isutf8 x 6,293 ops/sec ±0.11% (100 runs sampled)
------------------------------------------------------------

Loading https://ru.wikipedia.org/wiki/Заглавная_страница ...
uv x 7,467 ops/sec ±0.28% (99 runs sampled)
utf-8-validate (5.0.5, C++) x 16,716 ops/sec ±0.41% (93 runs sampled)
utf-8-validate (simdutf, C++) x 193,080 ops/sec ±0.19% (92 runs sampled)
utf-8-validate (5.0.5, JS) x 5,011 ops/sec ±0.08% (98 runs sampled)
isutf8 x 6,298 ops/sec ±0.25% (99 runs sampled)
------------------------------------------------------------

Loading https://ar.wikipedia.org/wiki/الصفحة_الرئيسية ...
uv x 5,784 ops/sec ±0.07% (97 runs sampled)
utf-8-validate (5.0.5, C++) x 12,480 ops/sec ±0.09% (98 runs sampled)
utf-8-validate (simdutf, C++) x 153,133 ops/sec ±0.15% (97 runs sampled)
utf-8-validate (5.0.5, JS) x 4,113 ops/sec ±0.06% (98 runs sampled)
isutf8 x 5,148 ops/sec ±0.09% (98 runs sampled)
------------------------------------------------------------

Loading https://ja.wikipedia.org/wiki/メインページ ...
uv x 10,007 ops/sec ±0.08% (97 runs sampled)
utf-8-validate (5.0.5, C++) x 23,876 ops/sec ±0.09% (94 runs sampled)
utf-8-validate (simdutf, C++) x 225,834 ops/sec ±0.15% (98 runs sampled)
utf-8-validate (5.0.5, JS) x 6,908 ops/sec ±0.09% (98 runs sampled)
isutf8 x 6,832 ops/sec ±0.09% (99 runs sampled)
------------------------------------------------------------

Loading https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt ...
uv x 48,083 ops/sec ±0.08% (98 runs sampled)
utf-8-validate (5.0.5, C++) x 67,217 ops/sec ±0.11% (94 runs sampled)
utf-8-validate (simdutf, C++) x 883,553 ops/sec ±0.11% (100 runs sampled)
utf-8-validate (5.0.5, JS) x 39,180 ops/sec ±0.08% (98 runs sampled)
isutf8 x 39,953 ops/sec ±0.09% (97 runs sampled)
------------------------------------------------------------

Preparing 256B of random ASCII data
uv x 5,368,671 ops/sec ±0.14% (98 runs sampled)
utf-8-validate (5.0.5, C++) x 8,219,664 ops/sec ±0.08% (98 runs sampled)
utf-8-validate (simdutf, C++) x 4,609,830 ops/sec ±0.48% (93 runs sampled)
utf-8-validate (5.0.5, JS) x 3,033,199 ops/sec ±0.10% (95 runs sampled)
isutf8 x 3,000,818 ops/sec ±0.08% (99 runs sampled)
------------------------------------------------------------

Preparing 1KB of random ASCII data
uv x 1,391,086 ops/sec ±0.07% (97 runs sampled)
utf-8-validate (5.0.5, C++) x 5,043,617 ops/sec ±0.07% (94 runs sampled)
utf-8-validate (simdutf, C++) x 4,458,192 ops/sec ±0.42% (93 runs sampled)
utf-8-validate (5.0.5, JS) x 777,434 ops/sec ±0.08% (94 runs sampled)
isutf8 x 773,459 ops/sec ±0.08% (98 runs sampled)
------------------------------------------------------------

Preparing 64KB of random ASCII data
uv x 22,809 ops/sec ±0.07% (98 runs sampled)
utf-8-validate (5.0.5, C++) x 162,204 ops/sec ±0.07% (99 runs sampled)
utf-8-validate (simdutf, C++) x 1,193,138 ops/sec ±0.16% (98 runs sampled)
utf-8-validate (5.0.5, JS) x 12,569 ops/sec ±0.06% (100 runs sampled)
isutf8 x 12,549 ops/sec ±0.10% (99 runs sampled)
------------------------------------------------------------

Preparing 1MB of random ASCII data
uv x 1,428 ops/sec ±0.06% (98 runs sampled)
utf-8-validate (5.0.5, C++) x 10,423 ops/sec ±0.06% (98 runs sampled)
utf-8-validate (simdutf, C++) x 79,672 ops/sec ±0.62% (87 runs sampled)
utf-8-validate (5.0.5, JS) x 785 ops/sec ±0.08% (97 runs sampled)
isutf8 x 784 ops/sec ±0.09% (97 runs sampled)
------------------------------------------------------------

Preparing 4MB of random ASCII bytes
uv x 357 ops/sec ±0.07% (91 runs sampled)
utf-8-validate (5.0.5, C++) x 2,606 ops/sec ±0.13% (98 runs sampled)
utf-8-validate (simdutf, C++) x 8,026 ops/sec ±2.95% (82 runs sampled)
utf-8-validate (5.0.5, JS) x 196 ops/sec ±0.09% (90 runs sampled)
isutf8 x 196 ops/sec ±0.09% (90 runs sampled)
------------------------------------------------------------

Preparing all valid UTF-8 bytes ~4.17 MB
uv x 221 ops/sec ±0.07% (86 runs sampled)
utf-8-validate (5.0.5, C++) x 327 ops/sec ±0.16% (92 runs sampled)
utf-8-validate (simdutf, C++) x 3,220 ops/sec ±0.85% (92 runs sampled)
utf-8-validate (5.0.5, JS) x 147 ops/sec ±0.07% (84 runs sampled)
isutf8 x 146 ops/sec ±0.08% (83 runs sampled)

@lpinca
Copy link
Member Author

lpinca commented Aug 3, 2021

The https://github.com/simdutf/simdutf library includes a lot of features that we don't need. I wonder if it makes sense to fork it and remove everything but UTF-8 validation.

@lpinca
Copy link
Member Author

lpinca commented Dec 21, 2022

Here are new benchmarks using the latest version (v2.0.9) of simdutf:

$ npx envinfo --system

  System:
    OS: Linux 5.15 Ubuntu 22.04.1 LTS 22.04.1 LTS (Jammy Jellyfish)
    CPU: (4) x64 Intel(R) Xeon(R) E-2124G CPU @ 3.40GHz
    Memory: 7.04 GB / 7.68 GB
    Container: Yes
    Shell: 5.1.16 - /bin/bash
$ node bench.js
Loading https://en.wikipedia.org/wiki/Main_Page ...
uv x 20,043 ops/sec ±2.18% (90 runs sampled)
utf-8-validate (5.0.10, C++) x 150,863 ops/sec ±1.96% (88 runs sampled)
utf-8-validate (simdutf, C++) x 722,091 ops/sec ±2.27% (89 runs sampled)
utf-8-validate (5.0.10, JS) x 7,781 ops/sec ±2.63% (85 runs sampled)
isutf8 x 9,447 ops/sec ±2.23% (91 runs sampled)
------------------------------------------------------------

Loading https://ro.wikipedia.org/wiki/Pagina_principală ...
uv x 7,220 ops/sec ±2.27% (90 runs sampled)
utf-8-validate (5.0.10, C++) x 31,418 ops/sec ±2.30% (89 runs sampled)
utf-8-validate (simdutf, C++) x 141,224 ops/sec ±2.85% (83 runs sampled)
utf-8-validate (5.0.10, JS) x 3,899 ops/sec ±2.31% (89 runs sampled)
isutf8 x 4,883 ops/sec ±1.66% (93 runs sampled)
------------------------------------------------------------

Loading https://ru.wikipedia.org/wiki/Заглавная_страница ...
uv x 6,581 ops/sec ±2.36% (91 runs sampled)
utf-8-validate (5.0.10, C++) x 15,246 ops/sec ±1.98% (92 runs sampled)
utf-8-validate (simdutf, C++) x 161,621 ops/sec ±2.35% (87 runs sampled)
utf-8-validate (5.0.10, JS) x 4,239 ops/sec ±1.57% (93 runs sampled)
isutf8 x 5,000 ops/sec ±2.33% (87 runs sampled)
------------------------------------------------------------

Loading https://ar.wikipedia.org/wiki/الصفحة_الرئيسية ...
uv x 5,122 ops/sec ±2.18% (87 runs sampled)
utf-8-validate (5.0.10, C++) x 10,690 ops/sec ±3.26% (83 runs sampled)
utf-8-validate (simdutf, C++) x 135,217 ops/sec ±2.14% (89 runs sampled)
utf-8-validate (5.0.10, JS) x 3,387 ops/sec ±2.25% (86 runs sampled)
isutf8 x 4,217 ops/sec ±1.57% (89 runs sampled)
------------------------------------------------------------

Loading https://ja.wikipedia.org/wiki/メインページ ...
uv x 9,242 ops/sec ±2.19% (87 runs sampled)
utf-8-validate (5.0.10, C++) x 26,823 ops/sec ±1.64% (91 runs sampled)
utf-8-validate (simdutf, C++) x 206,865 ops/sec ±2.17% (92 runs sampled)
utf-8-validate (5.0.10, JS) x 5,301 ops/sec ±2.41% (86 runs sampled)
isutf8 x 6,262 ops/sec ±1.83% (88 runs sampled)
------------------------------------------------------------

Loading https://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt ...
uv x 36,083 ops/sec ±2.02% (89 runs sampled)
utf-8-validate (5.0.10, C++) x 61,133 ops/sec ±1.75% (90 runs sampled)
utf-8-validate (simdutf, C++) x 1,019,659 ops/sec ±2.35% (88 runs sampled)
utf-8-validate (5.0.10, JS) x 33,907 ops/sec ±2.40% (87 runs sampled)
isutf8 x 38,335 ops/sec ±1.92% (91 runs sampled)
------------------------------------------------------------

Preparing 256B of random ASCII data
uv x 6,135,673 ops/sec ±2.19% (87 runs sampled)
utf-8-validate (5.0.10, C++) x 19,079,916 ops/sec ±2.71% (84 runs sampled)
utf-8-validate (simdutf, C++) x 22,446,496 ops/sec ±2.51% (87 runs sampled)
utf-8-validate (5.0.10, JS) x 2,415,179 ops/sec ±3.14% (83 runs sampled)
isutf8 x 2,922,265 ops/sec ±2.47% (86 runs sampled)
------------------------------------------------------------

Preparing 1KB of random ASCII data
uv x 1,681,068 ops/sec ±2.39% (94 runs sampled)
utf-8-validate (5.0.10, C++) x 9,660,694 ops/sec ±1.82% (90 runs sampled)
utf-8-validate (simdutf, C++) x 20,633,874 ops/sec ±1.55% (94 runs sampled)
utf-8-validate (5.0.10, JS) x 639,590 ops/sec ±2.32% (89 runs sampled)
isutf8 x 750,446 ops/sec ±2.40% (89 runs sampled)
------------------------------------------------------------

Preparing 64KB of random ASCII data
uv x 27,896 ops/sec ±1.76% (91 runs sampled)
utf-8-validate (5.0.10, C++) x 237,286 ops/sec ±2.28% (87 runs sampled)
utf-8-validate (simdutf, C++) x 1,432,517 ops/sec ±2.92% (85 runs sampled)
utf-8-validate (5.0.10, JS) x 10,386 ops/sec ±1.73% (92 runs sampled)
isutf8 x 11,860 ops/sec ±2.83% (83 runs sampled)
------------------------------------------------------------

Preparing 1MB of random ASCII data
uv x 1,705 ops/sec ±2.07% (89 runs sampled)
utf-8-validate (5.0.10, C++) x 15,098 ops/sec ±1.94% (89 runs sampled)
utf-8-validate (simdutf, C++) x 59,663 ops/sec ±2.41% (84 runs sampled)
utf-8-validate (5.0.10, JS) x 663 ops/sec ±1.28% (91 runs sampled)
isutf8 x 778 ops/sec ±2.10% (90 runs sampled)
------------------------------------------------------------

Preparing 4MB of random ASCII bytes
uv x 438 ops/sec ±1.47% (88 runs sampled)
utf-8-validate (5.0.10, C++) x 3,581 ops/sec ±2.92% (81 runs sampled)
utf-8-validate (simdutf, C++) x 15,316 ops/sec ±2.06% (88 runs sampled)
utf-8-validate (5.0.10, JS) x 165 ops/sec ±0.83% (83 runs sampled)
isutf8 x 198 ops/sec ±1.08% (81 runs sampled)
------------------------------------------------------------

Preparing all valid UTF-8 bytes ~4.17 MB
uv x 132 ops/sec ±0.97% (82 runs sampled)
utf-8-validate (5.0.10, C++) x 251 ops/sec ±1.54% (82 runs sampled)
utf-8-validate (simdutf, C++) x 2,947 ops/sec ±1.82% (91 runs sampled)
utf-8-validate (5.0.10, JS) x 157 ops/sec ±0.99% (77 runs sampled)
isutf8 x 160 ops/sec ±1.02% (80 runs sampled)

@lpinca
Copy link
Member Author

lpinca commented Jan 3, 2023

Superseded by #109.

@lpinca lpinca closed this Jan 3, 2023
@lpinca lpinca deleted the use/simdutf branch January 3, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant