Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make parser 70x times faster? #13

Open
rkeytacked opened this issue Sep 8, 2023 · 3 comments
Open

make parser 70x times faster? #13

rkeytacked opened this issue Sep 8, 2023 · 3 comments

Comments

@rkeytacked
Copy link

Hi, I am working some time now on a really fast ISBN parser/formatter in Java. And for this Java library (https://github.com/creativecouple/isbn-validation-java/) I recently found a way to speed-up the parsing throughput from 6k to 13k ops/milliseconds (meaning just 75 nanoseconds per parse operation).

When trying out this approach for other programming languages, I found your isbn3 NPM library.
Your benchmark script was not able to measure that tiny amount of time correctly, so I put a 1000x loop around the parsing like this:

for (let i=0; i<1000; i++) {
  const data = isbns.map(isbn => parse(isbn))
}

My question is: Are you interested in rewriting your parsing engine?
Otherwise I would try to create a new npm package with that approach.

I compared your latest version of isbn3 to the old npm lib isbn and then my temporary prototype using either of these different imports:

const { parse } = require('..') // your latest version from Github
const { ISBN: { parse } } = require('isbn') // using npm i --no-save isbn (7 years old!!)
const { parse } = require('isbn-validation-js') // using npm i --no-save git://github.com/creativecouple/isbn-validation-java#tmp-javascript-version

This is the result on my machine with these three approaches:

ISBN3

$ npm run benchmark

load module: 5.498ms
parsed 1000x 5640 non-hyphenated ISBNs in: 1:06.333 (m:ss.mmm)
ISBN 978-0-00-443799-6 Group Name English language

ISBN (7 years old!!)

$ npm run benchmark

load module: 1.033ms
parsed 1000x 5640 non-hyphenated ISBNs in: 10.214s
ISBN 978-0-00-443799-6 Group Name English speaking area

my prototype from https://github.com/creativecouple/isbn-validation-java/tree/tmp-javascript-version

$ npm run benchmark

load module: 1.425ms
parsed 1000x 5640 non-hyphenated ISBNs in: 975.009ms
ISBN 978-0-00-443799-6 Group Name English language

So the old isbn package is still faster than your current version, but as you see it is possible to go sub-second for 5,640,000 parsing operations.

@maxlath
Copy link
Member

maxlath commented Sep 11, 2023

Hi!

So for the back-story, this module was forked from isbn2 as I attempted to better understand how it was working. It probably lost some perf optimizations on the way, but was more readable to me like this, and thus easier to maintain and keep open for future changes.

That said, performance boost that don't hinder the readability/maintainability are very welcome! Could you submit a PR with your new approach, ideally without breaking changes, and with some comments on how that works? That big hyphens array is quite cryptic ^^'

@rkeytacked
Copy link
Author

Sure thing, I can try to come up with a PR that won't break the API.

Just one question for my understanding of API compatibility: Does the outcome of the parse operation have to be a plain JS object with all the fields? Or can it be a class object with getter properties instead?
(e.g. let isbn = parse("9781234567890") then isbn.group could be a getter only, which in my case increased the speed of the parse method)

@maxlath
Copy link
Member

maxlath commented Sep 13, 2023

Returning a class object with getters would be a breaking change I think: for instance, passing that object to JSON.stringifiy will ignore the getters. There might be other scenario were that would introduce breaking behaviors, but I'm not so familiar with class objects as I rarely work with them. So I think it would be better to stick to plain JS objects for this PR. I would expect the biggest perf boost to rather come from the optimized hyphen lookup, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants