-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up parsejson 3.25x (with a gcc-10.2 PGO build) on number heavy input #16055
Closed
Closed
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
0e24c5e
Speed up parsejson 3.25x (with a gcc-10.2 PGO build) on number heavy …
c-blake 6cb3375
Make the default mode quite a bit faster with setLen+copyMem instead of
c-blake 46d5f21
NimScript uses `json` and thus `parsejson`, but does not support `cop…
c-blake ad4707c
Gack. I guess one needs defined(nimscript) not nimvm.
c-blake b802eba
Ok. This jsOrVmBlock works for `streams` so it should work here.
c-blake 0e66b44
Fix empty container problem.
c-blake 05a7350
Have `getInt` & `getFloat` use the (also exported) always populated f…
c-blake 136fe92
Fix assertion; `kind = jsonError` here.
c-blake 7fd9287
These two spots should always have used the `get(Int|Float)` abstract…
c-blake 5b98112
Fix round-trip in test being of by just the least significant bit.
c-blake 61f39cd
Fix integer overflow errors. 18 digits is still plenty (IEEE 64-bit
c-blake 822d91c
Oops..fix comment, too.
c-blake 88d9845
This may pass all tests, but it is preliminary to see if, in fact,
c-blake 7a076c2
Remove token iterators as requested by
c-blake 24192b5
Replace public fields with accessors updating comments a bit.
c-blake e8c234f
Oops - correct ``isGiant`` implementation to do what was intended.
c-blake File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not the proc kinda too big to be
{.inline.}
?. 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. In the by far most common case it is just one line (two almost always perfectly predicted compare-jumps and one array reference).
It is only used just once, for human mental abstraction. So, there is no duplicated generated code.
Also, people are too sparing of
{.inline.}
in my view. Function call overhead is like 12..15 cycles * 3..6way superscalar =~ 36..90 dynamic instruction slots. So, if duplication is not a concern, that is the the rough amount of work to compare to.If you were really going to browbeat me over inline then you should complaining about
parseNumber
being inline, notpow10
. ;-) {EDIT: oh, wait, that's what you were doing! Sorry! } But many numbers to parse might be just single-digits which take only a few cycles. In fact, the more numbers you have the more likely many are single digits. So, the function call overhead would then be quite large compared to the work done.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing inline buys you here is putting
parseNumber
C code in the same translation unit as the parsing itself. So, the C compilers metrics/heuristics for inlining can at least be applied. Otherwise you rely on LTO which a lot of people do not enable. Given that this whole bit of work was to get really fast speed, it seems appropriate for that reason as well as the "many single digit" cases. If you do PGO builds the compiler will actually try to measure if inlining is good or not. According to the GCC guys that is the main benefit of PGO. I am always getting 1.5x to 2.0x speed ups from PGO with Nim which suggests to me that our{.inline.}
choices are not generally so great (or at least a few critical choices are not...).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and also
parseNumber
is also only called in one place ingetTok
. So, still no extra L1 I-cache style resource hit, and I also added{.inline.}
ongetTok
to get the meat of all this logic into the translation unit of whatever is using the json parser.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Nim's
{.inline.}
just marks it in the generated C asinline
. So, it's still up to C/C++ compiler code metrics what ultimately happens. Those surely do take into account the statically observable number of expansions/substitutions and their sizes. So, the many calls togetTok
fromjson
are not a real problem. But if anyone really wants me to remove them I can. I always use (and recommend using) PGO when performance matters, myself.