-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A cornucopia of optimisations #98
base: legacy
Are you sure you want to change the base?
Conversation
Instead of: - function call - conditional check for alphabet existence - dictionary lookup - dictionary lookup ... the new version only does one dictionary lookup. Should lower overhead.
…s tweaks To avoid creating a string, we introduce two new functions, `compressToArray` and `decompressFromArray`. The former can easily be wrapped by `compress` to minimize code duplication. Sadly `decompressFromArray` isn't wrapped so easily. To alleviate this somewhat, we introduce the private `_chrToOutput` function. Note that it's <500 chars, before minification, meaning that v8 will likely inline it anyway.
Slightly faster on most browsers.
- compressToBase64 avoids string append - compressToUint8Array avoids intermediate string - decompressFromUint8Array avoids intermediate string As a side-effect, the UTF16 functions get slightly more complicated, but the only alternative was a *lot* of code duplication.
Some benchmarks: plain compress/decompresss: base64: UTF16: Uri: Uint8: I'd say performance is.. probably a bit better, but also almost negligible within the margin or error. Given the changes to the code, there must be less allocation going on, so that must have improved a bit at least. I had them all in one big benchmark, but then I tried to profile them using DevTools. Those tools then suggested that Please check if the input data is relevant - I basically took the test string (the one about tattoo's) and tripled it, causing a lot of redundancy. |
I was recently optimising some startup code in an app, and found that the GC could actually have significant impact on things (up to 200ms for 16mb) - but as it's not something easily forced it's hard to get consistent profiling to include that unless running a significantly longer test including TL/DR: Reducing the need for GC is always good ;-) |
The main issue is that it's not easy to see if it's significant in the larger scheme of things - perhaps all computation and memory allocation happens in the core algorithm that I didn't touch. I think I should set up a plain website with this code, run it through both functions (let's just use the plain compress/decompress as a baseline) with a yuuuge string repeatedly, while running the browser performance tools to keep track of allocations and such. That might give a better overview. |
Deoptimization is triggered by various patterns (some links: Deoptimization in V8, Optimization-killers). In your tests, |
Hmm, time to dive into this a little deeper and see if we can fix that. Still on sick leave anyway so this is a fun side-track :) |
The thing I was referring to is that compress is passed a function, which itself is usually a closure. I suspect that level of dynamism prevents some deeper optimisations. This is pretty old, so probably somewhat outdated, but I'll look into it later: http://mrale.ph/blog/2012/09/23/grokking-v8-closures-for-fun.html |
Ah, indeed, no deopt on any of the functions if I test in devtools manually via copypaste, a primitive 100-rep loop, and console.time. The difference is negligible just the same. FWIW when switching to ES6 Map and Set in lz-string I clearly see the advertised 2x gain. |
Related to that, I just realised there's another optimisation possible for Base64 and URI-safe compressors: don't use // SETUP
var base64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
var charDict = (function (){
var dict = {};
for (var i = 0; i < base64.length; i++){
dict[base64.charAt(i)] = i;
}
return dict;
})();
var charCodeDict = (function () {
var dict = {};
for (var i = 0; i < base64.length; i++){
dict[base64.charCodeAt(i)] = i;
}
return dict;
})();
var randomString = (function (){
var i = 100000, strArray = new Array(i);
while(i--){
strArray[i] = base64.charAt(Math.random(base64.length)|0);
}
return strArray.join('');
})();
var converted = 0;
//BENCHMARKS
// charDict
for (var i = 0; i < randomString.length; i++){
converted = charDict[randomString.charAt(i)];
}
// charCodeDict
for (var i = 0; i < randomString.length; i++){
converted = charCodeDict[randomString.charCodeAt(i)];
} http://jsbench.github.io/#a5c234621b81cd41b26e31d7f92f62d4 On my machines integer key lookups are almost 3x faster. For Base64 and URI this is a drop-in replacement because I'll look up the LZW algorithm, read the current implementation of it, and see if there are more places where we can use |
@tophf: can you show me what the faster
Anyway, small correction: for Basae64 and URI decompression this is a drop-in replacement. So looking into this a bit further, there are two different ways we can apply this change: Returning an array of
|
https://gist.github.com/tophf/e8962e43efe35233212cf04d8d7cd317 2x speedup compared to the nonminified original version as tested on compressToUTF16/decompressFromUTF16 in modern versions of Chrome. Didn't test other functions nor other browsers. The measurements were performed in real code that compressed a megabyte or so of HTML.
The stack is still used for the arguments passed via .apply so anything larger than 32k is a risk depending on the browser. And even that is a risk so I usually do in 1k chunks, then join them. |
Thanks, pretty straightforward. I guess the reason they're not faster in my benchmarks is because I'm limiting myself to 64 single-character keys. Still, the (I wonder @pieroxy is on holiday or something, and will come back thinking "I look away for one second and then this happens?") |
Ooof... that sounds like performance would fluctuate wildly among browsers and hardware. I'll focus on the other possible optimisations first. |
Some more thoughts on that de-opting: let factory = function(){
return function(){
return 0;
}
};
let a = factory();
let b = factory();
let c = a;
a === b; //false
a === c; //true Right now we pass new functions on every call to any compressor/decompressor. If we hoist that we can maybe make things easier for the JIT compilers. However, current set-up requires it because it's //decompress from uint8array (UCS-2 big endian format)
decompressFromUint8Array:function (compressed) {
if (compressed===null || compressed===undefined){
return LZString.decompressFromArray(compressed);
} else if (compressed.length == 0){
return null;
}
return LZString._decompress(compressed.length, 128, function (index) { return compressed[index]; });
}, That would require a bit of rewriting too. |
First of all thanks for all the work here. I am eager to see the end result :-) Just as a side note, be careful when you advertise a x2 increase. It may be so in your browser (lets say Chrome) but then the same code might actually be slower in I.E. or Firefox. When I did a pass at perf optimisations back in the days I probably created 25 jsperf benchmarks and had many surprises on that front. That said, all this looks promising. |
Right, I should have been more clear that I when I talked about 2x/3x speed-ups was only referring to the micro-benchmarks on (effectively) string vs integer lookup, which is known to be a relatively recent optimisation in browser engines. It doesn't reflect on the whole algorithm or performance across all browsers. OTOH, the expected lower memory overhead of removing string concatenation ( At the moment I'm still trying to decipher what you do in the algorithm - specifically lines 137 - 204. It's barely commented, so I only have your high-level description from the main page to go by
(once it clicks for me I'm going to add a modified version of the paragraph below that a documentation comments in the code for later maintainers) |
So I figured out a simple way to do in-browser line-by-line performance testing: inline First bottleneck I found: -1 and -2 indexing is slow after all: Then I realised "Hey, why not just use a +2 offset on Again: I didn't refresh the same number of times, so you can't compare screenshots directly. Now, So then I tried the latest version vs the old version on JSBench, with the profiler on to measure memory use, and using a tripled 'During tattooing...' test string from the tests. Can you spot the point where it goes from old to new in graph? Anyway, moving on to the |
So just to add to pieroxy's remark that this is complicated, I did some in-between benchmarking on Chrome and Firefox. I'm using all the test strings except Hello World, so:
http://jsbench.github.io/#e25c445d987295b0114407e457dde9ad EDIT: Split out the short string case, because it was distorting the graph by being one to two orders of magnitude faster than the rest On Chrome, the picture is complicated: the new code is slower in quite a few situations, but for long strings with some repetition (1000 random floats) or lots of repetition ('aaa..') it's faster. On Firefox the picture is a lot rosier, not to mention generally a lot more performant: it's always better, and gets better as strings have more repetition and get longer. The UTF16 example is actually really revealing, since it is mostly about overhead of filling up a dictionary. I really hope I can optimise it further so it has either zero regressions in Chrome, or only in insignificant contexts (like really short strings). EDIT: For mobile, the improvement is a lot clearer: Chrome Mobile for Android, everything is faster: (I'm using an autostitcher to put these screenshots together, hence some of the weirdness) On Firefox Mobile for Android, short strings are slower, everything else is faster: |
Yes, this is pre- and post-fix increment/decrement abuse. But we're being low level enough here that I think we can handle it.
Replace StringStream object with global variables. Optimize arrays generation. About 5% smaller minified file. About twice faster in compressing large files.
Rewrite for smaller size.
Indeed, as I have suggested. Minified version is much faster (at least in Chrome). Well done. |
This progress is amazing! I'm using export const pack = str => {
// ...
// Compress it only for relatively small strings, since compression is O(N)
// so it takes too long for large strings
if (str.length < 100 * 1000) {
str = lz + LZString.compressToUTF16(str);
}
return str;
}; |
@franciscop: for the record, I wrote an async version - it's slower but non-blocking |
That's pretty cool. Unfortunately that would imply a full rewrite of my library and changing the API radically, since this would no longer be possible: import { local } from 'brownies';
local.id = 10;
console.log(local.id); I'm using Proxy internally for that setter, and a setter cannot be awaited... (I could set a local cache and do sync on the background, but that's a totally different can of worms). |
Any idea when this going to be merged to master and a 2.0 release available? |
Use ASM.JS integer optimizations to further increase speed by a few percentage points. Both Chrome and Firefox have considerations for integer optimizations. Observe the benchmark below. requestIdleCallback(function(){
for (var i=0; i<16777216; i=i+1|0) void 0; // warm up
const console = window.console;
function runWithoutSpeed(){
console.time("SlowerVersion");
for (var i=-4194304,j=4194304; i < j; i+=2, j-=2) void i + j;
console.timeEnd("SlowerVersion");
}
function runOptimizedTest() {
console.time("OptimizedTest");
for (var i=-4194304,j=4194304; i < j; i=i+2|0, j=j-2|0) void (i+j|0);
console.timeEnd("OptimizedTest");
}
runOptimizedTest();
runWithoutSpeed();
}, {timeout: 11}); To do integer optimizations, you must explicitly cast the number to an integer after every pass via function arguments, every addition/subtraction chain, every division. Also, avoid multiplication wherever possible. Use bit-shifts instead. |
Output in my Chrome -
|
Output in my Chrome -
Output in my Firefox:
Indeed, integer optimizations do make Javascript faster. |
That's not really I actually was trying to clean up the PR (locally, on my machine) a month ago, and finally got around to implementing an unsafe variant that made use of typed arrays to see if that was even faster. It was: However, in the process I discovered another bug in the new version, and I haven't managed to fix it in the meantime. |
"Proper" asm.js code is impossible to write for these purposes because we are working with strings which are foreign to asm.js. 👍 Nevertheless, integer optimizations do exist. I am very glad to hear that the Uint32Array version is much faster, but I know that it could be a lot faster if we put integer optimizations into both the loops and variables surrounding the Uint32Arrays. Although integer optimizations have rather little consequence on loops, their effect is much more pronounced on typed arrays. For example, If you could please post your LZStringUnsafe32, I could take a look at it for bugs and optimizations. Please explain to me the bug that you are having and please at least give me a chance to take you on a tour of the dark side of Javascript numbers. |
I take it that this PR isn't getting merged in anytime soon, but is the code in JobLeonard/lz-string safe to download and use directly? Or should I stick with the version on npm/main branch here? |
Yeah, sorry, the truth is that I'm just terrified of breaking something because I never published a package to NPM before and this is a package that still gets used in tons of places. But that isn't really about the code. In that regard it passes all existing tests, should be perfectly backwards compatible, and Stylus uses it in their code-base (or at least used to, I don't know if they still do) and afaik it hasn't given them any trouble. |
Maybe set it as version 2, make it clear that it has potential to be be a breaking change? Or just publish it under @JobLeonard/lz-string? |
Releasing as a major version upgrade should be enough to let everyone know that there MAY be BCs. Semantic versioning and all that. |
If everyone is happy for this to be released then I can do that as v2.0.0 |
With regard to the minification process .. this doesn't seem to be documented anywhere. Should we not be doing that as part of a release process? Normally accepting compressed files without ensuring they are based upon uncompressed ones is a bad idea. |
I think I just threw it through a minifier, but yeah, that probably should be standardized somehow |
There is a PR on here that relates to more pipeline work. Ideally someone with JS skills should come and look at it. I'm not that person. I work as part of the team that uses this library. |
Back in the days I threw the main JS file through |
I feel that the release process should be a different PR to this - also that this should be merged, then followed by an almost immediate conversion to Typescript in another PR (straight conversion, no code changes - so not updating #123) - using perhaps https://www.npmjs.com/package/microbundle (I used to use TSDX, but that's been abandoned for a couple of years) - then a further PR to add CI (Travis or Github actions?) test build and release :-) |
There are a lot of good work on this PR! merge this and release a v2 should be enough to don't break anything 🙂 |
This combines various suggestions from @Rycochet and @lishid and myself. See #88 and #97.
keyStr.charAt(i)
with array-basedkeyStr[i]
compressToArray
anddecompressFromArray
functions, and use them throughout the code to avoid intermediate string creation:compressToBase64
avoids appending to stringcompressToUint8Array
avoids an intermediate stringdecompressFromUint8Array
avoids an intermediate stringSimilar optimisations have been applied to
base64-string.js
, but it doesn't have tests so I can't say for sure if it's bug-free.