-
Notifications
You must be signed in to change notification settings - Fork 59
About the performance #38
Comments
Not sure I will get a chance to look at this in the coming days (perhaps even weeks), but here is what I suggest to find the cause. This is either a recent commit, or the newest version of node. Perhaps both. To test the code: To test node: If someone else has a chance to get to this before, the more eyes the better. |
also: #26 |
Just looked at this, and I am going to need a more rich testing environment where I can easily run older versions of node and compile older versions of msgpack against it. Does anyone have a node+msgpack version where bench.js runs fast? |
lame :( I see this as well |
I merged a pull request into master that uses a O(1) heuristic (based on native JS msgpack implementation) that uses 512 depth check to check for cycles. This sped things up quite a bit, but the native JSON stuff is still almost twice as fast. We still have more work to do. I'm doing a few more things then I will version bump msgpack to 0.2.x |
Here are the new ratios: msgpack pack: 3398 ms msgpack pack: 3549 ms msgpack pack: 3352 ms |
looks like the node JSON code is still ~4 times as fast. |
@godsflaw I'm looking to implement msgpack in the near term. Anything I can help with this? Do you have specific optimizations in mind that havn't got time to implement? |
@chakrit, First I wanted to say, this entire discussion is about the performance of node-msgpack in relation to the v8's JSON serialization functions. Most of my tests have been with 1 million objects, and while they are slower than the JSON serialization functions, they are still pretty fast. I have to do around 40 million object packs a day, which means that I spend 196 seconds packing objects each day, where I could spend ~40 seconds if I used JSON. We are clearly in the domain of diminishing returns for my use case, not to mention it is a MUCH more compact representation going across the wire. I have not even cracked open the unpack() routine to see if there is low hanging fruit there, it might be worth a look. In pack(), there are really two cases:
In case 1, we pay a very large penalty when we cross the boundary between node.js's javascript context, and v8's C++ context. I looked at the node.js JSON implementation, and AFAICT, it is all native JS. This was very surprising at first, but I've come to the conclusion that it has to be the memory allocations we perform when starting a pack(). I gained us some performance here by caching previous allocations, but short of allocating one large memory pool and performing a 0-copy (in place) pack, I don't see any other obvious optimizations. When I looked at msgpack-c I didn't see an obvious 0-copy interface either, but I didn't look too hard. In case 2, pack() runs twice as fast on a very large object. This is because we cross into C++ once, and then stay in that context for the entire pack() run.
As you can see above, pack does real well; however, unpack() on an object of this size is brutally slow, arguably making those gains in pack() useless (at least when unpacking the resulting object in node.js). They were so bad I removed that test from the benchmark script to stop user's dying of boredom waiting for it to complete. If you, or anyone else, would like to take a shot at improving performance, you can start with the benchmark test.
Good Luck! |
@godsflaw thanks for the extensive post! : ) |
just to document some of the performance testing. I focused on parsing messagepack and simple example I took an Array with 1 mil. times the String "Hello Welt!" and processed this 20 times. Handle<Value> CreateObject(const Arguments& args) {
int i;
const int max = 1000000;
HandleScope scope;
Local<Array> a = Array::New(max);
for( i=0;i<max;i++) a->Set(i,String::New("Hallo Welt!",11));
return scope.Close(a);
} However I found that this almost eats up all the time, just for creating String's and placing them into an array. As node.js beginner I had always the mindset, that IO is the expensive operation, but reading a file is thanks to SSD's and large buffers less than 5-10%. I checked JSON.Parse source.... Hope this is the right place: While the parsing surely takes a little more time compared to messagepack, this seems to be quite efficient programmed, so overall I have the feeling that this makes very little difference. However Google's engineers use a Factory-Class to generate all the Strings/Objects/Array's etc. that is in namespace V8::internal. Not sure this should be accessed by AddOn's. At least in new release from V8 I also found other factory methods, e.g. in node 0.11.x like NewFromUtf8. To check a little more I wrote my own minimal messagepack parser (just supporting Array and String :-) ) in Node-JS-Javascript using a NodeJS Buffer and this is almost on par with the C++ implementation of this Addon (a little slower running a 20 executions of parsing 1m strings). I run in several very odd behavior of V8 Factoring objects, like here: Just to show how slow this is I checked the message pack implementation for Ruby ( Github: https://github.com/msgpack/msgpack-ruby ) and performed exactly the same test. This is parsing the same number of entries (20 times 1mio) in less than 1000ms (over 4 times faster). This is a C-Extension as well, but illustrates that there is a lot of room for improvement. function process_array(buffer,length) {
// http://stackoverflow.com/questions/16961838/working-with-arrays-in-v8-performance-issue
var result = new Array(Math.min(99999,length));
// var result = [];
while (length--) //result[result.length] = parse(buffer);
result.push(parse(buffer));
return result;
}
function parse(buffer){
var format = buffer.readUInt8(position);
if ( format >= 0xa0 && format <= 0xbf ) {
var length = format - 0xa0;
format = 0xa0;
}
if ( format >= 0x90 && format <= 0x9f ) {
length = format - 0x90;
format = 0x90;
}
switch (format) {
case 0xa0:
position++;
var str = buffer.toString('utf8',position,position + length);
position += length;
return str;
case 0xdc:
length = buf.readUInt16BE(position + 1);
position +=3;
return process_array(buffer,length);
case 0xdd:
length = buf.readUInt32BE(position + 1);
position +=5;
return process_array(buffer,length);
case 0x90:
position++;
var result = [];
while (length--) {
result.push(parse(buffer));
}
return result;
break;
default: throw("Error: Messagepack format "+ format.toString(16) + " not yet implemented!");
}
}; |
I've had a very hard time getting things optimised in node.js with C++ and generally similar experiences to the above. I suspect some of it might be down to lack of understanding about the C++, software architecture and API on my part however some of it may also be because of a lot of overhead with the API and lack of direct risky but profitable internal access. I find it hard to even make it faster with larger data objects. For example I implemented a structured data analyser in JS originally written in PHP then tried to port it to C++ (small and big to measure overhead): js/c++ small took 7.28s By comparison, the PHP version looked like this: php/c small took 2.0803050994873s This is not msgpack but something involving a lot of Object/Primitive creation as well as Object iteration. It makes me think that maybe there are trickier issues to consider with the way V8's plugin system is designed. I am also sure there is definitely a lot more room for improvement with node.js native implementations. It's really hard to even make things a little bit faster, perhaps the API is too high level. |
I just run the bench.js with
msgpack-js
, the results is different fromREADME.md
.JSON is much faster than magpack.
node version is v0.8.16.
The text was updated successfully, but these errors were encountered: