Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init benchmark tests #10

Merged
merged 8 commits into from
May 29, 2018

Conversation

ZhouyihaiDing
Copy link
Contributor

@ZhouyihaiDing ZhouyihaiDing commented Apr 27, 2018

It only has LoggingClient for now.
For Logging API tests, it has 3 tests: send gRPC rpc, send REST rpc, send gRPC/REST rpc crossly.

run_test.sh may later be changed to run_test.py with arguments like --benchmark_time, --warm_up_time, project_id, database_id when spanner/bigtable benchmarks are added.

@tmatsuo , please have a review at this PR. Thank you!

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented Apr 30, 2018

Since the network condition varies a lot from time to time, the result for cross tests makes more sense.
Below are the result for gRPC v1.11.0(install by pecl) @wenbozhu
gRPC test:

gRPC transport qps: 4.425
gRPC latency for first RPC:1.0586969852448
stats result for gRPC
Ave latency :0.22687726640432
50% latency :0.21439814567566
80% latency :0.24291396141052
90% latency :0.29322695732117
99% latency :0.47512602806091

REST test:

rest transport qps: 4.425
rest latency for first RPC:0.90933418273926
stats result for REST
Ave latency :0.22676274466649
50% latency :0.22206497192383
80% latency :0.24648594856262
90% latency :0.26881194114685
99% latency :0.34884285926819

Cross test:

gRPC transport qps: 4.434249353681
gRPC latency for first RPC:0.80208897590637
stats result for gRPC
Ave latency :0.22551731313213
50% latency :0.21493792533875
80% latency :0.24008893966675
90% latency :0.27039813995361
99% latency :0.47210907936096

rest transport qps: 4.4087188720263
rest latency for first RPC:1.0721271038055
stats result for REST
Ave latency :0.22682326295403
50% latency :0.22026300430298
80% latency :0.24400186538696
90% latency :0.26301503181458
99% latency :0.41032218933105

For gRPC v1.12.0-dev(current master, build from the source)

gRPC transport qps: 7.9795510602788
gRPC latency for first RPC:1.0354900360107
stats result for gRPC
Ave latency :0.12532033349318
50% latency :0.11441683769226
80% latency :0.14502501487732
90% latency :0.1809549331665
99% latency :0.32968306541443

rest transport qps: 7.4318673503476
rest latency for first RPC:0.94629192352295
stats result for REST
Ave latency :0.13455568471001
50% latency :0.11412882804871
80% latency :0.14315915107727
90% latency :0.20014405250549
99% latency :0.4290292263031

@ZhouyihaiDing ZhouyihaiDing force-pushed the benchmark branch 3 times, most recently from 4096ca3 to e18c8f8 Compare April 30, 2018 22:44
@tmatsuo
Copy link

tmatsuo commented Apr 30, 2018

@ZhouyihaiDing Thanks for this! I'll take a look today or likely tomorrow!

@ZhouyihaiDing ZhouyihaiDing force-pushed the benchmark branch 2 times, most recently from 7df526b to 37e7dd1 Compare May 1, 2018 01:06
@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented May 1, 2018

I also get the latency result for the first RPC under PHP-FPM mode.

gRPC
Ave latency :0.26756258964539
50% latency :0.24290800094604
80% latency :0.28287696838379
90% latency :0.32798409461975
99% latency :1.1715950965881
REST
Ave latency :0.31379820585251
50% latency :0.29318499565125
80% latency :0.33797407150269
90% latency :0.42953395843506
99% latency :0.56754994392395

first RPC under PHP-CLI mode:

gRPC
Ave latency :0.48415014028549
50% latency :0.43862795829773
80% latency :0.6188747882843
90% latency :0.66285610198975
99% latency :0.96738314628601
REST
Ave latency :0.48467254638672
50% latency :0.45104002952576
80% latency :0.60308599472046
90% latency :0.64896297454834
99% latency :0.76410698890686

Copy link

@tmatsuo tmatsuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add numbers for the end to end latency from initializing the client to the actual RPC?

This is typical to PHP, but in PHP, everything starts from scratch for every requests, so that number matters more.

@ZhouyihaiDing
Copy link
Contributor Author

First RPC latency including create Client:
PHP-FPM

stats result for gRPC
Ave latency :0.25985748529434
50% latency :0.24248814582825
80% latency :0.27608799934387
90% latency :0.35205411911011
99% latency :0.63452410697937

stats result for REST
Ave latency :0.32368396997452
50% latency :0.29551482200623
80% latency :0.35474896430969
90% latency :0.45385408401489
99% latency :0.77281308174133

CLI

stats result for gRPC
Ave latency :0.57939051866531
50% latency :0.56095004081726
80% latency :0.68756103515625
90% latency :0.71235203742981
99% latency :0.7827308177948

stats result for REST
Ave latency :0.58826038360596
50% latency :0.62253403663635
80% latency :0.6954550743103
90% latency :0.71564102172852
99% latency :1.8141598701477

@@ -0,0 +1,5 @@
{
"require": {
"google/cloud": "^0.56.0"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have these requirements?

  "ext-json": "*",
  "ext-protobuf": "*"

or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@tmatsuo
Copy link

tmatsuo commented May 1, 2018

@ZhouyihaiDing Thanks! Give me one more day

@tmatsuo
Copy link

tmatsuo commented May 1, 2018

Oh, does it also make sense to batch 1000 messages?

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented May 2, 2018

Is that the same as increasing the payload from 1 Byte to ~1k Bytes?

For measuring the RPC latency/QPS, ($start_time - $end_time) stands for the time diff for a ping-pong. If change the batch to 1000, 999 out of 1000, ($start_time - $end_time) stands for the time for preparing the log and caching it. It's hard to get stats data for latency and QPS.

In addition, we already involve the factor difference which REST uses json while gRPC uses protobuf. If batch 1000 logs, each RPC actually contains time preparing 1000 times of data processing. Eg, if the protobuf is much slower than json, it will enlarge the third party factor by 1000 times, which leads to the incorrect latency/qps result.

I comment this line and do the tests for 10k payload:
so that the tests only prepare the data instead of sending RPCs, and it appears that json is as twice as faster than protobuf. Thus we shouldn't involve too much this variance into the transport benchmark.

Protobuf qps: 25310.447188426
stats result for Protobuf
Ave latency :3.9509376999758E-5
50% latency :1.3828277587891E-5
80% latency :1.4066696166992E-5
90% latency :1.4066696166992E-5
99% latency :2.598762512207E-5

Json qps: 55293.914922503
stats result for Json
Ave latency :1.808517268856E-5
50% latency :1.3828277587891E-5
80% latency :1.4066696166992E-5
90% latency :1.4066696166992E-5
99% latency :2.598762512207E-5
count: 344344

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented May 2, 2018

Hi @tmatsuo ,
The comment above is using the same string as payload, seems entries created by gRPC client will be cached for rest client. When I set the payload to be different, Protobuf and Json are the same, which makes sense because that line hasn't creating the protobuf object.
I also tried to comment this 2 lines:
Grpc
Rest

Result for 10k payload:

**protobuf** qps: 4938.947125521
stats result for protobuf
Ave latency :0.00020247230322283
50% latency :6.1988830566406E-5
80% latency :9.608268737793E-5
90% latency :0.00011706352233887
99% latency :0.0096359252929688

**json** transport qps: 14072.566916896
stats result for json
Ave latency :7.1060241241375E-5
50% latency :6.1988830566406E-5
80% latency :9.6797943115234E-5
90% latency :0.00011706352233887
99% latency :0.00021481513977051

By commenting this 2 lines, I think the only difference for two client is that gRPC client will create a protobuf object. Average latency for protobuf and json are 0.00013141206s, which means batching 1000 times, the latency diff cost by creating protobuf object will be 130ms.

Result for 1 Byte payload:

**protobuf** transport qps: 15791.498288918
stats result for protobuf
Ave latency :6.3325213460067E-5
50% latency :1.2874603271484E-5
80% latency :1.3113021850586E-5
90% latency :1.3828277587891E-5
99% latency :0.00088620185852051

**json** transport qps: 72578.41040855
stats result for json
Ave latency :1.3778202007607E-5
50% latency :1.215934753418E-5
80% latency :1.3113021850586E-5
90% latency :1.3113021850586E-5
99% latency :6.6041946411133E-5

Time diff for creating the protobuf project for 1000 batches 49.55ms

Since you are more familiar with google-cloud-php, can you provide some suggestions about comparing the performance with data preparing without sending RPCs?

And still I think set batch size to 1 can measuring the latency/RPCs more precisely considering the factor above because only takes ~50us difference for each RPC.

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented May 2, 2018

Below are logs per second(lps) instead of query per second(qps):

Batchsize = 1000

gRPC lps: 5280.5927862991
Rest lps: 5931.834876035

Batchsize = 100

gRPC lps: 997.47003935039
Rest lps: 1054.6248774098

Batchsize = 10

gRPC lps: 121.33812669424
Rest lps: 111.70049836608

Batchsize = 1

gRPC lps: 11.612439095506
Rest lps: 8.285669912106

@tmatsuo
Copy link

tmatsuo commented May 2, 2018

@ZhouyihaiDing

The cloud logging library has a performance optimization of batching the logs up to 1000 under the cover, so actually it matters the end user experiences.

I know where you come from, and I'm ok without having 1000 batch benchmark. However, I think it is also good for you to show the number for 1000 batch because it explains why our logging library is slow with grpc transport and protobuf (you can say protobuf is the culprit ;).

"require": {
"google/cloud": "^0.62.0",
"ext-json": "*",
"ext-protobuf": "*"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add "ext-curl"?

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented May 2, 2018

@tmatsuo .
I was adding the payload into the benchmark argument, then I discover that when the rest transport send the different message with the same payload, it will be a little slower. Can you please have a try when you have time?

I don't know how to explain that. Is it possible that rest client will re-use some caches created from grpc client?

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented May 2, 2018

Or curl does some optimization, reuse some cache when messages been sent are always the same.

$grpcLogger->info($msg_grpc);
array_push($grpc_latency_array, microtime(true) - $grpc_start_time);

$msg_rest = generate_string($payload);
Copy link
Contributor Author

@ZhouyihaiDing ZhouyihaiDing May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By generating different msg instead of using the same msg "a" in every iteration, rest becomes slower.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'll verify this too.

Copy link
Contributor Author

@ZhouyihaiDing ZhouyihaiDing May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce it in all 5 GCE machines with 1/2/4/8/16 cores.
By the way, I install the php by apt-get, as well as ext-curl by apt-get install php-curl.

I just set 1k to be the default payload and am going to re-test again. With 30s warmup and 600s benchmark time.

@ZhouyihaiDing
Copy link
Contributor Author

ZhouyihaiDing commented May 8, 2018

Hi @tmatsuo .

Does this pull request look good to you?

I update the default payload to 1kB and extend the default benchmark time. The result in the document is updated, can you take a look when you have time and see if the data make sense?

Thank you!

@tmatsuo
Copy link

tmatsuo commented May 9, 2018

I don't see any tests with larger batch size, but as I said before it LGTM

@ZhouyihaiDing ZhouyihaiDing merged commit 8ce5ce4 into GoogleCloudPlatform:master May 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants