Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiqubit ops GPU performance #53

Closed
stavros11 opened this issue Dec 1, 2021 · 1 comment
Closed

Multiqubit ops GPU performance #53

stavros11 opened this issue Dec 1, 2021 · 1 comment

Comments

@stavros11
Copy link
Member

Following our discussion, this is the GPU equivalent of #51 where we can discuss various interesting findings regarding the multiqubit kernel performance on GPU and particularly how it compares to qiskit. Here are some benchmark results:

Simulation time qibo/qiskit - RTX A6000 - double precision

image

Simulation time qibo/qiskit - RTX A6000 - single precision

image

In single precision we are much faster while in double precision we are mostly equivalent apart from a specific area. Here are the exact times for this interesting area:

simulation times - nqubits=15
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.00068 0.00171 0.00057 0.00115
2 0.00105 0.00177 0.00066 0.00184
3 0.00270 0.00184 0.00112 0.00175
4 0.00695 0.00169 0.00125 0.00242
5 0.02246 0.00259 0.00246 0.00350
6 0.03912 0.01039 0.00538 0.00637
7 0.06972 0.02882 0.00678 0.02115
8 0.12242 0.11884 0.01668 0.10134
9 0.25151 0.60537 0.05127 0.57179
10 0.77184 3.72334 0.30835 3.54705
simulation times - nqubits=16
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.00071 0.00142 0.00061 0.00132
2 0.00108 0.00193 0.00069 0.00191
3 0.00286 0.00210 0.00120 0.00183
4 0.00756 0.00758 0.00135 0.00232
5 0.02427 0.00859 0.00269 0.00351
6 0.08424 0.01121 0.00997 0.00782
7 0.15274 0.03605 0.01274 0.02318
8 0.27393 0.11994 0.02289 0.11415
9 0.52561 0.72564 0.06245 0.65283
10 1.15000 4.27385 0.36389 4.22007
simulation times - nqubits=17
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.00076 0.00213 0.00065 0.00203
2 0.00121 0.00233 0.00074 0.00170
3 0.00307 0.00792 0.00129 0.00286
4 0.00813 0.00277 0.00146 0.00361
5 0.02625 0.00923 0.00291 0.00575
6 0.09226 0.01316 0.01078 0.01105
7 0.33222 0.04734 0.02657 0.03166
8 0.60661 0.17166 0.04665 0.13848
9 1.09221 0.74349 0.08526 0.75295
10 2.17680 4.90263 0.41825 4.97029
simulation times - nqubits=18
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.00092 0.00196 0.00071 0.00226
2 0.00122 0.00227 0.00080 0.00225
3 0.00329 0.00906 0.00134 0.00408
4 0.00871 0.00968 0.00154 0.00623
5 0.02826 0.01136 0.00312 0.01024
6 0.09959 0.02241 0.01180 0.01939
7 0.36258 0.06104 0.02912 0.04782
8 1.32488 0.23768 0.10081 0.17849
9 2.42261 0.87334 0.19749 0.88229
10 4.36327 5.55920 0.57205 5.60554
simulation times - nqubits=19
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.00127 0.01038 0.00077 0.00300
2 0.00191 0.00469 0.00089 0.00329
3 0.00357 0.01160 0.00145 0.00656
4 0.00945 0.02012 0.00165 0.01049
5 0.03066 0.01703 0.00339 0.01806
6 0.10760 0.03546 0.01305 0.03311
7 0.39375 0.08286 0.03169 0.07409
8 1.44949 0.23880 0.11003 0.23198
9 5.31427 0.98861 0.39713 1.04869
10 9.69975 6.31365 0.82073 6.39574
simulation times - nqubits=20
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.00182 0.01452 0.00123 0.00561
2 0.00312 0.01466 0.00133 0.00602
3 0.00642 0.01459 0.00187 0.01225
4 0.01038 0.03097 0.00184 0.02110
5 0.03337 0.03782 0.00392 0.03531
6 0.11569 0.05954 0.01418 0.06470
7 0.42412 0.12040 0.03445 0.13335
8 1.57073 0.24070 0.11950 0.34332
9 5.79949 1.12296 0.43513 1.31031
10 21.42194 6.96956 1.59364 7.14927
simulation times - nqubits=21
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.00297 0.04279 0.00183 0.01087
2 0.00506 0.04736 0.00190 0.00970
3 0.01134 0.04779 0.00257 0.02269
4 0.02009 0.04995 0.00272 0.04079
5 0.03781 0.07061 0.00496 0.06971
6 0.12480 0.11806 0.03517 0.12507
7 0.45471 0.19977 0.05067 0.24671
8 1.69274 0.37102 0.13072 0.56319
9 6.32958 1.37186 0.47240 1.76927
10 23.37317 7.95625 1.75441 8.69835
simulation times - nqubits=22
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.00529 0.04282 0.00304 0.02550
2 0.00885 0.04292 0.00307 0.02752
3 0.01958 0.04581 0.00373 0.05389
4 0.03942 0.05852 0.00431 0.08455
5 0.07661 0.08379 0.00905 0.14375
6 0.13771 0.13358 0.07412 0.25818
7 0.48743 0.24396 0.10509 0.48839
8 1.82252 0.55038 0.19315 1.01881
9 6.81780 1.78645 0.52388 2.68159
10 25.32039 9.04470 1.91643 10.59537
simulation times - nqubits=23
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.01015 0.08001 0.00550 0.04743
2 0.01712 0.08031 0.00543 0.04986
3 0.03556 0.08724 0.00609 0.10628
4 0.07037 0.11227 0.00700 0.17147
5 0.15528 0.16596 0.01858 0.29656
6 0.29346 0.26994 0.15351 0.53666
7 0.52695 0.48206 0.22443 1.00201
8 1.96243 0.99488 0.40818 1.99313
9 7.36424 2.65864 0.75142 4.56154
10 27.27548 11.00970 2.12359 14.71918
simulation times - nqubits=24
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.02022 0.18080 0.01053 0.08798
2 0.03352 0.18370 0.01043 0.09721
3 0.07010 0.19461 0.01097 0.21545
4 0.13568 0.25227 0.01237 0.35247
5 0.28303 0.36481 0.03299 0.61607
6 0.61949 0.58082 0.32582 1.12560
7 1.16325 1.00706 0.47051 2.11143
8 2.11352 1.91901 0.86623 4.03778
9 7.86494 4.39171 1.59513 8.44242
10 29.23039 14.76853 3.00823 22.39044
simulation times - nqubits=25
ntargets qibo double (sec) qiskit double (sec) qibo single (sec) qiskit single (sec)
1 0.04113 0.36355 0.02106 0.18211
2 0.06760 0.37848 0.02062 0.20346
3 0.14073 0.39008 0.02098 0.44870
4 0.27040 0.51104 0.02358 0.73671
5 0.54744 0.75356 0.06350 1.30015
6 1.14574 1.21546 0.67175 2.36441
7 2.47845 2.06826 1.00970 4.39131
8 4.59437 4.11890 1.82457 8.32751
9 8.45306 9.00420 3.41029 16.53761
10 31.44260 24.93261 6.36849 38.09419

It appears that qiskit has an issue here as in some cases their single precision is significantly slower than their double. On the other hand, qibo's single precision is extremely faster than double.

Below are some numbers from the DGX, where the situation is completely different and more in line with what we observed on CPU in #51:

Simulation time qibo/qiskit - V100 - double precision

image

simulation times - nqubits=24 - V100
ntargets qibo double (sec) qiskit double (sec)
3 0.42339 0.33970
4 0.62654 0.33986
5 1.00678 0.33991
6 2.19785 0.34998
7 3.63109 0.45986
8 6.53215 0.76321
9 11.95559 2.57002
10 22.17884 15.57758
simulation times - nqubits=25 - V100
ntargets qibo double (sec) qiskit double (sec)
3 0.80510 0.67170
4 1.23495 0.67187
5 2.09212 0.67181
6 4.59987 0.67553
7 7.65031 0.88789
8 13.84615 1.25915
9 25.54963 3.41262
10 47.12159 19.35345
simulation times - nqubits=26 - V100
ntargets qibo double (sec) qiskit double (sec)
3 1.65588 1.33589
4 2.56164 1.33560
5 4.35132 1.33077
6 9.55329 1.35739
7 16.05355 1.74887
8 29.17296 2.29261
9 53.89957 5.04608
10 100.10543 26.62849
simulation times - nqubits=27 - V100
ntargets qibo double (sec) qiskit double (sec)
3 3.41055 2.65044
4 5.32581 2.63898
5 9.09922 2.63975
6 20.06714 2.64693
7 33.64388 3.46601
8 61.26683 4.29842
9 113.86667 8.26060
10 212.27658 40.63561
simulation times - nqubits=28 - V100
ntargets qibo double (sec) qiskit double (sec)
3 7.15363 5.24736
4 11.12589 5.22061
5 18.98035 5.25438
6 41.96390 5.26979
7 70.22372 6.85323
8 128.53272 8.41681
9 239.46607 14.80653
10 448.56989 70.21813

NOTE: I just realized that the benchmark script we have been using does not use state.numpy() so the final state is not transferred from GPU to CPU for qibo, something that probably happens for qiskit which returns a numpy array. If we include the transfer, qibo's results may be worse (not sure how much), but I believe some of above observations will still hold. For example the strange fact that qiskit single is slower than qiskit double remains, but is not really our problem to solve.

@scarrazza
Copy link
Member

Closing, obsolete results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants