You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following our discussion, this is the GPU equivalent of #51 where we can discuss various interesting findings regarding the multiqubit kernel performance on GPU and particularly how it compares to qiskit. Here are some benchmark results:
Simulation time qibo/qiskit - RTX A6000 - double precision
Simulation time qibo/qiskit - RTX A6000 - single precision
In single precision we are much faster while in double precision we are mostly equivalent apart from a specific area. Here are the exact times for this interesting area:
simulation times - nqubits=15
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.00068
0.00171
0.00057
0.00115
2
0.00105
0.00177
0.00066
0.00184
3
0.00270
0.00184
0.00112
0.00175
4
0.00695
0.00169
0.00125
0.00242
5
0.02246
0.00259
0.00246
0.00350
6
0.03912
0.01039
0.00538
0.00637
7
0.06972
0.02882
0.00678
0.02115
8
0.12242
0.11884
0.01668
0.10134
9
0.25151
0.60537
0.05127
0.57179
10
0.77184
3.72334
0.30835
3.54705
simulation times - nqubits=16
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.00071
0.00142
0.00061
0.00132
2
0.00108
0.00193
0.00069
0.00191
3
0.00286
0.00210
0.00120
0.00183
4
0.00756
0.00758
0.00135
0.00232
5
0.02427
0.00859
0.00269
0.00351
6
0.08424
0.01121
0.00997
0.00782
7
0.15274
0.03605
0.01274
0.02318
8
0.27393
0.11994
0.02289
0.11415
9
0.52561
0.72564
0.06245
0.65283
10
1.15000
4.27385
0.36389
4.22007
simulation times - nqubits=17
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.00076
0.00213
0.00065
0.00203
2
0.00121
0.00233
0.00074
0.00170
3
0.00307
0.00792
0.00129
0.00286
4
0.00813
0.00277
0.00146
0.00361
5
0.02625
0.00923
0.00291
0.00575
6
0.09226
0.01316
0.01078
0.01105
7
0.33222
0.04734
0.02657
0.03166
8
0.60661
0.17166
0.04665
0.13848
9
1.09221
0.74349
0.08526
0.75295
10
2.17680
4.90263
0.41825
4.97029
simulation times - nqubits=18
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.00092
0.00196
0.00071
0.00226
2
0.00122
0.00227
0.00080
0.00225
3
0.00329
0.00906
0.00134
0.00408
4
0.00871
0.00968
0.00154
0.00623
5
0.02826
0.01136
0.00312
0.01024
6
0.09959
0.02241
0.01180
0.01939
7
0.36258
0.06104
0.02912
0.04782
8
1.32488
0.23768
0.10081
0.17849
9
2.42261
0.87334
0.19749
0.88229
10
4.36327
5.55920
0.57205
5.60554
simulation times - nqubits=19
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.00127
0.01038
0.00077
0.00300
2
0.00191
0.00469
0.00089
0.00329
3
0.00357
0.01160
0.00145
0.00656
4
0.00945
0.02012
0.00165
0.01049
5
0.03066
0.01703
0.00339
0.01806
6
0.10760
0.03546
0.01305
0.03311
7
0.39375
0.08286
0.03169
0.07409
8
1.44949
0.23880
0.11003
0.23198
9
5.31427
0.98861
0.39713
1.04869
10
9.69975
6.31365
0.82073
6.39574
simulation times - nqubits=20
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.00182
0.01452
0.00123
0.00561
2
0.00312
0.01466
0.00133
0.00602
3
0.00642
0.01459
0.00187
0.01225
4
0.01038
0.03097
0.00184
0.02110
5
0.03337
0.03782
0.00392
0.03531
6
0.11569
0.05954
0.01418
0.06470
7
0.42412
0.12040
0.03445
0.13335
8
1.57073
0.24070
0.11950
0.34332
9
5.79949
1.12296
0.43513
1.31031
10
21.42194
6.96956
1.59364
7.14927
simulation times - nqubits=21
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.00297
0.04279
0.00183
0.01087
2
0.00506
0.04736
0.00190
0.00970
3
0.01134
0.04779
0.00257
0.02269
4
0.02009
0.04995
0.00272
0.04079
5
0.03781
0.07061
0.00496
0.06971
6
0.12480
0.11806
0.03517
0.12507
7
0.45471
0.19977
0.05067
0.24671
8
1.69274
0.37102
0.13072
0.56319
9
6.32958
1.37186
0.47240
1.76927
10
23.37317
7.95625
1.75441
8.69835
simulation times - nqubits=22
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.00529
0.04282
0.00304
0.02550
2
0.00885
0.04292
0.00307
0.02752
3
0.01958
0.04581
0.00373
0.05389
4
0.03942
0.05852
0.00431
0.08455
5
0.07661
0.08379
0.00905
0.14375
6
0.13771
0.13358
0.07412
0.25818
7
0.48743
0.24396
0.10509
0.48839
8
1.82252
0.55038
0.19315
1.01881
9
6.81780
1.78645
0.52388
2.68159
10
25.32039
9.04470
1.91643
10.59537
simulation times - nqubits=23
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.01015
0.08001
0.00550
0.04743
2
0.01712
0.08031
0.00543
0.04986
3
0.03556
0.08724
0.00609
0.10628
4
0.07037
0.11227
0.00700
0.17147
5
0.15528
0.16596
0.01858
0.29656
6
0.29346
0.26994
0.15351
0.53666
7
0.52695
0.48206
0.22443
1.00201
8
1.96243
0.99488
0.40818
1.99313
9
7.36424
2.65864
0.75142
4.56154
10
27.27548
11.00970
2.12359
14.71918
simulation times - nqubits=24
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.02022
0.18080
0.01053
0.08798
2
0.03352
0.18370
0.01043
0.09721
3
0.07010
0.19461
0.01097
0.21545
4
0.13568
0.25227
0.01237
0.35247
5
0.28303
0.36481
0.03299
0.61607
6
0.61949
0.58082
0.32582
1.12560
7
1.16325
1.00706
0.47051
2.11143
8
2.11352
1.91901
0.86623
4.03778
9
7.86494
4.39171
1.59513
8.44242
10
29.23039
14.76853
3.00823
22.39044
simulation times - nqubits=25
ntargets
qibo double (sec)
qiskit double (sec)
qibo single (sec)
qiskit single (sec)
1
0.04113
0.36355
0.02106
0.18211
2
0.06760
0.37848
0.02062
0.20346
3
0.14073
0.39008
0.02098
0.44870
4
0.27040
0.51104
0.02358
0.73671
5
0.54744
0.75356
0.06350
1.30015
6
1.14574
1.21546
0.67175
2.36441
7
2.47845
2.06826
1.00970
4.39131
8
4.59437
4.11890
1.82457
8.32751
9
8.45306
9.00420
3.41029
16.53761
10
31.44260
24.93261
6.36849
38.09419
It appears that qiskit has an issue here as in some cases their single precision is significantly slower than their double. On the other hand, qibo's single precision is extremely faster than double.
Below are some numbers from the DGX, where the situation is completely different and more in line with what we observed on CPU in #51:
Simulation time qibo/qiskit - V100 - double precision
simulation times - nqubits=24 - V100
ntargets
qibo double (sec)
qiskit double (sec)
3
0.42339
0.33970
4
0.62654
0.33986
5
1.00678
0.33991
6
2.19785
0.34998
7
3.63109
0.45986
8
6.53215
0.76321
9
11.95559
2.57002
10
22.17884
15.57758
simulation times - nqubits=25 - V100
ntargets
qibo double (sec)
qiskit double (sec)
3
0.80510
0.67170
4
1.23495
0.67187
5
2.09212
0.67181
6
4.59987
0.67553
7
7.65031
0.88789
8
13.84615
1.25915
9
25.54963
3.41262
10
47.12159
19.35345
simulation times - nqubits=26 - V100
ntargets
qibo double (sec)
qiskit double (sec)
3
1.65588
1.33589
4
2.56164
1.33560
5
4.35132
1.33077
6
9.55329
1.35739
7
16.05355
1.74887
8
29.17296
2.29261
9
53.89957
5.04608
10
100.10543
26.62849
simulation times - nqubits=27 - V100
ntargets
qibo double (sec)
qiskit double (sec)
3
3.41055
2.65044
4
5.32581
2.63898
5
9.09922
2.63975
6
20.06714
2.64693
7
33.64388
3.46601
8
61.26683
4.29842
9
113.86667
8.26060
10
212.27658
40.63561
simulation times - nqubits=28 - V100
ntargets
qibo double (sec)
qiskit double (sec)
3
7.15363
5.24736
4
11.12589
5.22061
5
18.98035
5.25438
6
41.96390
5.26979
7
70.22372
6.85323
8
128.53272
8.41681
9
239.46607
14.80653
10
448.56989
70.21813
NOTE: I just realized that the benchmark script we have been using does not use state.numpy() so the final state is not transferred from GPU to CPU for qibo, something that probably happens for qiskit which returns a numpy array. If we include the transfer, qibo's results may be worse (not sure how much), but I believe some of above observations will still hold. For example the strange fact that qiskit single is slower than qiskit double remains, but is not really our problem to solve.
The text was updated successfully, but these errors were encountered:
Following our discussion, this is the GPU equivalent of #51 where we can discuss various interesting findings regarding the multiqubit kernel performance on GPU and particularly how it compares to qiskit. Here are some benchmark results:
Simulation time qibo/qiskit - RTX A6000 - double precision
Simulation time qibo/qiskit - RTX A6000 - single precision
In single precision we are much faster while in double precision we are mostly equivalent apart from a specific area. Here are the exact times for this interesting area:
simulation times - nqubits=15
simulation times - nqubits=16
simulation times - nqubits=17
simulation times - nqubits=18
simulation times - nqubits=19
simulation times - nqubits=20
simulation times - nqubits=21
simulation times - nqubits=22
simulation times - nqubits=23
simulation times - nqubits=24
simulation times - nqubits=25
It appears that qiskit has an issue here as in some cases their single precision is significantly slower than their double. On the other hand, qibo's single precision is extremely faster than double.
Below are some numbers from the DGX, where the situation is completely different and more in line with what we observed on CPU in #51:
Simulation time qibo/qiskit - V100 - double precision
simulation times - nqubits=24 - V100
simulation times - nqubits=25 - V100
simulation times - nqubits=26 - V100
simulation times - nqubits=27 - V100
simulation times - nqubits=28 - V100
NOTE: I just realized that the benchmark script we have been using does not use
state.numpy()
so the final state is not transferred from GPU to CPU for qibo, something that probably happens for qiskit which returns a numpy array. If we include the transfer, qibo's results may be worse (not sure how much), but I believe some of above observations will still hold. For example the strange fact that qiskit single is slower than qiskit double remains, but is not really our problem to solve.The text was updated successfully, but these errors were encountered: