Add support for BLAS SVD functions in MPS simulation #1897

Patataman · 2023-08-21T04:14:03Z

Summary

Hello, in this PR I add support for using the OpenBLAS/LAPACK SVD functions to replace the Qiskit's SVD (sequential) implementation in https://github.com/Qiskit/qiskit-aer/blob/main/src/simulators/matrix_product_state/svd.cpp#L148 for the MPS simulation.

Details and comments

Some points about the implementation:

I could not guess why you are not using OpenBLAS library for the SVD. Therefore, as first approach I have used an environmental variable QISKIT_LAPACK_SVD to "activate/deactivate" the replacement to simplify the testing and benchmarking. From the code and comments I understood that it is based in this implementation (https://dl.acm.org/doi/10.1145/363235.363249), therefore, LAPACK zgesvd function should be the same. If you are agree with the replacement, it is just as simple as removing the old code and leaving the new.
I have implemented support for both zgesvd and zgesdd SVD approaches. Why? Because zgesdd is a Divide & Conquer approach and performance for bigger matrices is much better than zgesvd. Ideally the selection would be automatically based on the matrix size, but as first approach (again), it can be manually selected using the environmental variable QISKIT_LAPACK_SVD=DC
PR also includes a couple of changes around print_to_log functions. While profiling, I saw that these functions where quite expensive, and if you are not in "debug" usually you don't care about the logs, so I added #ifdef DEBUG / #endif around them to improve performance. If I am wrong, say it and I will just undo it.

Finally, you want to see if this improve performance. For that I have used Random Quantum Circuits (https://arxiv.org/pdf/2207.14280.pdf) and this server configuration:


CPU	Intel Xeon Gold 6148
# sockets	2
# cores	20
RAM	192GB
GPU	None
OS	Ubuntu 22.04.1 LTS
Python	3.10
OpenBLAS/LAPACK	0.3.21
gcc	v11.3

And, the average time (in seconds) from 5 different executions:

Depth	Base	LAPACK SVD	LAPACK D&C
1	0.043029881	0.04652729	0.049358368
3	0.116557598	0.109769773	0.126575661
5	0.14433198	0.148080826	0.16601491
10	74.70666704	28.21881118	21.8562469
12	1568.871852	429.5337416	208.0082648
15	33927.16216	12694.68723	1790.687791

As you can see, in the deepest circuits we are talking in (several) hours with the current implementation, but in minutes with the D&C approach.

Qiskit

CLAassistant · 2023-08-21T04:14:08Z

All committers have signed the CLA.

doichanj · 2023-08-24T01:42:46Z

Could you fix the errors so that all checks will be passed

Patataman · 2023-08-24T01:49:06Z

With the last commit I think I have solved the Windows-build problem.

However, for macOS the error is:
/Applications/Xcode_14.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/wchar.h:123:15: fatal error: 'wchar.h' file not found
And no idea how can I fix this. Never developed in mac nor I modified anything about CMake in the PR... Any idea how to address it?

EDIT: No, I didn't fix everything...

Patataman · 2023-08-24T05:42:32Z

Finally, all ok :)

merav-aharoni

This is a very nice performance improvement for MPS. In answer to your question why we didn't implement this, it was in plan, but never actually got to it.
I reviewed your code, but I am not very familiar with Lapack, so could not say anything on that part.
A few general comments:

Most important - I saw the performance comparison which looks very nice. However, did you compare results for deep and large circuits? In your test, the circuit is shallow, which is fine for regression, but before merge, please check results. In particular, I know the previous version used long double precision in some places, but Lapack uses long so this might make a difference.
Also compare results when using approximation.
I think it would be a good idea to turn on the validate_SVD_result function for the near future after this is merged. @doichanj - you should probably turn this off again after a few months of usage.
Add to the documentation the actual sizes of the matrices where one algorithm becomes better than the other.
Also, best to automatically choose between the two algorithms depending on matrix size.
It is not a good idea to include the change in print_to_log here. Best to separate to a different PR. In any case, I am not sure whether this change is needed after my comment above.

releasenotes/notes/compute-svd-with-lapack-3ee992d371d653d1.yaml

src/simulators/matrix_product_state/matrix_product_state_internal.cpp

src/simulators/matrix_product_state/svd.cpp

Patataman · 2023-09-01T14:16:48Z

Sorry for the delay, and thanks for the comments.

I will try to fix and change the things mentioned. However, this PR was part of my work in an internship and it already finished, so I have no longer access to the "big" server and it will take me a while to replicate everything. I will come back to you with the things done.

Co-authored-by: merav-aharoni <[email protected]>

Patataman · 2023-10-09T16:52:51Z

Hello, as mentioned before I have no longer access to the nice server I was using, and I didn't manage to get access to something similar. So, now (sadly) I am using my laptop. Nevertheless, I can simulate up to 16-18 qubits and for the remaining tests I think it's enough.

Here I left you the plot for deeper circuits, from 10 to 200 with 16 qubits, with the automatic selector between the QR or Divide and Conquer SVD functions in LAPACK.

Time in seconds

Depth	Base	LAPACK SVD
10	6.201454782	3.320371771
20	11.47285843	3.320371771
40	22.15550818	4.246894503
60	32.74408193	7.875970221
80	43.35610499	10.47702546
100	53.8639502	11.49814119
150	80.57371044	15.84590697
200	107.1309861	20.91695046

I still have more things to do, like the approximation (but I tested it a little back then and it was good, but I didn't generate speed-up graphs), code style and documentation.

Patataman · 2023-10-12T12:08:58Z

Results using approximation. I used 16 qubits, depth 40 as it was in the previous result the one with the biggest speed-up.
For the fidelity I used the state_fidelity using the statevector from the execution without LAPACK and execution with LAPACK.

This first plot is using truncation_threshold parameter:

Time in seconds:

Threshold	Base	LAPACK SVD
1e-16	17.64385796	5.667835045
1e-10	17.89117179	5.667835045
1e-8	16.61686311	7.67423768
1e-6	14.70690207	7.358585072
1e-4	13.58314958	6.801743698
1e-2	17.81055226	5.805790138

This second plot, using the max_bond_dimension parameter, for a big range of values as it depends a lot on the matrix sizes for the SVD:

Time in seconds:

Bond Dimension	Base	LAPACK SVD
10	0.06917600632	0.07078566551
20	0.1661295414	0.1548344612
40	0.8819941998	0.443166399
60	1.863601351	0.8205073357
80	3.030164289	1.139424276
100	4.378917599	3.04997468
150	10.09798179	4.118888235
200	12.50887923	4.118888235

SV Fidelity usually is 0.999999 but I guess that's because the float point error, and nothing to worry about.

I have left fix the documentation

merav-aharoni · 2023-10-12T15:57:52Z

Hi @Patataman - nice work! I am not sure I understand your graphs. In the table above, it seems the performance improvement is ~x2000 or so. In your current graphs the improvement seems to be x3 at most. What am I missing?
I think it would be helpful to plot the performance of the existing version and of the new version, rather than plotting the improvement.

Patataman · 2023-10-16T09:39:23Z

Hi @Patataman - nice work! I am not sure I understand your graphs. In the table above, it seems the performance improvement is ~x2000 or so. In your current graphs the improvement seems to be x3 at most. What am I missing? I think it would be helpful to plot the performance of the existing version and of the new version, rather than plotting the improvement.

The main differences are:

In the original post I was using a nice server with 40 cores, and now I have 8, so parallelism is very limited compared with the original results.
Original result was with 30 qubits, but now I can simulate up to 16 because of the execution time and RAM.

Execution times are highly related with the matrix size of the SVD function. During my original evaluation I saw that this size is highly related with the number of qubits and circuit's entanglement. Therefore, as now I am using 16 qubits instead, matrix sizes and speed-up gains are smaller.

Also, in the original results, maximum speed-up was x20, not ~x2000, maybe if you extrapolate it seems the potential speed up was x2000, but as mentioned before, it does not depend so much on the depth of the circuit, but rather the number of qubits. If I remember well, I think the maximum matrix size for a fully entangled circuit was something like 2^(n-1) with n as number of qubits.

I will edit the previous post to include the execution times with approximation

Patataman · 2023-10-26T15:25:19Z

src/simulators/matrix_product_state/svd.cpp

@@ -140,9 +141,28 @@ double reduce_zeros(cmatrix_t &U, rvector_t &S, cmatrix_t &V,
  return discarded_value;
 }

+void validate_SVdD_result(const cmatrix_t &A, const cmatrix_t &U,


Added this function to avoid applying AER::Utils::dagger to V all the time in lapack_csvd_wrapper

Patataman added 6 commits May 19, 2023 15:12

add lapack svd method + test

8eac5a9

forgot to unset env variable in test

79d05aa

fix seg fault bc arrays were too big

6c3ff5c

Merge branch 'PR-LapackSVD' into qiskit

7813e70

Merge pull request Qiskit#8 from Patataman/qiskit

213428f

Qiskit

code style and releasenote for PR

c673b0d

Patataman changed the title ~~Pr lapack svd~~ Add support for BLAS SVD functions in MPS simulation Aug 21, 2023

Merge branch 'main' into PR-LapackSVD

0f78c44

doichanj added the enhancement New feature or request label Aug 21, 2023

Merge branch 'main' into PR-LapackSVD

6a14811

doichanj self-requested a review August 22, 2023 14:35

Patataman added 4 commits August 24, 2023 01:49

address microsoft C2131?

180359b

style + another C2131

f377733

missing free

5a91443

change test to support windows python3.8

394298e

Patataman mentioned this pull request Aug 24, 2023

Add support to compile with Intel MKL + Use it in MPS simulation #1913

Open

doichanj added the performance Performance improvements label Aug 24, 2023

Merge branch 'main' into PR-LapackSVD

cd38c9c

merav-aharoni reviewed Aug 27, 2023

View reviewed changes

Patataman and others added 5 commits October 2, 2023 16:54

Update releasenotes/notes/compute-svd-with-lapack-3ee992d371d653d1.yaml

f7190fc

Co-authored-by: merav-aharoni <[email protected]>

remove unnecessary comments

0875607

undo ifdef DEBUG

0503475

automatic selector for QR or D&C in LAPACK SVD

fda3db1

Merge branch 'main' into PR-LapackSVD

1ca5181

doichanj added this to the Aer 0.14.0 milestone Oct 11, 2023

codestyle, enable MPS lapack using run_options

0899baa

Patataman commented Oct 26, 2023

View reviewed changes

Merge branch 'main' into PR-LapackSVD

5d7ee40

doichanj approved these changes Jan 10, 2024

View reviewed changes

doichanj merged commit 86a27e3 into Qiskit:main Jan 10, 2024
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for BLAS SVD functions in MPS simulation #1897

Add support for BLAS SVD functions in MPS simulation #1897

Patataman commented Aug 21, 2023 •

edited

Loading

CLAassistant commented Aug 21, 2023 •

edited

Loading

doichanj commented Aug 24, 2023

Patataman commented Aug 24, 2023 •

edited

Loading

Patataman commented Aug 24, 2023

merav-aharoni left a comment

Patataman commented Sep 1, 2023

Patataman commented Oct 9, 2023 •

edited

Loading

Patataman commented Oct 12, 2023 •

edited

Loading

merav-aharoni commented Oct 12, 2023

Patataman commented Oct 16, 2023

Patataman Oct 26, 2023 •

edited

Loading

Add support for BLAS SVD functions in MPS simulation #1897

Add support for BLAS SVD functions in MPS simulation #1897

Conversation

Patataman commented Aug 21, 2023 • edited Loading

Summary

Details and comments

CLAassistant commented Aug 21, 2023 • edited Loading

doichanj commented Aug 24, 2023

Patataman commented Aug 24, 2023 • edited Loading

Patataman commented Aug 24, 2023

merav-aharoni left a comment

Choose a reason for hiding this comment

Patataman commented Sep 1, 2023

Patataman commented Oct 9, 2023 • edited Loading

Patataman commented Oct 12, 2023 • edited Loading

merav-aharoni commented Oct 12, 2023

Patataman commented Oct 16, 2023

Patataman Oct 26, 2023 • edited Loading

Choose a reason for hiding this comment

Patataman commented Aug 21, 2023 •

edited

Loading

CLAassistant commented Aug 21, 2023 •

edited

Loading

Patataman commented Aug 24, 2023 •

edited

Loading

Patataman commented Oct 9, 2023 •

edited

Loading

Patataman commented Oct 12, 2023 •

edited

Loading

Patataman Oct 26, 2023 •

edited

Loading