-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiqubit ops CPU performance #51
Comments
Thank you for these tests.
Yes, concerning the GPU implementation I think that qiskit uses a standard matrix multiplication for Regarding CPU, my guess is that they implemented functions with AVX2 instructions up to @stavros11 can you share the code that you are using to do these benchmarks? I'd like to make some experiments. |
@stavros11 can we close this issue? |
I am not sure if the numbers from the first post are still valid because long time has passed, but I do not think we did anything to solve this issue, so it probably still exists. I also have not done the AVX test that Marco is proposing. |
Closing, too obsolete results. |
As we saw during our discussion about qiboteam/qibo#505, we are observing some performance issues while incorporating the multiqubit ops in qibo, particularly in comparison to qiskit. Here are some benchmarks on CPU for circuits of the following type:
where U is a multiqubit (here five-qubit) unitary:
multiqubit - qibo/qiskit - simulation time - double
multiqubit - qibo/qiskit - simulation time - single
Since in previous benchmarks on this repository we were comparing calling the custom operators directly vs qiskit, I made an additional comparison of qibo (with qibojit) vs qibojit:
multiqubit - qibo/qibojit - simulation time - double
multiqubit - qibo/qibojit - dry run time - double
multiqubit - qibo/qibojit - simulation time - single
multiqubit - qibo/qibojit - dry run time - single
nqubits=23 - simulation times - double
nqubits=24 - simulation times - double
nqubits=25 - simulation times - double
Qibo's performance increases expectedly with ntargets, while qiskit makes at ntargets=7. It looks like they have a very good implementation for ntargets < 7 (perhaps based in some decomposition?) and a very bad for more targets. I think @mlazzarin observed something similar in the past, right?
For all these benchmarks the threads were set using
from multiprocessing import cpu_count
with all libraries using half of the total threads and is tested that final wavefunctions agree.The text was updated successfully, but these errors were encountered: