-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizations for the qibo.quantum_info.basis.pauli_basis
and vectorization
function
#1459
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1459 +/- ##
=======================================
Coverage 99.93% 99.93%
=======================================
Files 81 81
Lines 11777 11785 +8
=======================================
+ Hits 11769 11777 +8
Misses 8 8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@BrunoLiegiBastonLiegi you don't have to use ascii characters for
https://numpy.org/doc/stable/reference/generated/numpy.einsum.html#numpy-einsum In this way, you can use integers instead of characters, that is much better if you are generating the (this is not a suggestion to ban characters for |
One way we could move forward is to return |
mmh yeah, that's a possibility, but only in certain cases. For the sake of parallelization, surely having everything represented by a single tensor contraction is better, but you are limited by memory. @cache
def _pauli_basis_element(self, i, nqubits):
# do things according to backend and then you can build the complete basis as a generator as you suggested def pauli_basis(...):
return (backend._pauli_basis_element(i, nqubits) for i in range(len)) but GPU wise this will be less efficient and it will work, anyway, as long as you don't need the complete basis at the same time. |
The Hilbert space is too big, unfortunately there's no way around that. I wouldn't spend much time on this since there are bigger priorities in terms of optimization. |
In the end I was not able to make the |
Co-authored-by: Renato Mello <[email protected]>
for more information, see https://pre-commit.ci
Ok I ended up also adding a generalization to the import numpy as np
import tensorflow as tf
import tensorflow.experimental.numpy as tnp
a = tf.Variable([0 + 1j, 1 + 0j, 1 + 1j])
tnp.nonzero(a)
# this finds the last two only
# [<tf.Tensor: shape=(2,), dtype=int64, numpy=array([1, 2])>]
np.nonzero(a)
# numpy gives the correct result
# (array([0, 1, 2]),) |
Co-authored-by: Renato Mello <[email protected]>
Co-authored-by: Renato Mello <[email protected]>
Co-authored-by: Renato Mello <[email protected]>
Co-authored-by: Renato Mello <[email protected]>
Co-authored-by: Renato Mello <[email protected]>
@stavros11 since you looked into #1462, could you look into this too? Thanks. |
This improves the construction of the pauli basis by moving everything to tensor notation and removing loops. Namely the
basis_full
is now constructed via contraction througheinsum
. This should also scale well with GPU backends, weirdly for standard numpy CPU there is no speedup.These are the results:
The GPU always takes ~1s to set up. With 8 qubits,
old
gets killed together with the shell session, whereasnew
, both CPU and GPU, raise an out of memory error. To run 8 qubits you apparently need ~64GB of memory.To perform the einsum I use all the 48 ascii characters available, which means that we are limited toI am using integers as the indices for the48/3=16
qubits, unless other characters can be used in einsum. In any case, the memory requirements for 16 qubits are probably going to be very taxingeinsum
now, thus this is not limited anymore by the number of ascii charachters. It may be possible to obtain a speedup with numba as well, but I still have to investigate.EDIT 1: unfortunately
einsum
is not supported bynumba
, however if youjit
the old implementation:you are able to get a nice speedup ~ 5s for 7 qubits, however this happens from the second call, since the first time you are still dealing with the compilationunfortunately this implementation was yielding wrong results, thus I had to rollback to theeinsum
aprroach. Further investigation is needed to understand if it's possible to parallelize and improve this withnumba
.EDIT 2:
At some point I realized that a possible bottleneck, or better inefficiency, was due to the
qibo.quantum_info.superoperator_transformation.vectorization
that allowed to be run on a single input only, either a state vector or a density matrix, thus forcing to run loops on each element of the basis (which can grow large very quickly). The impact on the runtime is still marginal for the cpu as for 7 qubits its contribute was around ~1-2s out 10s, but for GPUs this starts becoming relevant. Furthermore,vectorization
appears to be widely used in thequantum_info
module, which convinced me to generalize it to accept batches of state vectors or density matrices, lifting therefore the need of explicit loops and rather leveraging tensor primitives directly. This was applied to thepauli_basis
for now but has to be propagated throughout the whole module.Checklist: