-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend review #138
Comments
Keep thinking about it: is there a reason to support something else than NumPy for CPU and CuPy for GPU? TensorFlow is much more complex than CuPy, and more dissimilar to NumPy. If ever we'll go through this exercise, I'd really consider trimming down the number of backends, in such a way to support all platforms, but shaving off as much overhead as possible. |
@alecandido I agree with the your point about |
In the spirit of the first message, PyTorch would be definitely better than TensorFlow. However, considering the potential simplification, even PyTorch is still one more backend. But if there is a need deeply connected to the circuit simulation, of course it's much better to plan including a PyTorch backend from the beginning (if we'll ever start a refactor, this issue was mostly investigation until now - I just wanted to check if there is room for improvements and simplifications). |
@alecandido I personally haven't used |
I like the suggestions of the first post, I need to read it in more detail later, but I agree that the methods of Qibo's Other than that, regarding the existing backends:
The advantage is only when the custom kernels are used, which are only for applying gates to states and some state initialization. All other operations are delegated to numpy. I would say (without real proof) that the advantage is coming from the following points, ordered with decreasing importance:
import qibo
import numpy as np
from qibo import gates, Circuit
qibo.set_backend("qibojit") # or "numpy"
c = Circuit(2)
c.add(gates.H(0))
c.add(gates.H(1))
state = np.random.random(4).astype(complex)
state2 = circuit(state)
print(state)
print(state2) With numpy
np.einsum("ec,abcd->abed", gate, state) which applies a single-qubit gate to the 3rd qubit of a 4-qubit state, uses similar tricks but I have never checked the actual code.
That's true, CuQuantum is there only for supporting an additional backend, which is backed by NVIDIA, and for allowing easy benchmarking (CuPy vs CuQuantum). It does not offer any additional features.
As @renatomello said, the main motivation for using Tensorflow is automatic differentiation. Compared to numpy, it also supports multi-threading and GPUs but is still slower than qibojit primarily due to creating copies (point 1 above), which are needed for automatic differentiation. Indeed, there are alternative backends we could add for this point (PyTorch, JAX, etc.), I think we only have TensorFlow for historic reasons, as we started with this and also qibotf, the predecessor of qibojit. |
Thanks, @stavros11, for the summary, I believe now everything should be clear enough. My current understanding is that we'll need:
So, I'm not sure that point 3. is strictly required for simulation, because strict simulation can not derive a circuit (otherwise the same code would not run on hardware out of the box). On the one side, I have always been tempted to add a further requirement: go beyond Python. However, this, together with the three above, would be incredibly time-consuming, and I'm pretty sure it's not worth for the current state of the project. Speaking of XLA, it seems like all the major ML frameworks are using it (in particular TensorFlow, JAX, and even PyTorch), and it should satisfy all the conditions above on its own. Thinking twice, I actually wonder if it would be worth to investigate deeper CuPy vs XLA-based libraries. Because if JAX or PyTorch are good enough (maybe not TF, since it's the least interoperable one, and it already "failed" somehow), and they support all the use cases, why should we dedicate effort ourselves to develop/maintain multiple simulation backends? Eventually, if we really needed something more fine-grained than what these libraries could provide, making a trip into XLA itself might even be worth (but I really hope not, at least for a long while... also because we would lose all/most of the autodiff...). P.S.: about the copies, I was worried the problem could have persisted with the others, but there is room in JAX and PyTorch (all the |
Yes, AD is much better for gradient simulation than any other method that is hardware-compatible. So it is very necessary to keep. Getting the same computational complexity as AD on hardware is actually a hot topic right now in the QML circles, and there are some theoretical results showing that it even may be impossible to do it for a general circuit without violating complexity bounds. Of course, it can still be possible for specific circuits. But the point is that AD is indispensable. |
This is actually spanning both Qibojit and Qibo itself, but being specific for backends, I decided to avoid polluting Qibo's tracker.
It is only a proposal and definitely not urgent. The goal is to simplify the code (for maintenance), and potentially also new backends implementation.
The main observation is that, most of the work done at the level of the backend, relies on the usage of a NumPy compatible API.
This has already been observed since the beginning, and indeed there is a
self.np
attribute to access the API specific to that backend.However, NumPy has far more refined approaches for interoperability, and since they are quite adopted by the other similar libraries, in principle some of the tasks being performed by Qibo could be delegated to the libraries themselves.
In particular, the main mechanism are
__array_ufunc__
and__array_function__
, that allow a NumPy call on a foreign object to be handled by the external library defining the object. They are essentially hooks, that are called by the NumPy function, passing them all the details about the original call.These are not only working on the function processing existing arrays, but also on the creation routines, by using the
like
argument (see e.g.np.zeros
).Libraries like CuPy are already implementing this mechanism by themselves. In principle, all the backend methods that are just using the NumPy API should not be implemented more than once, at most the underlying NumPy operations should be hooked, by providing an
__array_function__
implementation ourselves (possibly a wrapper over an existing one, if not sufficiently complete).Essentially, we could act at the level of NumPy functions, filling the gaps, instead of at the level of quantum operations.
E.g. the
zero_state
method is implemented over and over:qibojit/src/qibojit/backends/cpu.py
Lines 87 to 90 in 0cac397
qibojit/src/qibojit/backends/gpu.py
Lines 144 to 150 in 0cac397
but it should always perform the same operations.
In practice, there are many limitations that should be discussed separately:
qibojit/src/qibojit/backends/gpu.py
Line 145 in 0cac397
however, this is happening purely in Python, so, if more efficient, could be simply adopted by the unique implementation (in other places the same backend is using exponentiation)
qibojit/src/qibojit/backends/gpu.py
Line 169 in 0cac397
qibojit/src/qibojit/backends/gpu.py
Lines 146 to 148 in 0cac397
where NumPy is using "fancy indexing", i.e.
arr[idx] = el
.However, if indexing is a problem for CuPy (or other backends), and in case it would be problematic to hook on its own, NumPy itself has an equivalent function, i.e.
np.put
. In the hooking perspective, the kernel implementation can be thenp.put
replacement (btw, CuPy has the same function,cp.put
, and I'm pretty sure is already hooked - but I also suspect indexing to work, and I could quickly check, so I might be missing something about the kernel...)np.put
replacement, or adding it at the end; however, this choice would become global (while currently it could be different method-by-method), we should investigate if this is a true limitation (most likely who implemented the backend has a better understanding of it)As I said, the main observation is that the current Qibo backends contain a lot of duplicated operations, at a higher level than required (an even better example would be matrices, which should definitely not be repeated more than once).
However, the update would require some effort and some (possibly deep) refactoring of the backends. The good part is that this would be fully internal, there is no need to break any interface for the Qibo user.
Given all these points, take this as a report about an investigation for possible improvements. There is no hurry to do anything.
The text was updated successfully, but these errors were encountered: