-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-controlled gates in apply_gate
#87
Conversation
@stavros11 the implementation looks good, please rebase this PR with the #86 (if you agree with the changes), so we can check that everything is working properly. |
Codecov Report
@@ Coverage Diff @@
## master #87 +/- ##
==========================================
+ Coverage 97.33% 97.37% +0.03%
==========================================
Files 30 30
Lines 2965 3005 +40
==========================================
+ Hits 2886 2926 +40
Misses 79 79
Continue to review full report at Codecov.
|
I integrated this to qibo Circuit in a different branch and run some of the gate benchmarks from #47 to compare with Cirq. It performs much better than our current implementation but it still does not exploit gate sparsity and Cirq is faster in some gates. Table has ratio Qibo custom op time / Cirq0.8 time.
As in our previous implementation, execution time is similar for all gates. The difference is just because Cirq is much faster for some gates (such as CNOT and Z) than it is for H. |
Good, do you have a complete QFT example? |
const int64 tk = std::pow(2, nqubits - target - 1); | ||
|
||
int64 cktot = 0; | ||
std::set<int64> cks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really need a set? If not, then we could use a vector with know size vector<int64> cks(ncontrols)
and then simply assign element by element, avoiding resizing with insert.
for (auto g = t; g < w; g += 2 * tk) { | ||
for (auto i = g; i < g + tk; i++) { | ||
bool apply = true; | ||
for (std::set<int64>::iterator q = cks.begin(); q != cks.end(); q++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a c++11 range-based for loop, e.g. for (const auto &i: cks)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. We can address the slower gates in another PR.
Exact numbers for many qubits (everything single precision):
It should be noted that Cirq is single-threaded, while our single thread performance will be ~2-3 times worse. However QFT also involves a lot of CZPow for which Cirq is particularly fast. I do not have results for more qubits due to time (no memory). 32 qubits take ~48GB, so it might be possible to run 33 on dom.
Thanks for the comments. I switched |
Great, we are almost there. Concerning the other operators we could have a functor per gate and thus pass the gate name together with the state. |
For the gates we currently have in Qibo I think the situation is the following:
The easiest and probably most efficient solution is to write three/four different kernels that implement these. The only disadvantage of this is that it would result to many C files. Alternatively we could try to fit all in the same kernel, however I am not sure how easy this would be. Also, there is an indexing issue with 32 qubits:
|
Thanks for the summary, having 3-4 kernels is not a problem at all. |
Great. I will try to implement something around this idea and check how it compares to Cirq. If it works as expected I will open a PR or push it directly here if we leave this open. |
Perfect, concerning the crash, I have just tried a simple |
Thanks for testing. I also tried a simpler example and it is possible to run up to 32 qubits. I guess the problem is associated with the way I integrated the custom op in qibo circuits but this was a quick implementation just for the benchmarks. I will revisit this anyway when we have the final kernels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if you prefer to merge this now and then open other PRs, it is fine by me too.
This extends
op.apply_gate
to support multi-control gates. I am not sure if this is the best implementation possible but performance and memory requirements are the same with our currentapply_gate
(controlled gates are actually slightly faster because they require less updates).With this kernel we can implement all Qibo gates except SWAP and measurements. For SWAP we can either decompose it to three CNOTs or write another special kernel.
I will implement
Circuit
in a different branch using this custom operator instead of matmul/einsum and redo the gate benchmarks to see how it compares to Cirq.