Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebalance CircuitInstruction and PackedInstruction #12730

Merged
merged 11 commits into from
Jul 23, 2024

Conversation

jakelishman
Copy link
Member

@jakelishman jakelishman commented Jul 7, 2024

Summary

This is a large overhaul of how circuit instructions are both stored in Rust (PackedInstruction) and how they are presented to Python (CircuitInstruction). In summary:

  • The old OperationType enum is now collapsed into a manually managed PackedOperation. This is logically equivalent, but stores a PyGate/PyInstruction/PyOperation indirectly through a boxed pointer, and stores a StandardGate inline. As we expect the vast majority of gates to be standard, this hugely reduces the memory usage. The enumeration is manually compressed to a single pointer, hiding the discriminant in the low, alignment-required bytes of the pointer.

  • PackedOperation::view() unpacks the operation into a proper reference-like enumeration OperationRef<'a>, which implements Operation (though there is also a try_standard_gate method to get the gate without unpacking the whole enumeration).

  • Both PackedInstruction and CircuitInstruction use this PackedOperation as the operation storage.

  • PackedInstruction is now completely the Rust-space format for data, and CircuitInstruction is purely for communication with Python.

On my machine, this commit brings the utility-scale benchmarks to within 10% of the runtime of 1.1.0 (and some to parity), despite all the additional overhead.

Changes to accepting and building Python objects

  • A PackedInstruction is created by copy constructor from a CircuitInstruction by CircuitData::pack. There is no pack_owned (really, there never was - the previous method didn't take ownership) because there's never owned CircuitInstructions coming in; they're Python-space interop, so we never own them (unless we clone them) other than when we're unpacking them.

  • PackedInstruction is currently just created manually when not coming from a CircuitInstruction. It's not hard, and makes it easier to re-use known intern indices than to waste time re-interning them. There is no need to go via CircuitInstruction.

  • CircuitInstruction now has two separated Python-space constructors: the old one, which is the default and takes (operation, qubits, clbits) (and extracts the information), and a new fast-path from_standard which asks only for the standard gate, qubits and params, avoiding operator construction.

  • To accept a Python-space operation, extract a Python object to OperationFromPython. This extracts the components that are separate in Rust space, but joined in Python space (the operation, params and extra attributes). This replaces OperationInput and OperationTypeConstruct, being more efficient at the extraction, including providing the data in the formats needed for PackedInstruction or CircuitInstruction.

  • To retrieve the Python-space operation, use CircuitInstruction::get_operation or PackedInstruction::unpack_py_op as appropriate. Both will cache and reuse the op, if cache_pygates is active. (Though note that if the op is created by CircuitInstruction, it will not propagate back to a PackedInstruction.)

Avoiding operation creation

The _raw_op field of CircuitInstruction is gone, because PyGate, PyInstruction and PyOperation are no longer pyclasses and no longer exposed to Python. Instead, we avoid operation creation by:

  • having an internal DAGNode::_to_circuit_instruction, which returns a copy of the internal CircuitInstruction, which can then be used with CircuitInstruction.replace, etc.

  • having CircuitInstruction::is_standard_gate to query from Python space if we should bother to create the operator.

  • changing CircuitData::map_ops to map_nonstandard_ops, and having it only call the Python callback function if the operation is not an unconditional standard gate.

Memory usage

Given the very simple example construction script:

from qiskit.circuit import QuantumCircuit

qc = QuantumCircuit(1_000)
for _ in range(3_000):
    for q in qc.qubits:
        qc.rz(0.0, q)
    for q in qc.qubits:
        qc.rx(0.0, q)
    for q in qc.qubits:
        qc.rz(0.0, q)
    for a, b in zip(qc.qubits[:-1], qc.qubits[1:]):
        qc.cx(a, b)

This uses 1.5GB in max resident set size on my Macbook (note that it's about 12 million gates) on both 1.1.0 and with this commit, so we've undone our memory losses. The parent of this commit uses 2GB.

However, we're in a strong position to beat 1.1.0 in the future now; there are two obvious large remaining costs:

  • There are 16 bytes per PackedInstruction for the Python-operation caching (worth about 180MB in this benchmark, since no Python operations are actually created).

  • There is also significant memory wastage in the current SmallVec<[Param; 3]> storage of the parameters; for all standard gates, we know statically how many parameters are / should be stored, and we never need to increase the capacity. Further, the Param enum is 16 bytes wide per parameter, of which nearly 8 bytes is padding, but for all our current use cases, we only care if all the parameters or floats (for everything else, we're going to have to defer to Python). We could move the discriminant out to the level of the parameters structure, and save a large amount of padding.

Further work

There's still performance left on the table here:

  • We still copy-in and copy-out of CircuitInstruction too much right now; we might want to make all the CircuitInstruction fields nullable and have CircuitData::append take them by move rather than by copy.

  • The qubits/clbits interner requires owned arrays going in, but most interning should return an existing entry. We probably want to switch to have the interner take references/iterators by default, and clone when necessary. We could have a small circuit optimisation where the intern contexts reserve the first n entries to use for an all-to-all connectivity interning for up to (say) 8 qubits, since the transpiler will want to create a lot of ephemeral small circuits.

  • The Param vectors are too heavy at the moment; SmallVec<[Param; 3]> is 56 bytes wide, despite the vast majority of gates we care about having at most one single float (8 bytes). Dead padding is a large chunk of the memory use currently.

Details and comments

Benchmarks in follow-up comments because GitHub complained I went over the limit. Top-level summary:

  • this is faster than its parent commit by over 2x pretty much across the board of QuantumCircuit creation and manipulation, and has non-trivial improvements for DAGCircuit manipulation
  • this brings the utility-scale transpilation benchmarks back to within 10% of the runtime of 1.1.0, and equal on a couple. Avoid Python op creation in BasisTranslator #12705 and [DAGCircuit Oxidation] Port DAGCircuit to Rust #12550 (or at least the subsequent porting of transpiler passes) should get us beyond 1.1.0 performance.
  • we're getting closer to 1.1.0 performance across the board. assign_parameters is still currently taking a big hit, but moving the orchestration of that down to Rust would likely get us back to parity.

Closes:

This is a large overhaul of how circuit instructions are both stored in
Rust (`PackedInstruction`) and how they are presented to Python
(`CircuitInstruction`).  In summary:

* The old `OperationType` enum is now collapsed into a manually managed
  `PackedOperation`.  This is logically equivalent, but stores a
  `PyGate`/`PyInstruction`/`PyOperation` indirectly through a boxed
  pointer, and stores a `StandardGate` inline.  As we expect the vast
  majority of gates to be standard, this hugely reduces the memory
  usage.  The enumeration is manually compressed to a single pointer,
  hiding the discriminant in the low, alignment-required bytes of the
  pointer.

* `PackedOperation::view()` unpacks the operation into a proper
  reference-like enumeration `OperationRef<'a>`, which implements
  `Operation` (though there is also a `try_standard_gate` method to get
  the gate without unpacking the whole enumeration).

* Both `PackedInstruction` and `CircuitInstruction` use this
  `PackedOperation` as the operation storage.

* `PackedInstruction` is now completely the Rust-space format for data,
  and `CircuitInstruction` is purely for communication with Python.

On my machine, this commit brings the utility-scale benchmarks to within
10% of the runtime of 1.1.0 (and some to parity), despite all the
additional overhead.

Changes to accepting and building Python objects
------------------------------------------------

* A `PackedInstruction` is created by copy constructor from a
  `CircuitInstruction` by `CircuitData::pack`.  There is no `pack_owned`
  (really, there never was - the previous method didn't take ownership)
  because there's never owned `CircuitInstruction`s coming in; they're
  Python-space interop, so we never own them (unless we clone them)
  other than when we're unpacking them.

* `PackedInstruction` is currently just created manually when not coming
  from a `CircuitInstruction`.  It's not hard, and makes it easier to
  re-use known intern indices than to waste time re-interning them.
  There is no need to go via `CircuitInstruction`.

* `CircuitInstruction` now has two separated Python-space constructors:
  the old one, which is the default and takes `(operation, qubits,
  clbits)` (and extracts the information), and a new fast-path
  `from_standard` which asks only for the standard gate, qubits and
  params, avoiding operator construction.

* To accept a Python-space operation, extract a Python object to
  `OperationFromPython`.  This extracts the components that are separate
  in Rust space, but joined in Python space (the operation, params and
  extra attributes).  This replaces `OperationInput` and
  `OperationTypeConstruct`, being more efficient at the extraction,
  including providing the data in the formats needed for
  `PackedInstruction` or `CircuitInstruction`.

* To retrieve the Python-space operation, use
  `CircuitInstruction::get_operation` or
  `PackedInstruction::unpack_py_op` as appropriate.  Both will
  cache and reuse the op, if `cache_pygates` is active.  (Though note
  that if the op is created by `CircuitInstruction`, it will not
  propagate back to a `PackedInstruction`.)

Avoiding operation creation
---------------------------

The `_raw_op` field of `CircuitInstruction` is gone, because `PyGate`,
`PyInstruction` and `PyOperation` are no longer pyclasses and no longer
exposed to Python.  Instead, we avoid operation creation by:

* having an internal `DAGNode::_to_circuit_instruction`, which returns a
  copy of the internal `CircuitInstruction`, which can then be used with
  `CircuitInstruction.replace`, etc.

* having `CircuitInstruction::is_standard_gate` to query from Python
  space if we should bother to create the operator.

* changing `CircuitData::map_ops` to `map_nonstandard_ops`, and having
  it only call the Python callback function if the operation is not an
  unconditional standard gate.

Memory usage
------------

Given the very simple example construction script:

```python
from qiskit.circuit import QuantumCircuit

qc = QuantumCircuit(1_000)
for _ in range(3_000):
    for q in qc.qubits:
        qc.rz(0.0, q)
    for q in qc.qubits:
        qc.rx(0.0, q)
    for q in qc.qubits:
        qc.rz(0.0, q)
    for a, b in zip(qc.qubits[:-1], qc.qubits[1:]):
        qc.cx(a, b)
```

This uses 1.5GB in max resident set size on my Macbook (note that it's
about 12 million gates) on both 1.1.0 and with this commit, so we've
undone our memory losses.  The parent of this commit uses 2GB.

However, we're in a strong position to beat 1.1.0 in the future now;
there are two obvious large remaining costs:

* There are 16 bytes per `PackedInstruction` for the Python-operation
  caching (worth about 180MB in this benchmark, since no Python
  operations are actually created).

* There is also significant memory wastage in the current
  `SmallVec<[Param; 3]>` storage of the parameters; for all standard
  gates, we know statically how many parameters are / should be stored,
  and we never need to increase the capacity.  Further, the `Param` enum
  is 16 bytes wide per parameter, of which nearly 8 bytes is padding,
  but for all our current use cases, we only care if _all_ the
  parameters or floats (for everything else, we're going to have to
  defer to Python).  We could move the discriminant out to the level of
  the parameters structure, and save a large amount of padding.

Further work
------------

There's still performance left on the table here:

* We still copy-in and copy-out of `CircuitInstruction` too much right
  now; we might want to make all the `CircuitInstruction` fields
  nullable and have `CircuitData::append` take them by _move_ rather
  than by copy.

* The qubits/clbits interner requires owned arrays going in, but most
  interning should return an existing entry.  We probably want to switch
  to have the interner take references/iterators by default, and clone
  when necessary.  We could have a small circuit optimisation where the
  intern contexts reserve the first n entries to use for an all-to-all
  connectivity interning for up to (say) 8 qubits, since the transpiler
  will want to create a lot of ephemeral small circuits.

* The `Param` vectors are too heavy at the moment; `SmallVec<[Param;
  3]>` is 56 bytes wide, despite the vast majority of gates we care
  about having at most one single float (8 bytes).  Dead padding is a
  large chunk of the memory use currently.
@jakelishman jakelishman added priority: high performance Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library labels Jul 7, 2024
@jakelishman jakelishman added this to the 1.2.0 milestone Jul 7, 2024
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @kevinhartman
  • @mtreinish

@jakelishman
Copy link
Member Author

Benchmarking against its parent commit (bb60891), showing only changed benchmarks. This is almost universally better across the board of circuit construction and manipulation, by half to a full order of magnitude. Transpile and DAG-related operations are faster too, most likely because the conversions are slightly cheaper now and there's less data to iterate through on copies.

| Change   | Before [bb60891a] <repack-instruction~1>   | After [d9e31ed5] <repack-instruction>   |   Ratio | Benchmark (Parameter)                                                                                           |
|----------|--------------------------------------------|-----------------------------------------|---------|-----------------------------------------------------------------------------------------------------------------|
| -        | 12.0±0.4ms                                 | 10.5±0.2ms                              |    0.87 | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 2048)                                |
| -        | 766±20ms                                   | 693±10ms                                |    0.9  | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 131072)                              |
| -        | 12.7±0.2ms                                 | 10.8±0.2ms                              |    0.85 | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 2048)                                |
| -        | 153±10μs                                   | 124±3μs                                 |    0.81 | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 8)                                   |
| -        | 13.2±0.4μs                                 | 10.7±0.3μs                              |    0.81 | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 128)                                         |
| -        | 14.9±0.5ms                                 | 2.92±0.6ms                              |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 131072)                                      |
| -        | 76.2±10μs                                  | 37.8±5μs                                |    0.5  | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 2048)                                        |
| -        | 1.58±0.4ms                                 | 543±200μs                               |    0.34 | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 32768)                                       |
| -        | 286±30μs                                   | 106±20μs                                |    0.37 | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 8192)                                        |
| -        | 14.7±2ms                                   | 3.68±0.1ms                              |    0.25 | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 131072)                                     |
| -        | 84.0±20μs                                  | 40.8±5μs                                |    0.49 | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 2048)                                       |
| -        | 1.77±0.2ms                                 | 532±100μs                               |    0.3  | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 32768)                                      |
| -        | 417±50μs                                   | 122±30μs                                |    0.29 | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 8192)                                       |
| -        | 15.1±2ms                                   | 3.73±0.2ms                              |    0.25 | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 131072)                                      |
| -        | 90.6±30μs                                  | 32.2±5μs                                |    0.36 | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 2048)                                        |
| -        | 2.69±0.7ms                                 | 541±100μs                               |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 32768)                                       |
| -        | 434±70μs                                   | 142±20μs                                |    0.33 | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 8192)                                        |
| -        | 13.6±1ms                                   | 3.67±0.1ms                              |    0.27 | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 131072)                                     |
| -        | 87.4±10μs                                  | 50.9±6μs                                |    0.58 | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 2048)                                       |
| -        | 2.38±0.4ms                                 | 577±100μs                               |    0.24 | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 32768)                                      |
| -        | 403±100μs                                  | 117±8μs                                 |    0.29 | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 8192)                                       |
| -        | 18.8±3ms                                   | 3.62±0.3ms                              |    0.19 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 131072)                                      |
| -        | 79.3±10μs                                  | 34.0±3μs                                |    0.43 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 2048)                                        |
| -        | 2.95±0.5ms                                 | 576±100μs                               |    0.19 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 32768)                                       |
| -        | 317±100μs                                  | 108±30μs                                |    0.34 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 8192)                                        |
| -        | 18.8±2ms                                   | 3.61±0.3ms                              |    0.19 | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 131072)                                      |
| -        | 87.3±20μs                                  | 44.5±8μs                                |    0.51 | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 2048)                                        |
| -        | 1.83±0.2ms                                 | 484±200μs                               |    0.26 | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 32768)                                       |
| -        | 346±40μs                                   | 118±20μs                                |    0.34 | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 8192)                                        |
| -        | 141±7μs                                    | 30.9±0.3μs                              |    0.22 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 128)                                       |
| -        | 129±2ms                                    | 23.0±0.9ms                              |    0.18 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 131072)                                    |
| -        | 1.83±0.07ms                                | 343±8μs                                 |    0.19 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 2048)                                      |
| -        | 29.5±0.7ms                                 | 5.68±0.2ms                              |    0.19 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 32768)                                     |
| -        | 16.9±0.5μs                                 | 11.3±0.5μs                              |    0.67 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 8)                                         |
| -        | 8.00±0.3ms                                 | 1.39±0.08ms                             |    0.17 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 8192)                                      |
| -        | 139±4μs                                    | 54.1±7μs                                |    0.39 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 128)                                      |
| -        | 146±1ms                                    | 34.7±0.2ms                              |    0.24 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 131072)                                   |
| -        | 1.94±0.03ms                                | 439±10μs                                |    0.23 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 2048)                                     |
| -        | 34.4±2ms                                   | 7.24±0.4ms                              |    0.21 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 32768)                                    |
| -        | 43.0±0.8μs                                 | 27.0±0.7μs                              |    0.63 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 8)                                        |
| -        | 8.01±0.3ms                                 | 1.83±0.09ms                             |    0.23 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 8192)                                     |
| -        | 129±7μs                                    | 36.0±2μs                                |    0.28 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 128)                                       |
| -        | 143±3ms                                    | 33.6±2ms                                |    0.23 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 131072)                                    |
| -        | 1.87±0.03ms                                | 380±20μs                                |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 2048)                                      |
| -        | 32.6±0.8ms                                 | 7.02±0.3ms                              |    0.22 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 32768)                                     |
| -        | 20.0±3μs                                   | 11.5±0.2μs                              |    0.58 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 8)                                         |
| -        | 7.57±0.1ms                                 | 1.62±0.05ms                             |    0.21 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 8192)                                      |
| -        | 158±4μs                                    | 57.8±10μs                               |    0.37 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 128)                                      |
| -        | 144±6ms                                    | 35.6±2ms                                |    0.25 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 131072)                                   |
| -        | 1.92±0.06ms                                | 447±7μs                                 |    0.23 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 2048)                                     |
| -        | 33.7±1ms                                   | 7.42±0.5ms                              |    0.22 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 32768)                                    |
| -        | 52.8±2μs                                   | 26.5±0.8μs                              |    0.5  | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 8)                                        |
| -        | 7.61±0.2ms                                 | 1.90±0.1ms                              |    0.25 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 8192)                                     |
| -        | 136±3μs                                    | 38.4±0.8μs                              |    0.28 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 128)                                       |
| -        | 141±5ms                                    | 34.9±1ms                                |    0.25 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 131072)                                    |
| -        | 1.95±0.06ms                                | 457±20μs                                |    0.24 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 2048)                                      |
| -        | 34.8±0.9ms                                 | 7.04±0.2ms                              |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 32768)                                     |
| -        | 21.0±0.9μs                                 | 13.0±0.3μs                              |    0.62 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 8)                                         |
| -        | 7.92±0.3ms                                 | 1.77±0.06ms                             |    0.22 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 8192)                                      |
| -        | 147±1μs                                    | 41.8±2μs                                |    0.28 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 128)                                       |
| -        | 143±5ms                                    | 35.4±1ms                                |    0.25 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 131072)                                    |
| -        | 1.88±0.1ms                                 | 446±30μs                                |    0.24 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 2048)                                      |
| -        | 36.2±1ms                                   | 7.14±0.3ms                              |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 32768)                                     |
| -        | 28.8±0.4μs                                 | 16.0±0.5μs                              |    0.56 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 8)                                         |
| -        | 7.82±0.3ms                                 | 1.74±0.07ms                             |    0.22 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 8192)                                      |
| -        | 3.45±0.07ms                                | 2.61±0.02ms                             |    0.75 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 128, 128)                               |
| -        | 3.30±0.2ms                                 | 2.69±0.02ms                             |    0.82 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 128, 8)                                 |
| -        | 2.72±0.01s                                 | 2.13±0.03s                              |    0.78 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 128)                            |
| -        | 3.17±0.05s                                 | 2.47±0.01s                              |    0.78 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 131072)                         |
| -        | 2.72±0.03s                                 | 2.21±0.07s                              |    0.81 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 2048)                           |
| -        | 2.87±0.02s                                 | 2.25±0.03s                              |    0.78 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 32768)                          |
| -        | 2.78±0.05s                                 | 2.17±0.03s                              |    0.78 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 8)                              |
| -        | 2.79±0.02s                                 | 2.14±0.01s                              |    0.77 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 8192)                           |
| -        | 41.1±0.3ms                                 | 31.7±0.3ms                              |    0.77 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 2048, 128)                              |
| -        | 44.7±1ms                                   | 35.3±0.2ms                              |    0.79 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 2048, 2048)                             |
| -        | 41.4±2ms                                   | 33.7±2ms                                |    0.81 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 2048, 8)                                |
| -        | 662±7ms                                    | 535±10ms                                |    0.81 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 128)                             |
| -        | 691±10ms                                   | 522±10ms                                |    0.76 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 2048)                            |
| -        | 757±30ms                                   | 594±9ms                                 |    0.79 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 32768)                           |
| -        | 669±10ms                                   | 529±10ms                                |    0.79 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 8)                               |
| -        | 662±10ms                                   | 516±8ms                                 |    0.78 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 8192)                            |
| -        | 859±10μs                                   | 666±6μs                                 |    0.78 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8, 8)                                   |
| -        | 177±2ms                                    | 134±1ms                                 |    0.76 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 2048)                             |
| -        | 167±2ms                                    | 128±1ms                                 |    0.77 | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 8)                                |
| -        | 185±4ms                                    | 148±3ms                                 |    0.8  | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 8192)                             |
| -        | 2.04±0.05ms                                | 1.77±0.02ms                             |    0.86 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 128, 128)       |
| -        | 1.53±0.02ms                                | 648±7μs                                 |    0.42 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128)                                                      |
| -        | 23.7±1ms                                   | 8.50±0.2ms                              |    0.36 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048)                                                     |
| -        | 148±5μs                                    | 59.2±0.7μs                              |    0.4  | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8)                                                        |
| -        | 97.7±5ms                                   | 35.3±2ms                                |    0.36 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192)                                                     |
| -        | 13.7±0.4ms                                 | 6.46±0.2ms                              |    0.47 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128)                                                     |
| -        | 204±5ms                                    | 95.0±3ms                                |    0.47 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048)                                                    |
| -        | 1.15±0.01ms                                | 667±9μs                                 |    0.58 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8)                                                       |
| -        | 2.39±0.03ms                                | 1.11±0.02ms                             |    0.46 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128)                                                      |
| -        | 37.9±2ms                                   | 15.2±0.2ms                              |    0.4  | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048)                                                     |
| -        | 238±9μs                                    | 119±2μs                                 |    0.5  | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8)                                                        |
| -        | 155±4ms                                    | 59.9±1ms                                |    0.39 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192)                                                     |
| -        | 19.7±0.3ms                                 | 10.9±1ms                                |    0.55 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 128)                                                     |
| -        | 1.82±0.06ms                                | 1.03±0.01ms                             |    0.57 | converters.ConverterBenchmarks.time_circuit_to_dag(20, 8)                                                       |
| -        | 33.2±0.6ms                                 | 16.9±2ms                                |    0.51 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 128)                                                     |
| -        | 2.62±0.1ms                                 | 1.53±0.06ms                             |    0.58 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8)                                                       |
| -        | 5.27±0.7ms                                 | 2.27±0.03ms                             |    0.43 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128)                                                      |
| -        | 74.9±0.7ms                                 | 32.0±0.9ms                              |    0.43 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048)                                                     |
| -        | 455±4μs                                    | 218±10μs                                |    0.48 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8)                                                        |
| -        | 296±2ms                                    | 125±2ms                                 |    0.42 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192)                                                     |
| -        | 58.3±0.8ms                                 | 33.2±1ms                                |    0.57 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 128)                                                     |
| -        | 4.66±0.05ms                                | 3.08±0.3ms                              |    0.66 | converters.ConverterBenchmarks.time_circuit_to_dag(53, 8)                                                       |
| -        | 7.87±0.1ms                                 | 3.64±0.04ms                             |    0.46 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128)                                                      |
| -        | 117±5ms                                    | 49.6±0.4ms                              |    0.42 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048)                                                     |
| -        | 725±70μs                                   | 364±5μs                                 |    0.5  | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8)                                                        |
| -        | 468±4ms                                    | 193±1ms                                 |    0.41 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192)                                                     |
| -        | 928±20μs                                   | 240±3μs                                 |    0.26 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 128)                                              |
| -        | 13.7±0.7ms                                 | 2.26±0.06ms                             |    0.17 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 2048)                                             |
| -        | 141±3μs                                    | 73.0±3μs                                |    0.52 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 8)                                                |
| -        | 53.0±2ms                                   | 8.84±0.1ms                              |    0.17 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 8192)                                             |
| -        | 6.60±0.4ms                                 | 1.58±0.03ms                             |    0.24 | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 128)                                             |
| -        | 98.5±1ms                                   | 24.0±0.3ms                              |    0.24 | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 2048)                                            |
| -        | 623±10μs                                   | 315±3μs                                 |    0.51 | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 8)                                               |
| -        | 1.39±0.01ms                                | 555±20μs                                |    0.4  | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 128)                                              |
| -        | 20.7±0.9ms                                 | 4.89±0.07ms                             |    0.24 | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 2048)                                             |
| -        | 186±5μs                                    | 114±9μs                                 |    0.61 | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 8)                                                |
| -        | 81.9±2ms                                   | 19.0±0.5ms                              |    0.23 | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 8192)                                             |
| -        | 8.88±0.2ms                                 | 2.35±0.1ms                              |    0.26 | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 128)                                             |
| -        | 817±10μs                                   | 392±7μs                                 |    0.48 | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 8)                                               |
| -        | 13.6±0.1ms                                 | 3.57±0.3ms                              |    0.26 | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 128)                                             |
| -        | 1.28±0.02ms                                | 550±10μs                                |    0.43 | converters.ConverterBenchmarks.time_circuit_to_instruction(32, 8)                                               |
| -        | 2.62±0.3ms                                 | 793±50μs                                |    0.3  | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 128)                                              |
| -        | 38.5±1ms                                   | 8.82±0.06ms                             |    0.23 | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 2048)                                             |
| -        | 294±3μs                                    | 125±3μs                                 |    0.43 | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 8)                                                |
| -        | 155±3ms                                    | 36.6±0.7ms                              |    0.24 | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 8192)                                             |
| -        | 22.9±0.7ms                                 | 5.82±0.2ms                              |    0.25 | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 128)                                             |
| -        | 2.06±0.06ms                                | 922±100μs                               |    0.45 | converters.ConverterBenchmarks.time_circuit_to_instruction(53, 8)                                               |
| -        | 3.90±0.09ms                                | 1.09±0.09ms                             |    0.28 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 128)                                              |
| -        | 57.9±1ms                                   | 13.6±0.2ms                              |    0.23 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 2048)                                             |
| -        | 422±20μs                                   | 191±5μs                                 |    0.45 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 8)                                                |
| -        | 232±4ms                                    | 55.3±1ms                                |    0.24 | converters.ConverterBenchmarks.time_circuit_to_instruction(8, 8192)                                             |
| -        | 446±4μs                                    | 243±0.9μs                               |    0.55 | converters.ConverterBenchmarks.time_dag_to_circuit(1, 128)                                                      |
| -        | 6.18±0.09ms                                | 3.25±0.05ms                             |    0.53 | converters.ConverterBenchmarks.time_dag_to_circuit(1, 2048)                                                     |
| -        | 60.5±3μs                                   | 37.6±0.4μs                              |    0.62 | converters.ConverterBenchmarks.time_dag_to_circuit(1, 8)                                                        |
| -        | 25.0±0.6ms                                 | 13.1±0.2ms                              |    0.53 | converters.ConverterBenchmarks.time_dag_to_circuit(1, 8192)                                                     |
| -        | 3.83±0.05ms                                | 2.10±0.05ms                             |    0.55 | converters.ConverterBenchmarks.time_dag_to_circuit(14, 128)                                                     |
| -        | 62.1±1ms                                   | 33.5±0.8ms                              |    0.54 | converters.ConverterBenchmarks.time_dag_to_circuit(14, 2048)                                                    |
| -        | 439±20μs                                   | 247±7μs                                 |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(14, 8)                                                       |
| -        | 802±20μs                                   | 450±20μs                                |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(2, 128)                                                      |
| -        | 11.3±0.4ms                                 | 5.88±0.08ms                             |    0.52 | converters.ConverterBenchmarks.time_dag_to_circuit(2, 2048)                                                     |
| -        | 86.2±3μs                                   | 57.7±1μs                                |    0.67 | converters.ConverterBenchmarks.time_dag_to_circuit(2, 8)                                                        |
| -        | 47.3±1ms                                   | 23.5±0.7ms                              |    0.5  | converters.ConverterBenchmarks.time_dag_to_circuit(2, 8192)                                                     |
| -        | 5.52±0.1ms                                 | 2.99±0.06ms                             |    0.54 | converters.ConverterBenchmarks.time_dag_to_circuit(20, 128)                                                     |
| -        | 557±6μs                                    | 323±3μs                                 |    0.58 | converters.ConverterBenchmarks.time_dag_to_circuit(20, 8)                                                       |
| -        | 8.64±0.2ms                                 | 4.81±0.2ms                              |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(32, 128)                                                     |
| -        | 959±40μs                                   | 539±20μs                                |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(32, 8)                                                       |
| -        | 1.51±0.06ms                                | 799±40μs                                |    0.53 | converters.ConverterBenchmarks.time_dag_to_circuit(5, 128)                                                      |
| -        | 22.6±0.5ms                                 | 12.2±0.2ms                              |    0.54 | converters.ConverterBenchmarks.time_dag_to_circuit(5, 2048)                                                     |
| -        | 167±0.8μs                                  | 107±6μs                                 |    0.64 | converters.ConverterBenchmarks.time_dag_to_circuit(5, 8)                                                        |
| -        | 92.8±2ms                                   | 50.6±1ms                                |    0.55 | converters.ConverterBenchmarks.time_dag_to_circuit(5, 8192)                                                     |
| -        | 15.0±0.3ms                                 | 8.46±0.2ms                              |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(53, 128)                                                     |
| -        | 1.55±0.01ms                                | 873±10μs                                |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(53, 8)                                                       |
| -        | 2.31±0.07ms                                | 1.35±0.05ms                             |    0.59 | converters.ConverterBenchmarks.time_dag_to_circuit(8, 128)                                                      |
| -        | 35.1±2ms                                   | 18.8±0.2ms                              |    0.54 | converters.ConverterBenchmarks.time_dag_to_circuit(8, 2048)                                                     |
| -        | 267±3μs                                    | 153±1μs                                 |    0.57 | converters.ConverterBenchmarks.time_dag_to_circuit(8, 8)                                                        |
| -        | 142±8ms                                    | 79.4±4ms                                |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(8, 8192)                                                     |
| -        | 138±2ms                                    | 71.8±1ms                                |    0.52 | mapping_passes.PassBenchmarks.time_apply_layout(14, 1024)                                                       |
| -        | 204±3ms                                    | 109±0.7ms                               |    0.53 | mapping_passes.PassBenchmarks.time_apply_layout(20, 1024)                                                       |
| -        | 49.7±2ms                                   | 25.2±0.2ms                              |    0.51 | mapping_passes.PassBenchmarks.time_apply_layout(5, 1024)                                                        |
| -        | 3.80±0.04s                                 | 3.42±0.01s                              |    0.9  | mapping_passes.PassBenchmarks.time_basic_swap(14, 1024)                                                         |
| -        | 22.4±0.5ms                                 | 2.04±0.03ms                             |    0.09 | mapping_passes.PassBenchmarks.time_check_map(14, 1024)                                                          |
| -        | 30.8±0.2ms                                 | 3.01±0.1ms                              |    0.1  | mapping_passes.PassBenchmarks.time_check_map(20, 1024)                                                          |
| -        | 8.23±0.3ms                                 | 774±20μs                                |    0.09 | mapping_passes.PassBenchmarks.time_check_map(5, 1024)                                                           |
| -        | 42.6±1ms                                   | 23.1±0.4ms                              |    0.54 | mapping_passes.PassBenchmarks.time_csp_layout(14, 1024)                                                         |
| -        | 66.7±0.7ms                                 | 40.0±0.5ms                              |    0.6  | mapping_passes.PassBenchmarks.time_csp_layout(20, 1024)                                                         |
| -        | 13.3±0.2ms                                 | 5.86±0.4ms                              |    0.44 | mapping_passes.PassBenchmarks.time_csp_layout(5, 1024)                                                          |
| -        | 24.4±0.4ms                                 | 5.15±0.1ms                              |    0.21 | mapping_passes.PassBenchmarks.time_layout_2q_distance(14, 1024)                                                 |
| -        | 35.5±1ms                                   | 7.63±0.2ms                              |    0.22 | mapping_passes.PassBenchmarks.time_layout_2q_distance(20, 1024)                                                 |
| -        | 9.28±0.2ms                                 | 1.81±0.05ms                             |    0.19 | mapping_passes.PassBenchmarks.time_layout_2q_distance(5, 1024)                                                  |
| -        | 582±10ms                                   | 343±10ms                                |    0.59 | mapping_passes.PassBenchmarks.time_sabre_layout(14, 1024)                                                       |
| -        | 943±10ms                                   | 573±6ms                                 |    0.61 | mapping_passes.PassBenchmarks.time_sabre_layout(20, 1024)                                                       |
| -        | 169±2ms                                    | 83.2±3ms                                |    0.49 | mapping_passes.PassBenchmarks.time_sabre_layout(5, 1024)                                                        |
| -        | 374±2ms                                    | 160±4ms                                 |    0.43 | mapping_passes.PassBenchmarks.time_sabre_swap(14, 1024)                                                         |
| -        | 584±10ms                                   | 266±5ms                                 |    0.45 | mapping_passes.PassBenchmarks.time_sabre_swap(20, 1024)                                                         |
| -        | 127±3ms                                    | 47.5±1ms                                |    0.37 | mapping_passes.PassBenchmarks.time_sabre_swap(5, 1024)                                                          |
| -        | 2.65±0s                                    | 2.29±0.03s                              |    0.87 | mapping_passes.PassBenchmarks.time_stochastic_swap(14, 1024)                                                    |
| -        | 3.85±0.03s                                 | 3.31±0.01s                              |    0.86 | mapping_passes.PassBenchmarks.time_stochastic_swap(20, 1024)                                                    |
| -        | 782±10ms                                   | 670±3ms                                 |    0.86 | mapping_passes.PassBenchmarks.time_stochastic_swap(5, 1024)                                                     |
| -        | 57.4±2ms                                   | 27.4±0.5ms                              |    0.48 | mapping_passes.RoutedPassBenchmarks.time_check_gate_direction(14, 1024)                                         |
| -        | 95.9±0.4ms                                 | 56.2±0.6ms                              |    0.59 | mapping_passes.RoutedPassBenchmarks.time_check_gate_direction(20, 1024)                                         |
| -        | 14.5±0.1ms                                 | 5.27±0.09ms                             |    0.36 | mapping_passes.RoutedPassBenchmarks.time_check_gate_direction(5, 1024)                                          |
| -        | 77.1±2ms                                   | 19.6±0.2ms                              |    0.25 | mapping_passes.RoutedPassBenchmarks.time_check_map(14, 1024)                                                    |
| -        | 132±5ms                                    | 35.3±1ms                                |    0.27 | mapping_passes.RoutedPassBenchmarks.time_check_map(20, 1024)                                                    |
| -        | 22.9±0.2ms                                 | 4.48±0.06ms                             |    0.2  | mapping_passes.RoutedPassBenchmarks.time_check_map(5, 1024)                                                     |
| -        | 82.1±3ms                                   | 19.9±0.2ms                              |    0.24 | mapping_passes.RoutedPassBenchmarks.time_gate_direction(14, 1024)                                               |
| -        | 125±0.4ms                                  | 34.2±0.3ms                              |    0.27 | mapping_passes.RoutedPassBenchmarks.time_gate_direction(20, 1024)                                               |
| -        | 23.2±0.2ms                                 | 4.83±0.05ms                             |    0.21 | mapping_passes.RoutedPassBenchmarks.time_gate_direction(5, 1024)                                                |
| -        | 107±3ms                                    | 54.2±2ms                                |    0.51 | passes.Collect2QPassBenchmarks.time_consolidate_blocks(14, 1024)                                                |
| -        | 154±5ms                                    | 76.7±2ms                                |    0.5  | passes.Collect2QPassBenchmarks.time_consolidate_blocks(20, 1024)                                                |
| -        | 42.7±0.2ms                                 | 20.3±0.1ms                              |    0.47 | passes.Collect2QPassBenchmarks.time_consolidate_blocks(5, 1024)                                                 |
| -        | 133±3ms                                    | 61.5±2ms                                |    0.46 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 1)                                         |
| -        | 137±1ms                                    | 66.9±2ms                                |    0.49 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 2)                                         |
| -        | 144±5ms                                    | 66.6±0.4ms                              |    0.46 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 3)                                         |
| -        | 138±2ms                                    | 66.8±1ms                                |    0.48 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 4)                                         |
| -        | 139±1ms                                    | 64.7±0.5ms                              |    0.47 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 5)                                         |
| -        | 194±2ms                                    | 88.5±1ms                                |    0.46 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 1)                                         |
| -        | 195±1ms                                    | 92.5±0.9ms                              |    0.47 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 2)                                         |
| -        | 204±4ms                                    | 101±6ms                                 |    0.5  | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 3)                                         |
| -        | 199±6ms                                    | 94.4±2ms                                |    0.47 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 4)                                         |
| -        | 212±8ms                                    | 93.6±2ms                                |    0.44 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 5)                                         |
| -        | 57.5±4ms                                   | 22.4±0.2ms                              |    0.39 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 1)                                          |
| -        | 52.5±1ms                                   | 23.3±0.2ms                              |    0.44 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 2)                                          |
| -        | 54.1±0.6ms                                 | 23.6±0.2ms                              |    0.44 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 3)                                          |
| -        | 51.8±0.4ms                                 | 22.9±0.2ms                              |    0.44 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 4)                                          |
| -        | 49.8±0.6ms                                 | 21.3±0.4ms                              |    0.43 | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 5)                                          |
| -        | 3.11±0.04s                                 | 1.25±0.01s                              |    0.4  | passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])        |
| -        | 1.81±0.04s                                 | 856±9ms                                 |    0.47 | passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])               |
| -        | 1.51±0.02s                                 | 725±10ms                                |    0.48 | passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['u', 'cx', 'id'])                           |
| -        | 4.39±0.02s                                 | 1.77±0.03s                              |    0.4  | passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])        |
| -        | 2.53±0.03s                                 | 1.18±0.02s                              |    0.47 | passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])               |
| -        | 2.22±0.06s                                 | 1.03±0.01s                              |    0.46 | passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['u', 'cx', 'id'])                           |
| -        | 1.02±0.03s                                 | 430±4ms                                 |    0.42 | passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])         |
| -        | 650±5ms                                    | 328±10ms                                |    0.5  | passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])                |
| -        | 548±10ms                                   | 267±2ms                                 |    0.49 | passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['u', 'cx', 'id'])                            |
| -        | 885±10ms                                   | 798±6ms                                 |    0.9  | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
| -        | 864±9ms                                    | 779±5ms                                 |    0.9  | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['u', 'cx', 'id'])                    |
| -        | 1.39±0.02s                                 | 1.25±0.02s                              |    0.89 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
| -        | 1.53±0.03s                                 | 1.35±0.02s                              |    0.88 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
| -        | 1.38±0.02s                                 | 1.25±0.03s                              |    0.9  | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['u', 'cx', 'id'])                    |
| -        | 315±4ms                                    | 273±3ms                                 |    0.87 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])  |
| -        | 372±10ms                                   | 301±10ms                                |    0.81 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])         |
| -        | 39.8±0.9ms                                 | 27.3±0.4ms                              |    0.68 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(14, 1024, ['u', 'cx', 'id'])                      |
| -        | 17.4±0.3ms                                 | 12.2±0.1ms                              |    0.7  | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])    |
| -        | 20.7±0.8ms                                 | 10.6±0.06ms                             |    0.51 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])           |
| -        | 10.1±0.3ms                                 | 7.84±0.2ms                              |    0.78 | passes.PassBenchmarks.time_barrier_before_final_measurements(5, 1024)                                           |
| -        | 48.4±1ms                                   | 27.0±0.2ms                              |    0.56 | passes.PassBenchmarks.time_collect_2q_blocks(14, 1024)                                                          |
| -        | 69.3±0.3ms                                 | 40.8±0.7ms                              |    0.59 | passes.PassBenchmarks.time_collect_2q_blocks(20, 1024)                                                          |
| -        | 20.9±0.3ms                                 | 6.57±0.4ms                              |    0.31 | passes.PassBenchmarks.time_collect_2q_blocks(5, 1024)                                                           |
| -        | 3.83±0.1ms                                 | 1.89±0.1ms                              |    0.49 | passes.PassBenchmarks.time_count_ops_longest_path(14, 1024)                                                     |
| -        | 5.11±0.3ms                                 | 2.78±0.09ms                             |    0.54 | passes.PassBenchmarks.time_count_ops_longest_path(20, 1024)                                                     |
| -        | 2.78±0.06ms                                | 933±40μs                                |    0.34 | passes.PassBenchmarks.time_count_ops_longest_path(5, 1024)                                                      |
| -        | 18.8±0.7ms                                 | 6.48±0.2ms                              |    0.34 | passes.PassBenchmarks.time_cx_cancellation(14, 1024)                                                            |
| -        | 26.9±0.5ms                                 | 11.5±0.6ms                              |    0.43 | passes.PassBenchmarks.time_cx_cancellation(20, 1024)                                                            |
| -        | 7.34±0.1ms                                 | 1.79±0.06ms                             |    0.24 | passes.PassBenchmarks.time_cx_cancellation(5, 1024)                                                             |
| -        | 1.56±0.02s                                 | 1.04±0.03s                              |    0.67 | passes.PassBenchmarks.time_decompose_pass(14, 1024)                                                             |
| -        | 2.39±0.04s                                 | 1.62±0.02s                              |    0.68 | passes.PassBenchmarks.time_decompose_pass(20, 1024)                                                             |
| -        | 533±10ms                                   | 335±10ms                                |    0.63 | passes.PassBenchmarks.time_decompose_pass(5, 1024)                                                              |
| -        | 2.46±0.05s                                 | 611±7ms                                 |    0.25 | passes.PassBenchmarks.time_optimize_swap_before_measure(14, 1024)                                               |
| -        | 4.48±0.09s                                 | 1.10±0.02s                              |    0.24 | passes.PassBenchmarks.time_optimize_swap_before_measure(20, 1024)                                               |
| -        | 359±10ms                                   | 90.2±2ms                                |    0.25 | passes.PassBenchmarks.time_optimize_swap_before_measure(5, 1024)                                                |
| -        | 30.6±0.7ms                                 | 8.07±0.07ms                             |    0.26 | passes.PassBenchmarks.time_remove_barriers(14, 1024)                                                            |
| -        | 44.6±0.3ms                                 | 15.2±0.9ms                              |    0.34 | passes.PassBenchmarks.time_remove_barriers(20, 1024)                                                            |
| -        | 12.8±0.5ms                                 | 2.28±0.02ms                             |    0.18 | passes.PassBenchmarks.time_remove_barriers(5, 1024)                                                             |
| -        | 31.6±0.3ms                                 | 8.74±0.2ms                              |    0.28 | passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement(14, 1024)                                   |
| -        | 47.7±3ms                                   | 15.8±0.5ms                              |    0.33 | passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement(20, 1024)                                   |
| -        | 13.8±0.4ms                                 | 2.74±0.06ms                             |    0.2  | passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement(5, 1024)                                    |
| -        | 56.7±2μs                                   | 28.1±0.3μs                              |    0.5  | passes.PassBenchmarks.time_remove_final_measurements(14, 1024)                                                  |
| -        | 107±5μs                                    | 34.1±1μs                                |    0.32 | passes.PassBenchmarks.time_remove_final_measurements(20, 1024)                                                  |
| -        | 26.8±0.9μs                                 | 17.9±0.6μs                              |    0.67 | passes.PassBenchmarks.time_remove_final_measurements(5, 1024)                                                   |
| -        | 16.4±0.3ms                                 | 5.06±0.2ms                              |    0.31 | passes.PassBenchmarks.time_remove_reset_in_zero_state(14, 1024)                                                 |
| -        | 23.7±1ms                                   | 9.55±0.1ms                              |    0.4  | passes.PassBenchmarks.time_remove_reset_in_zero_state(20, 1024)                                                 |
| -        | 6.85±0.2ms                                 | 1.43±0.01ms                             |    0.21 | passes.PassBenchmarks.time_remove_reset_in_zero_state(5, 1024)                                                  |
| -        | 1.38±0.02s                                 | 928±10ms                                |    0.67 | passes.PassBenchmarks.time_unroll_3q_or_more(14, 1024)                                                          |
| -        | 2.14±0.01s                                 | 1.43±0.01s                              |    0.67 | passes.PassBenchmarks.time_unroll_3q_or_more(20, 1024)                                                          |
| -        | 375±5ms                                    | 249±4ms                                 |    0.66 | passes.PassBenchmarks.time_unroll_3q_or_more(5, 1024)                                                           |
| -        | 1.26±0m                                    | 42.4±0.2s                               |    0.56 | qft.LargeQFTMappingTimeBench.time_sabre_swap(1081, 'decay')                                                     |
| -        | 729±10ms                                   | 209±2ms                                 |    0.29 | qft.LargeQFTMappingTimeBench.time_sabre_swap(115, 'decay')                                                      |
| -        | 731±8ms                                    | 214±4ms                                 |    0.29 | qft.LargeQFTMappingTimeBench.time_sabre_swap(115, 'lookahead')                                                  |
| -        | 117±2ms                                    | 50.6±1ms                                |    0.43 | qft.QftTranspileBench.time_ibmq_backend_transpile(13)                                                           |
| -        | 148±3ms                                    | 63.5±0.9ms                              |    0.43 | qft.QftTranspileBench.time_ibmq_backend_transpile(14)                                                           |
| -        | 15.0±0.08ms                                | 10.5±0.1ms                              |    0.7  | qft.QftTranspileBench.time_ibmq_backend_transpile(3)                                                            |
| -        | 25.1±1ms                                   | 14.1±0.2ms                              |    0.56 | qft.QftTranspileBench.time_ibmq_backend_transpile(5)                                                            |
| -        | 48.5±0.9ms                                 | 23.7±0.4ms                              |    0.49 | qft.QftTranspileBench.time_ibmq_backend_transpile(8)                                                            |
| -        | 949±20ms                                   | 780±10ms                                |    0.82 | quantum_info.CliffordDecomposeBench.time_decompose('2,500')                                                     |
| -        | 2.75±0.05s                                 | 2.23±0.01s                              |    0.81 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 10, 'lookahead')                        |
| -        | 47.6±0.5ms                                 | 35.1±1ms                                |    0.74 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'decay')                             |
| -        | 49.8±1ms                                   | 35.4±2ms                                |    0.71 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'lookahead')                         |
| -        | 457±10ms                                   | 312±4ms                                 |    0.68 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'decay')                            |
| -        | 444±10ms                                   | 301±4ms                                 |    0.68 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'lookahead')                        |
| -        | 581±3ms                                    | 431±6ms                                 |    0.74 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'decay')                             |
| -        | 1.05±0.01s                                 | 828±10ms                                |    0.79 | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'lookahead')                         |
| -        | 320±10ms                                   | 129±5ms                                 |    0.4  | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'synthesis')                              |
| -        | 282±7ms                                    | 90.9±0.7ms                              |    0.32 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'translator')                             |
| -        | 10.2±0.2ms                                 | 6.86±0.1ms                              |    0.67 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(2, 'translator')                              |
| -        | 777±10ms                                   | 295±10ms                                |    0.38 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'synthesis')                              |
| -        | 574±10ms                                   | 187±6ms                                 |    0.33 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'translator')                             |
| -        | 1.57±0.01s                                 | 584±20ms                                |    0.37 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'synthesis')                              |
| -        | 1.05±0.02s                                 | 330±4ms                                 |    0.31 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'translator')                             |
| -        | 14.0±0.1ms                                 | 9.06±0.8ms                              |    0.65 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(3, 'translator')                              |
| -        | 30.0±2ms                                   | 15.9±1ms                                |    0.53 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'synthesis')                               |
| -        | 32.4±0.8ms                                 | 15.8±1ms                                |    0.49 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(5, 'translator')                              |
| -        | 90.0±7ms                                   | 38.3±0.6ms                              |    0.43 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'synthesis')                               |
| -        | 89.7±2ms                                   | 32.5±0.7ms                              |    0.36 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'translator')                              |
| -        | 37.6±1ms                                   | 24.0±0.6ms                              |    0.64 | queko.QUEKOTranspilerBench.time_transpile_bigd(0, 'sabre')                                                      |
| -        | 63.7±0.6ms                                 | 44.7±0.7ms                              |    0.7  | queko.QUEKOTranspilerBench.time_transpile_bigd(0, None)                                                         |
| -        | 19.3±1ms                                   | 11.7±0.3ms                              |    0.61 | queko.QUEKOTranspilerBench.time_transpile_bigd(1, 'sabre')                                                      |
| -        | 42.4±0.9ms                                 | 32.6±0.3ms                              |    0.77 | queko.QUEKOTranspilerBench.time_transpile_bigd(2, 'sabre')                                                      |
| -        | 33.2±1ms                                   | 26.0±0.3ms                              |    0.78 | queko.QUEKOTranspilerBench.time_transpile_bigd(2, None)                                                         |
| -        | 64.4±1ms                                   | 41.3±0.7ms                              |    0.64 | queko.QUEKOTranspilerBench.time_transpile_bigd(3, 'sabre')                                                      |
| -        | 363±7ms                                    | 192±4ms                                 |    0.53 | queko.QUEKOTranspilerBench.time_transpile_bntf(0, 'sabre')                                                      |
| -        | 822±20ms                                   | 455±5ms                                 |    0.55 | queko.QUEKOTranspilerBench.time_transpile_bntf(0, None)                                                         |
| -        | 238±5ms                                    | 102±2ms                                 |    0.43 | queko.QUEKOTranspilerBench.time_transpile_bntf(1, 'sabre')                                                      |
| -        | 58.8±0.8ms                                 | 30.0±0.3ms                              |    0.51 | queko.QUEKOTranspilerBench.time_transpile_bntf(1, None)                                                         |
| -        | 564±7ms                                    | 388±4ms                                 |    0.69 | queko.QUEKOTranspilerBench.time_transpile_bntf(2, 'sabre')                                                      |
| -        | 134±2ms                                    | 85.9±2ms                                |    0.64 | queko.QUEKOTranspilerBench.time_transpile_bntf(2, None)                                                         |
| -        | 810±10ms                                   | 486±5ms                                 |    0.6  | queko.QUEKOTranspilerBench.time_transpile_bntf(3, 'sabre')                                                      |
| -        | 181±4ms                                    | 113±3ms                                 |    0.62 | queko.QUEKOTranspilerBench.time_transpile_bntf(3, None)                                                         |
| -        | 331±3ms                                    | 177±2ms                                 |    0.53 | queko.QUEKOTranspilerBench.time_transpile_bss(0, 'sabre')                                                       |
| -        | 2.30±0.02s                                 | 1.93±0.01s                              |    0.84 | queko.QUEKOTranspilerBench.time_transpile_bss(0, None)                                                          |
| -        | 281±5ms                                    | 138±1ms                                 |    0.49 | queko.QUEKOTranspilerBench.time_transpile_bss(1, 'sabre')                                                       |
| -        | 169±3ms                                    | 82.0±4ms                                |    0.49 | queko.QUEKOTranspilerBench.time_transpile_bss(1, None)                                                          |
| -        | 640±6ms                                    | 482±6ms                                 |    0.75 | queko.QUEKOTranspilerBench.time_transpile_bss(2, 'sabre')                                                       |
| -        | 364±10ms                                   | 245±3ms                                 |    0.67 | queko.QUEKOTranspilerBench.time_transpile_bss(2, None)                                                          |
| -        | 1.05±0.01s                                 | 725±6ms                                 |    0.69 | queko.QUEKOTranspilerBench.time_transpile_bss(3, 'sabre')                                                       |
| -        | 479±6ms                                    | 318±10ms                                |    0.66 | queko.QUEKOTranspilerBench.time_transpile_bss(3, None)                                                          |
| -        | 94.5±4ms                                   | 41.1±0.7ms                              |    0.44 | random_circuit_hex.BenchRandomCircuitHex.time_ibmq_backend_transpile(10)                                        |
| -        | 121±2ms                                    | 57.4±3ms                                |    0.47 | random_circuit_hex.BenchRandomCircuitHex.time_ibmq_backend_transpile(12)                                        |
| -        | 212±9ms                                    | 85.2±2ms                                |    0.4  | random_circuit_hex.BenchRandomCircuitHex.time_ibmq_backend_transpile(14)                                        |
| -        | 20.6±0.8ms                                 | 12.1±0.2ms                              |    0.59 | random_circuit_hex.BenchRandomCircuitHex.time_ibmq_backend_transpile(4)                                         |
| -        | 36.5±1ms                                   | 20.1±1ms                                |    0.55 | random_circuit_hex.BenchRandomCircuitHex.time_ibmq_backend_transpile(6)                                         |
| -        | 58.8±1ms                                   | 31.8±1ms                                |    0.54 | random_circuit_hex.BenchRandomCircuitHex.time_ibmq_backend_transpile(8)                                         |
| -        | 23.4±0.05s                                 | 18.3±0s                                 |    0.78 | randomized_benchmarking.RandomizedBenchmarkingBenchmark.time_ibmq_backend_transpile_single_thread([0, 1])       |
| -        | 1.06±0.05ms                                | 884±8μs                                 |    0.84 | ripple_adder.RippleAdderConstruction.time_build_ripple_adder(10)                                                |
| -        | 9.04±0.1ms                                 | 8.11±0.1ms                              |    0.9  | ripple_adder.RippleAdderConstruction.time_build_ripple_adder(100)                                               |
| -        | 4.69±0.06ms                                | 4.08±0.08ms                             |    0.87 | ripple_adder.RippleAdderConstruction.time_build_ripple_adder(50)                                                |
| -        | 239±3ms                                    | 183±1ms                                 |    0.77 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(10, 0)                                |
| -        | 107±2ms                                    | 41.1±0.6ms                              |    0.39 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(10, 1)                                |
| -        | 198±2ms                                    | 124±3ms                                 |    0.63 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(10, 2)                                |
| -        | 243±2ms                                    | 149±3ms                                 |    0.61 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(10, 3)                                |
| -        | 675±8ms                                    | 548±7ms                                 |    0.81 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(20, 0)                                |
| -        | 216±2ms                                    | 78.1±1ms                                |    0.36 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(20, 1)                                |
| -        | 418±8ms                                    | 258±2ms                                 |    0.62 | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(20, 2)                                |
| -        | 515±20ms                                   | 309±3ms                                 |    0.6  | ripple_adder.RippleAdderTranspile.time_transpile_square_grid_ripple_adder(20, 3)                                |
| -        | 1.93±0.03s                                 | 1.27±0.03s                              |    0.66 | scheduling_passes.SchedulingPassBenchmarks.time_alap_schedule_pass(10, 1000)                                    |
| -        | 944±20ms                                   | 632±20ms                                |    0.67 | scheduling_passes.SchedulingPassBenchmarks.time_alap_schedule_pass(10, 500)                                     |
| -        | 774±10ms                                   | 497±10ms                                |    0.64 | scheduling_passes.SchedulingPassBenchmarks.time_alap_schedule_pass(5, 1000)                                     |
| -        | 367±4ms                                    | 244±9ms                                 |    0.66 | scheduling_passes.SchedulingPassBenchmarks.time_alap_schedule_pass(5, 500)                                      |
| -        | 1.90±0.01s                                 | 1.28±0.02s                              |    0.67 | scheduling_passes.SchedulingPassBenchmarks.time_asap_schedule_pass(10, 1000)                                    |
| -        | 986±20ms                                   | 639±10ms                                |    0.65 | scheduling_passes.SchedulingPassBenchmarks.time_asap_schedule_pass(10, 500)                                     |
| -        | 793±30ms                                   | 499±7ms                                 |    0.63 | scheduling_passes.SchedulingPassBenchmarks.time_asap_schedule_pass(5, 1000)                                     |
| -        | 384±20ms                                   | 239±2ms                                 |    0.62 | scheduling_passes.SchedulingPassBenchmarks.time_asap_schedule_pass(5, 500)                                      |
| -        | 185±10ms                                   | 149±5ms                                 |    0.8  | scheduling_passes.SchedulingPassBenchmarks.time_time_unit_conversion_pass(10, 500)                              |
| -        | 155±3ms                                    | 121±0.5ms                               |    0.78 | scheduling_passes.SchedulingPassBenchmarks.time_time_unit_conversion_pass(5, 1000)                              |
| -        | 76.4±3ms                                   | 59.4±1ms                                |    0.78 | scheduling_passes.SchedulingPassBenchmarks.time_time_unit_conversion_pass(5, 500)                               |
| -        | 986±20ms                                   | 503±7ms                                 |    0.51 | transpiler_benchmarks.TranspilerBenchSuite.time_compile_from_large_qasm                                         |
| -        | 1.67±0.01s                                 | 1.21±0.01s                              |    0.73 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)                            |
| -        | 1.07±0.01s                                 | 386±6ms                                 |    0.36 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)                            |
| -        | 2.97±0.05s                                 | 2.10±0.03s                              |    0.71 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)                            |
| -        | 3.77±0.1s                                  | 2.47±0.03s                              |    0.66 | transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)                            |
| -        | 244±4ms                                    | 149±1ms                                 |    0.61 | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)                                         |
| -        | 242±6ms                                    | 107±3ms                                 |    0.44 | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)                                         |
| -        | 242±5ms                                    | 83.9±3ms                                |    0.35 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                                   |
| -        | 366±3ms                                    | 111±0.8ms                               |    0.3  | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                                   |
| -        | 477±10ms                                   | 172±4ms                                 |    0.36 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                                   |
| -        | 484±6ms                                    | 174±2ms                                 |    0.36 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                                   |
| -        | 245±4ms                                    | 89.4±1ms                                |    0.36 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)                 |
| -        | 416±20ms                                   | 145±0.8ms                               |    0.35 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)                 |
| -        | 461±3ms                                    | 184±3ms                                 |    0.4  | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)                 |
| -        | 495±8ms                                    | 190±2ms                                 |    0.38 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)                 |
| -        | 166±2ms                                    | 109±0.4ms                               |    0.66 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                                        |
| -        | 210±3ms                                    | 94.3±3ms                                |    0.45 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                                        |
| -        | 473±7ms                                    | 326±8ms                                 |    0.69 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)                                        |
| -        | 547±10ms                                   | 358±3ms                                 |    0.65 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                                        |
| -        | 67.9±2ms                                   | 49.7±2ms                                |    0.73 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(0, 'sabre', 'dense')           |
| -        | 63.9±2ms                                   | 47.6±1ms                                |    0.75 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(0, 'sabre', 'sabre')           |
| -        | 126±2ms                                    | 106±1ms                                 |    0.84 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(0, 'stochastic', 'dense')      |
| -        | 122±1ms                                    | 106±0.8ms                               |    0.87 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(0, 'stochastic', 'sabre')      |
| -        | 79.6±2ms                                   | 56.9±2ms                                |    0.71 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(1, 'sabre', 'dense')           |
| -        | 73.7±3ms                                   | 55.5±0.7ms                              |    0.75 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(1, 'sabre', 'sabre')           |
| -        | 143±3ms                                    | 114±3ms                                 |    0.8  | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(1, 'stochastic', 'dense')      |
| -        | 126±1ms                                    | 103±1ms                                 |    0.82 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(1, 'stochastic', 'sabre')      |
| -        | 124±3ms                                    | 93.6±3ms                                |    0.75 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(2, 'sabre', 'dense')           |
| -        | 144±2ms                                    | 112±1ms                                 |    0.78 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(2, 'sabre', 'sabre')           |
| -        | 240±3ms                                    | 189±3ms                                 |    0.79 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(2, 'stochastic', 'dense')      |
| -        | 214±8ms                                    | 169±1ms                                 |    0.79 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(2, 'stochastic', 'sabre')      |
| -        | 160±2ms                                    | 113±2ms                                 |    0.71 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(3, 'sabre', 'dense')           |
| -        | 199±3ms                                    | 144±3ms                                 |    0.72 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(3, 'sabre', 'sabre')           |
| -        | 313±5ms                                    | 232±1ms                                 |    0.74 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(3, 'stochastic', 'dense')      |
| -        | 248±2ms                                    | 198±1ms                                 |    0.8  | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(3, 'stochastic', 'sabre')      |
| -        | 104±1ms                                    | 70.4±1ms                                |    0.68 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(0, 'sabre', 'dense')           |
| -        | 99.8±0.7ms                                 | 69.1±0.7ms                              |    0.69 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(0, 'sabre', 'sabre')           |
| -        | 274±2ms                                    | 231±4ms                                 |    0.84 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(0, 'stochastic', 'dense')      |
| -        | 285±2ms                                    | 242±5ms                                 |    0.85 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(0, 'stochastic', 'sabre')      |
| -        | 141±2ms                                    | 84.7±2ms                                |    0.6  | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(1, 'sabre', 'dense')           |
| -        | 138±2ms                                    | 84.6±1ms                                |    0.61 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(1, 'sabre', 'sabre')           |
| -        | 329±3ms                                    | 246±6ms                                 |    0.75 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(1, 'stochastic', 'dense')      |
| -        | 326±4ms                                    | 258±2ms                                 |    0.79 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(1, 'stochastic', 'sabre')      |
| -        | 326±3ms                                    | 231±6ms                                 |    0.71 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(2, 'sabre', 'dense')           |
| -        | 354±10ms                                   | 262±9ms                                 |    0.74 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(2, 'sabre', 'sabre')           |
| -        | 555±20ms                                   | 415±5ms                                 |    0.75 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(2, 'stochastic', 'dense')      |
| -        | 585±4ms                                    | 467±10ms                                |    0.8  | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(2, 'stochastic', 'sabre')      |
| -        | 479±10ms                                   | 305±4ms                                 |    0.64 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(3, 'sabre', 'dense')           |
| -        | 495±4ms                                    | 336±6ms                                 |    0.68 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(3, 'sabre', 'sabre')           |
| -        | 676±10ms                                   | 511±10ms                                |    0.76 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(3, 'stochastic', 'dense')      |
| -        | 716±20ms                                   | 554±10ms                                |    0.77 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(3, 'stochastic', 'sabre')      |
| -        | 110±4ms                                    | 67.6±0.8ms                              |    0.61 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(0, 'sabre', 'dense')               |
| -        | 102±3ms                                    | 64.9±3ms                                |    0.63 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(0, 'sabre', 'sabre')               |
| -        | 238±3ms                                    | 185±4ms                                 |    0.78 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(0, 'stochastic', 'dense')          |
| -        | 255±4ms                                    | 200±7ms                                 |    0.78 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(0, 'stochastic', 'sabre')          |
| -        | 151±2ms                                    | 81.9±2ms                                |    0.54 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(1, 'sabre', 'dense')               |
| -        | 142±2ms                                    | 79.8±4ms                                |    0.56 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(1, 'sabre', 'sabre')               |
| -        | 312±10ms                                   | 204±2ms                                 |    0.65 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(1, 'stochastic', 'dense')          |
| -        | 305±2ms                                    | 205±1ms                                 |    0.67 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_qft_16(1, 'stochastic', 'sabre')          |
| -        | 19.5±1ms                                   | 12.6±0.2ms                              |    0.65 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                                 |
| -        | 19.2±0.6ms                                 | 12.9±0.5ms                              |    0.67 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                                 |
| -        | 19.7±0.9ms                                 | 12.5±0.2ms                              |    0.63 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                                |
| -        | 221±4ms                                    | 144±5ms                                 |    0.65 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                                  |
| -        | 216±6ms                                    | 143±2ms                                 |    0.66 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                                  |
| -        | 221±3ms                                    | 142±6ms                                 |    0.64 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                                 |
| -        | 68.9±1ms                                   | 45.8±0.3ms                              |    0.67 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                                    |
| -        | 67.5±0.6ms                                 | 46.2±2ms                                |    0.68 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                                    |
| -        | 71.7±2ms                                   | 45.4±2ms                                |    0.63 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                                   |
| -        | 2.66±0.05s                                 | 1.83±0.02s                              |    0.69 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                                            |
| -        | 3.47±0.02s                                 | 1.83±0.02s                              |    0.53 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                                               |
| -        | 4.80±0s                                    | 2.66±0.03s                              |    0.55 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                                               |
| -        | 4.69±0.01s                                 | 2.49±0.04s                              |    0.53 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                                              |
| Change   | Before [bb60891a] <repack-instruction~1>   | After [d9e31ed5] <repack-instruction>   |   Ratio | Benchmark (Parameter)                                           |
|----------|--------------------------------------------|-----------------------------------------|---------|-----------------------------------------------------------------|
| +        | 87.1±3ms                                   | 143±4ms                                 |    1.65 | assembler.AssemblerBenchmarks.time_assemble_circuit(8, 4096, 1) |

@jakelishman
Copy link
Member Author

Benchmarking against 1.1.0, showing only changed. Note that most utility-scale transpilations are not in this list - they're within 10% of 1.1.0 now. assign_parameters is still much slower; we need to both rework Param handling in Rust a bit to avoid the nearly 50% padding factor, and probably move the orchestration down to Rust to avoid allocations.

| Change   | Before [7d29dc1b] <1.1.0^0>   | After [d9e31ed5] <repack-instruction>   |   Ratio | Benchmark (Parameter)                                                                                           |
|----------|-------------------------------|-----------------------------------------|---------|-----------------------------------------------------------------------------------------------------------------|
| -        | 1.07±0.04ms                   | 714±90μs                                |    0.67 | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 128)                                 |
| -        | 1.04±0.01s                    | 683±30ms                                |    0.65 | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 131072)                              |
| -        | 17.1±0.8ms                    | 11.2±0.4ms                              |    0.65 | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 2048)                                |
| -        | 258±5ms                       | 172±6ms                                 |    0.67 | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 32768)                               |
| -        | 65.6±2ms                      | 42.3±2ms                                |    0.64 | circuit_construction.CircuitConstructionBench.time_circuit_construction(1, 8192)                                |
| -        | 1.22±0.01ms                   | 827±30μs                                |    0.68 | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 128)                                |
| -        | 1.15±0.01s                    | 713±20ms                                |    0.62 | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 131072)                             |
| -        | 17.6±0.3ms                    | 10.6±0.09ms                             |    0.6  | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 2048)                               |
| -        | 284±6ms                       | 183±4ms                                 |    0.64 | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 32768)                              |
| -        | 297±4μs                       | 210±10μs                                |    0.71 | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 8)                                  |
| -        | 70.1±6ms                      | 43.1±0.8ms                              |    0.61 | circuit_construction.CircuitConstructionBench.time_circuit_construction(14, 8192)                               |
| -        | 1.18±0.06ms                   | 720±10μs                                |    0.61 | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 128)                                 |
| -        | 1.11±0.01s                    | 701±5ms                                 |    0.63 | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 131072)                              |
| -        | 17.1±0.3ms                    | 10.5±0.2ms                              |    0.61 | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 2048)                                |
| -        | 277±4ms                       | 171±2ms                                 |    0.62 | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 32768)                               |
| -        | 114±0.8μs                     | 78.1±2μs                                |    0.69 | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 8)                                   |
| -        | 76.3±4ms                      | 44.4±1ms                                |    0.58 | circuit_construction.CircuitConstructionBench.time_circuit_construction(2, 8192)                                |
| -        | 1.41±0.07ms                   | 929±70μs                                |    0.66 | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 128)                                |
| -        | 1.11±0.01s                    | 702±10ms                                |    0.63 | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 131072)                             |
| -        | 17.6±0.2ms                    | 10.9±0.2ms                              |    0.62 | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 2048)                               |
| -        | 282±10ms                      | 172±3ms                                 |    0.61 | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 32768)                              |
| -        | 415±30μs                      | 296±8μs                                 |    0.71 | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 8)                                  |
| -        | 69.5±1ms                      | 42.7±0.5ms                              |    0.61 | circuit_construction.CircuitConstructionBench.time_circuit_construction(20, 8192)                               |
| -        | 1.26±0.06ms                   | 792±100μs                               |    0.63 | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 128)                                 |
| -        | 1.11±0.02s                    | 693±10ms                                |    0.63 | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 131072)                              |
| -        | 17.9±0.8ms                    | 10.8±0.2ms                              |    0.6  | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 2048)                                |
| -        | 279±4ms                       | 170±3ms                                 |    0.61 | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 32768)                               |
| -        | 123±2μs                       | 84.0±4μs                                |    0.69 | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 8)                                   |
| -        | 70.5±3ms                      | 43.9±2ms                                |    0.62 | circuit_construction.CircuitConstructionBench.time_circuit_construction(5, 8192)                                |
| -        | 1.30±0.03ms                   | 756±6μs                                 |    0.58 | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 128)                                 |
| -        | 1.14±0.04s                    | 704±5ms                                 |    0.61 | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 131072)                              |
| -        | 17.4±0.2ms                    | 11.8±0.5ms                              |    0.68 | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 2048)                                |
| -        | 282±10ms                      | 181±7ms                                 |    0.64 | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 32768)                               |
| -        | 192±8μs                       | 124±3μs                                 |    0.64 | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 8)                                   |
| -        | 71.7±3ms                      | 43.4±0.4ms                              |    0.61 | circuit_construction.CircuitConstructionBench.time_circuit_construction(8, 8192)                                |
| -        | 27.9±0.2μs                    | 10.7±0.3μs                              |    0.38 | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 128)                                         |
| -        | 18.0±0.4ms                    | 2.92±0.6ms                              |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 131072)                                      |
| -        | 299±9μs                       | 37.8±5μs                                |    0.13 | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 2048)                                        |
| -        | 4.59±0.3ms                    | 543±200μs                               |    0.12 | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 32768)                                       |
| -        | 12.1±0.1μs                    | 9.42±0.07μs                             |    0.78 | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 8)                                           |
| -        | 1.14±0.01ms                   | 106±20μs                                |    0.09 | circuit_construction.CircuitConstructionBench.time_circuit_copy(1, 8192)                                        |
| -        | 34.9±0.7μs                    | 16.7±0.8μs                              |    0.48 | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 128)                                        |
| -        | 17.8±0.1ms                    | 3.68±0.1ms                              |    0.21 | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 131072)                                     |
| -        | 295±2μs                       | 40.8±5μs                                |    0.14 | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 2048)                                       |
| -        | 4.56±0.2ms                    | 532±100μs                               |    0.12 | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 32768)                                      |
| -        | 20.1±0.6μs                    | 14.8±0.9μs                              |    0.73 | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 8)                                          |
| -        | 1.14±0.04ms                   | 122±30μs                                |    0.11 | circuit_construction.CircuitConstructionBench.time_circuit_copy(14, 8192)                                       |
| -        | 29.6±0.3μs                    | 12.4±0.1μs                              |    0.42 | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 128)                                         |
| -        | 17.9±0.2ms                    | 3.73±0.2ms                              |    0.21 | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 131072)                                      |
| -        | 286±3μs                       | 32.2±5μs                                |    0.11 | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 2048)                                        |
| -        | 4.61±0.2ms                    | 541±100μs                               |    0.12 | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 32768)                                       |
| -        | 12.6±0.1μs                    | 9.87±0.3μs                              |    0.78 | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 8)                                           |
| -        | 1.18±0.05ms                   | 142±20μs                                |    0.12 | circuit_construction.CircuitConstructionBench.time_circuit_copy(2, 8192)                                        |
| -        | 41.0±0.6μs                    | 19.5±1μs                                |    0.48 | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 128)                                        |
| -        | 18.2±0.3ms                    | 3.67±0.1ms                              |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 131072)                                     |
| -        | 312±20μs                      | 50.9±6μs                                |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 2048)                                       |
| -        | 4.62±0.1ms                    | 577±100μs                               |    0.12 | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 32768)                                      |
| -        | 1.15±0.02ms                   | 117±8μs                                 |    0.1  | circuit_construction.CircuitConstructionBench.time_circuit_copy(20, 8192)                                       |
| -        | 30.0±0.3μs                    | 12.9±0.2μs                              |    0.43 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 128)                                         |
| -        | 18.1±0.1ms                    | 3.62±0.3ms                              |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 131072)                                      |
| -        | 309±6μs                       | 34.0±3μs                                |    0.11 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 2048)                                        |
| -        | 4.63±0.2ms                    | 576±100μs                               |    0.12 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 32768)                                       |
| -        | 14.9±0.2μs                    | 10.5±0.1μs                              |    0.71 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 8)                                           |
| -        | 1.19±0.03ms                   | 108±30μs                                |    0.09 | circuit_construction.CircuitConstructionBench.time_circuit_copy(5, 8192)                                        |
| -        | 32.0±0.8μs                    | 14.9±2μs                                |    0.47 | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 128)                                         |
| -        | 18.1±0.6ms                    | 3.61±0.3ms                              |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 131072)                                      |
| -        | 288±5μs                       | 44.5±8μs                                |    0.15 | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 2048)                                        |
| -        | 4.42±0.1ms                    | 484±200μs                               |    0.11 | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 32768)                                       |
| -        | 1.10±0.02ms                   | 118±20μs                                |    0.11 | circuit_construction.CircuitConstructionBench.time_circuit_copy(8, 8192)                                        |
| -        | 187±5μs                       | 30.9±0.3μs                              |    0.17 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 128)                                       |
| -        | 168±2ms                       | 23.0±0.9ms                              |    0.14 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 131072)                                    |
| -        | 2.75±0.1ms                    | 343±8μs                                 |    0.12 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 2048)                                      |
| -        | 41.7±0.3ms                    | 5.68±0.2ms                              |    0.14 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 32768)                                     |
| -        | 20.3±0.5μs                    | 11.3±0.5μs                              |    0.56 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 8)                                         |
| -        | 11.1±0.3ms                    | 1.39±0.08ms                             |    0.13 | circuit_construction.CircuitConstructionBench.time_circuit_extend(1, 8192)                                      |
| -        | 204±2μs                       | 54.1±7μs                                |    0.26 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 128)                                      |
| -        | 184±2ms                       | 34.7±0.2ms                              |    0.19 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 131072)                                   |
| -        | 2.87±0.1ms                    | 439±10μs                                |    0.15 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 2048)                                     |
| -        | 44.9±0.5ms                    | 7.24±0.4ms                              |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 32768)                                    |
| -        | 53.2±0.6μs                    | 27.0±0.7μs                              |    0.51 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 8)                                        |
| -        | 11.4±0.3ms                    | 1.83±0.09ms                             |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_extend(14, 8192)                                     |
| -        | 188±2μs                       | 36.0±2μs                                |    0.19 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 128)                                       |
| -        | 187±6ms                       | 33.6±2ms                                |    0.18 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 131072)                                    |
| -        | 2.98±0.09ms                   | 380±20μs                                |    0.13 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 2048)                                      |
| -        | 45.1±1ms                      | 7.02±0.3ms                              |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 32768)                                     |
| -        | 23.3±1μs                      | 11.5±0.2μs                              |    0.49 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 8)                                         |
| -        | 11.9±0.3ms                    | 1.62±0.05ms                             |    0.14 | circuit_construction.CircuitConstructionBench.time_circuit_extend(2, 8192)                                      |
| -        | 246±8μs                       | 57.8±10μs                               |    0.24 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 128)                                      |
| -        | 182±1ms                       | 35.6±2ms                                |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 131072)                                   |
| -        | 2.91±0.05ms                   | 447±7μs                                 |    0.15 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 2048)                                     |
| -        | 45.6±1ms                      | 7.42±0.5ms                              |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 32768)                                    |
| -        | 76.3±4μs                      | 26.5±0.8μs                              |    0.35 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 8)                                        |
| -        | 11.6±0.5ms                    | 1.90±0.1ms                              |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_extend(20, 8192)                                     |
| -        | 213±9μs                       | 38.4±0.8μs                              |    0.18 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 128)                                       |
| -        | 185±6ms                       | 34.9±1ms                                |    0.19 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 131072)                                    |
| -        | 2.87±0.04ms                   | 457±20μs                                |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 2048)                                      |
| -        | 48.0±2ms                      | 7.04±0.2ms                              |    0.15 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 32768)                                     |
| -        | 24.5±1μs                      | 13.0±0.3μs                              |    0.53 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 8)                                         |
| -        | 11.4±0.3ms                    | 1.77±0.06ms                             |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_extend(5, 8192)                                      |
| -        | 210±6μs                       | 41.8±2μs                                |    0.2  | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 128)                                       |
| -        | 185±3ms                       | 35.4±1ms                                |    0.19 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 131072)                                    |
| -        | 2.89±0.02ms                   | 446±30μs                                |    0.15 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 2048)                                      |
| -        | 45.3±1ms                      | 7.14±0.3ms                              |    0.16 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 32768)                                     |
| -        | 37.0±0.8μs                    | 16.0±0.5μs                              |    0.43 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 8)                                         |
| -        | 11.6±0.3ms                    | 1.74±0.07ms                             |    0.15 | circuit_construction.CircuitConstructionBench.time_circuit_extend(8, 8192)                                      |
| -        | 3.84±0.02ms                   | 1.77±0.02ms                             |    0.46 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 128, 128)       |
| -        | 3.53±0.4ms                    | 1.18±0.1ms                              |    0.34 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 128, 8)         |
| -        | 2.57±0.01s                    | 900±10ms                                |    0.35 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 128)    |
| -        | 3.52±0.06s                    | 1.62±0.03s                              |    0.46 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 131072) |
| -        | 2.64±0.06s                    | 912±10ms                                |    0.35 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 2048)   |
| -        | 2.84±0.02s                    | 1.07±0s                                 |    0.38 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 32768)  |
| -        | 2.62±0.05s                    | 955±40ms                                |    0.36 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 8)      |
| -        | 2.63±0.03s                    | 953±10ms                                |    0.36 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 131072, 8192)   |
| -        | 45.1±1ms                      | 14.8±0.4ms                              |    0.33 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 2048, 128)      |
| -        | 50.9±0.7ms                    | 25.1±1ms                                |    0.49 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 2048, 2048)     |
| -        | 38.7±0.3ms                    | 13.8±0.2ms                              |    0.36 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 2048, 8)        |
| -        | 640±20ms                      | 223±1ms                                 |    0.35 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 128)     |
| -        | 650±7ms                       | 232±10ms                                |    0.36 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 2048)    |
| -        | 867±20ms                      | 396±4ms                                 |    0.46 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 32768)   |
| -        | 672±30ms                      | 224±2ms                                 |    0.33 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 8)       |
| -        | 703±10ms                      | 267±4ms                                 |    0.38 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 32768, 8192)    |
| -        | 873±10μs                      | 393±4μs                                 |    0.45 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8, 8)           |
| -        | 159±5ms                       | 61.8±2ms                                |    0.39 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8192, 128)      |
| -        | 174±2ms                       | 67.8±2ms                                |    0.39 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8192, 2048)     |
| -        | 156±4ms                       | 55.2±0.4ms                              |    0.35 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8192, 8)        |
| -        | 210±2ms                       | 99.2±2ms                                |    0.47 | circuit_construction.ParameterizedCircuitConstructionBench.time_build_parameterized_circuit(20, 8192, 8192)     |
| -        | 838±10μs                      | 648±7μs                                 |    0.77 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 128)                                                      |
| -        | 12.5±0.3ms                    | 8.50±0.2ms                              |    0.68 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 2048)                                                     |
| -        | 93.5±2μs                      | 59.2±0.7μs                              |    0.63 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8)                                                        |
| -        | 52.6±1ms                      | 35.3±2ms                                |    0.67 | converters.ConverterBenchmarks.time_circuit_to_dag(1, 8192)                                                     |
| -        | 8.14±0.1ms                    | 6.46±0.2ms                              |    0.79 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 128)                                                     |
| -        | 117±0.8ms                     | 95.0±3ms                                |    0.81 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 2048)                                                    |
| -        | 773±10μs                      | 667±9μs                                 |    0.86 | converters.ConverterBenchmarks.time_circuit_to_dag(14, 8)                                                       |
| -        | 1.45±0.08ms                   | 1.11±0.02ms                             |    0.76 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 128)                                                      |
| -        | 23.3±2ms                      | 15.2±0.2ms                              |    0.65 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 2048)                                                     |
| -        | 157±0.6μs                     | 119±2μs                                 |    0.75 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8)                                                        |
| -        | 82.2±0.7ms                    | 59.9±1ms                                |    0.73 | converters.ConverterBenchmarks.time_circuit_to_dag(2, 8192)                                                     |
| -        | 1.75±0.02ms                   | 1.53±0.06ms                             |    0.88 | converters.ConverterBenchmarks.time_circuit_to_dag(32, 8)                                                       |
| -        | 3.00±0.1ms                    | 2.27±0.03ms                             |    0.75 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 128)                                                      |
| -        | 40.5±0.8ms                    | 32.0±0.9ms                              |    0.79 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 2048)                                                     |
| -        | 329±20μs                      | 218±10μs                                |    0.66 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8)                                                        |
| -        | 171±8ms                       | 125±2ms                                 |    0.73 | converters.ConverterBenchmarks.time_circuit_to_dag(5, 8192)                                                     |
| -        | 4.66±0.06ms                   | 3.64±0.04ms                             |    0.78 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 128)                                                      |
| -        | 65.7±2ms                      | 49.6±0.4ms                              |    0.75 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 2048)                                                     |
| -        | 463±20μs                      | 364±5μs                                 |    0.79 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8)                                                        |
| -        | 258±4ms                       | 193±1ms                                 |    0.75 | converters.ConverterBenchmarks.time_circuit_to_dag(8, 8192)                                                     |
| -        | 299±10μs                      | 240±3μs                                 |    0.8  | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 128)                                              |
| -        | 3.57±0.3ms                    | 2.26±0.06ms                             |    0.63 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 2048)                                             |
| -        | 12.8±0.1ms                    | 8.84±0.1ms                              |    0.69 | converters.ConverterBenchmarks.time_circuit_to_instruction(1, 8192)                                             |
| -        | 158±10μs                      | 125±3μs                                 |    0.79 | converters.ConverterBenchmarks.time_circuit_to_instruction(5, 8)                                                |
| -        | 548±20μs                      | 243±0.9μs                               |    0.44 | converters.ConverterBenchmarks.time_dag_to_circuit(1, 128)                                                      |
| -        | 8.19±0.8ms                    | 3.25±0.05ms                             |    0.4  | converters.ConverterBenchmarks.time_dag_to_circuit(1, 2048)                                                     |
| -        | 71.3±4μs                      | 37.6±0.4μs                              |    0.53 | converters.ConverterBenchmarks.time_dag_to_circuit(1, 8)                                                        |
| -        | 30.0±0.2ms                    | 13.1±0.2ms                              |    0.44 | converters.ConverterBenchmarks.time_dag_to_circuit(1, 8192)                                                     |
| -        | 3.66±0.03ms                   | 2.10±0.05ms                             |    0.57 | converters.ConverterBenchmarks.time_dag_to_circuit(14, 128)                                                     |
| -        | 59.7±0.5ms                    | 33.5±0.8ms                              |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(14, 2048)                                                    |
| -        | 377±50μs                      | 247±7μs                                 |    0.65 | converters.ConverterBenchmarks.time_dag_to_circuit(14, 8)                                                       |
| -        | 787±7μs                       | 450±20μs                                |    0.57 | converters.ConverterBenchmarks.time_dag_to_circuit(2, 128)                                                      |
| -        | 12.3±0.3ms                    | 5.88±0.08ms                             |    0.48 | converters.ConverterBenchmarks.time_dag_to_circuit(2, 2048)                                                     |
| -        | 92.3±2μs                      | 57.7±1μs                                |    0.63 | converters.ConverterBenchmarks.time_dag_to_circuit(2, 8)                                                        |
| -        | 48.6±0.8ms                    | 23.5±0.7ms                              |    0.48 | converters.ConverterBenchmarks.time_dag_to_circuit(2, 8192)                                                     |
| -        | 5.31±0.2ms                    | 2.99±0.06ms                             |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(20, 128)                                                     |
| -        | 474±10μs                      | 323±3μs                                 |    0.68 | converters.ConverterBenchmarks.time_dag_to_circuit(20, 8)                                                       |
| -        | 8.54±0.2ms                    | 4.81±0.2ms                              |    0.56 | converters.ConverterBenchmarks.time_dag_to_circuit(32, 128)                                                     |
| -        | 804±20μs                      | 539±20μs                                |    0.67 | converters.ConverterBenchmarks.time_dag_to_circuit(32, 8)                                                       |
| -        | 1.54±0.04ms                   | 799±40μs                                |    0.52 | converters.ConverterBenchmarks.time_dag_to_circuit(5, 128)                                                      |
| -        | 23.3±0.6ms                    | 12.2±0.2ms                              |    0.53 | converters.ConverterBenchmarks.time_dag_to_circuit(5, 2048)                                                     |
| -        | 160±2μs                       | 107±6μs                                 |    0.67 | converters.ConverterBenchmarks.time_dag_to_circuit(5, 8)                                                        |
| -        | 92.6±4ms                      | 50.6±1ms                                |    0.55 | converters.ConverterBenchmarks.time_dag_to_circuit(5, 8192)                                                     |
| -        | 14.3±0.4ms                    | 8.46±0.2ms                              |    0.59 | converters.ConverterBenchmarks.time_dag_to_circuit(53, 128)                                                     |
| -        | 1.19±0.01ms                   | 873±10μs                                |    0.73 | converters.ConverterBenchmarks.time_dag_to_circuit(53, 8)                                                       |
| -        | 2.31±0.1ms                    | 1.35±0.05ms                             |    0.59 | converters.ConverterBenchmarks.time_dag_to_circuit(8, 128)                                                      |
| -        | 35.0±1ms                      | 18.8±0.2ms                              |    0.54 | converters.ConverterBenchmarks.time_dag_to_circuit(8, 2048)                                                     |
| -        | 223±30μs                      | 153±1μs                                 |    0.69 | converters.ConverterBenchmarks.time_dag_to_circuit(8, 8)                                                        |
| -        | 140±6ms                       | 79.4±4ms                                |    0.57 | converters.ConverterBenchmarks.time_dag_to_circuit(8, 8192)                                                     |
| -        | 57.0±2ms                      | 40.5±1ms                                |    0.71 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])   |
| -        | 60.9±4ms                      | 43.3±0.4ms                              |    0.71 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])          |
| -        | 49.5±0.8ms                    | 27.3±0.4ms                              |    0.55 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(14, 1024, ['u', 'cx', 'id'])                      |
| -        | 75.3±0.5ms                    | 58.4±3ms                                |    0.78 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])   |
| -        | 80.8±1ms                      | 61.7±2ms                                |    0.76 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])          |
| -        | 68.2±2ms                      | 55.8±2ms                                |    0.82 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(20, 1024, ['u', 'cx', 'id'])                      |
| -        | 26.2±1ms                      | 12.2±0.1ms                              |    0.47 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])    |
| -        | 27.8±2ms                      | 10.6±0.06ms                             |    0.38 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])           |
| -        | 17.0±2ms                      | 9.31±0.5ms                              |    0.55 | passes.MultipleBasisPassBenchmarks.time_optimize_1q_decompose(5, 1024, ['u', 'cx', 'id'])                       |
| -        | 6.65±0.5ms                    | 5.26±0.09ms                             |    0.79 | passes.PassBenchmarks.time_merge_adjacent_barriers(20, 1024)                                                    |
| -        | 46.5±1ms                      | 35.3±0.6ms                              |    0.76 | quantum_info.CliffordDecomposeBench.time_decompose('1,1000')                                                    |
| -        | 350±4ms                       | 3.34±0.1ms                              |    0.01 | quantum_info.CliffordDecomposeBench.time_decompose('4,50')                                                      |
| -        | 106±2ms                       | 856±50μs                                |    0.01 | quantum_info.CliffordDecomposeBench.time_decompose('5,10')                                                      |
| -        | 273±7ms                       | 224±7ms                                 |    0.82 | quantum_info.RandomCnotDihedralBench.time_random_cnotdihedral('1,2000')                                         |
| -        | 246±5ms                       | 174±3ms                                 |    0.71 | quantum_info.RandomCnotDihedralBench.time_random_cnotdihedral('2,1500')                                         |
| -        | 222±2ms                       | 140±1ms                                 |    0.63 | quantum_info.RandomCnotDihedralBench.time_random_cnotdihedral('3,1200')                                         |
| -        | 211±5ms                       | 117±1ms                                 |    0.55 | quantum_info.RandomCnotDihedralBench.time_random_cnotdihedral('4,1000')                                         |
| -        | 187±3ms                       | 96.7±5ms                                |    0.52 | quantum_info.RandomCnotDihedralBench.time_random_cnotdihedral('5,800')                                          |
| -        | 187±4ms                       | 86.9±3ms                                |    0.47 | quantum_info.RandomCnotDihedralBench.time_random_cnotdihedral('6,700')                                          |
| -        | 547                           | 490                                     |    0.9  | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(115, 10, 'lookahead')                 |
| -        | 104±2ms                       | 90.9±0.7ms                              |    0.88 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(14, 'translator')                             |
| -        | 369±8ms                       | 295±10ms                                |    0.8  | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(20, 'synthesis')                              |
| -        | 702±20ms                      | 584±20ms                                |    0.83 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'synthesis')                              |
| -        | 376±4ms                       | 330±4ms                                 |    0.88 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(27, 'translator')                             |
| -        | 50.5±2ms                      | 38.3±0.6ms                              |    0.76 | quantum_volume.QuantumVolumeBenchmark.time_ibmq_backend_transpile(8, 'synthesis')                               |
| -        | 394                           | 328                                     |    0.83 | queko.QUEKOTranspilerBench.track_depth_bntf_optimal_depth_25(1, 'sabre')                                        |
| -        | 285                           | 233                                     |    0.82 | queko.QUEKOTranspilerBench.track_depth_bntf_optimal_depth_25(2, 'sabre')                                        |
| -        | 370                           | 281                                     |    0.76 | queko.QUEKOTranspilerBench.track_depth_bss_optimal_depth_100(1, 'sabre')                                        |
| -        | 360                           | 297                                     |    0.82 | queko.QUEKOTranspilerBench.track_depth_bss_optimal_depth_100(2, 'sabre')                                        |
| -        | 1.34±0.02ms                   | 884±8μs                                 |    0.66 | ripple_adder.RippleAdderConstruction.time_build_ripple_adder(10)                                                |
| -        | 14.2±2ms                      | 8.11±0.1ms                              |    0.57 | ripple_adder.RippleAdderConstruction.time_build_ripple_adder(100)                                               |
| -        | 24.4±0.4ms                    | 17.4±1ms                                |    0.71 | ripple_adder.RippleAdderConstruction.time_build_ripple_adder(200)                                               |
| -        | 6.25±0.08ms                   | 4.08±0.08ms                             |    0.65 | ripple_adder.RippleAdderConstruction.time_build_ripple_adder(50)                                                |
| -        | 62.3±0.6ms                    | 40.7±2ms                                |    0.65 | ripple_adder.RippleAdderConstruction.time_build_ripple_adder(500)                                               |
| -        | 28                            | 17                                      |    0.61 | statepreparation.StatePreparationTranspileBench.track_cnot_counts_after_mapping_to_ibmq_16_melbourne(4)         |
| -        | 79                            | 47                                      |    0.59 | statepreparation.StatePreparationTranspileBench.track_cnot_counts_after_mapping_to_ibmq_16_melbourne(5)         |
| -        | 186                           | 111                                     |    0.6  | statepreparation.StatePreparationTranspileBench.track_cnot_counts_after_mapping_to_ibmq_16_melbourne(6)         |
| -        | 411                           | 231                                     |    0.56 | statepreparation.StatePreparationTranspileBench.track_cnot_counts_after_mapping_to_ibmq_16_melbourne(7)         |
| -        | 872                           | 460                                     |    0.53 | statepreparation.StatePreparationTranspileBench.track_cnot_counts_after_mapping_to_ibmq_16_melbourne(8)         |
| -        | 132±3ms                       | 111±0.8ms                               |    0.84 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                                   |
| -        | 174±10ms                      | 145±0.8ms                               |    0.83 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)                 |
| -        | 336                           | 304                                     |    0.9  | transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)                                 |
| -        | 110±2ms                       | 93.6±3ms                                |    0.85 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(2, 'sabre', 'dense')           |
| -        | 204±7ms                       | 169±1ms                                 |    0.83 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(2, 'stochastic', 'sabre')      |
| -        | 280±2ms                       | 232±1ms                                 |    0.83 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(3, 'stochastic', 'dense')      |
| -        | 235±3ms                       | 198±1ms                                 |    0.84 | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_179(3, 'stochastic', 'sabre')      |
| -        | 2582                          | 1954                                    |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                                                      |
| -        | 2582                          | 1954                                    |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                                                      |
| -        | 2582                          | 1954                                    |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                                                     |
| Change   | Before [7d29dc1b] <1.1.0^0>   | After [d9e31ed5] <repack-instruction>   | Ratio   | Benchmark (Parameter)                                                                                           |
|----------|-------------------------------|-----------------------------------------|---------|-----------------------------------------------------------------------------------------------------------------|
| +        | 85.3±3ms                      | 143±4ms                                 | 1.68    | assembler.AssemblerBenchmarks.time_assemble_circuit(8, 4096, 1)                                                 |
| +        | 1.50±0.05ms                   | 2.61±0.02ms                             | 1.73    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 128, 128)                               |
| +        | 1.19±0.02ms                   | 2.69±0.02ms                             | 2.27    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 128, 8)                                 |
| +        | 1.16±0.04s                    | 2.13±0.03s                              | 1.84    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 128)                            |
| +        | 1.52±0.02s                    | 2.47±0.01s                              | 1.62    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 131072)                         |
| +        | 1.07±0.02s                    | 2.21±0.07s                              | 2.06    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 2048)                           |
| +        | 1.13±0.01s                    | 2.25±0.03s                              | 1.99    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 32768)                          |
| +        | 1.12±0.01s                    | 2.17±0.03s                              | 1.93    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 8)                              |
| +        | 1.06±0.01s                    | 2.14±0.01s                              | 2.02    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 131072, 8192)                           |
| +        | 15.6±0.2ms                    | 31.7±0.3ms                              | 2.03    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 2048, 128)                              |
| +        | 22.4±1ms                      | 35.3±0.2ms                              | 1.57    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 2048, 2048)                             |
| +        | 15.2±0.6ms                    | 33.7±2ms                                | 2.21    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 2048, 8)                                |
| +        | 270±6ms                       | 535±10ms                                | 1.98    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 128)                             |
| +        | 264±10ms                      | 522±10ms                                | 1.98    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 2048)                            |
| +        | 370±10ms                      | 594±9ms                                 | 1.61    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 32768)                           |
| +        | 271±10ms                      | 529±10ms                                | 1.95    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 8)                               |
| +        | 274±3ms                       | 516±8ms                                 | 1.88    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 32768, 8192)                            |
| +        | 347±5μs                       | 666±6μs                                 | 1.92    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8, 8)                                   |
| +        | 63.5±2ms                      | 133±3ms                                 | 2.09    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 128)                              |
| +        | 67.1±2ms                      | 134±1ms                                 | 2.00    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 2048)                             |
| +        | 61.6±2ms                      | 128±1ms                                 | 2.07    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 8)                                |
| +        | 87.3±2ms                      | 148±3ms                                 | 1.70    | circuit_construction.ParameterizedCircuitBindBench.time_bind_params(20, 8192, 8192)                             |
| +        | 247±10μs                      | 315±3μs                                 | 1.27    | converters.ConverterBenchmarks.time_circuit_to_instruction(14, 8)                                               |
| +        | 431±7μs                       | 555±20μs                                | 1.29    | converters.ConverterBenchmarks.time_circuit_to_instruction(2, 128)                                              |
| +        | 2.10±0.02ms                   | 2.35±0.1ms                              | 1.12    | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 128)                                             |
| +        | 336±4μs                       | 392±7μs                                 | 1.17    | converters.ConverterBenchmarks.time_circuit_to_instruction(20, 8)                                               |
| +        | 56.2±2ms                      | 71.8±1ms                                | 1.28    | mapping_passes.PassBenchmarks.time_apply_layout(14, 1024)                                                       |
| +        | 89.0±2ms                      | 109±0.7ms                               | 1.22    | mapping_passes.PassBenchmarks.time_apply_layout(20, 1024)                                                       |
| +        | 19.5±0.6ms                    | 25.2±0.2ms                              | 1.29    | mapping_passes.PassBenchmarks.time_apply_layout(5, 1024)                                                        |
| +        | 1.62±0.03ms                   | 2.04±0.03ms                             | 1.26    | mapping_passes.PassBenchmarks.time_check_map(14, 1024)                                                          |
| +        | 2.42±0.07ms                   | 3.01±0.1ms                              | 1.25    | mapping_passes.PassBenchmarks.time_check_map(20, 1024)                                                          |
| +        | 6.34±0.08ms                   | 7.63±0.2ms                              | 1.20    | mapping_passes.PassBenchmarks.time_layout_2q_distance(20, 1024)                                                 |
| +        | 1.54±0.07ms                   | 1.81±0.05ms                             | 1.18    | mapping_passes.PassBenchmarks.time_layout_2q_distance(5, 1024)                                                  |
| +        | 281±2ms                       | 343±10ms                                | 1.22    | mapping_passes.PassBenchmarks.time_sabre_layout(14, 1024)                                                       |
| +        | 480±2ms                       | 573±6ms                                 | 1.19    | mapping_passes.PassBenchmarks.time_sabre_layout(20, 1024)                                                       |
| +        | 68.8±0.5ms                    | 83.2±3ms                                | 1.21    | mapping_passes.PassBenchmarks.time_sabre_layout(5, 1024)                                                        |
| +        | 229±4ms                       | 266±5ms                                 | 1.16    | mapping_passes.PassBenchmarks.time_sabre_swap(20, 1024)                                                         |
| +        | 28.8±0.1ms                    | 35.3±1ms                                | 1.22    | mapping_passes.RoutedPassBenchmarks.time_check_map(20, 1024)                                                    |
| +        | 3.67±0.05ms                   | 4.48±0.06ms                             | 1.22    | mapping_passes.RoutedPassBenchmarks.time_check_map(5, 1024)                                                     |
| +        | 17.1±0.2ms                    | 19.9±0.2ms                              | 1.17    | mapping_passes.RoutedPassBenchmarks.time_gate_direction(14, 1024)                                               |
| +        | 29.5±0.3ms                    | 34.2±0.3ms                              | 1.16    | mapping_passes.RoutedPassBenchmarks.time_gate_direction(20, 1024)                                               |
| +        | 45.2±1ms                      | 61.5±2ms                                | 1.36    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 1)                                         |
| +        | 46.1±2ms                      | 66.9±2ms                                | 1.45    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 2)                                         |
| +        | 52.7±3ms                      | 66.6±0.4ms                              | 1.26    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 3)                                         |
| +        | 48.3±0.5ms                    | 66.8±1ms                                | 1.38    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 4)                                         |
| +        | 46.9±1ms                      | 64.7±0.5ms                              | 1.38    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(14, 1024, 5)                                         |
| +        | 66.4±3ms                      | 88.5±1ms                                | 1.33    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 1)                                         |
| +        | 67.3±4ms                      | 92.5±0.9ms                              | 1.38    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 2)                                         |
| +        | 72.6±2ms                      | 101±6ms                                 | 1.39    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 3)                                         |
| +        | 69.5±1ms                      | 94.4±2ms                                | 1.36    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 4)                                         |
| +        | 66.7±2ms                      | 93.6±2ms                                | 1.40    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(20, 1024, 5)                                         |
| +        | 16.4±0.9ms                    | 22.4±0.2ms                              | 1.37    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 1)                                          |
| +        | 16.4±0.4ms                    | 23.3±0.2ms                              | 1.42    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 2)                                          |
| +        | 18.2±1ms                      | 23.6±0.2ms                              | 1.30    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 3)                                          |
| +        | 15.9±0.3ms                    | 22.9±0.2ms                              | 1.44    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 4)                                          |
| +        | 15.0±0.3ms                    | 21.3±0.4ms                              | 1.42    | passes.MultiQBlockPassBenchmarks.time_collect_multiq_block(5, 1024, 5)                                          |
| +        | 976±8ms                       | 1.25±0.01s                              | 1.28    | passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])        |
| +        | 709±20ms                      | 856±9ms                                 | 1.21    | passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])               |
| +        | 578±9ms                       | 725±10ms                                | 1.25    | passes.MultipleBasisPassBenchmarks.time_basis_translator(14, 1024, ['u', 'cx', 'id'])                           |
| +        | 1.39±0.03s                    | 1.77±0.03s                              | 1.27    | passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])        |
| +        | 990±10ms                      | 1.18±0.02s                              | 1.20    | passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])               |
| +        | 830±20ms                      | 1.03±0.01s                              | 1.24    | passes.MultipleBasisPassBenchmarks.time_basis_translator(20, 1024, ['u', 'cx', 'id'])                           |
| +        | 343±5ms                       | 430±4ms                                 | 1.25    | passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id'])         |
| +        | 219±1ms                       | 267±2ms                                 | 1.22    | passes.MultipleBasisPassBenchmarks.time_basis_translator(5, 1024, ['u', 'cx', 'id'])                            |
| +        | 672±7ms                       | 798±6ms                                 | 1.19    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
| +        | 726±10ms                      | 866±20ms                                | 1.19    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
| +        | 652±8ms                       | 779±5ms                                 | 1.20    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(14, 1024, ['u', 'cx', 'id'])                    |
| +        | 1.07±0.03s                    | 1.25±0.02s                              | 1.17    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rx', 'ry', 'rz', 'r', 'rxx', 'id']) |
| +        | 1.14±0.01s                    | 1.35±0.02s                              | 1.18    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['rz', 'x', 'sx', 'cx', 'id'])        |
| +        | 1.04±0.01s                    | 1.25±0.03s                              | 1.20    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(20, 1024, ['u', 'cx', 'id'])                    |
| +        | 253±2ms                       | 301±10ms                                | 1.19    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['rz', 'x', 'sx', 'cx', 'id'])         |
| +        | 225±1ms                       | 273±7ms                                 | 1.22    | passes.MultipleBasisPassBenchmarks.time_optimize_1q_commutation(5, 1024, ['u', 'cx', 'id'])                     |
| +        | 10.5±0.1ms                    | 28.9±3ms                                | 2.75    | passes.PassBenchmarks.time_barrier_before_final_measurements(14, 1024)                                          |
| +        | 16.9±0.3ms                    | 44.8±0.3ms                              | 2.66    | passes.PassBenchmarks.time_barrier_before_final_measurements(20, 1024)                                          |
| +        | 4.54±0.07ms                   | 7.84±0.2ms                              | 1.73    | passes.PassBenchmarks.time_barrier_before_final_measurements(5, 1024)                                           |
| +        | 11.8±0.4ms                    | 27.0±0.2ms                              | 2.29    | passes.PassBenchmarks.time_collect_2q_blocks(14, 1024)                                                          |
| +        | 17.8±0.5ms                    | 40.8±0.7ms                              | 2.29    | passes.PassBenchmarks.time_collect_2q_blocks(20, 1024)                                                          |
| +        | 4.08±0.2ms                    | 6.57±0.4ms                              | 1.61    | passes.PassBenchmarks.time_collect_2q_blocks(5, 1024)                                                           |
| +        | 715±10μs                      | 933±40μs                                | 1.31    | passes.PassBenchmarks.time_count_ops_longest_path(5, 1024)                                                      |
| +        | 2.68±0.07ms                   | 6.48±0.2ms                              | 2.42    | passes.PassBenchmarks.time_cx_cancellation(14, 1024)                                                            |
| +        | 4.25±0.09ms                   | 11.5±0.6ms                              | 2.70    | passes.PassBenchmarks.time_cx_cancellation(20, 1024)                                                            |
| +        | 976±20μs                      | 1.79±0.06ms                             | 1.83    | passes.PassBenchmarks.time_cx_cancellation(5, 1024)                                                             |
| +        | 924±20ms                      | 1.04±0.03s                              | 1.13    | passes.PassBenchmarks.time_decompose_pass(14, 1024)                                                             |
| +        | 1.42±0.01s                    | 1.62±0.02s                              | 1.14    | passes.PassBenchmarks.time_decompose_pass(20, 1024)                                                             |
| +        | 298±7ms                       | 335±10ms                                | 1.13    | passes.PassBenchmarks.time_decompose_pass(5, 1024)                                                              |
| +        | 532±10ms                      | 611±7ms                                 | 1.15    | passes.PassBenchmarks.time_optimize_swap_before_measure(14, 1024)                                               |
| +        | 3.38±0.05ms                   | 8.07±0.07ms                             | 2.39    | passes.PassBenchmarks.time_remove_barriers(14, 1024)                                                            |
| +        | 4.76±0.1ms                    | 15.2±0.9ms                              | 3.19    | passes.PassBenchmarks.time_remove_barriers(20, 1024)                                                            |
| +        | 1.30±0.03ms                   | 2.28±0.02ms                             | 1.76    | passes.PassBenchmarks.time_remove_barriers(5, 1024)                                                             |
| +        | 4.08±0.1ms                    | 8.74±0.2ms                              | 2.14    | passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement(14, 1024)                                   |
| +        | 6.11±0.3ms                    | 15.8±0.5ms                              | 2.58    | passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement(20, 1024)                                   |
| +        | 1.64±0.03ms                   | 2.74±0.06ms                             | 1.67    | passes.PassBenchmarks.time_remove_diagonal_gates_before_measurement(5, 1024)                                    |
| +        | 2.25±0.07ms                   | 5.06±0.2ms                              | 2.24    | passes.PassBenchmarks.time_remove_reset_in_zero_state(14, 1024)                                                 |
| +        | 3.05±0.1ms                    | 9.55±0.1ms                              | 3.13    | passes.PassBenchmarks.time_remove_reset_in_zero_state(20, 1024)                                                 |
| +        | 924±10μs                      | 1.43±0.01ms                             | 1.55    | passes.PassBenchmarks.time_remove_reset_in_zero_state(5, 1024)                                                  |
| +        | 797±10ms                      | 928±10ms                                | 1.16    | passes.PassBenchmarks.time_unroll_3q_or_more(14, 1024)                                                          |
| +        | 1.25±0.05s                    | 1.43±0.01s                              | 1.15    | passes.PassBenchmarks.time_unroll_3q_or_more(20, 1024)                                                          |
| +        | 211±2ms                       | 249±4ms                                 | 1.18    | passes.PassBenchmarks.time_unroll_3q_or_more(5, 1024)                                                           |
| +        | 131±3ms                       | 209±2ms                                 | 1.60    | qft.LargeQFTMappingTimeBench.time_sabre_swap(115, 'decay')                                                      |
| +        | 135±4ms                       | 214±4ms                                 | 1.58    | qft.LargeQFTMappingTimeBench.time_sabre_swap(115, 'lookahead')                                                  |
| +        | 6.56±0.1ms                    | 7.68±0.1ms                              | 1.17    | qft.QftTranspileBench.time_ibmq_backend_transpile(2)                                                            |
| +        | 3.72±0.01s                    | 5.54±0.01s                              | 1.49    | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(1081, 10, 'decay')                            |
| +        | 30.1±2ms                      | 35.4±2ms                                | 1.18    | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 10, 'lookahead')                         |
| +        | 264±2ms                       | 301±4ms                                 | 1.14    | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(115, 100, 'lookahead')                        |
| +        | 361±20ms                      | 431±6ms                                 | 1.19    | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'decay')                             |
| +        | 374±5ms                       | 828±10ms                                | 2.21    | quantum_volume.LargeQuantumVolumeMappingTimeBench.time_sabre_swap(409, 10, 'lookahead')                         |
| +        | 3945                          | 4526                                    | 1.15    | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_depth_sabre_swap(409, 10, 'lookahead')                 |
| +        | 217448                        | 258126                                  | 1.19    | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(1081, 10, 'decay')                     |
| +        | 47894                         | 70053                                   | 1.46    | quantum_volume.LargeQuantumVolumeMappingTrackBench.track_size_sabre_swap(409, 10, 'lookahead')                  |
| +        | 18.1±0.1ms                    | 24.0±0.6ms                              | 1.33    | queko.QUEKOTranspilerBench.time_transpile_bigd(0, 'sabre')                                                      |
| +        | 40.5±0.3ms                    | 44.7±0.7ms                              | 1.11    | queko.QUEKOTranspilerBench.time_transpile_bigd(0, None)                                                         |
| +        | 160±3ms                       | 192±4ms                                 | 1.20    | queko.QUEKOTranspilerBench.time_transpile_bntf(0, 'sabre')                                                      |
| +        | 370±5ms                       | 455±5ms                                 | 1.23    | queko.QUEKOTranspilerBench.time_transpile_bntf(0, None)                                                         |
| +        | 303±3ms                       | 388±4ms                                 | 1.28    | queko.QUEKOTranspilerBench.time_transpile_bntf(2, 'sabre')                                                      |
| +        | 403±10ms                      | 486±5ms                                 | 1.21    | queko.QUEKOTranspilerBench.time_transpile_bntf(3, 'sabre')                                                      |
| +        | 149±2ms                       | 177±2ms                                 | 1.19    | queko.QUEKOTranspilerBench.time_transpile_bss(0, 'sabre')                                                       |
| +        | 1.69±0.01s                    | 1.93±0.01s                              | 1.14    | queko.QUEKOTranspilerBench.time_transpile_bss(0, None)                                                          |
| +        | 602±10ms                      | 725±6ms                                 | 1.21    | queko.QUEKOTranspilerBench.time_transpile_bss(3, 'sabre')                                                       |
| +        | 672                           | 973                                     | 1.45    | queko.QUEKOTranspilerBench.track_depth_bntf_optimal_depth_25(0, 'sabre')                                        |
| +        | 200                           | 310                                     | 1.55    | queko.QUEKOTranspilerBench.track_depth_bntf_optimal_depth_25(3, 'sabre')                                        |
| +        | 261                           | 350                                     | 1.34    | queko.QUEKOTranspilerBench.track_depth_bss_optimal_depth_100(3, 'sabre')                                        |
| +        | 1.01±0.01s                    | 1.27±0.03s                              | 1.26    | scheduling_passes.SchedulingPassBenchmarks.time_alap_schedule_pass(10, 1000)                                    |
| +        | 504±10ms                      | 632±20ms                                | 1.25    | scheduling_passes.SchedulingPassBenchmarks.time_alap_schedule_pass(10, 500)                                     |
| +        | 1.37±0.02s                    | 1.73±0.02s                              | 1.26    | scheduling_passes.SchedulingPassBenchmarks.time_alap_schedule_pass(20, 500)                                     |
| +        | 403±5ms                       | 497±10ms                                | 1.23    | scheduling_passes.SchedulingPassBenchmarks.time_alap_schedule_pass(5, 1000)                                     |
| +        | 1.01±0.02s                    | 1.28±0.02s                              | 1.26    | scheduling_passes.SchedulingPassBenchmarks.time_asap_schedule_pass(10, 1000)                                    |
| +        | 502±10ms                      | 639±10ms                                | 1.27    | scheduling_passes.SchedulingPassBenchmarks.time_asap_schedule_pass(10, 500)                                     |
| +        | 1.37±0.01s                    | 1.74±0.03s                              | 1.27    | scheduling_passes.SchedulingPassBenchmarks.time_asap_schedule_pass(20, 500)                                     |
| +        | 405±8ms                       | 499±7ms                                 | 1.23    | scheduling_passes.SchedulingPassBenchmarks.time_asap_schedule_pass(5, 1000)                                     |
| +        | 206±10ms                      | 239±2ms                                 | 1.16    | scheduling_passes.SchedulingPassBenchmarks.time_asap_schedule_pass(5, 500)                                      |
| +        | 189±6ms                       | 317±10ms                                | 1.68    | scheduling_passes.SchedulingPassBenchmarks.time_time_unit_conversion_pass(10, 1000)                             |
| +        | 90.1±5ms                      | 149±5ms                                 | 1.65    | scheduling_passes.SchedulingPassBenchmarks.time_time_unit_conversion_pass(10, 500)                              |
| +        | 473±4ms                       | 755±5ms                                 | 1.60    | scheduling_passes.SchedulingPassBenchmarks.time_time_unit_conversion_pass(20, 1000)                             |
| +        | 245±7ms                       | 397±4ms                                 | 1.62    | scheduling_passes.SchedulingPassBenchmarks.time_time_unit_conversion_pass(20, 500)                              |
| +        | 73.2±1ms                      | 121±0.5ms                               | 1.66    | scheduling_passes.SchedulingPassBenchmarks.time_time_unit_conversion_pass(5, 1000)                              |
| +        | 36.0±0.8ms                    | 59.4±1ms                                | 1.65    | scheduling_passes.SchedulingPassBenchmarks.time_time_unit_conversion_pass(5, 500)                               |
| +        | 5.62±0.1ms                    | 6.49±1ms                                | 1.15    | transpiler_benchmarks.TranspilerBenchSuite.time_single_gate_compile                                             |
| +        | 129±1ms                       | 149±1ms                                 | 1.16    | transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)                                         |
| +        | 74.7±0.4ms                    | 83.9±3ms                                | 1.12    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                                   |
| +        | 224±2ms                       | 262±9ms                                 | 1.17    | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(2, 'sabre', 'sabre')           |
| +        | 271±4ms                       | 336±6ms                                 | 1.24    | transpiler_qualitative.TranspilerQualitativeBench.time_transpile_time_cnt3_5_180(3, 'sabre', 'sabre')           |
| +        | 11.4±0.2ms                    | 12.9±0.5ms                              | 1.13    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                                 |
| +        | 40.0±0.3ms                    | 46.2±2ms                                | 1.16    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                                    |
| !        | 29.0±0.08s                    | failed                                  | n/a     | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                                             |
| !        | 29.2±0.03s                    | failed                                  | n/a     | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                                            |

@coveralls
Copy link

coveralls commented Jul 7, 2024

Pull Request Test Coverage Report for Build 10045737675

Details

  • 753 of 800 (94.13%) changed or added relevant lines in 17 files are covered.
  • 66 unchanged lines in 9 files lost coverage.
  • Overall coverage increased (+0.001%) to 89.926%

Changes Missing Coverage Covered Lines Changed/Added Lines %
crates/circuit/src/dag_node.rs 52 53 98.11%
crates/accelerate/src/convert_2q_block_matrix.rs 23 26 88.46%
crates/circuit/src/circuit_instruction.rs 241 246 97.97%
crates/circuit/src/packed_instruction.rs 198 214 92.52%
crates/circuit/src/operations.rs 67 89 75.28%
Files with Coverage Reduction New Missed Lines %
qiskit/circuit/library/standard_gates/swap.py 1 98.18%
qiskit/circuit/duration.py 1 70.27%
qiskit/circuit/quantumcircuit.py 1 94.4%
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 2 88.35%
crates/qasm2/src/lex.rs 3 93.13%
crates/circuit/src/imports.rs 3 77.78%
crates/circuit/src/operations.rs 9 85.92%
crates/circuit/src/dag_node.rs 18 80.0%
crates/circuit/src/circuit_instruction.rs 28 89.35%
Totals Coverage Status
Change from base Build 10044784237: 0.001%
Covered Lines: 65770
Relevant Lines: 73138

💛 - Coveralls

@mtreinish
Copy link
Member

I was a bit concerned by these numbers:

Change Before [7d29dc1] <1.1.0^0> After [d9e31ed] Ratio Benchmark (Parameter) >
! 29.0±0.08s failed n/a utility_scale.UtilityScaleBenchmarks.time_qft('cz') >
! 29.2±0.03s failed n/a utility_scale.UtilityScaleBenchmarks.time_qft('ecr')

since failed either indicates it is no longer running or more likely in this case we hit the 60sec timeout. So to validate I ran the utility benchmarks locally with this PR compared against 1.1.0:

Benchmarks that have improved:

| Change   |   Before [7d29dc1b] <1.1.0^0> |   After [a6cd6bb2] <repack-instruction> |   Ratio | Benchmark (Parameter)                                       |
|----------|-------------------------------|-----------------------------------------|---------|-------------------------------------------------------------|
| -        |                          2582 |                                    1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')  |
| -        |                          2582 |                                    1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')  |
| -        |                          2582 |                                    1954 |    0.76 | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr') |

Benchmarks that have stayed the same:

| Change   | Before [7d29dc1b] <1.1.0^0>   | After [a6cd6bb2] <repack-instruction>   |   Ratio | Benchmark (Parameter)                                                     |
|----------|-------------------------------|-----------------------------------------|---------|---------------------------------------------------------------------------|
|          | 1.30±0s                       | 1.42±0s                                 |    1.1  | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')         |
|          | 1.80±0.01s                    | 1.93±0.01s                              |    1.08 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')        |
|          | 1.95±0.01s                    | 2.08±0.01s                              |    1.07 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')         |
|          | 1.13±0s                       | 1.15±0.01s                              |    1.03 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                      |
|          | 2.49±0.01s                    | 2.47±0.02s                              |    0.99 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                     |
|          | 444                           | 435                                     |    0.98 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')  |
|          | 444                           | 435                                     |    0.98 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')  |
|          | 444                           | 435                                     |    0.98 | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr') |
|          | 3.00±0.02s                    | 2.91±0.03s                              |    0.97 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                      |
|          | 20.9±0.02s                    | 19.9±0.1s                               |    0.95 | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                       |
|          | 23.8±0.02s                    | 22.6±0.02s                              |    0.95 | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                      |
|          | 23.8±0.1s                     | 22.4±0.09s                              |    0.94 | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                       |
|          | 1607                          | 1483                                    |    0.92 | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')               |
|          | 1622                          | 1488                                    |    0.92 | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')               |
|          | 1622                          | 1488                                    |    0.92 | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')              |

Benchmarks that have got worse:

| Change   | Before [7d29dc1b] <1.1.0^0>   | After [a6cd6bb2] <repack-instruction>   |   Ratio | Benchmark (Parameter)                                                         |
|----------|-------------------------------|-----------------------------------------|---------|-------------------------------------------------------------------------------|
| +        | 32.1±0.2ms                    | 38.1±0.3ms                              |    1.19 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')  |
| +        | 9.18±0.1ms                    | 10.7±0.1ms                              |    1.17 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')               |
| +        | 9.30±0.06ms                   | 10.8±0.1ms                              |    1.16 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')               |
| +        | 99.8±2ms                      | 116±0.6ms                               |    1.16 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                |
| +        | 101±0.8ms                     | 117±0.2ms                               |    1.16 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                |
| +        | 32.5±0.3ms                    | 37.7±0.2ms                              |    1.16 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
| +        | 101±0.9ms                     | 116±1ms                                 |    1.15 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')               |
| +        | 33.0±0.4ms                    | 37.8±0.2ms                              |    1.15 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')  |
| +        | 9.34±0.09ms                   | 10.7±0.04ms                             |    1.14 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')              |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

So that failed result might be something tied to @jakelishman's local environment. Like it could be are old friend the medium allocator on MacOS with x86_64.

kevinhartman
kevinhartman previously approved these changes Jul 8, 2024
Copy link
Contributor

@kevinhartman kevinhartman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent! It's nice to see your idea of using the low 3 bits of 8-byte aligned pointers to store the operation type enum come to fruition 😄. I'm also glad to see some cleanup of our Rust operation types and conversion to and from Python. The performance and memory improvements are also nice, ofc.

I think the impact this has on our work to port DAGCircuit to Rust should be fairly minimal.

I'm approving this, but holding off on queuing it for merge in case anyone else wants to take a look.

@jlapeyre
Copy link
Contributor

jlapeyre commented Jul 9, 2024

I'd like look at it briefly @kevinhartman .

This was a highly nontrivial merge, since bringing in more control-flow
operations and allowing them to be cast down to standard gates found
some problems in the model (both pre-existing and new for this PR).
@jakelishman
Copy link
Member Author

jakelishman commented Jul 9, 2024

I've fixed the merge conflicts with #12659 in c4f299f, the process of which turned up a couple more ugly corners around conditional / otherwise non-standard "standard" gates during the conversion. Notably: the use of CircuitInstruction::new was masking that we weren't propagating through the pycache in CircuitData::__getitem__, which meant that gates that were semi-standard (e.g. standard gates with labels, conditions, etc) were not being retained as is compatible even with caching on, when they should have been.

Previously, overzealous rejection of standard gates in convert_py_to_operation_type was masking this, by rejecting such gates, even though Rust space does have the necessary representation.

edit: ugh, in the mean time, #12704 merged which invalidates the merge.

There was no actual merge conflict, but still a logical one.
@jakelishman
Copy link
Member Author

Ok, b5a3d9f should have fixed the merge conflict with #12704.

@jlapeyre
Copy link
Contributor

jlapeyre commented Jul 9, 2024

fwiw, I also got no failures running a few parts of the asv set including the two failures above. Using Arch linux with an AMD processor.

@jakelishman
Copy link
Member Author

It turns out that while asv's timeout is 60s, that includes all runs of a benchmark. Since we have a minimum of 2 runs for those the cut-off is actually 30s - I increased the timeout locally and it was running them in ~30.9s on my laptop or something.

The `compose` test had a now-broken assumption, because the Python-space
`is` check is no longer expected to return an identical object when a
standard gate is moved from one circuit to another and has its
components remapped as part of the `compose` operation.  This doesn't
constitute the unpleasant deep-copy that that test is preventing. A
custom gate still satisfies that, however, so we can just change the
test.

`DAGNode::set_name` could cause problems if it was called for the first
time on a `CircuitInstruction` that was for a standard gate; these would
be created as immutable instances.  Given the changes in operator
extraction to Rust space, it can now be the case that a standard gate
that comes in as mutable is unpacked into Rust space, the cache is some
time later invalidated, and then the operation is recreated immutably.
Merged via the queue into Qiskit:main with commit cd6757a Jul 23, 2024
15 checks passed
@jakelishman jakelishman deleted the repack-instruction branch July 23, 2024 20:23
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Jul 24, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Jul 24, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Jul 24, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Jul 24, 2024
ElePT pushed a commit to mtreinish/qiskit-core that referenced this pull request Jul 24, 2024
* Rebalance `CircuitInstruction` and `PackedInstruction`

This is a large overhaul of how circuit instructions are both stored in
Rust (`PackedInstruction`) and how they are presented to Python
(`CircuitInstruction`).  In summary:

* The old `OperationType` enum is now collapsed into a manually managed
  `PackedOperation`.  This is logically equivalent, but stores a
  `PyGate`/`PyInstruction`/`PyOperation` indirectly through a boxed
  pointer, and stores a `StandardGate` inline.  As we expect the vast
  majority of gates to be standard, this hugely reduces the memory
  usage.  The enumeration is manually compressed to a single pointer,
  hiding the discriminant in the low, alignment-required bytes of the
  pointer.

* `PackedOperation::view()` unpacks the operation into a proper
  reference-like enumeration `OperationRef<'a>`, which implements
  `Operation` (though there is also a `try_standard_gate` method to get
  the gate without unpacking the whole enumeration).

* Both `PackedInstruction` and `CircuitInstruction` use this
  `PackedOperation` as the operation storage.

* `PackedInstruction` is now completely the Rust-space format for data,
  and `CircuitInstruction` is purely for communication with Python.

On my machine, this commit brings the utility-scale benchmarks to within
10% of the runtime of 1.1.0 (and some to parity), despite all the
additional overhead.

Changes to accepting and building Python objects
------------------------------------------------

* A `PackedInstruction` is created by copy constructor from a
  `CircuitInstruction` by `CircuitData::pack`.  There is no `pack_owned`
  (really, there never was - the previous method didn't take ownership)
  because there's never owned `CircuitInstruction`s coming in; they're
  Python-space interop, so we never own them (unless we clone them)
  other than when we're unpacking them.

* `PackedInstruction` is currently just created manually when not coming
  from a `CircuitInstruction`.  It's not hard, and makes it easier to
  re-use known intern indices than to waste time re-interning them.
  There is no need to go via `CircuitInstruction`.

* `CircuitInstruction` now has two separated Python-space constructors:
  the old one, which is the default and takes `(operation, qubits,
  clbits)` (and extracts the information), and a new fast-path
  `from_standard` which asks only for the standard gate, qubits and
  params, avoiding operator construction.

* To accept a Python-space operation, extract a Python object to
  `OperationFromPython`.  This extracts the components that are separate
  in Rust space, but joined in Python space (the operation, params and
  extra attributes).  This replaces `OperationInput` and
  `OperationTypeConstruct`, being more efficient at the extraction,
  including providing the data in the formats needed for
  `PackedInstruction` or `CircuitInstruction`.

* To retrieve the Python-space operation, use
  `CircuitInstruction::get_operation` or
  `PackedInstruction::unpack_py_op` as appropriate.  Both will
  cache and reuse the op, if `cache_pygates` is active.  (Though note
  that if the op is created by `CircuitInstruction`, it will not
  propagate back to a `PackedInstruction`.)

Avoiding operation creation
---------------------------

The `_raw_op` field of `CircuitInstruction` is gone, because `PyGate`,
`PyInstruction` and `PyOperation` are no longer pyclasses and no longer
exposed to Python.  Instead, we avoid operation creation by:

* having an internal `DAGNode::_to_circuit_instruction`, which returns a
  copy of the internal `CircuitInstruction`, which can then be used with
  `CircuitInstruction.replace`, etc.

* having `CircuitInstruction::is_standard_gate` to query from Python
  space if we should bother to create the operator.

* changing `CircuitData::map_ops` to `map_nonstandard_ops`, and having
  it only call the Python callback function if the operation is not an
  unconditional standard gate.

Memory usage
------------

Given the very simple example construction script:

```python
from qiskit.circuit import QuantumCircuit

qc = QuantumCircuit(1_000)
for _ in range(3_000):
    for q in qc.qubits:
        qc.rz(0.0, q)
    for q in qc.qubits:
        qc.rx(0.0, q)
    for q in qc.qubits:
        qc.rz(0.0, q)
    for a, b in zip(qc.qubits[:-1], qc.qubits[1:]):
        qc.cx(a, b)
```

This uses 1.5GB in max resident set size on my Macbook (note that it's
about 12 million gates) on both 1.1.0 and with this commit, so we've
undone our memory losses.  The parent of this commit uses 2GB.

However, we're in a strong position to beat 1.1.0 in the future now;
there are two obvious large remaining costs:

* There are 16 bytes per `PackedInstruction` for the Python-operation
  caching (worth about 180MB in this benchmark, since no Python
  operations are actually created).

* There is also significant memory wastage in the current
  `SmallVec<[Param; 3]>` storage of the parameters; for all standard
  gates, we know statically how many parameters are / should be stored,
  and we never need to increase the capacity.  Further, the `Param` enum
  is 16 bytes wide per parameter, of which nearly 8 bytes is padding,
  but for all our current use cases, we only care if _all_ the
  parameters or floats (for everything else, we're going to have to
  defer to Python).  We could move the discriminant out to the level of
  the parameters structure, and save a large amount of padding.

Further work
------------

There's still performance left on the table here:

* We still copy-in and copy-out of `CircuitInstruction` too much right
  now; we might want to make all the `CircuitInstruction` fields
  nullable and have `CircuitData::append` take them by _move_ rather
  than by copy.

* The qubits/clbits interner requires owned arrays going in, but most
  interning should return an existing entry.  We probably want to switch
  to have the interner take references/iterators by default, and clone
  when necessary.  We could have a small circuit optimisation where the
  intern contexts reserve the first n entries to use for an all-to-all
  connectivity interning for up to (say) 8 qubits, since the transpiler
  will want to create a lot of ephemeral small circuits.

* The `Param` vectors are too heavy at the moment; `SmallVec<[Param;
  3]>` is 56 bytes wide, despite the vast majority of gates we care
  about having at most one single float (8 bytes).  Dead padding is a
  large chunk of the memory use currently.

* Fix clippy in no-gate-cache mode

* Fix pylint unused-import complaints

* Fix broken assumptions around the gate model

The `compose` test had a now-broken assumption, because the Python-space
`is` check is no longer expected to return an identical object when a
standard gate is moved from one circuit to another and has its
components remapped as part of the `compose` operation.  This doesn't
constitute the unpleasant deep-copy that that test is preventing. A
custom gate still satisfies that, however, so we can just change the
test.

`DAGNode::set_name` could cause problems if it was called for the first
time on a `CircuitInstruction` that was for a standard gate; these would
be created as immutable instances.  Given the changes in operator
extraction to Rust space, it can now be the case that a standard gate
that comes in as mutable is unpacked into Rust space, the cache is some
time later invalidated, and then the operation is recreated immutably.

* Fix lint

* Fix minor documentation
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Jul 24, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Jul 25, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Jul 25, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jul 25, 2024
…12292)

* Initial: Add `Target` class to `_accelerate`
- Add `Target` class to test mobility between Rust and Python.
- Add `add_instruction` method to test compatibility with instructions.

* Fix: Remove empty property check
- Property check caused most cases to panic.
- Will be commented out and restored at a later time.

* Add: Instructions property
- Instructions property returns all added to the target.
- Similar behavior to source.

* Chore: comments and deprecated methods
- Add comments to instruction property.
- Use new_bound for new PyDicts.

* Chore: Remove redundant code
- Remove redundant transformation of PyObject to PyTuple.
- Remove debugging print statement.

* Add: `InstructionProperties` class and type checkers
- Add `InstructionProperties` class to process properties in rust.
- Add `is_instance` and `is_class` to identify certain Python objects.
- Modify logic of `add_instruction` to use class check.
- Other tweaks and fixes.

* Add: Setter and Getter for calibration in `InstructionProperty`

* Add: `update_instruction_properties` to Target.

* Add: Update_from_instruction_schedule_map
- Partial addition from Target.py\
- Introduction of hashable qarg data structure.
- Other tweaks and fixes.

* Add: Complete `update_from_instruction_schedule_map1
- Complete missing procedures in function.
- Rename `Qargs` to `HashableVec`.
- Make `HashableVec` generic.
- Separate `import_from_module_call` into call0 and call1.
- Other tweaks and fixes.

* Add: instruction_schedule_map property.
- Remove stray print statements.
- Other tweaks and fixes.

* Fix: Key issue in `update_from_instruction_schedule_map`
- Remove all unsafe unwraps

* Fix: Use PyResult Value for void functon
- Update `update_from_instruction_schedule_map to use PyResult and '?' operator.
- Use Bound Python objects whenever possible.
- Other tweaks and fixes.

* Add: Python wrapping for Target
- Add temporary _target module for testing.
- Remove update_from_instruction_schedule_map function back to python.
- Add python properties for all public attributes in rust
- Other tweaks and fixes.

* Add: `qargs` property
- Add identical method `qargs` to obtain the qargs of a target.
- Other tweaks and fixes.

* Add: `qargs_for_operation_name` function.
- Add function with identical behavior to the original in Target.
- Other tweaks and fixes.

* Add: durations method for Target
- Add target module to qiskit init file.
- Remove is_instance method.
- Modify set_calibration method in InstructionProperty to leave typechecking to Python.
- Change rust Target alias to Target2.
- Other tweaks and fixes,

* Add: InstructionProperties wrapper in python

* Fix: InstructionProperties could not receive calibrations
- Fix wrong setters/getters for calibration in InstructionProperty object in rust.

* Add: more methods to Target in `target.rs`
- Add FromPyObject trait to Hashable vec to receive Tuples and transform them directly into this type.
- Add operations_for_qargs for Target class in Rust side and Python.
- Fix return dict keys for `qargs_for_operation_name`.
- Add `timing_constrains` and `operation_from_name` to Python side.
- Other tweaks and fixes.

* Fix: missing return value in `operations_for_args`
- Fix wrong name for function operation_for_qargs.
- Fix missing return value in the python side.
- Other tweaks and fixes.

* Fix: Bad compatibility with InstructionProperties
- Make `InstructionProperties` "_calibration" attribute visible.
- Removed attribute "calibration", treat as class property.
- Other tweaks and fixes

* Add: `operation_names_for_qargs` to Target
- Port class method to rust and connect to Python wrapper.
- Other tweaks and fixes.

* Add: instruction_supported method to rust and python:
- Other tweaks and fixes.

* Add: changes to add_instruction function to increase functionality.
- These changes break current functionality of other functions, butemulate intended behavior better.
- Fixes coming soon.

* Fix: Backwards compatibility with `add_instruction`
- Fixed wrong additions to HashMaps in the rust side causing instructions to be missing.
- Other tweaks and fixes.

* Fix: Gate Map behavior didn't match #11422
- Make GateMap use optional values to match behavior of #11422.
- Define GateMapType for complex type in self.gate_map.
- Throw Python KeyError exceptions from the rust side in `update_instruction_properties` and other functions.
- Modify logic in subsequent functions that use gate_map optional values.
- Other tweaks and fixes.

* Add: `has_calibration` method to Target

* Add: `get_calibraton` method to Target

* Add: `instruction_properties` method to Target

* Add: `build_coupling_map` and helper methods
- `build_coupling_map`will remain in Python for now, along with its helper functions.
- Make `gate_name_map` visible to python.
- Add `coupling_graph` attribute to Target in Rust.
- Other tweaks and fixes.

* Add: `get_non_global_operation_names` to Target.
- Add attributes `non_global_strict_basis` and `non_global_basis` as Optional.
- Other tweaks and fixes.

* Add: Missing properties
- Add properties: operations, operation_names, and physical_qubits.
- Reorganize properties placement.
- Other tweaks and fixes.

* Add: `from_configuration` classmethod to Target.
- Add method that mimics the behavior of the python method.
- Change concurrent_measurements to 2d Vec instead of a Vec of sets.
- Other tweaks and fixes.

* Add: Magic methods to Rust and Python
- Add docstring to __init__.
- Add __iter__, __getitem__, __len__, __contains__, keys, values, and items methods to rust.
- Add equivalen methods to python + the __str__ method.
- Make description an optional attribute in rust.
- Other tweaks and fixes.

* Fix: Bugs when fetching qargs or operations
- Fix qarg_for_operation_name logic to account for None and throw correct exceptions.
- Stringify description before sending in case of numerical descriptors.
- Fix qarg to account for None entry.
- Other tweaks and fixes.

* Chore: Prepare for Draft PR
- Remove _target.py testing file.
- Fix incorrect initialization of calibration in InstructionProperties.
- Other tweaks and fixes.

* Fix: target not being recognized as a module
- Add target to the pyext crate.
- Change placement of target import for alphabetical ordering.
- Other tweaks and fixes.

* Fix: Change HashMap to IndexMap
- Change from f32 to f64 precision.
- Other tweaks and fixes.

* Fix: Move InstructionProperties fully to Rust
- Move InstructionProperties to rust.
- Modify gate_map to accept an InstructionProprties object instead of PyObjecy.
- Change update_instruction_properties to use Option InstructionProprtyird.
- Remove InstructionProperties from target.py
- Other tweaks and fixes.

* Fix: Make Target inherit from Rust
- Make Target inherit from the rust-side Target by using subclass attribute, then extending its functionality using python.
- Switch from __init__ to __new__ to adapt to the Target rust class.
- Modify every trait that worked with `target._Target` to use `super()` or `self` instead.
- Fix repr in InstructionProperties to not show `Some()` when a value exists.
- Fix `__str__` method in `Target` to not display "None" if no description is given.
- Assume `num_qubits` is the first argument when an integer is provided as a first argument and nothing else is provided for second (Target initializer).
- Return a set in operation_names instead of a Vec.
- Other tweaks and fixes.

* Fix: Recognize None in `operation_for_qargs`.
- Fix module labels for each class in target.rs.
- Use py.is_instance instead of passing isinstance to `instruction_supported`.
- Modify `operations_for_qargs` to accept optional values less aggressively. Allow it to find instructions with no qargs. (NoneType).
- Other tweaks and fixes.

* Fix: Make InstructionProperties subclassable.
- Fix get_non_global_operation_names to accept optional values and fix search set to use sorted values.
- Fix __repr__ method in InstructionProperties to add punctuation.
- Fix typo in python durations method.
- Modify test to overload __new__ method instead of just  __init__ (Possible breaking change).
-Other tweaks and fixes.

* Fix: errors in `instruction_properties` and others:
- Allow `instruction_properties` method to view optional properties.
- Allow `operation_names_for_qargs` to select class instructions when None is passed as a qarg.
- Modify __str__ method to display error and duration times as int if the value is 0.
- Other tweaks and fixes.

* Fix: call `isclass` from rust, instead of passing it from Python.

* Fix: Move `update_from_instruction_schedule_map` to rust.

* Fix: Move `durations` to rust.

* Fix: Move `timing_constraints` to rust

* Fix: Move operations_from_name fully to rust

* Fix: `instruction_supported` method:
- Rewrite the logic of instruction_supported due to previous errors in the method.
- Move `check_obj_params` to Rust.
- Other tweaks and fixes.

* Fix: errors in `from_configuration` class method.
- Fix some of the logic when retrieving gates from `name_mapping`.
- Remove function arguments in favor of implementing counterpart functions in rust.
- Add qubit_props_list_from_props function and return rust datatypes.
- Fix wrong error handling procedures when retrieving attributes from backend_property.
- Other tweaks and fixes.

* Fix: Import `InstructionScheduleMap` directly instead of passing.
- `instruction_schedule_map()` now imports the classtype directly from rust instead of needing it to be passed from python.
- Remove unused imports in `target.py`.
- Ignore unused arguments in `test_extra_props_str`.
- Other tweaks and fixes.

* Docs: Add docstrings to rust functions
- Remove redundant redefinitions in python.
- Fix text_signatures for some rust functions.
- Added lint exceptions to some necessary imports and function arguments.
- Other tweaks and fixes.

* Add: Make `Target` and `InstructionProperties` pickleable.
- Add `__getstate__` and `__setstate__` methods to make both rust subclasses pickleable.

* Fix: Wrong calibration assignment in __setstate__
- Use set_calibration to set the correct calibration argument.
- Fix wrong signature in get_non_global_operation_names.
- Other tweaks and fixes.

* Refactor: HashableVec is now Qarg
- Use `PhysicalQubit` instead of u32 for qargs.
- Use a `SmallVec` of size 4 instead of a dynamic Vec.
- Default to using the `Hash()` method embedded in `SmallVec`.
- Add a Default method to easily unwrap Qarg objects.
- Other tweaks and fixes.

* Add: `get` function to target.
- Remove some redundant cloning in code.
- Other small fixes.

* Fix: Remove unnecessary Optional values in gate_map.
- Update every gate_map call to use the new format.
- Other small tweaks and fixes.

* Refactor: `calibration` is for `InstructionProperties`
- Use python `None` instead of option to store `calibration` in `InstructionProperties`.
- Adapt code to these changes.
- Remove redundant implementation of Hash in Qargs.
- Other tweaks and fixes.

* Fix: Temporary speedup for `gate_map` access
- Added temporary speedups to access the gate_map by returning the values as PyObjects.
- Convert qargs to rust tuples instead of initializing a `PyTuple`.
- Store `InstructionProperties` as a python ref in gate_map. (Will be changed in future updates).
- Other tweaks anf fixes.

* Fix: Incorrect extractions for `InstructionProperties`
- Fix incorrect conversion of `InstructionProperties` to `Py<InstructionProperties>`
- Fix incorrect extraction of qargs in `update_from_instruction_schedule_map`

* Fix: Hide all private attributes in `Target`
- Hide all private attributes of the `Target` to prevent unecessary cloning.
- Other small tweaks and fixes.

* Add: New representation of gate_map using new pyclasses:
- Make Qarg a sequence pyclass.
- Make QargPropsMap the new representation of a GateMap value.
- Adapt the code to new structure.
- TODO: Add missing magic methods for sequence and mapping objects.
- Other small tweaks and fixes.

* Add: Use custom datatypes to return values to Python.
- Add QargSet datatype to return a set of qargs.
   - Works as return type for `Target.qargs`
   - Object is has itertype of QargSetIter.
- Rename QargPropMap to PropsMap
   - Use iterator type IterPropsMap
- Other small tweaks and fixes.

* Fix: Extend `InstructionProperties` to be subclassable using `__init__:
- Made a subclass of `InstructionProperties` that can be extended using an `__init__`method.
- Revert previous changes to `test_target.py`.
- Other tweaks and fixes.

* Refactor: Split target into its own module
- Reestructure the files to improve readability of code.
   - `instruction_properties.rs` contaisn the `InstructionProperties` class.
   - `mod.rs` contains the `Target` class.
   - `qargs.rs` contains the Qargs struct to store quantum arguments.
   - `property_map` contains the Qarg: Property Mapping that will be stored in the gate_map.
- Add missing methods to PropsMap:
   - Add `PropsMapKeys` object to store the qargs as a set.
   - Add methods to compare and access `PropsMapKey`.
- Add QargsOrTuple enum in Qargs to parse Qargs instantly.

* Fix: Rest of failing tests in Target
- Modify the `InstructionProperties` python wrapper.
   - InstructionProperties was not assigning properties to rust side.
- Make duration in `InstructionProperties` setable.
- Add `__eq__` method for `PropMap` to compare with other dicts.
- `PropMapKeys` can only be compared with a Set.
- Remove `qargs_for_operation_name` from `target.py`
- Other small tweaks and fixes.

* Add: New GateMap Structure
- GateMap is now its own mapping object.
- Add `__setstate__` and `__getstate__` methods for `PropMap` and `GateMap`.
- Other small tweaks and fixes.

* Fix: Make new subclasses pickleable
- Add module location to `PropsMap`, `GateMap`, and `Qargs`.
- Added default method to PropMap.
- Made default method part of class initializers.
- Other smalls tweaks and fixes.

* Fix: Remove redundant lookup in Target (#12373)

* Format: `mod.rs` qubit_comparison to one line.

* Add: `GateMapKeys` object in GateMap:
- Use IndexSet as a base to preserve the insertion order.
- Other tweaks and fixes.

* Add: __sub__ method to GateMapKeys

* Fix: Modify `GateMap` to store values in Python heap.
- Fix `GateMap.__iter__` to use an IndexKeys iterator.
- Other small tweaks and fixes.

* Fix: Remove duplicate import of `IndexSet::into_iter` in `GateMap`.
- Make `__iter__` use the keys() method in `GateMap`.

* Fix:: Adapt to target changes (#12288)
- Fix lint stray imports.

* Fix: Incorrect creation of parameters in `update_from_instruction_schedule_map`
- Add `tupelize` function to create tuples from non-downcastable items.
- Fix creation of Parameters by iterating through members of tuple object and mapping them to parameters in `update_from_instruction_schedule_map`.
- Add missing logic for creating a Target with/without `qubit_properties`.
- Add tuple conversion of `Qargs` to store items in a dict in `BasisTranslator` and `UnitarySynthesis` passes.
- Cast `PropsMap` object to dict when comparing in `test_fake_backends.py`.
- Modify logic of helper functions that receive a bound object reference, a second `py` not required as an argument.
- Add set operation methods to `GateMapKeys`.
- Other tweaks and fixes.

* Fix: More failing tests
- Fix repeated erroneous calls to `add_instruction` in `update_from_instruction_schedule_map`
- Add missing condition in `instruction_supported`
- Use `IndexSet` instead of `HashSet` for `QargsSet`.
- Other small tweaks and fixes.

* Add: Macro rules for qargs and other sequences.
- Create `QargSet` and `PropsMap` using the new macros.
- Return a `TargetOpNames` ordered set to python in `operation_names`.
- Remove the Python side `operation_names.`
- Fix faulty docstring in `target.py`.
- Other tweaks and fixes.

* Docs: Add necessary docstrings to all new rust functions.
- Remove duplicate Iterator in GateMap.
- Other small tweaks and fixes.

* Fix: Use `GILOneCell` and remove `Qargs`
- Use `GILOneCell` to import python modules only once at initialization.
- Remove the custom data structure `Qargs` to avoid conversion overhead.
- `Qargs` does not use `PhysicalQubits`, `u32` is used instead.
- Fix `__setstate__ `and `__getstate__` methods for `PropsMap`, `GateMap`, and `key_like_set_iterator` macro_rule.
- Update code to use the new structures.
- TODO: Fix broken tests.

* Fix: Cast `Qargs` to `Tuple` in specific situations
- Use tupleize to cast `Qargs` to `Tuple` in `instructions`.
- Use downcast to extract string in `add_instruction`.
- Other tweaks and fixes.

* Add: Make `Target` Representable in Rust
- Rename `InstructionProperties` as `BaseInstructionProperties`.
   - Remove `Calibration` from the rust space.
- Restore `gate_map`, `coupling_map`, `instruction_schedule_map`, and `instruction_durations` to rust.
- Remove all unnecessary data structures from rust space.
- Other tweaks and fixes.

* Refactor: Remove previour changes to unrelated files.

* Add: rust native functions to target
- Added rust native functionality to target such that a `py` would not be needed to use one.
- Add Index trait to make `Target` subscriptable.
- Other small tweaks and fixes.

* Fix: Remove all unnecessary python method calls.
- Remove uage of `inspect.isclass`.
- Rename `Target` to `BaseTarget` in the rust side.
- Rename `err.rs` to `errors.rs`.
- Remove rust-native `add_inst` and `update_inst` as Target should not be modified from Rust.
- Made `add_instruction` and `update_instruction_properties` private in `BaseTarget`.
- Add missing `get` method in `Target`.
- Other tweaks and fixes

* Format: Fix lint

* Fix: Wrong return value for `BaseTarget.qargs`

* Add: Temporary Instruction representation in rust.
- Add temporary instruction representation to avoid repeated extraction from python.

* Add: Native representation of coupling graph

* Fix: Wrong attribute extraction for `GateRep`

* Remove: `CouplingGraph` rust native representation.
- Move to different PR.

* Format: Remove stray whitespace

* Add: `get_non_global_op_names` as a rust native function

* Fix: Use Ahash for Hashing
- Use ahash for hashing when possible.
- Rename `BaseTarget` to `Target` in rust only.
- Rename `BaseInstructionProperties` to `InstructionProperties` in rust only.
- Remove optional logic from `generate_non_global_op_names`.
- Use dict for `__setstate__` and `__getstate__` in `Target`.
- Reduced the docstring for `Target` and `InstructionProperties`.
- Other small tweaks and fixes.

* Format: new changes to `lib.rs`

* Format: Adapt to new lint rules

* Fix: Use new gates infrastructure (#12459)
- Create custom enum to collect either a `NormalOperation` or a `VariableOperation` depending on what is needed.
- Add a rust native `is_instruction_supported` method to check whether a Target supports a certain instruction.
- Make conversion methods from `circuit_instruction.rs` public.
- Add comparison methods for `Param` in `operations.rs`
- Remove need for isclass method in rustwise `add_instruction`
- Other tweaks and fixes.

* Format: Fix rust formatting

* Add: rust-native method to obtain Operstion objects.

* Add: Comparison methods for `Param`

* FIx: Add display methods for `Params`

* Format: Fix lint test

* Format: Wrong merge conflict solve

* Fix: Improve rust methods to use iterators.
- Adapt the Python methods to leverage the rust native improvements.
- Use python native structures for the Python methods.

* Format: Remove extra blankspace

* Fix: Remove `text_signature`, use `signature` instead.

* Fix: Rectify the behavior of `qargs`
- Keep insertion order by inserting all qargs into a `PySet`.
- Perform conversion to `PyTuple` at insertion time leveraging the iterator architecture.
- Remove python side counterpart to avoid double iteration.
- Make rust-native `qargs` return an iterator.

* Fix: Corrections from Matthew's review
- Use `format!` for repr method in `InstructionProperties`
- Rename `Variable` variant of `TargetInstruction` to `Variadic`.
- Remove `internal_name` attribute from `TargetOperation`.
- Remove `VariableOperation` class.
- Use `u32` for `granularity`, `pulse_alignment`, and `acquire_alignment`.
- Use `Option` to store nullable `concurrent_measurements.
- Use `&str` instead of `String` for most function arguments.
- Use `swap_remove` to deallocate items from the provided `properties` map in `add_instruction`.
- Avoid cloning instructions, use `to_object()` instead.
- Avoid using `.to_owned()`, use `.clone()` instead.
- Remove mention of `RandomState`, use `ahash::HashSet` instead.
- Move parameter check to python in `instruction_supported`.
- Avoid exposing private attributes, use the available ones instead.
- Filter out `Varidadic` Instructions as they're not supported in rust.
- Use peekable iterator to peak at the next qargs in `generate_non_global_op_names`.
- Rename `qarg_set` to `deduplicated_qargs` in `generate_non_global_op_names`.
- Return iterator instances instead of allocated `Vec`.
- Add `python_compare` and `python_is_instance` to perform object comparison with objects that satisfy the `ToPyObject` trait.
- Other small tweaks and fixes.

* Implement a nullable dict-like structure for IndexMap (#2)

* Initial: Implement a nullable dict-like structure for IndexMap

* FIx: Erroneous item extraction from Python
- Fix error that caused `None` values to be ignored from `None` keys.
- Removed mutability from rust function argument in `add_instruction`.
   - Object is mutably referenced after option unwrapping.
- Add missing header in `nullable_index_map.rs`.
- Add Clone as a `K` and/or `V` constraint in some of the iterators.
- Remove `IntoPy` constraint from `NullableIndexMap<K, V>`.
- Add `ToPyObject` trait to `NullableIndexMap<K, V>`.

* Fix: inplace modification of Python dict.
- Perform `None` extraction from rust.
- Revert changes to `Target.py`

* Fix: Avoid double iteration by using filter_map.

* Docs: Add inline comments.

* Fix: More specific error message in `NullableIndexMap`

* Fix: Use `Mapping` as the metaclass for `Target`
- Minor corrections from Matthew's review.

* Fix: Make `Target` crate-private.
- Due to the private nature of `NullableIndexMap`, the `Target` has to be made crate private.
- Add temporary`allow(dead_code)` flag for the unused `Target` and `NullableIndexMap` methods.
- Fix docstring of `Target` struct.
- Fix docstring of `add_instruction`.
- Make several python-only operations public so they can be used with other `PyClass` instances as long as they own the gil.
- Modify `py_instruction_supported` to accept bound objects.
- Use rust-native functions for some of the instance properties.
- Rewrite `instruction` to return parameters as slice.
- `operation_names` returns an `ExactSizeIterator`.
- All rust-native methods that return an `OperationType` object, will return a `NormalOperation` instance which includes the `OperationType` and the parameters.

* Fix: Comments from Matthew's review
- Mention duplication in docstring for rust Target.
- Use f"{*:g}" to avoid printing the floating point for 0 in `Target`'s repr method.
- Add note mentioning future unit-tests in rust.

* Fix: Adapt to #12730
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Jul 25, 2024
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
* Rebalance `CircuitInstruction` and `PackedInstruction`

This is a large overhaul of how circuit instructions are both stored in
Rust (`PackedInstruction`) and how they are presented to Python
(`CircuitInstruction`).  In summary:

* The old `OperationType` enum is now collapsed into a manually managed
  `PackedOperation`.  This is logically equivalent, but stores a
  `PyGate`/`PyInstruction`/`PyOperation` indirectly through a boxed
  pointer, and stores a `StandardGate` inline.  As we expect the vast
  majority of gates to be standard, this hugely reduces the memory
  usage.  The enumeration is manually compressed to a single pointer,
  hiding the discriminant in the low, alignment-required bytes of the
  pointer.

* `PackedOperation::view()` unpacks the operation into a proper
  reference-like enumeration `OperationRef<'a>`, which implements
  `Operation` (though there is also a `try_standard_gate` method to get
  the gate without unpacking the whole enumeration).

* Both `PackedInstruction` and `CircuitInstruction` use this
  `PackedOperation` as the operation storage.

* `PackedInstruction` is now completely the Rust-space format for data,
  and `CircuitInstruction` is purely for communication with Python.

On my machine, this commit brings the utility-scale benchmarks to within
10% of the runtime of 1.1.0 (and some to parity), despite all the
additional overhead.

Changes to accepting and building Python objects
------------------------------------------------

* A `PackedInstruction` is created by copy constructor from a
  `CircuitInstruction` by `CircuitData::pack`.  There is no `pack_owned`
  (really, there never was - the previous method didn't take ownership)
  because there's never owned `CircuitInstruction`s coming in; they're
  Python-space interop, so we never own them (unless we clone them)
  other than when we're unpacking them.

* `PackedInstruction` is currently just created manually when not coming
  from a `CircuitInstruction`.  It's not hard, and makes it easier to
  re-use known intern indices than to waste time re-interning them.
  There is no need to go via `CircuitInstruction`.

* `CircuitInstruction` now has two separated Python-space constructors:
  the old one, which is the default and takes `(operation, qubits,
  clbits)` (and extracts the information), and a new fast-path
  `from_standard` which asks only for the standard gate, qubits and
  params, avoiding operator construction.

* To accept a Python-space operation, extract a Python object to
  `OperationFromPython`.  This extracts the components that are separate
  in Rust space, but joined in Python space (the operation, params and
  extra attributes).  This replaces `OperationInput` and
  `OperationTypeConstruct`, being more efficient at the extraction,
  including providing the data in the formats needed for
  `PackedInstruction` or `CircuitInstruction`.

* To retrieve the Python-space operation, use
  `CircuitInstruction::get_operation` or
  `PackedInstruction::unpack_py_op` as appropriate.  Both will
  cache and reuse the op, if `cache_pygates` is active.  (Though note
  that if the op is created by `CircuitInstruction`, it will not
  propagate back to a `PackedInstruction`.)

Avoiding operation creation
---------------------------

The `_raw_op` field of `CircuitInstruction` is gone, because `PyGate`,
`PyInstruction` and `PyOperation` are no longer pyclasses and no longer
exposed to Python.  Instead, we avoid operation creation by:

* having an internal `DAGNode::_to_circuit_instruction`, which returns a
  copy of the internal `CircuitInstruction`, which can then be used with
  `CircuitInstruction.replace`, etc.

* having `CircuitInstruction::is_standard_gate` to query from Python
  space if we should bother to create the operator.

* changing `CircuitData::map_ops` to `map_nonstandard_ops`, and having
  it only call the Python callback function if the operation is not an
  unconditional standard gate.

Memory usage
------------

Given the very simple example construction script:

```python
from qiskit.circuit import QuantumCircuit

qc = QuantumCircuit(1_000)
for _ in range(3_000):
    for q in qc.qubits:
        qc.rz(0.0, q)
    for q in qc.qubits:
        qc.rx(0.0, q)
    for q in qc.qubits:
        qc.rz(0.0, q)
    for a, b in zip(qc.qubits[:-1], qc.qubits[1:]):
        qc.cx(a, b)
```

This uses 1.5GB in max resident set size on my Macbook (note that it's
about 12 million gates) on both 1.1.0 and with this commit, so we've
undone our memory losses.  The parent of this commit uses 2GB.

However, we're in a strong position to beat 1.1.0 in the future now;
there are two obvious large remaining costs:

* There are 16 bytes per `PackedInstruction` for the Python-operation
  caching (worth about 180MB in this benchmark, since no Python
  operations are actually created).

* There is also significant memory wastage in the current
  `SmallVec<[Param; 3]>` storage of the parameters; for all standard
  gates, we know statically how many parameters are / should be stored,
  and we never need to increase the capacity.  Further, the `Param` enum
  is 16 bytes wide per parameter, of which nearly 8 bytes is padding,
  but for all our current use cases, we only care if _all_ the
  parameters or floats (for everything else, we're going to have to
  defer to Python).  We could move the discriminant out to the level of
  the parameters structure, and save a large amount of padding.

Further work
------------

There's still performance left on the table here:

* We still copy-in and copy-out of `CircuitInstruction` too much right
  now; we might want to make all the `CircuitInstruction` fields
  nullable and have `CircuitData::append` take them by _move_ rather
  than by copy.

* The qubits/clbits interner requires owned arrays going in, but most
  interning should return an existing entry.  We probably want to switch
  to have the interner take references/iterators by default, and clone
  when necessary.  We could have a small circuit optimisation where the
  intern contexts reserve the first n entries to use for an all-to-all
  connectivity interning for up to (say) 8 qubits, since the transpiler
  will want to create a lot of ephemeral small circuits.

* The `Param` vectors are too heavy at the moment; `SmallVec<[Param;
  3]>` is 56 bytes wide, despite the vast majority of gates we care
  about having at most one single float (8 bytes).  Dead padding is a
  large chunk of the memory use currently.

* Fix clippy in no-gate-cache mode

* Fix pylint unused-import complaints

* Fix broken assumptions around the gate model

The `compose` test had a now-broken assumption, because the Python-space
`is` check is no longer expected to return an identical object when a
standard gate is moved from one circuit to another and has its
components remapped as part of the `compose` operation.  This doesn't
constitute the unpleasant deep-copy that that test is preventing. A
custom gate still satisfies that, however, so we can just change the
test.

`DAGNode::set_name` could cause problems if it was called for the first
time on a `CircuitInstruction` that was for a standard gate; these would
be created as immutable instances.  Given the changes in operator
extraction to Rust space, it can now be the case that a standard gate
that comes in as mutable is unpacked into Rust space, the cache is some
time later invalidated, and then the operation is recreated immutably.

* Fix lint

* Fix minor documentation
Procatv pushed a commit to Procatv/qiskit-terra-catherines that referenced this pull request Aug 1, 2024
…iskit#12292)

* Initial: Add `Target` class to `_accelerate`
- Add `Target` class to test mobility between Rust and Python.
- Add `add_instruction` method to test compatibility with instructions.

* Fix: Remove empty property check
- Property check caused most cases to panic.
- Will be commented out and restored at a later time.

* Add: Instructions property
- Instructions property returns all added to the target.
- Similar behavior to source.

* Chore: comments and deprecated methods
- Add comments to instruction property.
- Use new_bound for new PyDicts.

* Chore: Remove redundant code
- Remove redundant transformation of PyObject to PyTuple.
- Remove debugging print statement.

* Add: `InstructionProperties` class and type checkers
- Add `InstructionProperties` class to process properties in rust.
- Add `is_instance` and `is_class` to identify certain Python objects.
- Modify logic of `add_instruction` to use class check.
- Other tweaks and fixes.

* Add: Setter and Getter for calibration in `InstructionProperty`

* Add: `update_instruction_properties` to Target.

* Add: Update_from_instruction_schedule_map
- Partial addition from Target.py\
- Introduction of hashable qarg data structure.
- Other tweaks and fixes.

* Add: Complete `update_from_instruction_schedule_map1
- Complete missing procedures in function.
- Rename `Qargs` to `HashableVec`.
- Make `HashableVec` generic.
- Separate `import_from_module_call` into call0 and call1.
- Other tweaks and fixes.

* Add: instruction_schedule_map property.
- Remove stray print statements.
- Other tweaks and fixes.

* Fix: Key issue in `update_from_instruction_schedule_map`
- Remove all unsafe unwraps

* Fix: Use PyResult Value for void functon
- Update `update_from_instruction_schedule_map to use PyResult and '?' operator.
- Use Bound Python objects whenever possible.
- Other tweaks and fixes.

* Add: Python wrapping for Target
- Add temporary _target module for testing.
- Remove update_from_instruction_schedule_map function back to python.
- Add python properties for all public attributes in rust
- Other tweaks and fixes.

* Add: `qargs` property
- Add identical method `qargs` to obtain the qargs of a target.
- Other tweaks and fixes.

* Add: `qargs_for_operation_name` function.
- Add function with identical behavior to the original in Target.
- Other tweaks and fixes.

* Add: durations method for Target
- Add target module to qiskit init file.
- Remove is_instance method.
- Modify set_calibration method in InstructionProperty to leave typechecking to Python.
- Change rust Target alias to Target2.
- Other tweaks and fixes,

* Add: InstructionProperties wrapper in python

* Fix: InstructionProperties could not receive calibrations
- Fix wrong setters/getters for calibration in InstructionProperty object in rust.

* Add: more methods to Target in `target.rs`
- Add FromPyObject trait to Hashable vec to receive Tuples and transform them directly into this type.
- Add operations_for_qargs for Target class in Rust side and Python.
- Fix return dict keys for `qargs_for_operation_name`.
- Add `timing_constrains` and `operation_from_name` to Python side.
- Other tweaks and fixes.

* Fix: missing return value in `operations_for_args`
- Fix wrong name for function operation_for_qargs.
- Fix missing return value in the python side.
- Other tweaks and fixes.

* Fix: Bad compatibility with InstructionProperties
- Make `InstructionProperties` "_calibration" attribute visible.
- Removed attribute "calibration", treat as class property.
- Other tweaks and fixes

* Add: `operation_names_for_qargs` to Target
- Port class method to rust and connect to Python wrapper.
- Other tweaks and fixes.

* Add: instruction_supported method to rust and python:
- Other tweaks and fixes.

* Add: changes to add_instruction function to increase functionality.
- These changes break current functionality of other functions, butemulate intended behavior better.
- Fixes coming soon.

* Fix: Backwards compatibility with `add_instruction`
- Fixed wrong additions to HashMaps in the rust side causing instructions to be missing.
- Other tweaks and fixes.

* Fix: Gate Map behavior didn't match Qiskit#11422
- Make GateMap use optional values to match behavior of Qiskit#11422.
- Define GateMapType for complex type in self.gate_map.
- Throw Python KeyError exceptions from the rust side in `update_instruction_properties` and other functions.
- Modify logic in subsequent functions that use gate_map optional values.
- Other tweaks and fixes.

* Add: `has_calibration` method to Target

* Add: `get_calibraton` method to Target

* Add: `instruction_properties` method to Target

* Add: `build_coupling_map` and helper methods
- `build_coupling_map`will remain in Python for now, along with its helper functions.
- Make `gate_name_map` visible to python.
- Add `coupling_graph` attribute to Target in Rust.
- Other tweaks and fixes.

* Add: `get_non_global_operation_names` to Target.
- Add attributes `non_global_strict_basis` and `non_global_basis` as Optional.
- Other tweaks and fixes.

* Add: Missing properties
- Add properties: operations, operation_names, and physical_qubits.
- Reorganize properties placement.
- Other tweaks and fixes.

* Add: `from_configuration` classmethod to Target.
- Add method that mimics the behavior of the python method.
- Change concurrent_measurements to 2d Vec instead of a Vec of sets.
- Other tweaks and fixes.

* Add: Magic methods to Rust and Python
- Add docstring to __init__.
- Add __iter__, __getitem__, __len__, __contains__, keys, values, and items methods to rust.
- Add equivalen methods to python + the __str__ method.
- Make description an optional attribute in rust.
- Other tweaks and fixes.

* Fix: Bugs when fetching qargs or operations
- Fix qarg_for_operation_name logic to account for None and throw correct exceptions.
- Stringify description before sending in case of numerical descriptors.
- Fix qarg to account for None entry.
- Other tweaks and fixes.

* Chore: Prepare for Draft PR
- Remove _target.py testing file.
- Fix incorrect initialization of calibration in InstructionProperties.
- Other tweaks and fixes.

* Fix: target not being recognized as a module
- Add target to the pyext crate.
- Change placement of target import for alphabetical ordering.
- Other tweaks and fixes.

* Fix: Change HashMap to IndexMap
- Change from f32 to f64 precision.
- Other tweaks and fixes.

* Fix: Move InstructionProperties fully to Rust
- Move InstructionProperties to rust.
- Modify gate_map to accept an InstructionProprties object instead of PyObjecy.
- Change update_instruction_properties to use Option InstructionProprtyird.
- Remove InstructionProperties from target.py
- Other tweaks and fixes.

* Fix: Make Target inherit from Rust
- Make Target inherit from the rust-side Target by using subclass attribute, then extending its functionality using python.
- Switch from __init__ to __new__ to adapt to the Target rust class.
- Modify every trait that worked with `target._Target` to use `super()` or `self` instead.
- Fix repr in InstructionProperties to not show `Some()` when a value exists.
- Fix `__str__` method in `Target` to not display "None" if no description is given.
- Assume `num_qubits` is the first argument when an integer is provided as a first argument and nothing else is provided for second (Target initializer).
- Return a set in operation_names instead of a Vec.
- Other tweaks and fixes.

* Fix: Recognize None in `operation_for_qargs`.
- Fix module labels for each class in target.rs.
- Use py.is_instance instead of passing isinstance to `instruction_supported`.
- Modify `operations_for_qargs` to accept optional values less aggressively. Allow it to find instructions with no qargs. (NoneType).
- Other tweaks and fixes.

* Fix: Make InstructionProperties subclassable.
- Fix get_non_global_operation_names to accept optional values and fix search set to use sorted values.
- Fix __repr__ method in InstructionProperties to add punctuation.
- Fix typo in python durations method.
- Modify test to overload __new__ method instead of just  __init__ (Possible breaking change).
-Other tweaks and fixes.

* Fix: errors in `instruction_properties` and others:
- Allow `instruction_properties` method to view optional properties.
- Allow `operation_names_for_qargs` to select class instructions when None is passed as a qarg.
- Modify __str__ method to display error and duration times as int if the value is 0.
- Other tweaks and fixes.

* Fix: call `isclass` from rust, instead of passing it from Python.

* Fix: Move `update_from_instruction_schedule_map` to rust.

* Fix: Move `durations` to rust.

* Fix: Move `timing_constraints` to rust

* Fix: Move operations_from_name fully to rust

* Fix: `instruction_supported` method:
- Rewrite the logic of instruction_supported due to previous errors in the method.
- Move `check_obj_params` to Rust.
- Other tweaks and fixes.

* Fix: errors in `from_configuration` class method.
- Fix some of the logic when retrieving gates from `name_mapping`.
- Remove function arguments in favor of implementing counterpart functions in rust.
- Add qubit_props_list_from_props function and return rust datatypes.
- Fix wrong error handling procedures when retrieving attributes from backend_property.
- Other tweaks and fixes.

* Fix: Import `InstructionScheduleMap` directly instead of passing.
- `instruction_schedule_map()` now imports the classtype directly from rust instead of needing it to be passed from python.
- Remove unused imports in `target.py`.
- Ignore unused arguments in `test_extra_props_str`.
- Other tweaks and fixes.

* Docs: Add docstrings to rust functions
- Remove redundant redefinitions in python.
- Fix text_signatures for some rust functions.
- Added lint exceptions to some necessary imports and function arguments.
- Other tweaks and fixes.

* Add: Make `Target` and `InstructionProperties` pickleable.
- Add `__getstate__` and `__setstate__` methods to make both rust subclasses pickleable.

* Fix: Wrong calibration assignment in __setstate__
- Use set_calibration to set the correct calibration argument.
- Fix wrong signature in get_non_global_operation_names.
- Other tweaks and fixes.

* Refactor: HashableVec is now Qarg
- Use `PhysicalQubit` instead of u32 for qargs.
- Use a `SmallVec` of size 4 instead of a dynamic Vec.
- Default to using the `Hash()` method embedded in `SmallVec`.
- Add a Default method to easily unwrap Qarg objects.
- Other tweaks and fixes.

* Add: `get` function to target.
- Remove some redundant cloning in code.
- Other small fixes.

* Fix: Remove unnecessary Optional values in gate_map.
- Update every gate_map call to use the new format.
- Other small tweaks and fixes.

* Refactor: `calibration` is for `InstructionProperties`
- Use python `None` instead of option to store `calibration` in `InstructionProperties`.
- Adapt code to these changes.
- Remove redundant implementation of Hash in Qargs.
- Other tweaks and fixes.

* Fix: Temporary speedup for `gate_map` access
- Added temporary speedups to access the gate_map by returning the values as PyObjects.
- Convert qargs to rust tuples instead of initializing a `PyTuple`.
- Store `InstructionProperties` as a python ref in gate_map. (Will be changed in future updates).
- Other tweaks anf fixes.

* Fix: Incorrect extractions for `InstructionProperties`
- Fix incorrect conversion of `InstructionProperties` to `Py<InstructionProperties>`
- Fix incorrect extraction of qargs in `update_from_instruction_schedule_map`

* Fix: Hide all private attributes in `Target`
- Hide all private attributes of the `Target` to prevent unecessary cloning.
- Other small tweaks and fixes.

* Add: New representation of gate_map using new pyclasses:
- Make Qarg a sequence pyclass.
- Make QargPropsMap the new representation of a GateMap value.
- Adapt the code to new structure.
- TODO: Add missing magic methods for sequence and mapping objects.
- Other small tweaks and fixes.

* Add: Use custom datatypes to return values to Python.
- Add QargSet datatype to return a set of qargs.
   - Works as return type for `Target.qargs`
   - Object is has itertype of QargSetIter.
- Rename QargPropMap to PropsMap
   - Use iterator type IterPropsMap
- Other small tweaks and fixes.

* Fix: Extend `InstructionProperties` to be subclassable using `__init__:
- Made a subclass of `InstructionProperties` that can be extended using an `__init__`method.
- Revert previous changes to `test_target.py`.
- Other tweaks and fixes.

* Refactor: Split target into its own module
- Reestructure the files to improve readability of code.
   - `instruction_properties.rs` contaisn the `InstructionProperties` class.
   - `mod.rs` contains the `Target` class.
   - `qargs.rs` contains the Qargs struct to store quantum arguments.
   - `property_map` contains the Qarg: Property Mapping that will be stored in the gate_map.
- Add missing methods to PropsMap:
   - Add `PropsMapKeys` object to store the qargs as a set.
   - Add methods to compare and access `PropsMapKey`.
- Add QargsOrTuple enum in Qargs to parse Qargs instantly.

* Fix: Rest of failing tests in Target
- Modify the `InstructionProperties` python wrapper.
   - InstructionProperties was not assigning properties to rust side.
- Make duration in `InstructionProperties` setable.
- Add `__eq__` method for `PropMap` to compare with other dicts.
- `PropMapKeys` can only be compared with a Set.
- Remove `qargs_for_operation_name` from `target.py`
- Other small tweaks and fixes.

* Add: New GateMap Structure
- GateMap is now its own mapping object.
- Add `__setstate__` and `__getstate__` methods for `PropMap` and `GateMap`.
- Other small tweaks and fixes.

* Fix: Make new subclasses pickleable
- Add module location to `PropsMap`, `GateMap`, and `Qargs`.
- Added default method to PropMap.
- Made default method part of class initializers.
- Other smalls tweaks and fixes.

* Fix: Remove redundant lookup in Target (Qiskit#12373)

* Format: `mod.rs` qubit_comparison to one line.

* Add: `GateMapKeys` object in GateMap:
- Use IndexSet as a base to preserve the insertion order.
- Other tweaks and fixes.

* Add: __sub__ method to GateMapKeys

* Fix: Modify `GateMap` to store values in Python heap.
- Fix `GateMap.__iter__` to use an IndexKeys iterator.
- Other small tweaks and fixes.

* Fix: Remove duplicate import of `IndexSet::into_iter` in `GateMap`.
- Make `__iter__` use the keys() method in `GateMap`.

* Fix:: Adapt to target changes (Qiskit#12288)
- Fix lint stray imports.

* Fix: Incorrect creation of parameters in `update_from_instruction_schedule_map`
- Add `tupelize` function to create tuples from non-downcastable items.
- Fix creation of Parameters by iterating through members of tuple object and mapping them to parameters in `update_from_instruction_schedule_map`.
- Add missing logic for creating a Target with/without `qubit_properties`.
- Add tuple conversion of `Qargs` to store items in a dict in `BasisTranslator` and `UnitarySynthesis` passes.
- Cast `PropsMap` object to dict when comparing in `test_fake_backends.py`.
- Modify logic of helper functions that receive a bound object reference, a second `py` not required as an argument.
- Add set operation methods to `GateMapKeys`.
- Other tweaks and fixes.

* Fix: More failing tests
- Fix repeated erroneous calls to `add_instruction` in `update_from_instruction_schedule_map`
- Add missing condition in `instruction_supported`
- Use `IndexSet` instead of `HashSet` for `QargsSet`.
- Other small tweaks and fixes.

* Add: Macro rules for qargs and other sequences.
- Create `QargSet` and `PropsMap` using the new macros.
- Return a `TargetOpNames` ordered set to python in `operation_names`.
- Remove the Python side `operation_names.`
- Fix faulty docstring in `target.py`.
- Other tweaks and fixes.

* Docs: Add necessary docstrings to all new rust functions.
- Remove duplicate Iterator in GateMap.
- Other small tweaks and fixes.

* Fix: Use `GILOneCell` and remove `Qargs`
- Use `GILOneCell` to import python modules only once at initialization.
- Remove the custom data structure `Qargs` to avoid conversion overhead.
- `Qargs` does not use `PhysicalQubits`, `u32` is used instead.
- Fix `__setstate__ `and `__getstate__` methods for `PropsMap`, `GateMap`, and `key_like_set_iterator` macro_rule.
- Update code to use the new structures.
- TODO: Fix broken tests.

* Fix: Cast `Qargs` to `Tuple` in specific situations
- Use tupleize to cast `Qargs` to `Tuple` in `instructions`.
- Use downcast to extract string in `add_instruction`.
- Other tweaks and fixes.

* Add: Make `Target` Representable in Rust
- Rename `InstructionProperties` as `BaseInstructionProperties`.
   - Remove `Calibration` from the rust space.
- Restore `gate_map`, `coupling_map`, `instruction_schedule_map`, and `instruction_durations` to rust.
- Remove all unnecessary data structures from rust space.
- Other tweaks and fixes.

* Refactor: Remove previour changes to unrelated files.

* Add: rust native functions to target
- Added rust native functionality to target such that a `py` would not be needed to use one.
- Add Index trait to make `Target` subscriptable.
- Other small tweaks and fixes.

* Fix: Remove all unnecessary python method calls.
- Remove uage of `inspect.isclass`.
- Rename `Target` to `BaseTarget` in the rust side.
- Rename `err.rs` to `errors.rs`.
- Remove rust-native `add_inst` and `update_inst` as Target should not be modified from Rust.
- Made `add_instruction` and `update_instruction_properties` private in `BaseTarget`.
- Add missing `get` method in `Target`.
- Other tweaks and fixes

* Format: Fix lint

* Fix: Wrong return value for `BaseTarget.qargs`

* Add: Temporary Instruction representation in rust.
- Add temporary instruction representation to avoid repeated extraction from python.

* Add: Native representation of coupling graph

* Fix: Wrong attribute extraction for `GateRep`

* Remove: `CouplingGraph` rust native representation.
- Move to different PR.

* Format: Remove stray whitespace

* Add: `get_non_global_op_names` as a rust native function

* Fix: Use Ahash for Hashing
- Use ahash for hashing when possible.
- Rename `BaseTarget` to `Target` in rust only.
- Rename `BaseInstructionProperties` to `InstructionProperties` in rust only.
- Remove optional logic from `generate_non_global_op_names`.
- Use dict for `__setstate__` and `__getstate__` in `Target`.
- Reduced the docstring for `Target` and `InstructionProperties`.
- Other small tweaks and fixes.

* Format: new changes to `lib.rs`

* Format: Adapt to new lint rules

* Fix: Use new gates infrastructure (Qiskit#12459)
- Create custom enum to collect either a `NormalOperation` or a `VariableOperation` depending on what is needed.
- Add a rust native `is_instruction_supported` method to check whether a Target supports a certain instruction.
- Make conversion methods from `circuit_instruction.rs` public.
- Add comparison methods for `Param` in `operations.rs`
- Remove need for isclass method in rustwise `add_instruction`
- Other tweaks and fixes.

* Format: Fix rust formatting

* Add: rust-native method to obtain Operstion objects.

* Add: Comparison methods for `Param`

* FIx: Add display methods for `Params`

* Format: Fix lint test

* Format: Wrong merge conflict solve

* Fix: Improve rust methods to use iterators.
- Adapt the Python methods to leverage the rust native improvements.
- Use python native structures for the Python methods.

* Format: Remove extra blankspace

* Fix: Remove `text_signature`, use `signature` instead.

* Fix: Rectify the behavior of `qargs`
- Keep insertion order by inserting all qargs into a `PySet`.
- Perform conversion to `PyTuple` at insertion time leveraging the iterator architecture.
- Remove python side counterpart to avoid double iteration.
- Make rust-native `qargs` return an iterator.

* Fix: Corrections from Matthew's review
- Use `format!` for repr method in `InstructionProperties`
- Rename `Variable` variant of `TargetInstruction` to `Variadic`.
- Remove `internal_name` attribute from `TargetOperation`.
- Remove `VariableOperation` class.
- Use `u32` for `granularity`, `pulse_alignment`, and `acquire_alignment`.
- Use `Option` to store nullable `concurrent_measurements.
- Use `&str` instead of `String` for most function arguments.
- Use `swap_remove` to deallocate items from the provided `properties` map in `add_instruction`.
- Avoid cloning instructions, use `to_object()` instead.
- Avoid using `.to_owned()`, use `.clone()` instead.
- Remove mention of `RandomState`, use `ahash::HashSet` instead.
- Move parameter check to python in `instruction_supported`.
- Avoid exposing private attributes, use the available ones instead.
- Filter out `Varidadic` Instructions as they're not supported in rust.
- Use peekable iterator to peak at the next qargs in `generate_non_global_op_names`.
- Rename `qarg_set` to `deduplicated_qargs` in `generate_non_global_op_names`.
- Return iterator instances instead of allocated `Vec`.
- Add `python_compare` and `python_is_instance` to perform object comparison with objects that satisfy the `ToPyObject` trait.
- Other small tweaks and fixes.

* Implement a nullable dict-like structure for IndexMap (Qiskit#2)

* Initial: Implement a nullable dict-like structure for IndexMap

* FIx: Erroneous item extraction from Python
- Fix error that caused `None` values to be ignored from `None` keys.
- Removed mutability from rust function argument in `add_instruction`.
   - Object is mutably referenced after option unwrapping.
- Add missing header in `nullable_index_map.rs`.
- Add Clone as a `K` and/or `V` constraint in some of the iterators.
- Remove `IntoPy` constraint from `NullableIndexMap<K, V>`.
- Add `ToPyObject` trait to `NullableIndexMap<K, V>`.

* Fix: inplace modification of Python dict.
- Perform `None` extraction from rust.
- Revert changes to `Target.py`

* Fix: Avoid double iteration by using filter_map.

* Docs: Add inline comments.

* Fix: More specific error message in `NullableIndexMap`

* Fix: Use `Mapping` as the metaclass for `Target`
- Minor corrections from Matthew's review.

* Fix: Make `Target` crate-private.
- Due to the private nature of `NullableIndexMap`, the `Target` has to be made crate private.
- Add temporary`allow(dead_code)` flag for the unused `Target` and `NullableIndexMap` methods.
- Fix docstring of `Target` struct.
- Fix docstring of `add_instruction`.
- Make several python-only operations public so they can be used with other `PyClass` instances as long as they own the gil.
- Modify `py_instruction_supported` to accept bound objects.
- Use rust-native functions for some of the instance properties.
- Rewrite `instruction` to return parameters as slice.
- `operation_names` returns an `ExactSizeIterator`.
- All rust-native methods that return an `OperationType` object, will return a `NormalOperation` instance which includes the `OperationType` and the parameters.

* Fix: Comments from Matthew's review
- Mention duplication in docstring for rust Target.
- Use f"{*:g}" to avoid printing the floating point for 0 in `Target`'s repr method.
- Add note mentioning future unit-tests in rust.

* Fix: Adapt to Qiskit#12730
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Aug 1, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Aug 2, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Aug 2, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Aug 6, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Aug 29, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Aug 30, 2024
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Sep 5, 2024
github-merge-queue bot pushed a commit that referenced this pull request Sep 25, 2024
* Initial: Add equivalence to `qiskit._accelerate.circuit`

* Add: `build_basis_graph` method

* Add: `EquivalencyLibrary` to `qiskit._accelerate.circuit`
- Add `get_entry` method to obtain an entry from binding to a `QuantumCircuit`.
- Add `rebind_equiv` to bind parameters to `QuantumCircuit`

* Add: PyDiGraph converter for `equivalence.rs`

* Add: Extend original equivalence with rust representation

* Fix: Correct circuit parameter extraction

* Add: Stable infrastructure for EquivalenceLibrary
- TODO: Make elements pickleable.

* Add: Default methods to equivalence data structures.

* Fix: Adapt to new Gate Structure

* Fix: Erroneous display of `Parameters`

* Format: Fix lint test

* Fix: Use EdgeReferences instead of edge_indices.
- Remove stray comment.
- Use match instead of matches!.

* Fix: Use StableDiGraph for more stable indexing.
- Remove required py argument for get_entry.
- Reformat `to_pygraph` to use `add_nodes_from` and `add_edges_from`.
- Other small fixes.

* Fix: Use `clone` instead of `to_owned`
- Use `clone_ref` for the PyObject Graph instance.

* Fix: Use `OperationTypeConstruct` instead of `CircuitInstruction`
- Use `convert_py_to_operation_type` to correctly extract Gate instances into rust operational datatypes.
- Add function `get_sources_from_circuit_rep` to not extract circuit data directly but only the necessary data.
- Modify failing test due to different mapping. (!!)
- Other tweaks and fixes.

* Fix: Elide implicit lifetime of PyRef

* Fix: Make `CircuitRep` attributes OneCell-like.
- Attributes from CircuitRep are only written once, reducing the overhead.
- Modify `__setstate__` to avoid extra conversion.
- Remove `get_sources_from_circuit_rep`.

* Fix: Incorrect pickle attribute extraction

* Remove: Default initialization methods from custom datatypes.
- Use `__getnewargs__ instead.

* Remove: `__getstate__`, `__setstate__`, use `__getnewargs__` instead.

* Fix: Further improvements to pickling
- Use python structures to avoid extra conversions.
- Add rust native `EquivalenceLibrary.keys()` and have the python method use it.

* Fix: Use `PyList` and iterators when possible to skip extra conversion.
- Use a `py` token instead of `Python::with_gil()` for `rebind_params`.
- Other tweaks and fixes.

* Fix: incorrect list operation in `__getstate__`

* Fix: improvements on rust native methods
- Accept `Operations` and `[Param]` instead of the custom `GateOper` when calling from rust.
- Build custom `GateOper` inside of class.

* Remove: `add_equiv`, `set_entry` from rust-native methods.
- Add `node_index` Rust native method.
- Use python set comparison for `Param` check.

* Remove: Undo changes to Param
- Fix comparison methods for `Key`, `Equivalence`, `EdgeData` and `NodeData` to account for the removal of `PartialEq` for `Param`.

* Fix: Leverage usage of `CircuitData` for accessing the `QuantumCircuit` intructions in rust.
- Change implementation of `CircuitRef, to leverage the use of `CircuitData`.

* Add: `data()` method to avoid extracting `CircuitData`
- Add `py_clone` to perform shallow clones of a `CircuitRef` object by cloning the references to the `QuantumCircuit` object.
- Extract `num_qubits` and `num_clbits` for CircuitRep.
- Add wrapper over `add_equivalence` to be able to accept references and avoid unnecessary cloning of `GateRep` objects in `set_entry`.
- Remove stray mutability of `entry` in `set_entry`.

* Fix: Make `graph` attribute public.

* Fix: Make `NoteData` attributes public.

* Fix: Revert reference to `CircuitData`, extract instead.

* Add: Make `EquivalenceLibrary` graph weights optional.

* Fix: Adapt to #12730

* Fix: Use `IndexSet` and `IndexMap`

* Fix: Revert changes from previously failing test

* Fix: Adapt to #12974

* Fix: Use `EquivalenceLibrary.keys()` instead of `._key_to_node_index`

* Chore: update dependencies

* Refactor: Move `EquivalenceLibrary` to `_accelerate`.

* Fix: Erroneous `pymodule` function for `equivalence`.

* Fix: Update `EquivalenceLibrary` to store `CircuitData`.
- The equivalence library will now only store `CircuitData` instances as it does not need to call python directly to re-assign parameters.
- An `IntoPy<PyObject>` trait was adapted so that it can be automatically converted to a `QuantumCircuit` instance using `_from_circuit_data`.
- Updated all tests to use register-less qubits in circuit comparison.
- Remove `num_qubits` and `num_clbits` from `CircuitRep`.

* Fix: Make inner `CircuitData` instance public.

* Fix: Review comments and ownership issues.
- Add `from_operation` constructor for `Key`.
- Made `py_has_entry()` private, but kept its main method public.
- Made `set_entry` more rust friendly.
- Modify `add_equivalence` to accept a slice of `Param` and use `Into` to convert it into a `SmallVec` instance.

* Fix: Use maximum possible integer value for Key in basis_translator.
- Add method to immutably borrow the `EquivalenceLibrary`'s graph.

* Fix: Use generated string, instead of large int
- Using large int as the key's number of qubits breaks compatibility with qpy, use a random string instead.

---------

Co-authored-by: John Lapeyre <[email protected]>
raynelfss added a commit to raynelfss/qiskit that referenced this pull request Sep 26, 2024
github-merge-queue bot pushed a commit that referenced this pull request Oct 7, 2024
…r` to rust. (#12811)

* Add: Basis search function
- Add rust counterpart for `basis_search`.
- Consolidated the `BasisSearchVisitor` into the function due to differences in rust behavior.

* Fix: Wrong return value for `basis_search`

* Fix: Remove `IndexMap` and duplicate declarations.

* Fix: Adapt to #12730

* Remove: unused imports

* Docs: Edit docstring for rust native `basis_search`

* Fix: Use owned Strings.
- Due to the nature of `hashbrown` we must use owned Strings instead of `&str`.

* Add: mutable graph view that the `BasisTranslator` can access in Rust.
- Remove import of `random` in `basis_translator`.

* Fix: Review comments
- Rename `EquivalenceLibrary`'s `mut_graph` method to `graph_mut` to keep consistent with rust naming conventions.
- Use `&HashSet<String>` instead of `HashSet<&str>` to avoid extra conversion.
- Use `u32::MAX` as num_qubits for dummy node.
- Use for loop instead of foreachj to add edges to dummy node.
- Add comment explaining usage of flatten in `initialize_num_gates_remain_for_rule`.
- Remove stale comments.

* Update crates/accelerate/src/basis/basis_translator/basis_search.rs

---------

Co-authored-by: Matthew Treinish <[email protected]>
ElePT pushed a commit to ElePT/qiskit that referenced this pull request Oct 9, 2024
* Initial: Add equivalence to `qiskit._accelerate.circuit`

* Add: `build_basis_graph` method

* Add: `EquivalencyLibrary` to `qiskit._accelerate.circuit`
- Add `get_entry` method to obtain an entry from binding to a `QuantumCircuit`.
- Add `rebind_equiv` to bind parameters to `QuantumCircuit`

* Add: PyDiGraph converter for `equivalence.rs`

* Add: Extend original equivalence with rust representation

* Fix: Correct circuit parameter extraction

* Add: Stable infrastructure for EquivalenceLibrary
- TODO: Make elements pickleable.

* Add: Default methods to equivalence data structures.

* Fix: Adapt to new Gate Structure

* Fix: Erroneous display of `Parameters`

* Format: Fix lint test

* Fix: Use EdgeReferences instead of edge_indices.
- Remove stray comment.
- Use match instead of matches!.

* Fix: Use StableDiGraph for more stable indexing.
- Remove required py argument for get_entry.
- Reformat `to_pygraph` to use `add_nodes_from` and `add_edges_from`.
- Other small fixes.

* Fix: Use `clone` instead of `to_owned`
- Use `clone_ref` for the PyObject Graph instance.

* Fix: Use `OperationTypeConstruct` instead of `CircuitInstruction`
- Use `convert_py_to_operation_type` to correctly extract Gate instances into rust operational datatypes.
- Add function `get_sources_from_circuit_rep` to not extract circuit data directly but only the necessary data.
- Modify failing test due to different mapping. (!!)
- Other tweaks and fixes.

* Fix: Elide implicit lifetime of PyRef

* Fix: Make `CircuitRep` attributes OneCell-like.
- Attributes from CircuitRep are only written once, reducing the overhead.
- Modify `__setstate__` to avoid extra conversion.
- Remove `get_sources_from_circuit_rep`.

* Fix: Incorrect pickle attribute extraction

* Remove: Default initialization methods from custom datatypes.
- Use `__getnewargs__ instead.

* Remove: `__getstate__`, `__setstate__`, use `__getnewargs__` instead.

* Fix: Further improvements to pickling
- Use python structures to avoid extra conversions.
- Add rust native `EquivalenceLibrary.keys()` and have the python method use it.

* Fix: Use `PyList` and iterators when possible to skip extra conversion.
- Use a `py` token instead of `Python::with_gil()` for `rebind_params`.
- Other tweaks and fixes.

* Fix: incorrect list operation in `__getstate__`

* Fix: improvements on rust native methods
- Accept `Operations` and `[Param]` instead of the custom `GateOper` when calling from rust.
- Build custom `GateOper` inside of class.

* Remove: `add_equiv`, `set_entry` from rust-native methods.
- Add `node_index` Rust native method.
- Use python set comparison for `Param` check.

* Remove: Undo changes to Param
- Fix comparison methods for `Key`, `Equivalence`, `EdgeData` and `NodeData` to account for the removal of `PartialEq` for `Param`.

* Fix: Leverage usage of `CircuitData` for accessing the `QuantumCircuit` intructions in rust.
- Change implementation of `CircuitRef, to leverage the use of `CircuitData`.

* Add: `data()` method to avoid extracting `CircuitData`
- Add `py_clone` to perform shallow clones of a `CircuitRef` object by cloning the references to the `QuantumCircuit` object.
- Extract `num_qubits` and `num_clbits` for CircuitRep.
- Add wrapper over `add_equivalence` to be able to accept references and avoid unnecessary cloning of `GateRep` objects in `set_entry`.
- Remove stray mutability of `entry` in `set_entry`.

* Fix: Make `graph` attribute public.

* Fix: Make `NoteData` attributes public.

* Fix: Revert reference to `CircuitData`, extract instead.

* Add: Make `EquivalenceLibrary` graph weights optional.

* Fix: Adapt to Qiskit#12730

* Fix: Use `IndexSet` and `IndexMap`

* Fix: Revert changes from previously failing test

* Fix: Adapt to Qiskit#12974

* Fix: Use `EquivalenceLibrary.keys()` instead of `._key_to_node_index`

* Chore: update dependencies

* Refactor: Move `EquivalenceLibrary` to `_accelerate`.

* Fix: Erroneous `pymodule` function for `equivalence`.

* Fix: Update `EquivalenceLibrary` to store `CircuitData`.
- The equivalence library will now only store `CircuitData` instances as it does not need to call python directly to re-assign parameters.
- An `IntoPy<PyObject>` trait was adapted so that it can be automatically converted to a `QuantumCircuit` instance using `_from_circuit_data`.
- Updated all tests to use register-less qubits in circuit comparison.
- Remove `num_qubits` and `num_clbits` from `CircuitRep`.

* Fix: Make inner `CircuitData` instance public.

* Fix: Review comments and ownership issues.
- Add `from_operation` constructor for `Key`.
- Made `py_has_entry()` private, but kept its main method public.
- Made `set_entry` more rust friendly.
- Modify `add_equivalence` to accept a slice of `Param` and use `Into` to convert it into a `SmallVec` instance.

* Fix: Use maximum possible integer value for Key in basis_translator.
- Add method to immutably borrow the `EquivalenceLibrary`'s graph.

* Fix: Use generated string, instead of large int
- Using large int as the key's number of qubits breaks compatibility with qpy, use a random string instead.

---------

Co-authored-by: John Lapeyre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog mod: circuit Related to the core of the `QuantumCircuit` class or the circuit library performance priority: high Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rework heirarchy of CircuitInstruction and PackedInstruction Optimize memory footprint of OperationType
6 participants