David McKay, Seth Merkel, Doug McClure, Neereja Sundaresan and Isaac Lauer
A non-trivial issue when building a quantum computer is trying to answer a simple question: “how well does it work?” As with regular computers, measuring a quantum computer’s performance boils down to running a set of problems where we know the expected outputs.
But the task doesn’t end there. Which problems should we run? How many? What does a wrong output mean about the likelihood of a wrong output in the future? These are complicated questions even for regular computers. However, in the quantum realm, the situation is even more difficult due to the complexities of superposition, entanglement and measurement. For example, due to the no-cloning theorem, we can’t determine the output of a quantum circuit from a single experimental instance; the experiment needs to be repeated exponentially more times as the number of qubits increases. Therefore, a number of quantum benchmarking strategies use the concept of random circuits–random programs of a similar type that, after enough trials, give an average “sense” of how well our devices work based on statistical measures.
These benchmarks operate at two scales: the qubit level and the overall device level. At the device level, there are several benchmarks, for example, the quantum volume [1, 2, 3, 5] (proposed by IBM) and the cross entropy [4]. These measures give a single number that is useful for getting a sense of overall device performance and improvements. However, these measures are not very predictive, i.e., users can’t use those numbers to predict the results of their own algorithms. That’s where the other scale of benchmarking comes in. Benchmarks at the qubit level tell us about one- and two-qubit gate performance; a gate is the fundamental operation the occurs in a quantum circuit to evolve the quantum state. Generally, one-qubit gates create superposition states of individual qubits and two-qubit gates generate entanglement. As quantum computers increase in complexity, new benchmarks will get added to this list to investigate operations such as reset, mid-circuit measurement and feed-forward which are all elements required for fault-tolerance.
If you’ve used an IBM system in Qiskit, you can view the gate errors by looking at the “properties” of a physical backend. By assigning an error number to each gate, we can then use these errors in simulators [5] to estimate the outputs of our circuits with noise.
What are these errors and how are they measured? It is important to understand that these errors are averaged over all possible input states for a specific combination of gates. For example, the error of a gate on qubit 0 should be independent of the gates we run on qubit 2, but in practice there are small crosstalk effects. It would be exponentially expensive in time to measure the errors for all these scenarios, so instead only a subset are measured and reported. In general, we try to measure errors on IBM Quantum devices when all the neighboring qubits are idle. To tell which gate errors are measured together, one can look at the “date measured” value of the error. Errors with identical date/times were measured simultaneously. In short, the gate errors are estimates for the errors that will occur in any particular algorithm, but they aren’t perfect.
Figure 1: Schematic of Randomized Benchmarking. Here we have decided to run circuits with {l0=1,l1=3,l2=6} Cliffords. We also show what a typical interleaved RB circuit would look like. Each “C” gate is a Clifford gate that needs to be transpiled to the device.
To measure these errors, we use a specific random circuit program known as randomized benchmarking [7, 8]. Randomized benchmarking (RB for short) is a program that selects random gates from a certain class of gates – the Clifford group – and the last gate inverts the operation of all the previous gates. A special property of the Clifford group means that the inversion gate is efficient to calculate. Therefore, every RB sequence of gates should return the qubit(s) back to its initial state. The basic premise of RB is shown in Figure 1 for a subset of 2 qubits. First, we decide we are going to run circuits with different numbers of Clifford gates {li} on a subset of n qubits. Then, we make a circuit with l0 random Clifford gates and the inversion. Next, we make a second circuit by adding l1-l0 more gates and recalculating the inversion gate for the new sequence, and so on. We run all the circuits in this set and measure the population in the |0> state of each qubit (the ground state); due to the properties of RB we can plot the population of any of the qubit |0> states and get the same answer. Next, we repeat this experiment and average the results. With enough averaging, the qubit |0> state population decays as Aα^l+B where the average error per Clifford gate is given as ϵ_c = ((2^n - 1)/2^n)(1-α) where n is the number of qubits in the Clifford gate group that we used for RB (if we are measuring one-qubit error n=1, if we are measuring two-qubit error n=2). For IBM Quantum systems the typical Clifford length is a few thousand one-qubit Cliffords, and a few hundred two-qubit Cliffords. A big benefit of this method is that errors in the preparation of the state and the readout of the state are mostly contained in the coefficients A and B, which are not used for measuring error.
Now there are a few important points. For one we want to know the error per gate, not per Clifford. The Cliffords are certain particular gate operations, but they must be expressed to the native gates of the device with a transpiler. When the Clifford is transpiled it may require several types of gates, and in the case of two-qubit Cliffords there will be a mix of one- and two-qubit gates. To measure single-qubit gate errors we take the average number of single-qubit gates per Clifford gate n_1C and divide the error to get the error per gate ϵ_1G = ϵ_1C/n_1c. To measure the two-qubit gate errors we take the average number of two-qubit gates per Clifford n_2C and divide the error to get the error per gate ϵ_2G = ϵ_2C/n_2c . In this case the error is an upper bound because we are neglecting the contribution to the Clifford error from the single qubit gates. The red curve in Fig 2 is an example of standard two-qubit RB.
If we want a gate error that is not an upper bound, there is a protocol to use RB to measure the error of a specific gate directly – interleaved RB [9]. A schematic is given in Fig 1 and the blue curve in Fig 2 is an example. In interleaved RB (IRB) we run an extra circuit with the specific gate interleaved between the random Clifford gates as shown in the schematic of Fig 1. The gate error is then given by ϵ_G = ((2^n - 1)/2^n)(1-α_IRB/α_RB)) , i.e. the gate fidelity estimate is proportional to the ratio of the decays from the two curves. We don’t use this method for reporting IBM Quantum backend errors, as it requires twice as much data and the systematic errors can be large [10] since we are taking ratios. Subtle double exponential decays can lead to unphysical error rates. In the example plot shown in Fig 2 the error from interleaved RB is 2.3e-3 and from the procedure used on IBM Quantum systems the error is 3e-3, which are reasonably close. However, there are times when the reference curve error is much higher and, in those cases, the systematic errors mean that IRB must be taken with caution.
Figure 2: Example of RB (red) and Interleaved RB (blue). From https://arxiv.org/abs/2011.07050, see details therein.
In conclusion, RB is a quick and effective way to measure gate errors on large devices. It allows us to report a complete set of gate errors, which can be used to monitor the health of devices, improvements, and as an input into simulations to give rough predictions for algorithmic performance. However, it’s important to understand the limitations of any benchmarking scheme; we’ve highlighted a few for RB (in particular the caution required for using IRB) and furthermore there is a deep body of literature on the more subtle issues surrounding RB (see, for examples, refs. [11, 12,13, 14]). We hope this blog post gives some insight into how operation gate errors are measured on IBM Quantum systems with randomized benchmarking and what these errors represent.
References
1. Cross, Andrew W., et al. “Validating Quantum Computers Using Randomized Model Circuits.” ArXiv.org, 11 Oct. 2019, arxiv.org/abs/1811.12926.
2. Mandelbaum, Ryan F. “What Is Quantum Volume, Anyway?” Qiskit Medium, 20 Aug. 2020, medium.com/qiskit/what-is-quantum-volume-anyway-a4dff801c36f.
3. Jurcevic, Petar, et al. “Demonstration of Quantum Volume 64 on a Superconducting Quantum Computing System.” ArXiv.org, 4 Sept. 2020, arxiv.org/abs/2008.08571.
4. Arute, Frank, et al. “Quantum Supremacy Using a Programmable Superconducting Processor.” Nature, vol. 574, no. 7779, 2019, pp. 505–510., doi:10.1038/s41586-019-1666-5
5. “Quantum Volume.” Qiskit 0.23.1 Documentation, qiskit.org/documentation/tutorials/noise/5_quantum_volume.html.
6. “Building Noise Models.” Qiskit 0.23.1 Documentation, qiskit.org/documentation/tutorials/simulators/3_building_noise_models.html.
7. “Randomized Benchmarking.” Qiskit Textbook, 8 Dec. 2020, qiskit.org/textbook/ch-quantum-hardware/randomized-benchmarking.html.
8. Magesan, E., Gambetta, J. M. & Emerson, J. Characterizing quantum gates via randomized benchmarking. Phys. Rev. A85, 042311 (2012).
9. Magesan, E. et al. Efficient Measurement of Quantum Gate Error by Interleaved Randomized Benchmarking. Phys. Rev. Lett. 109, 080505 (2012).
10. Epstein, Jeffrey M., et al. “Investigating the Limits of Randomized Benchmarking Protocols.” ArXiv.org, 13 Aug. 2013, arxiv.org/abs/1308.2928.
11. Proctor, Timothy, et al. “What Randomized Benchmarking Actually Measures.” Physical Review Letters, American Physical Society, 28 Sept. 2017, link.aps.org/doi/10.1103/PhysRevLett.119.130502.
12. Wallman, Joel J. “Randomized Benchmarking with Gate-Dependent Noise.” Quantum, Verein Zur Förderung Des Open Access Publizierens in Den Quantenwissenschaften, 29 Jan. 2018, quantum-journal.org/papers/q-2018-01-29-47/.
13. Merkel, Seth T., et al. “Randomized Benchmarking as Convolution: Fourier Analysis of Gate Dependent Errors.” ArXiv.org, 14 Aug. 2019, arxiv.org/abs/1804.05951.
Helsen, Jonas, et al. “A General Framework for Randomized Benchmarking.” ArXiv.org, 15 Oct. 2020, arxiv.org/abs/2010.07