Dr Oliver Brown is a Chancellor’s Fellow at EPCC since April 2023 where he addresses the usage of quantum computing in High-Performance Computing (HPC). His recent publication “Energy Efficiency of Quantum Statevector Simulation at Scale” assesses the performance and energy consumption of large Quantum Fourier Transform (QFT) simulations on ARCHER2, the UK’s National Supercomputing Service.
In simple terms, quantum computing takes advantage of particle and wave behaviour on a very small scale to perform computations. Dr Oliver Brown is simulating this by optimizing the energy consumption on ARCHER2, the national supercomputer in the UK, hosted by EPCC here at the University of Edinburgh.
To achieve the goal the author used a statevector simulator software, QuEST, parallelised with MPI+OpenMP to update the dependencies on the same node. A statevector is a one-dimensional matrix that indicates the relative probability amplitude of a system being in different states. MPI stands for ‘Message Passing Interface’, which is an interface for the message exchange between parallel computing processes. OpenMP is an application that supports multi-platform shared-memory programming in C/C++ and Fortran.
Different strategies were adopted to optimise the statevector simulations, and those involved: changing the clock frequency of the CPU; using nodes with larger memory sizes; and transpilling the circuit to reduce communication via cache-blocking. Moreover, the author used QFT circuits, which conveniently allow cache-blocking, without adding new gates (quantum operators). It is important that the statevector elements on which the update depends are located on the same nodes, so the gates can be quickly applied, and the measurement becomes more efficient. By using a cluster manager, the author managed to determine the energy usage in the nodes assessed, therefore, to measure if there was gain on energy efficiency of the supercomputer. The figure above shows how the optimization process implemented by the author, represented on QFT (cache-blocking) column, managed to significantly reduce the communication (MPI) on jobs, consequently reducing the energy consumption of the system.
Among other observations, Dr Brown managed to save the considerable amount of 35% of energy on ARCHER 2 processes. We talked to the author about his work, methodology and improvements on the UK supercomputer.
Discover more about ARCHER2A few reasons! First, we wanted to see how far we could push the performance of QuEST. It’s well known that classical simulation of quantum systems is a problem that scales exponentially, but I feel like this means people sort of give up sooner than they really have to. We could have just sat down and said well, ARCHER2 has this much memory, so 44 qubits is viable, but why not go ahead and actually demonstrate it? Of course, at that scale the resource cost is considerable, hence the motivation to investigate what the user could do to minimise energy usage. Naturally it also helped that as ARCHER2 service providers, EPCC has (limited) access to ARCHER2 to use for our own purposes. Now, if a user comes to us and asks how to perform a large quantum computer simulation on ARCHER2, we can tell them!
Yes, absolutely! Clock speed reduction is maybe the least transferrable, as it does require the machine to be set up to enable this. On ARCHER2 we can control the clock speed with a slurm command in the job submission script, and users can select one of three predefined frequency caps: 1.5GHz, 2GHz, or uncapped 2.25GHz + turbo. This can be an effective way to reduce energy consumption whenever your code is not compute bound – in fact the default on ARCHER2 was changed to 2GHz in December 2022! QuEST is mostly communication bound so benefitted from the clock reduction, though we did find that dropping to 1.5GHz also made the communication slower and was therefore detrimental.
The change we made to QuEST’s MPI calls should make a difference on any system with a decent interconnect and a high-quality MPI implementation. Messages were being sent using blocking MPI calls (MPI_Sendrecv), but we changed these to nonblocking calls (MPI_Isend/Irecv). The upshot is that (ideally) all those messages should now be sent at once, rather than one after another. In practice it depends on when MPI actually progresses communication, and the bandwidth of your interconnect. We found it did make a difference on ARCHER2, and it should only improve or make no change to performance on any system, so we contributed that change back to QuEST.
Finally, transpiling the circuit to avoid communication is always good on any system. The caveat is that it’s only beneficial when your quantum circuit already has SWAP gates (as is the case for the QFT circuit). If you have to add them in to achieve cache blocking then you’re really just swapping one communication for another, and there’s no guarantee it’ll help. In general, though, for any distributed parallel code, minimising the communication will bring performance gains and reduce energy consumption. What we wanted to highlight in this paper is that in statevector simulation there are very specific operations that require communication.
For users I think the key takeaway should be know your code. Here we demonstrated that the difference can be very significant, even with small changes to the program being run. There are caveats of course – the energy reduction was amplified simply by the scale of the problem, and we did play with both the hardware configuration, and modify the library we used. In the end though, the biggest difference was made just by modifying the simulation to avoid communication as much as possible. So: know your code, know its communication pattern. Running things in less time almost always means energy savings.
From the facility providers point of view, I think providing users with good profiling tools and the ability to check their energy usage is therefore key. ARCHER2 is a good example of this I think, as energy usage for compute (although not for shared resources like cooling or the network) are stored in the slurm database for every job that runs.
There are a few things actually. First, we have an ARCHER2 eCSE project funded to explore scaling up a code which uses one-sided communication to do the same kind of simulation as QuEST. The advantage is that we may be able to pack the same size job down on to significantly fewer nodes, and therefore save energy overall. We are also involved in an EPSRC-funded project to improve QuEST and port it to GPUs (hedging our bets you see…). Separately, we’re also considering how intermediate representations of quantum circuits like QASM or QIR could be leveraged to transpile circuits and optimise for simulation, as opposed to optimising for running on real quantum hardware.
Obviously working at EPCC I’m in a privileged position, but I get the sense that provision across the university is pretty good! Not least from handling access requests for EPCC facilities. I know that we also offer a lot of training on using HPC which is available to all researchers across the UK, not just UoE, but it doesn’t hurt that we’re nearby.
Know your code!
Find more publications from Dr Oliver Brown on his Edinburgh Research Explorer profile.