TensorCircuit-NG vs MindQuantum for VQE: when JIT compilation pays off #117
refraction-ray
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Tensor-network simulators and state-vector simulators often have very different performance profiles. In this note, we compare TensorCircuit-NG with MindQuantum on a simple transverse-field Ising model (TFIM) VQE benchmark. The benchmark computes both the expectation value and its gradient for a fixed hardware-efficient ansatz with circuit block depth 10 and
complex64precision.The goal is not to claim a universal winner for all quantum simulation tasks. Instead, this benchmark highlights a common performance trade-off: TensorCircuit-NG can be run in a fast-compile mode that uses
scanto keep the JAX program small, or in a peak-runtime mode that compiles a larger unrolled program but gives the fastest later repeated evaluations. MindQuantum has a much smaller first-call overhead though the later run is also slower.Benchmark setup
For each number of qubits, we measure two quantities:
The GPU benchmark was run on an NVIDIA GeForce RTX 5090. TensorCircuit-NG used the JAX backend on GPU, while MindQuantum used
mqvector_gpu. CPU data were measured on MacBook Pro.Panel (a) shows the post-compilation speedup of TensorCircuit-NG over MindQuantum, with values larger than one mean that TensorCircuit-NG is faster. Panel (b) shows the absolute running times. Solid lines are TensorCircuit-NG, and dashed lines are MindQuantum. See the benchmark script.
Jitted performance
Once compilation is excluded, TensorCircuit-NG is faster in all measured cases. TensorCircuit-NG is about
1.5xfaster at 12 qubits,3.8xfaster at 16 qubits, and6.2xfaster at 20 qubits on CPU.On GPU, the fast-compile mode already beats MindQuantum in runtime while keeping the first-call cost modest: for example, the 24-qubit GPU case compiles and runs in about
9.56son the first call and is3.2xfaster than MindQuantum afterwards. The peak-runtime mode takes longer to compile, but gives a much faster later runtime; at 24 qubits it is12.8xfaster than MindQuantum in repeated value-and-gradient calls.This is the regime where TensorCircuit-NG's design is most useful: differentiable, JIT-compiled, backend-native tensor programs that can be evaluated many times with the same structure.
The cost of the first call
The main concern of TensorCircuit-NG in this benchmark is also clear: the first call can be slower because JAX needs to compile the computation. This cost should not be ignored. The fast-compile mode reduces this overhead by using
scan, while the peak-runtime mode intentionally spends more compilation time to obtain the fastest repeated evaluations.However, VQE is not a one-shot workload. A typical optimization repeatedly
evaluates the same circuit structure for many parameter values. To avoid local minimum, we utilize the pipeline where the same ansatz is optimized from many random initializations, making it even more favorable to a compiled workflow.
We can estimate the break-even point using
In this benchmark, TensorCircuit-NG starts to win in total end-to-end running time after:
These numbers mean that the compilation cost is amortized after a finite number of repeated evaluations. For larger circuits, the faster runtime can compensate for compilation after only tens to hundreds of calls. Such call counts are easy to reach in practical VQE optimization, especially when running multiple optimizer steps, scanning hyperparameters, or repeating the optimization from different initial parameters.
Takeaway
MindQuantum has a very low first-call overhead and is convenient for quick, single-shot simulations. TensorCircuit-NG has a heavier first call, but its compiled repeated evaluations can be substantially faster. For workloads such as VQE, where the same circuit is evaluated many times during optimization, TensorCircuit-NG's JIT-based workflow can provide better overall performance after the compilation cost is amortized.
Beta Was this translation helpful? Give feedback.
All reactions