TensorCircuit-NG vs MindQuantum for VQE: when JIT compilation pays off #117

refraction-ray · 2026-06-06T16:26:11Z

refraction-ray
Jun 6, 2026
Maintainer

Tensor-network simulators and state-vector simulators often have very different performance profiles. In this note, we compare TensorCircuit-NG with MindQuantum on a simple transverse-field Ising model (TFIM) VQE benchmark. The benchmark computes both the expectation value and its gradient for a fixed hardware-efficient ansatz with circuit block depth 10 and complex64 precision.

The goal is not to claim a universal winner for all quantum simulation tasks. Instead, this benchmark highlights a common performance trade-off: TensorCircuit-NG can be run in a fast-compile mode that uses scan to keep the JAX program small, or in a peak-runtime mode that compiles a larger unrolled program but gives the fastest later repeated evaluations. MindQuantum has a much smaller first-call overhead though the later run is also slower.

Benchmark setup

For each number of qubits, we measure two quantities:

Warmup / compile time: the first call, including compilation.
Running time: the average time of later value-and-gradient evaluations.

The GPU benchmark was run on an NVIDIA GeForce RTX 5090. TensorCircuit-NG used the JAX backend on GPU, while MindQuantum used mqvector_gpu. CPU data were measured on MacBook Pro.

Panel (a) shows the post-compilation speedup of TensorCircuit-NG over MindQuantum, with values larger than one mean that TensorCircuit-NG is faster. Panel (b) shows the absolute running times. Solid lines are TensorCircuit-NG, and dashed lines are MindQuantum. See the benchmark script.

Jitted performance

Once compilation is excluded, TensorCircuit-NG is faster in all measured cases. TensorCircuit-NG is about 1.5x faster at 12 qubits, 3.8x faster at 16 qubits, and 6.2x faster at 20 qubits on CPU.

On GPU, the fast-compile mode already beats MindQuantum in runtime while keeping the first-call cost modest: for example, the 24-qubit GPU case compiles and runs in about 9.56s on the first call and is 3.2x faster than MindQuantum afterwards. The peak-runtime mode takes longer to compile, but gives a much faster later runtime; at 24 qubits it is 12.8x faster than MindQuantum in repeated value-and-gradient calls.

This is the regime where TensorCircuit-NG's design is most useful: differentiable, JIT-compiled, backend-native tensor programs that can be evaluated many times with the same structure.

The cost of the first call

The main concern of TensorCircuit-NG in this benchmark is also clear: the first call can be slower because JAX needs to compile the computation. This cost should not be ignored. The fast-compile mode reduces this overhead by using scan, while the peak-runtime mode intentionally spends more compilation time to obtain the fastest repeated evaluations.

However, VQE is not a one-shot workload. A typical optimization repeatedly
evaluates the same circuit structure for many parameter values. To avoid local minimum, we utilize the pipeline where the same ansatz is optimized from many random initializations, making it even more favorable to a compiled workflow.

We can estimate the break-even point using

total time = warmup time + (number of calls - 1) * running time.

In this benchmark, TensorCircuit-NG starts to win in total end-to-end running time after:

Device	Qubits	Break-even total steps
CPU	16	22
CPU	20	2
GPU fast	20	140
GPU fast	24	33
GPU peak	20	685
GPU peak	24	145

These numbers mean that the compilation cost is amortized after a finite number of repeated evaluations. For larger circuits, the faster runtime can compensate for compilation after only tens to hundreds of calls. Such call counts are easy to reach in practical VQE optimization, especially when running multiple optimizer steps, scanning hyperparameters, or repeating the optimization from different initial parameters.

Takeaway

MindQuantum has a very low first-call overhead and is convenient for quick, single-shot simulations. TensorCircuit-NG has a heavier first call, but its compiled repeated evaluations can be substantially faster. For workloads such as VQE, where the same circuit is evaluated many times during optimization, TensorCircuit-NG's JIT-based workflow can provide better overall performance after the compilation cost is amortized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorCircuit

TensorCircuit-NG vs MindQuantum for VQE: when JIT compilation pays off #117

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

TensorCircuit

TensorCircuit-NG vs MindQuantum for VQE: when JIT compilation pays off #117

Uh oh!

refraction-ray Jun 6, 2026 Maintainer

Benchmark setup

Jitted performance

The cost of the first call

Takeaway

Replies: 0 comments

refraction-ray
Jun 6, 2026
Maintainer