← Back to Blog

Research entry

Building ising-rs: From Python Script to GPU-Accelerated Research Platform

9 March 2026 · 7 min read

The journey from a college Python script to a publication-grade Monte Carlo engine achieving 0.01% accuracy on the 3D critical temperature.

Rust CUDA Physics Performance Case Study

Why the Ising model?

Every magnet you have ever stuck to a fridge is performing a small miracle. Billions of atoms have somehow agreed to point the same direction, producing a field you can feel with your fingers. Heat that magnet past a threshold called the Curie temperature, and the coordination collapses. The magnet stops being a magnet.

The Ising model is the simplest mathematical framework that captures this behavior. Place a spin on every site of a lattice, let each spin point up or down, and define a single rule: neighbors want to align. That is the entire model. One interaction rule, and yet it produces the full richness of phase transitions, critical phenomena, and universality when you scale it up to millions of sites.

Understanding phase transitions matters far beyond refrigerator magnets. The same mathematics governs the liquid-gas transition, superconductor behavior, and even defect formation in the early universe. The Ising model is where all of that theory gets tested.

Starting with Python

In 2018 I wrote a Monte Carlo Ising simulation for a college statistical mechanics course. It was maybe 200 lines of Python with NumPy: a 2D square lattice, Metropolis single-spin flips, and a matplotlib animation that looked cool in a presentation. It worked. It was also painfully slow.

A 64x64 lattice took minutes to equilibrate. Anything three-dimensional was out of the question. Measuring critical exponents required running at multiple lattice sizes with thousands of equilibration sweeps per temperature point, and Python simply could not get through the inner loop fast enough. The simulation spent most of its time in interpreter overhead and cache-unfriendly array access patterns. I set the project aside.

Moving to Rust

Years later I came back to it with Rust. The appeal was straightforward: C-like performance with memory safety guarantees. Monte Carlo simulations flip spins billions of times and run for hours. A segfault at 3 AM means lost data. Rust eliminates that class of bug at compile time.

The first Rust implementation stored spins as a flat Vec<i8> (+1 or -1) with precomputed neighbor indices. This is the same adjacency-list representation whether the topology is cubic, triangular, or an arbitrary graph loaded from a file. The Metropolis algorithm drops in cleanly: pick a random spin, compute the energy change from its neighbors, accept or reject the flip.

On a single CPU core, the Rust version was already orders of magnitude faster than Python. A 3D cubic lattice with 20 sites per edge (8,000 spins) equilibrated in seconds instead of minutes. But near the critical temperature, a deeper problem emerged.

The Wolff breakthrough

Near the phase transition, the Metropolis algorithm suffers from critical slowing down. The autocorrelation time diverges as a power law with a dynamic exponent z of roughly 2.17 for the 3D Ising model. In practical terms, it takes exponentially more sweeps to generate statistically independent configurations as you approach the critical temperature, which is exactly where you need the most data.

The Wolff cluster algorithm attacks this directly. Instead of flipping one spin at a time, it grows a cluster of correlated spins using a breadth-first search and flips the entire cluster at once. The bond addition probability, 1 - exp(-2 * beta * J), is tuned so that the algorithm respects detailed balance while building physically meaningful clusters.

The payoff is dramatic: the dynamic exponent drops from roughly 2.17 to about 0.33. That translates to approximately 7x faster decorrelation near the critical temperature. Runs that previously needed 50,000 sweeps to produce independent samples now needed a few thousand.

Going to the GPU

Even with Wolff, publication-quality finite-size scaling demands large lattices (up to 48 or 64 sites per edge, meaning hundreds of thousands of spins) at dozens of temperature points. The GPU path opened this up.

The key insight is checkerboard decomposition. Color the 3D lattice like a chessboard: every “red” site has only “black” neighbors and vice versa. This means all red sites can be updated simultaneously in a single CUDA kernel launch, because none of them depend on each other. Then you flip to black sites and repeat. Two kernel launches equal one full Metropolis sweep, with thousands of threads working in parallel.

On an RTX 2060, this yields an estimated 50 to 200x speedup over single-core CPU Metropolis. Lattice sizes that would take hours on the CPU finish in seconds. Combined with parallel tempering (running multiple replicas at different temperatures and occasionally swapping configurations), the GPU backend makes it feasible to generate the high-statistics datasets needed for precision measurements.

Validation obsession

Speed means nothing if the physics is wrong. The validation pipeline has multiple layers.

First, the 2D square lattice. Onsager solved this exactly in 1944: the critical temperature is 2 / ln(1 + sqrt(2)), which comes out to 2.2692 in natural units. Our simulation reproduces this to better than 0.02%. If the code has a bug, it almost certainly shows up here first.

Second, exact enumeration on small lattices. For a 4x4 lattice you can enumerate all 2 to the 16th power configurations and compute observables analytically. The simulation output must match these exact values within statistical error.

Third, the fluctuation-dissipation theorem. The heat capacity computed from energy fluctuations (beta squared times the variance of E) must agree with the heat capacity computed from the numerical derivative of the energy curve. If these disagree, something is wrong with the observable measurement code. We caught a subtle bug this way: the susceptibility was originally computed using the absolute magnetization variance instead of the signed magnetization variance, which introduced a systematic positive bias.

Achieving 0.01%

The headline result is the 3D critical temperature: T_c = 4.512(4) J/k_B, matching the best known theoretical value of 4.5115232 to within 0.01%. Getting there required three techniques working together.

Finite-size scaling extracts infinite-volume behavior from finite lattices. We run at eight lattice sizes from 8 to 48 and track how observables scale with system size. The susceptibility peak grows as L to the gamma over nu power. The magnetization at T_c vanishes as L to the minus beta over nu power.

Binder cumulant crossings locate the critical temperature without assuming any exponent values. The Binder cumulant, defined as 1 - M4 / (3 * M2 squared), is scale-invariant exactly at T_c. Curves for different lattice sizes cross at the same point. This crossing is extremely sensitive to statistical noise, though. We learned the hard way that coarse temperature grids (steps of 0.05) and low sample counts produced wildly inaccurate crossings. Switching to a high-resolution grid with steps of 0.01 and 10,000 samples per point brought the crossing to 4.5121, right on target.

Histogram reweighting (the Ferrenberg-Swendsen method) interpolates between simulated temperatures using the raw energy and magnetization samples. From 41 simulation temperatures, we reconstruct a smooth 200-point grid, sharpening the peak locations and crossing points without additional simulation time.

Three universality classes

The Ising model is just one member of a family. By generalizing the spin from a scalar (up/down) to a vector, you access different universality classes that describe different physical systems.

The Ising model has a scalar spin and discrete Z2 symmetry. It describes uniaxial magnets and the liquid-gas transition. The XY model (O(2) symmetry) has a two-component spin vector and describes superfluid helium and thin-film magnets. The Heisenberg model (O(3) symmetry) has a three-component spin and describes isotropic ferromagnets like iron and nickel.

Each universality class has its own set of critical exponents. By extending the simulation to support vector spins, we can validate all three and study crossover behavior between them.

Materials science: iron and nickel

The Ising model is not just a toy. By fitting the exchange coupling constant J to match experimental Curie temperatures, we connect the simulation to real materials.

For BCC iron (body-centered cubic crystal structure, coordination number 8), we find a best-fit coupling of J = 14.3 meV, reproducing the experimental Curie temperature of 1043 K. For FCC nickel (face-centered cubic, coordination number 12), J = 5.6 meV gives a Curie temperature around 631 K, close to the measured 627 K.

The same “match your neighbors” rule, tuned with a single number per material, reproduces real-world magnetism. The simulation runs on the actual crystal geometry, not an idealized cubic lattice, using the generic adjacency-list representation that was built in from the start.

What comes next

Several research directions are open. Anisotropy crossover studies how the critical behavior changes when the exchange coupling is not the same in all directions, interpolating between the Ising and Heisenberg universality classes. A quantum bridge would connect the classical Monte Carlo results to quantum spin models via the Suzuki-Trotter decomposition. And the long-term goal is a publication in Physical Review E, packaging the full finite-size scaling analysis, Kibble-Zurek measurements, and materials fitting into a single coherent paper.

The code is open source. The full technical details, validation notebooks, and WASM browser demo are all available in the repository.

Full research writeup