Algorithms page

A list of my favo[u]rite or recommended algorithms for reference. This is by no means complete. Most of these are described in the Numerical Recipes books. I will make this clearer as time goes on. A lot of this is based on Numerical Recipes and their recommendations.

Linear/quadratic/harmonic problems

Linear algebra

Multidimensional extremization of linear functions on finite domain (linear programming): see NR.

(I have never needed to do this.)

Nonlinear/anharmonic problems

Multidimensional extremization

To find local extrema (maxima or minima) of f(X), where f is scalar and X is vector: (*) Here, instead of inverting, LU factorization is also acceptable. In Mathematica: 'Fit' only does fits to linear combinations of basis functions. 'FindFit' ..? `FindMinimum' and Statistics'NonlinearFit' works on general nonlinear functions with several variables and parameters (but the functions must be differentiable, because Methods are LevenbergMarquardt, Newton or QuasiNewton). 'Minimize' is an algebraic tool. 'NMinimize' is the most general tool (which can use the Nelder-Mead simplex method).

Multidimensional root-finding (based on Numerical Recipes)

look here for definition of root. To find the root(s) of F(X), where F and X are vectors of the same length:
  • If you can only evaluate F(X): use quasi-Newton(Broyden's secant method) with backtracking(line search) as fallback.
  • If you can also evaluate and invert the Jacobian J=grad F(X): use Newton-Raphson with backtracking(line search) as fallback.
It is always a good idea to use |F|^2 as a convergence criterion (so that we know when we have to backtrack).

Solving simultaneous equations P(X)=Q(X) is equivalent to finding the roots of the function F(X)=P(X)-Q(X).
This includes equations of the type X=Q(X); these are equivalent to finding the roots of F(X)=Q(X)-X;
They can sometimes be solved by fixed-point iteration X:=Q(X), but this is only linearly convergent, if at all. It corresponds to taking X:=X - F(X). It is better to use Newton's method, which takes X:=X - (grad F(X))^(-1) F(X), and is quadratically convergent. For example, instead of x:=cos(x), use x:=x-(cos x-x)/(-sin x-1), which converges much faster.

Extremization problems for continuous functions are related to root-finding problems. NR indicates that the latter are easier.
A root of grad f(X) sometimes corresponds to an extremum of f(X) (but sometimes it indicates a saddle-point instead).
An extremum of f(X) sometimes corresponds to a root of grad f(X) (but boundary extrema and singularities may not).
A root of F(X) always corresponds to a root, and a global minimum, of |F(X)|^2.
A minimum of |F(X)|^2 sometimes corresponds to a root of F(X) (but sometimes it is only a local minimum).

Global extremization for discrete systems (e.g., travelling salesman, ground state of spin glass) is usually very difficult.

Self-consistent mean field theories for magnetism (e.g., Weiss) or superconductivity (e.g., BCS/Eliashberg) are sometimes formulated as root-finding problems (solve the gap equation), and sometimes as extremization problems (minimize the Hubbard-Stratonovich free energy with respect to the order parameter). The latter approach is more appealing to me, when it is feasible.

Finding the ground state of a few-body quantum mechanical problem using variational methods is (basically) a quadratic extremization problem; finding excited states is an eigenproblem (diagonalization); finding Green functions G=(E-H)^(-1) is an inversion problem. These are all "linear" problems because the Schrodinger equation is linear.

Sampling

  • Discrete distributions: Walker's alias method
  • Continuous 1D distributions: inverse CDF method, rejection, continuous version of WAM
  • Multi-dimensional Gaussian distributions: can be done by sampling in the eigenbasis or some other suitable basis ...(???)and transforming --- for a general matrix, this requires that one is able to perform diagonalization (.... or is Cholesky factorization good enough?) I can't remember.
  • General high-dimensional distributions: Markov chain Monte Carlo
    • Simple Metropolis method can usually be improved upon ...
    • O(N) spin models: Cluster methods (Wolff etc) usually most efficient, but there are some systems to which they are not applicable
    • O(N) spin models with long-range interactions: Luijten-Blöte
    • Continuous variables: Multilevel Metropolis (if good level actions are known); otherwise, Hamiltonian Monte Carlo with preconditioning (Fourier acceleration)

Monte Carlo estimation of critical properties

  • Certain systems have duality relations which can be exploited ...
  • For order-disorder transitions, Binder parameter is probably the best general way to get Tc.
  • For 2D XY, it is said that Binder doesn't work. However, the helicity modulus (/helical modulus/spinwave stiffness/superfluid density) works as a kind of order parameter; one can apply finite-size scaling ... to try and find Tc.
  • Critical exponents: I haven't really scanned the literature yet. There is MCRG etc.

Estimating errors of correlated data (from Markov chain Monte Carlo simulations)

Data points from MCMC simulations are usually not independent. If one estimates the variance of the average assuming that the samples are independent, one obtains underestimates ...
  • Autocorrelation method: compute autocorrelation and use it to correct the variance of the average (upwards).
  • Binary reblocking scheme of Flyvbjerg and Petersen: take averages of several samples ("blocks"), compute variances of the blocks; large enough blocks - correlation is negligible, variance is true variance. Said to be quicker than autocorrelation method.
  • I use an `in-situ' implementation of binary reblocking, i.e., the block averages and variances are accumulated as the data is generated. This avoids the need to store large amounts of MCMC data with little information content. As a side benefit, in binary reblocking one is always combining quantities of comparable magnitude (instead of adding the 10001th sample to the sum of 1-10000), so no need to worry about loss of precision. (This is an advantage that the FFT also has over the simple-minded Discrete Fourier Transform.)

Integrating functions with singularities

Say we want to integrate a function f(x) that has a singularity at x=x_0 (where x can be a vector).
  • If the type, location, and amplitude of the singularity is known, one can construct an analytically integrable function g(x) that posses the same singularity; then, integrate g(x) analytically and (f(x)-g(x)) numerically, and combine. (Example: in spinwaves.nb)
  • If the type and location are known but the amplitude is unknown, then use a coordinate transformation to get rid of the singularity. (Example: integrating a function that behaves roughly as A/sqrt(1-x^2), where A is unknown: let x=cos theta, so that the Jacobian (sin theta) cancels out the integrable singularity)
  • If the type or location is unknown: --- Mathematica's integration routines can supposedly detect singularities automatically and handle them.

Sorting

  • Remember that while quicksort and mergesort are fun and O(N lg N), radix sort/bin sort is O(N) and generally beats them.
  • For similar reasons, binary trees, AVL trees and red-black trees are a fun way of maintaining sorted lists, but hash tables are probably more efficient in time (though they take more space). (Do I mean hash tables? Maybe I mean something else.)

Transforming probability distributions

If you have calculated a function f(x_i) at several points, e.g., the displacement of a string under a load, and wish to know its values f(x_j)at other points, you can use interpolation. There are various types of interpolation algorithms, of course; most of the polynomial interpolation schemes should be equivalent to using the Lagrange polynomial (see Numerical Recipes).

Now suppose you have the values of a probability DISTRIBUTION P(x_i) and wish to know P(x_j). Strictly speaking, it is bad practice to sample a distribution at discrete points; one should really not work with "P(x_i)" but with "P(x_i < x < x_j)"; the scheme below is even better for this case ... It is quite likely that the original values were normalized with respect to a set of quadrature weights, such that sum_i w_i P(x_i)=1. Now, if you simply interpolate to obtain P(x) and then P(x_j), this sum rule is no longer guaranteed to be exactly satisfied! It is safer to integrate the discrete distribution to obtain a cumulative distribution function C(x_i), interpolate this to get a continuous function C(x), sample it to get C(x_j), and take finite differences to get P(x_j).

If you want to make a change of independent variable, i.e., find P(y) such that P(x)dx=P(y)dy, you can use the above steps, noting that C(x)=C(y) for y=y(x).

Of course, this method doesn't work in more than 1D. I suppose the correct way to treat 2D distributions is to divide the domain into finite elements X_1,..., X_N, and store the integrated weight in each element W_i. Then, to find the weight in a new finite element Y_i, you would calculate the overlaps between Y_i and every X_j, and mix the weights accordingly ... yucks! There is a sparse matrix that describes this linear transformation. ... Presumably the engineers who do finite element analysis know all about this. ... That is:

   For continuous distributions:  Q(y) = P(x) dx/dy      (dx/dy=J(y) is Jacobian)
   For piecewise-uniform distributions represented by binned data:
      P_i = integral of P(x) over interval i
      Q_j = integral of Q(y) over interval j      
      Q_j = J_{ji} P_i
   where it is assumed that P(x) is constant over interval i.
   I think this is a smoothing transformation, which is non-invertible;
   don't re-bin your data too many times!