780.20: 1094 Session 14

Handouts: "Using the GDB Debugger", gdb reference sheet, printouts of multimin_sa_compare.cpp, deltashell_boundstates.cpp and check_primes.cpp, excerpt on simulated annealing

In this session, we'll try out the GDB debugger and consider a toy problem comparing multidimensional minimization with standard downhill routines for local minima to an application of Monte Carlo techniques to find the global minimum: simulated annealing.


Using the GDB Debugger

We'll step through a contrived example that illustrates the basic commands and capabilities of a debugger (in this case, the GNU debugger gdb). We will use the command-line ("no windows") version of gdb. There are graphical interfaces (such as ddd) that are much nicer to use for more extensive debugging, but it will be worthwhile to start with the simple, primitive version.

  1. When debugging, you may find it convenient to have two terminal windows open: one to re-compile and link a sample code and another to run gdb in. (Actually, you can interact with gdb directly through emacs, but we won't go into that here.)
  2. Go through the example from the handout "Using the GDB Debugger". The code to debug is check_primes.cpp (a copy is also provided, called check_primes_orig.cpp, so that you can go back to the original if necessary). We've also created a makefile that doesn't use any of our usual warning flags (at first!).
  3. (BONUS) Try out the DDD interface to gdb by following through the sample_ddd_session.ps.gz handout included in session14.zip (you will need to spend more time than this to learn how to use DDD efficiently!).

Simulated Annealing

Standard optimization methods are very good in finding local minima near where the minimization was started, but not good at finding the global minimum. Finding the global minimum of a function (such as the energy) is often (but not always) the goal. One strategy using conventional minimizers is to start the minimization at many different places in the parameter space (perhaps chosen at random) and keep the best minimum found.

Here we'll adapt our Monte Carlo algorithm for generating a canonical Boltzmann distribution of configurations at a temperature T to mimic how physical systems find their ground states (i.e., the energy minimum at T=0). At high temperature (compared to characteristic energy spacings), the equilibrium distribution will include many states. If the system is cooled slowly, then it will have the opportunity to settle into the lowest energy state as T goes to zero. This is called annealing. If the system is cooled quickly, it can get stuck in a state that is not the minimum ("quenching"); this is analogous to the routines we looked at in Session 10, which rapidly go "downhill" to find local minima.

The strategy here is to simulate the annealing process by treating the function to be minimized as an energy (it might actually be an energy!), introducing an artificial temperature T, generating a sequence of states in a canonical distribution via the Metropolis algorithm. Then we lower the temperature according to a "schedule" and let the system settle into (hopefully!) a configuration that minimizes the energy.

Global vs. Local Minimum

We first consider the simple example of a damped sine wave in one dimension, which has many local minima.
f(x) = e-(x-1)2sin(8x)

(This example is from the GSL documentation.) Suppose we start with an initial guess of 15.5; let's see how "downhill" minimization methods compare to simulated annealing.

  1. Plot the function in Mathematica (see test_function.nb) so we know what to expect ahead of time (remember that in general we won't be able to do this!). Use FindMinimum starting near 15.5 to find the local minimum. Then find the global minimum (using the graph to identify a good starting value). Answers:


  2. The code multimin_sa_compare.cpp (compile and link with make_multimin_sa_compare) applies the GSL routines we used in a previous session to find the minimum and then applies a simple simulated annealing algorithm. Run the program and look at the code to figure out where the simulated annealing control parameters (starting and finishing temperatures, maximum step size, cooling rate, # of iterations at each temperature) are set. How well does each method work?



  3. Now adjust the simulated annealing control parameters to give it a chance! The step size should be large enough (or small enough) so that successes are neither assured nor happen too infrequently (50% would be ideal!). The initial temperature should be considerably larger than the largest typical delta_E normally encountered. You shouldn't cool to rapidly. Make these adjustments and rerun the code. What did you do to get it to work best?



  4. Just to check that the Metropolis algorithm is doing something, once the code is working, change the line with "random < exp()" to "random > exp()" and verify that it no longer works.
  5. Bonus: Switch the code to use classes defined in previous sessions.

Optimization 101: Squaring a Number

One of the most common floating-point operations is to square a number. Two ways to square x are: pow(x,2) and x*x. Let's test how efficient they are.

  1. Look at the printout for the square_test.cpp code. It implements these two ways of squaring a number. The "clock" function from time.h is used to find the elapsed time. Each operation is executed a large number of times (determined by "repeat") so that we get a reasonably accurate timing.
  2. We've set the optimization to its lowest value, -O0 ("minus oh zero"), to start in make_square_test.
  3. Compile square_test.cpp (using make_square_test) and run it. Adjust "repeat" if the minimum time is too small. Record the times. Which way to square x is more efficient?



  4. If you have an expression (rather than just x) to square, coding (expression)*(expression) is awkward and hard to read. Wouldn't it be better to call a function (e.g., squareit(expression)? Add to square_test.cpp a function:
    double squareit (double x)
    that returns x*x. Add a section to the code that times how long this takes (just copy one of the other timing sections and edit it appropriately, making sure to keep the "final y" cout statement). How does it compare to the others? What is the "overhead" in calling a function (that is, how much extra time does it take)? When is the overhead worthwhile?



  5. Another alternative, common from C programming: use #define to define a macro that squares a number. Add
    #define sqr(z) ((z)*(z))
    somewhere before the start of main. (The extra ()'s are safeguards against unexpected behavior; always include them!) Add a section to the code to time how long this macro takes; what do you find?



  6. One final alternative: add an "inline" function called square:
    inline double square (double x) { return (x*x); };
    that is a function prototype and the function itself. Put it up top with the squareit prototype. Add a section to the code to time how long this function takes. What is your conclusion about which of these methods to use?



  7. Finally, we'll try the simplest way to optimize a code: let the compiler do it for you! Change the compile flag -O0 (no optimization) to -O2 (that's the uppercase letter O, not a zero). Recompile and run the code. How do the times for each operation compare to the times before you optimized? What do you conclude?



  8. In your project programs, once they are debugged and running, you'll want to use the -O2 (or maybe -O3) optimization flag.



Profiling

A "profiling" tool, such as gprof, allows you to analyze how the execution time of your program is divided among various function calls. This information identifies the candidate sections of code for optimization. (You don't want to waste time optimizing a part of the code that is only active for 1% of the run time!) We'll use the eigen_basis_class.cpp code from an earlier session as a guinea pig.

  1. To use gprof, compile and link the relevant codes with the -pg option. You can do this most easily by editing make_eigen_basis_class and adding -pg to BOTH the CFLAGS and LDFLAGS lines. (Note that make_eigen_basis_class has $(MAKEFILE) added to several lines to ensure that the codes are recompiled if the makefile changes.)
  2. Execute the program as usual (choose a fairly large basis size so that it takes a while to execute, building up statistics), which generates a file called gmon.out that is used by gprof. The program has to exit normally (e.g., you can't stop it with control-c) and any existing gmon.out file will be overwritten.
  3. Run gprof and save the output to a file (e.g., gprof.out):
    gprof eigen_basis_class > gprof.out
    Edit gprof.out and try to figure out from the "Flat profile" and the explanations where (i.e., in what functions) the program spends the most time. What are the most time-consuming functions? Would you try to speed up the part that finds the eigenvalues?



  4. Try profiling the square_test code (with -O0 optimization again). You might like to know in this case how much time each line uses, rather than each function. Try (after recompiling with -pg) the -l option:
    gprof -l square_test >! gprof.out
    Are the results consistent with the timings from the program?




780.20: 1094 Session 14. Last modified: 09:13 am, March 09, 2009.
furnstahl.1@osu.edu