780.20: 1094 Session 14
Handouts: "Using the GDB Debugger", gdb reference sheet,
printouts of multimin_sa_compare.cpp, deltashell_boundstates.cpp and
check_primes.cpp, excerpt on simulated annealing
In this session, we'll try out the GDB debugger and
consider a toy problem comparing multidimensional minimization with
standard downhill routines for local minima
to an application of Monte Carlo techniques to find
the global minimum:
simulated annealing.
Using the GDB Debugger
We'll step through a contrived example that illustrates the basic
commands and capabilities of a debugger (in this case, the GNU debugger
gdb). We will
use the command-line ("no windows") version of gdb. There are graphical
interfaces (such as ddd) that are much nicer to use for more extensive
debugging, but it will be worthwhile to start with the simple, primitive
version.
- When debugging,
you may find it convenient to have two terminal windows open:
one to re-compile and link a sample code and another to run gdb in.
(Actually, you can interact with gdb directly through emacs, but
we won't go into that here.)
- Go through the example from the handout "Using the GDB Debugger".
The code to debug is check_primes.cpp (a copy is also provided, called
check_primes_orig.cpp, so that you can go back to the original if
necessary).
We've also created a makefile that doesn't use any of our usual
warning flags (at first!).
- (BONUS) Try out the DDD interface to gdb by following through the
sample_ddd_session.ps.gz handout included in session14.zip
(you will need to spend more time than this to learn how to use DDD
efficiently!).
Simulated Annealing
Standard optimization methods are very good in
finding local minima near where the minimization was started, but not
good at finding the global minimum. Finding the global minimum
of a function (such as the energy) is often (but not always) the goal.
One strategy using conventional minimizers is to start the minimization
at many different places in the parameter space (perhaps chosen at
random) and keep the best minimum found.
Here we'll adapt our Monte Carlo algorithm for generating a canonical
Boltzmann distribution of configurations at a temperature T to mimic how
physical systems find their ground states (i.e., the energy minimum at
T=0). At high temperature (compared to characteristic energy
spacings), the equilibrium distribution will include many states. If
the system is cooled slowly, then it will have the opportunity
to settle into the lowest energy state as T goes to zero. This is
called annealing. If the system is cooled quickly, it can get
stuck in a state that is not the minimum ("quenching"); this is
analogous to the routines we looked at in Session 10, which rapidly go
"downhill" to find local minima.
The strategy here is to simulate the annealing process by treating the
function to be minimized as an energy (it might actually be an energy!),
introducing an artificial temperature T, generating a sequence of states
in a canonical distribution via the Metropolis algorithm.
Then we lower the temperature according to a "schedule" and let the
system settle into (hopefully!) a configuration that minimizes the
energy.
Global vs. Local Minimum
We first consider the simple example of a damped sine wave in one
dimension, which has
many local minima.
f(x) = e-(x-1)2sin(8x)
(This example is from the GSL documentation.) Suppose we start with an
initial guess of 15.5; let's see how
"downhill" minimization methods compare to
simulated annealing.
- Plot the function in Mathematica (see test_function.nb)
so we know what to expect ahead of
time (remember that in general we won't be able to
do this!). Use FindMinimum starting near 15.5 to find the local
minimum. Then find the global minimum (using the graph to identify
a good starting value).
Answers:
- The code multimin_sa_compare.cpp (compile and link with
make_multimin_sa_compare) applies the GSL routines we used in a
previous session to find the minimum and then applies a simple
simulated annealing algorithm. Run the program and look at the code to
figure out where the simulated annealing control parameters (starting
and finishing temperatures, maximum step size, cooling rate, # of
iterations at each temperature) are set.
How well does each method work?
- Now adjust the simulated annealing control parameters to give it a
chance! The step size should be large enough (or small enough)
so that successes are neither assured nor happen too infrequently
(50% would be ideal!).
The initial temperature should be considerably larger
than the largest typical delta_E normally encountered.
You shouldn't cool to rapidly. Make these adjustments and rerun the
code. What did you do to get it to work best?
- Just to check that the Metropolis algorithm is doing something,
once the code is working, change the line with "random < exp()"
to "random > exp()" and verify that it no longer works.
- Bonus: Switch the code to use classes defined in previous
sessions.
Optimization 101: Squaring a Number
One of the most common floating-point operations is to square a number.
Two ways to square x are: pow(x,2) and x*x. Let's test how efficient
they are.
- Look at the printout for the square_test.cpp code. It implements
these two ways of squaring a number. The "clock" function from
time.h is used to find the elapsed time. Each operation is executed
a large number of times (determined by "repeat") so that we get
a reasonably accurate timing.
- We've set the optimization to its lowest value, -O0
("minus oh zero"), to
start in make_square_test.
- Compile square_test.cpp (using make_square_test) and run it.
Adjust "repeat" if the
minimum time is too small.
Record the times. Which way to square x is more efficient?
- If you have an expression (rather than just x) to square,
coding (expression)*(expression) is awkward and hard to read.
Wouldn't it be better to call a function (e.g., squareit(expression)?
Add to square_test.cpp a function:
double squareit (double x)
that returns x*x. Add a section to the code that times how long
this takes (just copy one of the other timing sections and edit
it appropriately, making sure to keep the "final y" cout statement).
How does it compare to the others? What is the
"overhead" in calling a function (that is, how much extra time does
it take)? When is the overhead worthwhile?
- Another alternative, common from C programming:
use #define to define a macro
that squares a number. Add
#define sqr(z) ((z)*(z))
somewhere before the start of main.
(The extra ()'s are safeguards against unexpected behavior;
always include them!)
Add a section to
the code to time how long this macro takes; what do you find?
- One final alternative: add an "inline" function called square:
inline double square (double x) { return (x*x); };
that is a function prototype and the function itself.
Put it up top with the squareit prototype.
Add a section to
the code to time how long this function takes.
What is your conclusion about
which of these methods to use?
- Finally, we'll try the simplest way to optimize a code: let the
compiler do it for you! Change the compile flag -O0 (no
optimization) to -O2 (that's the uppercase
letter O, not a zero). Recompile and
run the code.
How do the times for each
operation compare to the times before you optimized?
What do you conclude?
- In your project programs, once they are debugged and running,
you'll want to use the -O2 (or maybe -O3) optimization flag.
Profiling
A "profiling" tool, such as gprof, allows you to analyze how the
execution time of your program is divided among various function calls.
This information identifies the candidate sections of code for
optimization. (You don't want to waste time optimizing a part of the
code that is only active for 1% of the run time!)
We'll use the eigen_basis_class.cpp code from an earlier
session as a guinea pig.
- To use gprof, compile and link the relevant codes with the -pg
option. You can do this most easily by editing
make_eigen_basis_class
and adding -pg to BOTH the CFLAGS and
LDFLAGS lines. (Note that make_eigen_basis_class has $(MAKEFILE) added to
several lines to ensure that the codes are recompiled if the makefile
changes.)
- Execute the program as usual (choose a fairly large basis size
so that it takes a while to execute, building up statistics), which
generates a file called gmon.out that is used by gprof.
The program has to exit normally (e.g., you can't stop it with
control-c) and any existing gmon.out file will be overwritten.
- Run gprof and save the output to a file (e.g., gprof.out):
gprof eigen_basis_class > gprof.out
Edit gprof.out and try to figure out from the "Flat profile"
and the explanations where (i.e., in what functions) the program spends
the most time. What are the most time-consuming functions?
Would you try to speed up the part that finds the
eigenvalues?
- Try profiling the square_test code (with -O0
optimization again). You might like to know in
this case how much time each line uses, rather than each function.
Try (after recompiling with -pg) the -l option:
gprof -l square_test >! gprof.out
Are the results consistent with the timings from the program?
780.20: 1094 Session 14.
Last modified: 09:13 am, March 09, 2009.
furnstahl.1@osu.edu