780.20: 2082 Session 15
Handouts: "Three-Dimensional Plots with Gnuplot",
"Using the GDB Debugger", printouts of eqheat.cpp,
check_primes.c, and square_test.cpp
Today we'll look at a variety of small topics relevant for
computational physics, particularly on Linux systems.
Your goals for this session:
- Learn about 3-d plotting with gnuplot.
- Step through an example of how to use gdb and (maybe)
try out DDD.
- Try out the Intel C++ compiler.
- Explore how to make codes run faster and how to profile them.
3-D Plots with Gnuplot
In the first quarter of 780.20, we frequently used Gnuplot for
visualization, but we only considered two-dimensional plots. But now we
will want to make three-dimensional surface plots of functions and data.
- Follow through the handout on "Three-Dimensional Plots with
- Figure out how to make a parametric plot of a sphere using
trigonometric functions. [Hint: To plot a 2-d circle, you would
gnuplot> set parametric
gnuplot> plot [0:2*pi] sin(t),cos(t)
- Take a look at the eqheat.cpp code and guess at what it is doing.
Compile and link it using
make_eqheat and then run it to generate
eqheat.dat. Look at eqheat.dat and then plot it with gnuplot, using
the comments in the code and the handout as guides. Interpret the
plot for your partner.
Using the GDB Debugger
We'll step through a contrived example that illustrates the basic
commands and capabilities of a debugger (in this case, gdb). We will
use the command-line ("no windows") version of gdb. There are graphical
interfaces (such as ddd) that are much nicer to use for more extensive
debugging, but it will be worthwhile to start with the simple, primitive
- When debugging,
you may find it convenient to have two terminal windows open:
one to re-compile and link a sample code and another to run gdb in.
(Actually, you can interact with gdb directly through emacs, but
we won't go into that here.)
- Go through the example from the handout "Using the GDB Debugger".
The code to debug is check_primes.c (a copy is also provided, called
check_primes_orig.c, so that you can go back to the original if
We use a C code rather than a C++ code for the experience of seeing
the extra bugs you can get away with in C (the C++ compiler would
complain about several of the problems with the check_primes code).
- (BONUS) When you've got a working version of check_primes.c, copy it to
check_primes.cpp and convert it to C++ (including cin and cout).
- (BONUS) Try out the DDD interface to gdb by following through the
sample_ddd_session.ps.gz handout included in session15.tarz
(you will need to spend more time to learn how to use DDD
Squaring a Number
One of the most common floating-point operations is to square a number.
Two ways to square x are: pow(x,2) and x*x. Which is more efficient?
Is there an efficient alternative?
- Look at the printout for the square_test.cpp code. It implements
these two ways of squaring a number. The "clock" function from
time.h is used to find the elapsed time. Each operation is executed
a large number of times (determined by "repeat") so that we get
a reasonably accurate timing.
- Compile, link, and run the code. Adjust "repeat" until the
minimum time is at least 0.1 seconds.
Which way to square x is more efficient?
- If you have an expression (rather than just x) to square,
coding (expression)*(expression) is awkward and hard to read.
Wouldn't it be better to call a function (e.g., squareit(expression)?
Add to square_test.cpp a function:
double squareit (double x)
that returns x*x. Add a section to the code that times how long
this takes (just copy one of the other timing sections and edit
it appropriately). How does it compare to the others? What is the
"overhead" in calling a function? When is the overhead worthwhile?
- Another alternative: use #define to define a macro
that squares a number. Add
#define sqr(z) ((z)*(z))
somewhere before the start of main.
(The extra ()'s are safeguards against unexpected behavior;
always include them!)
Add a section to
the code to time how long this macro takes.
- One final alternative: add an "inline" function called square:
inline double square (double x)
that is the same as squareit but uses the "inline" keyword.
Add a section to
the code to time how long this function takes. What is your conclusion about
which of these methods to use? (Record the times for each method
for comparison below to the Intel compiler.)
- Finally, we'll try the simplest way to optimize a code: let the
compiler do it for you! Change the compile flag -O0 (no
optimization) to -O3 in
the CFLAGS line in make_square_test (that's the uppercase
letter O, not a zero). Recompile, link, and
run the code (note that $(MAKEFILE) was added to the line
with square_test.o to make sure that the program is
recompiled if the makefile is changed). How do the times for each
operation compare to the times before you optimized?
- In your project programs, once they are debugged and running,
you'll want to use the -O3 optimization flag. Note that there are
other options you can learn about using man g++.
Using the Intel C++ Compiler
It's very useful to have more than one compiler available. The Intel
C++ compiler, which is called "icc", is particularly good (assuming you
are running on an Intel processor such as a Pentium 4).
- In order to access the Intel compiler and libraries, we need to
set some environment variables. These will be installation dependent,
but the same settings work for all of the physics machines. One way
to do this is to set them in your .bashrc file. Instead,
we'll take a shortcut and use the "module" program. Type the commands
indicated in the following.
Now we're ready to use the compiler.
- Check available modules then look at the help for one of
module help intel
- Check all of the environment variables and then just the
ones with "intel" in their names (with either case):
printenv | grep -i intel
- Now load the intel module and check again:
printenv | grep -i intel
- Try the compiler on the square_test program. Copy
make_square_test to make_square_test_icc
and modify it as follows:
Note that you can do all this by simply defining alternative
variables, so that it is easy to switch back and forth. (Or else
redefine the variables rather than deleting the intial definitions,
so you can switch back simply by changing the order.)
- Change the program name to square_test_icc;
- Change the compiler from g++ to icc;
- Use the compiler flags -g -O0;
- Eliminate the warning flags (for now).
- Run the program and compare to the unoptimized g++ results.
- Now let's try optimization.
For icc, the options -O2 -tpp7 -xW provide very good
optimization. Try it!
For more information on other compiler flags to consider,
look at man icc or icc -help.
There are also optimized libraries, such as mkl_lapack.
- For g++, the -march pentium4 compiler option (arch is
for "architecture") performs optimization special for pentium 4.
A "profiling" tool, such as gprof, allows you to analyze how the
execution time of your program is divided among various function calls.
This information identifies the candidate sections of code for
optimization. (You don't want to waste time optimizing a part of the
code that is only active for 1% of the run time!)
We'll use the eigen_basis.cpp code from an earlier
session as a guinea pig.
- To use gprof, compile and link the relevant codes with the -pg
option. You can do this most easily by editing make_eigen_basis
and adding -pg to BOTH the CFLAGS and
LDFLAGS lines. (Note that make_eigen_basis has $(MAKEFILE) added to
two lines to ensure that the codes are recompiled if the makefile
- Execute the program as usual (choose a fairly large basis size
so that it takes a while to execute, building up statistics), which
generates a file called gmon.out that is used by gprof.
The program has to exit normally (e.g., you can't stop it with
control-c) and any existing gmon.out file will be overwritten.
- Run gprof and save the output to a file (e.g., gprof.out):
gprof eigen_basis >! gprof.out
Edit gprof.out and try to figure out from the "Flat profile"
and the explanations where (i.e., in what functions) the program spends
the most time. Would you try to optimize the section that finds the
- Try profiling the square_test code. You might like to know in
this case how much time each line uses, rather than each function.
Try (after recompiling with -pg) the -l option:
gprof -l square_test >! gprof.out
Are the results consistent with the timings from the program?
780.20: 2082 Session 15.
Last modified: 07:03 pm, March 06, 2005.