# 780.20: 2082 Session 15

Handouts: "Three-Dimensional Plots with Gnuplot", "Using the GDB Debugger", printouts of eqheat.cpp, check_primes.c, and square_test.cpp

Today we'll look at a variety of small topics relevant for computational physics, particularly on Linux systems.

• Learn about 3-d plotting with gnuplot.
• Step through an example of how to use gdb and (maybe) try out DDD.
• Try out the Intel C++ compiler.
• Explore how to make codes run faster and how to profile them.

## 3-D Plots with Gnuplot

In the first quarter of 780.20, we frequently used Gnuplot for visualization, but we only considered two-dimensional plots. But now we will want to make three-dimensional surface plots of functions and data.

1. Follow through the handout on "Three-Dimensional Plots with Gnuplot".
2. Figure out how to make a parametric plot of a sphere using trigonometric functions. [Hint: To plot a 2-d circle, you would type:
```  gnuplot> set parametric
gnuplot> plot [0:2*pi] sin(t),cos(t)
```
3. Take a look at the eqheat.cpp code and guess at what it is doing. Compile and link it using make_eqheat and then run it to generate eqheat.dat. Look at eqheat.dat and then plot it with gnuplot, using the comments in the code and the handout as guides. Interpret the plot for your partner.

## Using the GDB Debugger

We'll step through a contrived example that illustrates the basic commands and capabilities of a debugger (in this case, gdb). We will use the command-line ("no windows") version of gdb. There are graphical interfaces (such as ddd) that are much nicer to use for more extensive debugging, but it will be worthwhile to start with the simple, primitive version.

1. When debugging, you may find it convenient to have two terminal windows open: one to re-compile and link a sample code and another to run gdb in. (Actually, you can interact with gdb directly through emacs, but we won't go into that here.)
2. Go through the example from the handout "Using the GDB Debugger". The code to debug is check_primes.c (a copy is also provided, called check_primes_orig.c, so that you can go back to the original if necessary). We use a C code rather than a C++ code for the experience of seeing the extra bugs you can get away with in C (the C++ compiler would complain about several of the problems with the check_primes code).
3. (BONUS) When you've got a working version of check_primes.c, copy it to check_primes.cpp and convert it to C++ (including cin and cout).
4. (BONUS) Try out the DDD interface to gdb by following through the sample_ddd_session.ps.gz handout included in session15.tarz (you will need to spend more time to learn how to use DDD efficiently!).

## Squaring a Number

One of the most common floating-point operations is to square a number. Two ways to square x are: pow(x,2) and x*x. Which is more efficient? Is there an efficient alternative?

1. Look at the printout for the square_test.cpp code. It implements these two ways of squaring a number. The "clock" function from time.h is used to find the elapsed time. Each operation is executed a large number of times (determined by "repeat") so that we get a reasonably accurate timing.
2. Compile, link, and run the code. Adjust "repeat" until the minimum time is at least 0.1 seconds. Which way to square x is more efficient?

3. If you have an expression (rather than just x) to square, coding (expression)*(expression) is awkward and hard to read. Wouldn't it be better to call a function (e.g., squareit(expression)? Add to square_test.cpp a function:
double squareit (double x)
that returns x*x. Add a section to the code that times how long this takes (just copy one of the other timing sections and edit it appropriately). How does it compare to the others? What is the "overhead" in calling a function? When is the overhead worthwhile?

4. Another alternative: use #define to define a macro that squares a number. Add
#define sqr(z) ((z)*(z))
somewhere before the start of main. (The extra ()'s are safeguards against unexpected behavior; always include them!) Add a section to the code to time how long this macro takes.
5. One final alternative: add an "inline" function called square:
inline double square (double x)
that is the same as squareit but uses the "inline" keyword. Add a section to the code to time how long this function takes. What is your conclusion about which of these methods to use? (Record the times for each method for comparison below to the Intel compiler.)

6. Finally, we'll try the simplest way to optimize a code: let the compiler do it for you! Change the compile flag -O0 (no optimization) to -O3 in the CFLAGS line in make_square_test (that's the uppercase letter O, not a zero). Recompile, link, and run the code (note that \$(MAKEFILE) was added to the line with square_test.o to make sure that the program is recompiled if the makefile is changed). How do the times for each operation compare to the times before you optimized?

7. In your project programs, once they are debugged and running, you'll want to use the -O3 optimization flag. Note that there are other options you can learn about using man g++.

## Using the Intel C++ Compiler

It's very useful to have more than one compiler available. The Intel C++ compiler, which is called "icc", is particularly good (assuming you are running on an Intel processor such as a Pentium 4).

1. In order to access the Intel compiler and libraries, we need to set some environment variables. These will be installation dependent, but the same settings work for all of the physics machines. One way to do this is to set them in your .bashrc file. Instead, we'll take a shortcut and use the "module" program. Type the commands indicated in the following.
• Check available modules then look at the help for one of interest here:
module avail
module help intel
• Check all of the environment variables and then just the ones with "intel" in their names (with either case):
printenv
printenv | grep -i intel
• Now load the intel module and check again:
module intel
printenv | grep -i intel
Now we're ready to use the compiler.
2. Try the compiler on the square_test program. Copy make_square_test to make_square_test_icc and modify it as follows:
• Change the program name to square_test_icc;
• Change the compiler from g++ to icc;
• Use the compiler flags -g -O0;
• Eliminate the warning flags (for now).
Note that you can do all this by simply defining alternative variables, so that it is easy to switch back and forth. (Or else redefine the variables rather than deleting the intial definitions, so you can switch back simply by changing the order.)
3. Run the program and compare to the unoptimized g++ results.

4. Now let's try optimization. For icc, the options -O2 -tpp7 -xW provide very good optimization. Try it! For more information on other compiler flags to consider, look at man icc or icc -help. There are also optimized libraries, such as mkl_lapack.

5. For g++, the -march pentium4 compiler option (arch is for "architecture") performs optimization special for pentium 4.

## Profiling

A "profiling" tool, such as gprof, allows you to analyze how the execution time of your program is divided among various function calls. This information identifies the candidate sections of code for optimization. (You don't want to waste time optimizing a part of the code that is only active for 1% of the run time!) We'll use the eigen_basis.cpp code from an earlier session as a guinea pig.

1. To use gprof, compile and link the relevant codes with the -pg option. You can do this most easily by editing make_eigen_basis and adding -pg to BOTH the CFLAGS and LDFLAGS lines. (Note that make_eigen_basis has \$(MAKEFILE) added to two lines to ensure that the codes are recompiled if the makefile changes.)
2. Execute the program as usual (choose a fairly large basis size so that it takes a while to execute, building up statistics), which generates a file called gmon.out that is used by gprof. The program has to exit normally (e.g., you can't stop it with control-c) and any existing gmon.out file will be overwritten.
3. Run gprof and save the output to a file (e.g., gprof.out):
gprof eigen_basis >! gprof.out
Edit gprof.out and try to figure out from the "Flat profile" and the explanations where (i.e., in what functions) the program spends the most time. Would you try to optimize the section that finds the eigenvalues?

4. Try profiling the square_test code. You might like to know in this case how much time each line uses, rather than each function. Try (after recompiling with -pg) the -l option:
gprof -l square_test >! gprof.out
Are the results consistent with the timings from the program?