*Handouts:* diffeq_test.cpp, diffeq_routines.cpp,
eigen_tridiagonal_class.cpp.

The Session 6 notes have an introduction to algorithms for
integrating differential equations. In this session,
we'll go over the basic ideas by
example, using the routines in session06.zip, in preparation for
looking at "Anharmonic Oscillations" and "Differential Chaos in Phase
Space".

*
Your goals for this session:
*

- Take a look at using a C++ class to "encapsulate" the GSL functions in the "eigen_tridiagonal" code.
- Have a first look at parallel processing with OpenMP.
- Run a code to integrate a simple first-order differential equation using Euler's and 4th-order Runge-Kutta algorithms, then modify it so you can analyze the errors.
- Extras: Add code for 2nd-order Runge-Kutta. Adapt the code to treat the 2nd order F=ma problem.

Please work in pairs (more or less). The instructors will bounce around 1094 and answer questions.

The code eigen_tridiagonal_class.cpp, together with GslHamiltonian.cpp and GslHamiltonian.h, are a rewrite of eigen_tridiagonal.cpp from Session 5. A C++ class called "Hamiltonian" hides all of the GSL function calls from the main program. Minimal changes were made to the code for clarity (so it is not optimal!).

- Look at the eigen_tridiagonal_class.cpp printout and compare
to the eigen_tridiagonal.cpp printout from Session 5. If this is
your first exposure to a C++ class (or if you forget what you used
to know), there will be confusing aspects. A
detailed guide to the implementation will be given
in the Session 7 notes. For now, identify what
has happened to each of the GSL function calls.
*In what ways is the new version better? (For experts, what would do differently?)*

- Verify that the code still works. Try adding a loop
over the matrix dimension N.
*Does it work?*

- (Bonus)
*What additional functions might you define in the Hamiltonian class? What other classes might you define?*

OpenMP (not to be confused with Open MPI!) implements parallel processing
when there is *shared memory*, as is common these days with
multi-core hardware. If you have a dual-core processor that means that
in principle you can run your program twice as fast by running
two "threads" of your code
simultaneously on the two cores. If you have more cores available,
in principle the speed scales with the number of cores. One way to achieve this
in practice is to use OpenMP. We'll use a simple example written
by Chris Orban to illustrate how it works.
*Do this on Linux (or a Mac).*

- The 1094 machines each have multiple cores: give the command
`lscpu`(or (`sysctl hw`on a Mac).*How many cores are there?*

- Look at the program
`simpson_cosint_openmp.cpp`in an editor or in the printed copy. The key openmp features are the omp include statement and the`#pragma omp`statement, which is an instruction to the compiler.*What part of the program is executed in parallel?*

- Let's compare using one and two threads. There is a built-in
timer in the program, but run the program with
`time ./simpson_cosint_openmp`, which will automatically print the cpu usage*and*the wall time after you run the program. Compile the program with g++ following the instructions in the comments, and then run it.*Record the "num_time", the CPU time (first number), the wall time (third number), and the percent of the CPU used (fourth number).*

Then change from two threads to one thread by modifying the`omp_set_num_threads`command in the code, recompile it, and re-run.*Record the numbers again. Why do you think the CPU time is about the same but the wall time differs? Is the parallelization working?*

- Now try to use all the cores you can.
*How well does the program scale? (E.g., compare**(num_time for 1 core)/(num_time for n cores)*to*n*.) Why is it not perfect scaling?

- (BONUS) For completeness, you can try the Intel C++ compiler
(on linux), following
the instructions in the top comment section of the program.
*Does it behave the same way as g++?*

The code diffeq_test.cpp calls the differential-equation-solving routines in diffeq_routines.cpp ("euler" and "runge4") to integrate a series of coupled differential equations (but we'll start with a single equation). Functions for the right-hand-side of a first order differential equation (dy/dt = rhs[y,t]) and the exact y(t) [called "exact_answer"] for a specified initial condition are also defined in this file. There is also a header file diffeq_routines.h.

- Look through the Session 6 Background Notes for a quick overview of differential equation solving.
- Download and unzip session06.zip.
- Use make_diffeq_test to compile and link diffeq_test. Run the
program to generate diffeq_test.dat and look at it in an editor.
The gnuplot plotfile diffeq_test.plt generates comparison plots
of the integrated
function from the output in diffeq_test.out.
Load this plotfile in gnuplot:

`gnuplot> load "diffeq_test.plt"`

and examine the result.*What can you conclude at this point?*

- Look at the printout for diffeq_test.cpp and diffeq_routines.cpp
and compare to the Session 6 notes to figure out what is going on.
The codes follow the
notation in the notes. At present there is only
one equation (first-order), so only y[0] is used.
*What is the differential equation being integrated?*

- Modify the diffeq_test.plt file to plot the
**relative**error at each value of t. (Modify the plot file and NOT the program; see the gnuplot handout on plot files for an example of how to do this.) As usual in studying errors, a log-log scale will be useful. The first point at t=0 may get in the way. Use "set xrange [?:?]" in gnuplot (where you fill in the ?'s) to avoid this problem.*What can you say qualitatively about the errors?*

- Now generate and plot results for a second value of h (your plot
should have both values of h, so think about how to best do this).
You'll want it use something like 1/10 the value, so it's easy to
check the effect (e.g., if the difference goes like h
^{n}, then you'll see 10^{n}, which is easy to see on a log-log plot). When the local error (for each step h) adds**coherently**, then the "accumulated" or "global" error for a given algorithm at t_{f}scales like N_{f}*(local error) = (t_{f}/h)*(local error). You can verify for Euler's algorithm, for which the local error to be h^{2}(see notes and excerpt), that the global error is, in fact, h.*What is the local error for 4th-order Runge-Kutta according to the graph?*

- Next, check how the accumulated error at one value of t scales
with h for the two algorithms. Take t=1, for example. You'll need
to modify the code to output the results for y(t=1) for a range
of h's (think logarithmic!).
Some things to be careful of:
- Print out enough digits. For small h's, 9 digits is not enough (try 16).
- The most common problem is printing out y_rk4[0] and exact_answer(t,params_ptr) at two different points. (Note: it is not important that the t used is exactly the same for every h, but it must be the same for the exact and rk4 result for each h.) Look at your output file!

*Interpret them.*

*Given your results, how would you choose a step size for 4th-order Runge-Kutta?*(Hint: How do you explain the behavior of the error for small h?)

This exercise is intended to verify that you understand the meaning of the
Runge-Kutta algorithms and how they are implemented in C++.
*(Most likely for a future assignment.)*

- Add a third diffeq routine to the code, which implements the 2nd-order Runge-Kutta algorithms described in the Session 6 notes.
- Check the scaling of the error with h, i.e., is the error proportional
to a power of h?.
*What is the power?*

- Try a different differential equation, such as dy/dt =
2*cos(2pi*t).
*How do the errors compare to your first equation?*

Second-order differential equations are treated by writing them as two
coupled 1st-order equations, as described in the Session 6 notes.
We'll try this out for a simple example, which we'll generalize
later to look at chaotic behavior.
*(Most likely pushed to Session 7.)*

- Consider the differential equation for a simple, undriven
harmonic oscillator [e.g., a mass m on a spring with constant k:
d
^{2}x/dt^{2}+ (k/m)x = 0].*What is the general solution in terms of the initial position and velocity?*

*Rewrite the differential equation as two coupled 1st-order equations [for x and v=dx/dt].*Use units in which the oscillator mass and spring constant are equal to one. Take the initial position to be 1.0 and the initial velocity to be zero.

- Modify a copy of diffeq_test.cpp to use
the runge4 subroutine to calculate this oscillator for times t=0
to t=10.
You'll need to change N, modify rhs to consider both i==0 and i==1,
put the exact answer in
`exact_answer`, and change the limits of t appropriately. - Plot the result and the exact result for comparison.
*What do you conclude?*

furnstahl.1@osu.edu