6810: 1094 Session 15
In this session, we'll try out some basic examples of parallel
processing using MPI. Two versions of the test codes are available:
one set using C++ classes in the main Session 15 directory and another
set using functions in the function_versions subdirectory.
Introduction to Parallel Processing with MPI
This is a very brief introduction by example to using "Open MPI", which is
an implementation of the "Message Passing Interface" or MPI (a library
and software standard). We've set things up so we can use
the computers in Smith 1094 running Linux as parallel processors.
Communication between machines is carried out via ssh but we'll
set it up so you don't need to enter passwords repeatedly.
If your computer is not in Linux already, restart it now.
In a message passing implementation of parallel processing, multiple
"tasks" are created, each with a unique identifying name, which run on
different computers (multiple processes can run on one machine as
well). The tasks talk to each other by sending and receiving
A "message" is an array with elements that include data, identifiers
for the source and destination processes, and a tag used to distinguish
Usually a fixed set of processes is created when the program is
initialized but the processes can be executing different programs
(this is referred to as "Multiple Instruction Multiple Data" or MIMD).
There are many functions defined in the MPI library. We'll look
at a few of the most important and heavily used functions here. See
the documentation list below for more information.
- Setting up ssh.
This procedure will let you login to all of the machines in 1094
without giving a password every time. (Skip any steps you've
already done before.)
- Generate the files ~/.ssh/id_rsa and
~/.ssh/id_rsa.pub with the command:
ssh-keygen -t rsa
(hit return at the "Enter file" request). When prompted,
enter a passphrase
(this is like a password, but generally longer; a good example
is "physics6810rocks" :), which you
will need to remember. Were id_rsa and
- Copy the contents of ~/.ssh/id_rsa.pub into the file
(you will have to create this file if it does not already exist).
Set the permissions with: chmod 600 authorized_keys
- Normally you would need to type ssh-agent $SHELL, but
should already be running (check with ps aux | grep agent).
If there is not a process called ssh-agent, then type
- Give the command ssh-add. You will be prompted (just
this once!) for your passphrase. If successful, you will get
an "Identity added:" message.
- Check that you can ssh without a password by connecting
to at least four other computers (logging out each time; this only works
from your original computer, where you typed ssh-add).
The computer names in Smith 1094 start
with "sm1094" and end with a, b, c, ..., r. The first time you
ssh, you will be asked if you want to connect. Answer "yes"
and then you won't be asked again.
Did it work?
- Setting up the environment.
You need to add "/usr/lib64/openmpi/bin" to your PATH (which is searched
to find programs) and
"/usr/lib64/openmpi/lib" to your LD_LIBRARY_PATH (which is searched to find libraries to link to).
You can set these by separate commands in your shell scripts,
but here we'll do it with the module program:
module load openmpi-x86_64
(give the command module avail to see all of the
Check your shell with the command: printenv SHELL.
If you are using tcsh as your shell (most of you),
add the module command to your .cshrc.path file
(create this file in your main directory if it doesn't exist).
If you are using bash as
your shell, put the module command in your .bashrc.
To activate the module command the first time, type
(or source .bashrc). It will be automatically
sourced when you login (or ssh to another computer).
- MPI Hello, World! program.
- Go to the "mpi_examples" directory and look at the
"make_mpi_hello_world" makefile. Note that it compiles the
"mpi_hello_world.cpp" code using mpiCC, which for us invokes
g++ as the compiler. We've also limited the warning flags.
- The "mpi_hello_world.cpp" code illustrates some of
the basics of using MPI
with a workstation cluster. Look through the printout and then
compile and link using the makefile. To run it, we will
create a list of machines to use in starting up processes.
Create a file called
"machines.1094" (or something like that, the actual name doesn't
matter) and put in a list of four computers to run on. Here is an
example (lines begining with "#" are comments and don't need
to be included):
# Change this file to contain the machines that you want
# to run MPI jobs on. The format is one host name per line,
# with either
# where n is the number of processors in an SMP. The hostname
# should be the same as the result from the command "hostname".
(We used sm1094a.mps.ohio-state.edu instead of sm1094a because that is
what is returned from "hostname").
To check whether a particular computer is available (i.e., turned on
and booted into Linux), for
example sm1094b, give the command "ssh sm1094b ps" (you'll either
get output on running processes, which means it is available, or a
"not available" response).
- Run the code with the command:
mpiexec -machinefile machines.1094 -np 4 mpi_hello_world
The "-np 4" option tells MPI to use four processes.
Note that the same program runs independently ("asynchronously") four
times (these are the four processes), on four different computers in this
- Change the global array in the program to have 10 unique entries
and try running the code with 10 processes (but with the same
machine list that has only 4 computers listed). What happens?
Run it a few times; do you always get responses in the same order?
Do you gain anything by running more processes than computers?
Do you lose anything?
- Send and Receive.
In the program from the last section, there was no communication
between the processes running in parallel.
Here we use a program that demonstrates two of the
functions to send and receive messages (there are more functions!).
The program doesn't do anything
but send numbers back and forth, but can easily be generalized to more
- Take a look at the "mpi_send_receive.cpp" printout.
The part of the code that does the work is quite small; most
of the code is taken up by error
checking and processing, which is implemented using C++'s
"try" and "catch" formalism.
You'll find the standard calls to Init, Get_rank, and Get_size at
the beginning, and Finalize at the end. The processes are divided
into the "master" process (rank 0), which coordinates the
calculation, and the "slave" processes (all the others).
Send and Receive functions are used to transmit data (in this
case, just an integer) between master and slaves.
- Compile and link the code with "make_mpi_send_receive",
set up a machinefile (e.g., machines.1094), and run with
four processors using
mpiexec -machinefile machines.1094 -np 4 mpi_send_receive.
Verify that the program works and try running with 20 processes.
How are the "tags" being used?
- Modify the code so that a different test integer
is sent to each slave process (e.g., send the test integer read from
the terminal added to
the rank of the slave process), and then have
the slave processes send back a double
(MPI::DOUBLE), rather than an integer, equal to the square root
of the integer sent.
Did it work?
- Try modifying the code so that the Master waits for each message
in turn from the slaves, rather than receiving from any source
What did you do?
- Pick one code from the semester and describe how
you would use MPI Send and Receive to run it in parallel.
- Calculating pi in parallel.
Here we explore a program to calculate pi in parallel
using "collective communication" functions. This means that
information is distributed to and collected from the processes
all at once (rather than communicating with each one-to-one).
[Note: The infinite series used here
is not an efficient method to calculate pi!]
- Take a look at the "mpi_calculate_pi.cpp" printout.
Figure out from the code what series is being summed and how it
is being split among the processors.
Verify the sum using Mathematica.
- Compile and link the code with "make_mpi_calculate", set up
a machinefile (e.g., machines.pi), and run with four processors using
mpiexec -machinefile machines.pi -np 4 mpi_calculate_pi.
Try 1, 2, 3, and 10000 intervals.
When you quit you'll get a summary of elapsed time for each
- Now run with 10 processes and step through different numbers of
Do the processors report in order?
- Modify the code so that you get the time each processor takes
for one interval (instead of all at once).
How much faster
is it to calculate 100,000 intervals on 4 machines than 1?
- Modify the code so that it inputs an integer and calculates
the product of integers from one up to the input integer.
- More information on MPI.
6810: 1094 Session 15.
Last modified: 07:15 am, April 08, 2013.