home page -> teaching -> parallel and distributed programming -> Lecture 8-9 - MPI programming

Lecture 8-9 - MPI programming

Basic concepts

Launching MPI applications

A MPI application can be launched via the mpirun command. The MPI application is launched on several specified nodes (hosts) and in a specified number of instances on each node.

Each instance starts the execution from the main() function and has the same command-line arguments (specified in the mpirun command). Each instance can retrieve (via MPI_Comm_size() and MPI_Comm_rank()) the number of instances launched (in the same mpirun command) and the own ID.

The MPI setup

The mpirun command normally uses ssh to connect to the remote nodes in order to launch there instances of the MPI application. For that purpose, it is best to prepare a public key authentication setup for SSH.

In addition, mpirun (actually, the remote daemon) must be able to find an executable with the same name and in the same location on each of the nodes. This can be achieved either by setting up a NFS or other networked sharing of files. It is enough, however, simply to copy the executable on all of the nodes before invoking mpirun.

The nodes can be of distinct architecture. In that case, the executable must, of course, be distinct: each node must have the executable of the appropriate architecture.

Basic API

An MPI program must call MPI_Init() in the beginning and MPI_Finalize() in the end.

Finding out the number of launched instances is done via MPI_Comm_size(), with MPI_COMM_WORLD as the identifier. Finding out one's own ID is done via MPI_Comm_rank().

Basic communication operations consist in sending and receiving an array of elements of the same type (array of integers, array of doubles, etc).

There are two kinds of communications:

The receive operation is MPI_Recv(). The send operations are: MPI_Ssend() (synchronous), MPI_Bsend() (buffered) and MPI_Send() (unspecified, implementation-defined).

See the example mpi1.cpp, where a first process sends a number to a second one, the second adds 1 and sends the result to the third one, and so on, until the last process adds 1 and prints the result.

Other operations include:

Other simple example

Add all numbers in a vector. See solution sum-mpi.cpp:

Another solution is given in sum-scatter-mpi.cpp. This one uses MPI_Scatter() and MPI_Gather() to distribute input data to workers and to gather the results.

Algorithm design isses

Divide and conquer algorithms

Example: distributed merge-sorting:

Interesting algorithms — Cannon's matrix multiplication

Idea (consider, for simplicity, the product of two square matrices NxN, to be computed on KxK processors, where K divides N):

Note that the initial distribution of blocks must be done in such a way that each process has two blocks that it has to multiply together. This involves some circular permutations of rows or columns of blocks.

See the implementation in matrix-mpi.cpp

.
Radu-Lucian LUPŞA
2016-12-11