Index
HPC - Universidad de Sevilla

Programming with the Message Passing Interface (MPI)

What is MPI
Implementations
A first program with MPI
#include <mpi.h>
#include <stdio.h>

int main(int argc, char *argv[])
{
    int myid, np, nlen;
    char name[MPI_MAX_PROCESSOR_NAME];

    // initialize and get info about world and ourselves
    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&np);
    MPI_Comm_rank(MPI_COMM_WORLD,&myid);
    MPI_Get_processor_name(name, &nlen);

    printf("Process %d of %d is running on %s\n", myid,np,name);
    fflush(stdout);

    MPI_Finalize();
    return (0);
}

Exercise

Running on different nodes

Processes on remote hosts are started like a remote shell by MPI. You must be able to ssh into the remote hosts without entering a password. Usually the easiest way to set up a cluster is to share the file system between nodes (same files are visible for every user on each node). Then, ssh may be set up to authenticate by key pairs inside the cluster:

In order to specify on which nodes of the cluster the program will run, we prepare a hostsfile which lists the hosts we want to run our program on:

  scadm01
  scadm02
  ...

Exercise:

Communications

Performance of a HPC system depends heavily on the speed with which data can be interchanged between the nodes. Some current technologies (2015) are

For a nice comparison and other insights, see Interconnect Analysis

When MPI processes are run on nodes with shared memory, the shared memory is used to provide an efficient communication layer.

PingPong

we will now write a small program to benchmark point to point communications in our cluster.

Pingpong

The heart of the program is

other = 1 - myid;

MPI_Barrier(MPI_COMM_WORLD);
if (myid==0)
  secs=MPI_Wtime();
for (n=0;n<niter;n++) {
  if (myid == 0) {
    MPI_Send(buffer,nmax, MPI_INT, other, 0, MPI_COMM_WORLD);
    MPI_Recv(buffer,nmax, MPI_INT, other, 0, MPI_COMM_WORLD,&status);
  }
  else {
    MPI_Recv(buffer,nmax, MPI_INT, other, 0, MPI_COMM_WORLD,&status);
    MPI_Send(buffer,nmax, MPI_INT, other, 0, MPI_COMM_WORLD);
  }
}

Exercise:

I have run this benchmark on the cluster, see results.

Further reads: