Sending and Receiving with Halo Exchanges

From our examination of the illustrations for column-major and row-major languages, we conclude that we should split our two-dimensional grid by columns for Fortran and similar languages, and by rows for C++, Python, and others with that array orientation.

Let us assume for both cases that we have nr rows and nc columns in our grid. In a one-dimensional decomposition, we will split only one of those values among the processes. Call the local number of rows and columns nrl and ncl. When boundary (“ghost” or real) are added, each subdomain will contain nrl+2 and ncl+2 rows and columns. Our computed domains will extend from $1$ to nrl and 1 to ncl, with boundaries at $0$ and at $nrl+1$ and $ncl+1$. Two exchanges will require two Sendrecv invocations.

C++ and Python

For these languages $nrl = nr \div nprocs$ and $ncl=nc$. We will send the first computed row up to row $nrl+1$ of the process at $rank-1$ and the last computed row down to row $0$ of the process at $rank+1$.

C++ syntax:

MPI_Sendrecv(&w[1][1],nc, MPI_DOUBLE,up,tag,&w[nrl+1][1],
                      nc, MPI_DOUBLE,down,tag,MPI_COMM_WORLD,&status);

MPI_Sendrecv(&w[nrl][1],nc,MPI_DOUBLE,down,tag,&w[0][1],
                        nc,MPI_DOUBLE,up,tag,MPI_COMM_WORLD,&status);

Note that although the variable w alone would be a pointer, a specific element is not (it dereferences the memory location) and so must be passed to MPI by reference in C++.

Python syntax:

comm.Sendrecv([w[1,1:nc+1],MPI.DOUBLE], up, tag, [w[nr+1,1:nc+1],MPI.DOUBLE], down, tag )
comm.Sendrecv([w[nr,1:nc+1],MPI.DOUBLE], down, tag, [w[0,1:nc+1],MPI.DOUBLE], up, tag )

Fortran

For Fortran $nrl=nr$ and $ncl= nc \div nprocs$. We send the first computed column left to column $ncl+1$ of $rank+1$ and the last computed column right to column $0$ of $rank-1$.

call MPI_SENDRECV(w(1:nr,1),nr,MPI_DOUBLE_PRECISION,left,tag, w(1:nr,ncl+1),nr,&
                               MPI_DOUBLE_PRECISION,right,tag,                 &
                                                  MPI_COMM_WORLD,mpi_stat,ierr)

call MPI_SENDRECV(w(1:nr,ncl),nr,MPI_DOUBLE_PRECISION,right,tag, w(1:nr,0),nr, &
                                 MPI_DOUBLE_PRECISION,left,tag,                &
                                                  MPI_COMM_WORLD,mpi_stat,ierr)

MPI_PROC_NULL

Rank $0$ has no neighbor to the top or left, and rank $nprocs$ has no neighbor to the bottom or right. We might think that we would have to write a complicated set of conditionals to handle these situations, but this is a common scenario and MPI provides a built-in solution: MPI_PROC_NULL.

The special value MPI_PROC_NULL can be used in place of a source or destination and results in a “no op,” i.e. the send or receive is not attempted and the function returns immediately.

Exercise

Use the language of your choice. Write a program that fills an array w with the uniform value of 50 in the inner grid (1 to nr, 1 to nc) where nr=nc=500. Set the values of row 0 to 0 and the values of row nrl+1 and columns 0 and ncl+1 to 100. Do not worry about whether values at corners match.

Set up the transfer and run one exchange from u to w. Print some values to check your result.

Review strong and weak scaling. Try both with different runs in your code. For strong scaling, make sure that the number of processes equally divides the number of rows or columns.

Hint Set up the neighbors for each rank (“up” and “down” or “left” and “right”) before doing any transfers.

C++

#include <iostream>
#include <mpi.h>

using namespace std;

int main (int argc, char *argv[]) {

    int i, j;
    int N;

    double topBC=100;
    double bottomBC=0.;
    double edgeBC=100.;

    // Added for MPI
    int nr, nrl, nc, ncl;
    int rank, nprocs;
    MPI_Status status;
    int root=0, tag=0;
    int up, down;

     //Initialize MPI
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);

   //Usually we would read in nr and nc; they represent the global grid RxC;
   //For this case they are both N.
    N=500;
    nr=N;
    nc=N;

    //Use ncl for clarity
    ncl=nc;

    //Weak scaling
    //nrl=nr;

    //Strong scaling
    if (nr%nprocs!=0) {
        MPI_Finalize();
        cout<<"Not an even division of the arrays.\n";
        return 0;
    } 
    else {
    nrl=nr/nprocs;
    }

    //Find my neighbors
    if (rank==0) {
        up=MPI_PROC_NULL;
    }
    else {
        up=rank-1;
    }

    if (rank==nprocs-1) {
        down=MPI_PROC_NULL;
    }
    else {
        down=rank+1;
    }
    //Two extra rows and columns for boundary conditions
    int nrows=nrl+2;
    int ncols=ncl+2;
    double **w=new double*[nrows];
    double *wptr=new double[(nrows)*(ncols)];

    for (i=0;i<(nrl+2);++i,wptr+=ncols)
       w[i] = wptr;

    if (rank==0) {
       for (int i=0;i<=ncl+1;++i){
          w[0][i]=bottomBC;
       }
    }
    if (rank==nprocs-1) {
        for (int i=0;i<ncl+1;++i){
            w[nrl+1][i]=topBC;
       }
    }

    for (int i=0;i<=nrl+1;++i){
        w[i][0]=edgeBC;
        w[i][ncl+1]=edgeBC;
    }

    //Initialize interior only
    for ( i = 1; i <= nrl; i++ ) {
         for (j = 1; j <= ncl; j++ ) {
             w[i][j] = 50.;
         }
    }

    MPI_Sendrecv(&w[1][1],ncl, MPI_DOUBLE,up,tag,&w[nrl+1][1],
                           ncl, MPI_DOUBLE,down,tag,MPI_COMM_WORLD,&status);

    MPI_Sendrecv(&w[nrl][1],ncl,MPI_DOUBLE,down,tag,&w[0][1],
                             ncl,MPI_DOUBLE,up,tag,MPI_COMM_WORLD,&status);

    //Spot-check results
    for (i=0;i<nprocs;++i) {
        if (i==rank) {
	    cout<<i<<" "<<w[0][ncl/2]<<" "<<w[nrl+1][ncl/2]<<endl;
        }
    }	

    MPI_Finalize();

}

Fortran

program halo
   use mpi

   double precision, allocatable, dimension(:,:)  :: w
   integer            :: N=500
   double precision   :: topBC, bottomBC, edgeBC
   integer            :: i

   integer            :: nr, nc, nrl, ncl
   integer            :: rank, nprocs, tag=0
   integer            :: errcode, ierr
   integer, dimension(MPI_STATUS_SIZE) :: mpi_stat
   integer, parameter :: root=0
   integer            :: left, right

   !Initialize MPI, get the local number of columns
   call MPI_INIT(ierr)
   call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr)
   call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)

   !For this case
   !Usually we'd read in nr and nc; they represent the global grid size RxC.
   nr=N
   nc=N

   !Generally we'd just use nr and not duplicate; this is for clarity later
   nrl=nr

   ! weak scaling
   !ncl = nc
   ! strong scaling
   if (mod(nc,nprocs).ne.0) then
       call MPI_FINALIZE(ierr)
       stop 'Not an even division of the array'
   else
       ncl = nc/nprocs
   endif

   ! Find my neighbors
   if ( rank .eq. root ) then
      left = MPI_PROC_NULL
   else
      left = rank - 1
   endif

   if ( rank .eq. nprocs-1 ) then
      right = MPI_PROC_NULL
   else
      right = rank + 1
   endif

   allocate(w(0:nrl+1,0:ncl+1))

   ! Boundary conditions
   topBC=0.d0
   bottomBC=100.d0
   edgeBC=100.d0

   !Set boundary conditions
   if (rank==0) then
      w(:,0)=edgeBC
   endif
   if (rank==nprocs-1) then
      w(:,ncl+1)=edgeBC
   endif
   w(nrl+1,:)=bottomBC
   w(0,:)=topBC

   ! Initialize the array interior
   w(1:nrl,1:ncl)=50.

   ! Exchange halo values
   call MPI_SENDRECV(w(1:nrl,1)    ,nrl,MPI_DOUBLE_PRECISION,left,tag,         &
                     w(1:nrl,ncl+1),nrl,MPI_DOUBLE_PRECISION,right,tag,        &
                                        MPI_COMM_WORLD,mpi_stat,ierr)

   call MPI_SENDRECV(w(1:nrl,ncl)  ,nrl,MPI_DOUBLE_PRECISION,right,tag,        &
                     w(1:nrl,0)    ,nrl,MPI_DOUBLE_PRECISION,left,tag,         &
                                        MPI_COMM_WORLD,mpi_stat,ierr)

   !Spot-check result
   do i=0,nprocs-1
      if (i==rank) then 
          write (*,*) rank, w(nrl/2,0), w(nrl/2,ncl+1)
      endif
   enddo

   call MPI_FINALIZE(ierr)

end program

Python

import sys
import numpy as np
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
nprocs = comm.Get_size()

N = 500

#Strong scaling
if N%nprocs==0:
    nrl = N//nprocs
else:
    print("Number of ranks should divide the number of rows evenly.")
    sys.exit()

#Weak scaling
#nrl = N

#Both
ncl = N

w = np.zeros((nrl+2, ncl+2), dtype=np.double)

#Set up boundaries
if rank == 0 :
    w[0,:] = 0.        # up

if rank == nprocs-1 :
    w[nrl+1,:] = 100.   # bottom

w[:,0] = 100.      # left
w[:,ncl+1] = 100.   # right

#Arbitarary value for interior that may speed up convergence somewhat.
#Be sure not to overwrite boundaries.
w[1:nrl+1,1:ncl+1] = 50.

# setting up the up and down rank for each process
if rank == 0 :
    up = MPI.PROC_NULL
else :
    up = rank - 1

if rank == nprocs - 1 :
    down = MPI.PROC_NULL
else :
    down = rank + 1

tag=0

# sending up and receiving down
comm.Sendrecv([w[1,1:ncl+1],MPI.DOUBLE], up, tag, [w[nrl+1,1:ncl+1],MPI.DOUBLE], down, tag)
# sending down and receiving up
comm.Sendrecv([w[nrl,1:ncl+1],MPI.DOUBLE], down, tag, [w[0,1:ncl+1],MPI.DOUBLE], up, tag)

# Spot-check result
for n in range(nprocs):
    if n==rank:
        print(n,w[0,ncl//2],w[nrl+1,ncl//2])

Last updated on Nov 17, 2020