MPI Buffers for Arrays

So far in our examples we have only discussed sending scalar buffers. In computing, a scalar is a variable that holds only one quantity. The exact meaning of array varies from one programming language to another, but in all cases it refers to a variable that represents several quantities, each of which can be individually accessed by some form of subscripting the array.

To understand array communications we must consider how MPI buffers are created. The first argument to a send or receive is a pointer to a location in memory. In C++ this is explicit; the argument must be passed by reference if it isn’t declared a pointer. Fortran always passes by reference so nothing is required aside from the variable name. The mpi4py bindings arrange for the argument to be a pointer so as for Fortran, only the variable name is required.

For a scalar variable, the pointer is to the location in memory of that variable. For an array, the pointer is to the first element of the section of the array to be sent, which may be all or part of it. The item count is the number of items to be sent. The MPI type specifies the number of bytes per item. From this information the MPI library computes the total number of bytes to be put into or received from the buffer. The library reads that number of bytes, starting at the initial memory location. It pays no attention to any indexing of that string of bytes. This is why it is extremely important that the send and receive buffers match up appropriately; in particular, if the send buffer is longer, in bytes, than the receive buffer, then invalid data can be written to the receiver, or the process may even terminate.

Consider an example where each rank computes an array u and sends it to its left into a receive buffer w. We will show the syntax only for MPI_Sendrecv since how to break it into individual Send and Recv if desired should be obvious.

In the C++ example we create the arrays with the new operator.

double* u=new double[nelem]{0};
double* w=new double[nelem]{0};
//Fill u with something
MPI_Sendrecv(u, nelem, MPI_DOUBLE,neighbor,sendtag,
             w, nelem, MPI_DOUBLE,neighbor,recvtag,MPI_COMM_WORLD,&status);

Normally in C++ an array variable will be declared a pointer and so it is not passed by reference to the MPI subprograms.

Fortran and Python are straightfoward.

! In the nonexecutable part we declare u and w. They can be allocatable or static.
! Fill in u with some values
call MPI_Sendrecv(u,nelems,MPI_DOUBLE_PRECISION,neighbor,sendtag,              &
                  w,nelems,MPI_DOUBLE_PRECISION,neighbor,recvtag,              &
                                                   MPI_COMM_WORLD,status,ierr)

Python

u=np.zeros(nelems)
w=np.zeros(nelems)
#fill in u with some values
comm.Sendrecv([u,MPI.DOUBLE],neighbor,0,[w,MPI.DOUBLE],neighbor,0,MPI.Status())

Exercise

Use the above syntax for your language to write a complete program to implement the sending and receiving as specified above. For u you should fill it with

u[i]=20.+i*rank  C++ and Python
u(i)=20.+(i-1)*rank Fortran (just so the answer is the same
C++

#include <iostream>
#include "mpi.h"

using namespace std;

int main (int argc, char *argv[]) {

  int nelem=10;

  int rank, nprocs;
  MPI_Status status;

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
  MPI_Comm_rank(MPI_COMM_WORLD,&rank);

  if (nprocs < 2) {
     cout<<"This program works only for at least two processes\n";
     MPI_Finalize();
     return 0;
  }
  else if (nprocs%2 != 0) {
     cout<<"This program works only for an even number of processes\n";
     MPI_Finalize();
     return 0;
  }


  double* u=new double[nelem]{0};
  double* w=new double[nelem]{0};

  for (int i=0; i<nelem; ++i) {
      u[i]=20.+i*rank;
  }

  int neighbor;
  int sendtag, recvtag;

  if (rank%2==0) {
     neighbor = rank+1;
     sendtag=1;
     recvtag=2;
  }
  else {
     neighbor = rank-1;
     sendtag=2;
     recvtag=1;
  }

  if (rank%2==0) {
      MPI_Sendrecv(u, nelem, MPI_DOUBLE,neighbor,sendtag,
                  w, nelem, MPI_DOUBLE,neighbor,recvtag,MPI_COMM_WORLD,&status);
  } else {
      MPI_Sendrecv(u, nelem, MPI_DOUBLE,neighbor,sendtag,
                  w, nelem, MPI_DOUBLE,neighbor,recvtag,MPI_COMM_WORLD,&status);
  }

  for (int i=0; i<nelem; ++i) {
     cout<<rank<<" "<<i<<" "<<u[i]<<" "<<w[i]<<endl;
  }

  MPI_Finalize();

  return 0;

}

Fortran

program exchange
   use mpi
   implicit none

   double precision, allocatable, dimension(:) :: u, w
   integer :: nelems
   integer :: i
    
   integer :: rank, nprocs, neighbor, ierr
   integer :: sendtag, recvtag
   integer :: status(MPI_STATUS_SIZE)
   integer :: message

   call MPI_Init(ierr)
   call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
   call MPI_Comm_size(MPI_COMM_WORLD, nprocs, ierr)

   if (nprocs < 2) then
        write(6,*) "This program works only for at least two processes"
        call MPI_Finalize(ierr)
        stop
   else if ( mod(nprocs,2) /= 0 ) then
        write(6,*) "This program works only for an even number of processes"
        call MPI_Finalize(ierr)
        stop
   end if

   nelems=10
   allocate(u(nelems),w(nelems))
   u=0.
   w=0.

   do i=1,nelems
      !Adjust for 1 base only so Fortran gets same answer as C++ and Python
      u(i)=20.+(i-1)*rank
   enddo

   if ( mod(rank,2)==0 ) then
       neighbor = rank+1
       sendtag=1
       recvtag=2
   else
       neighbor = rank-1
       sendtag=2
       recvtag=1
   end if

   if ( mod(rank,2)==0 ) then
       call MPI_Sendrecv(u,nelems,MPI_DOUBLE_PRECISION,neighbor,sendtag,       &
                         w,nelems,MPI_DOUBLE_PRECISION,neighbor,recvtag,MPI_COMM_WORLD, &
                                                                  status,ierr)
   else
       call MPI_Sendrecv(u,nelems,MPI_DOUBLE_PRECISION,neighbor,sendtag,       &
                         w,nelems,MPI_DOUBLE_PRECISION,neighbor,recvtag,MPI_COMM_WORLD, &
                                                                  status,ierr)
   endif

   do i=1,nelems
       write(*,'(i5,i8,2f12.4)') rank, i, u(i), w(i)
   enddo

   call MPI_Finalize(ierr)

end program


Python

import sys
import numpy as np
from mpi4py import MPI

comm=MPI.COMM_WORLD
nprocs=comm.Get_size()
rank=comm.Get_rank()

if nprocs<2:
    print("This program works only for at least two processes.")
    sys.exit()
elif nprocs%2!=0:
    print("This program works only for an even number of processes.")
    sys.exit()

nelems=10

u=np.zeros(nelems)
w=np.zeros(nelems)

for i in range(nelems):
    u[i]=20.+i*rank

if rank%2==0:
    neighbor=rank+1
else:
    neighbor=rank-1

comm.Sendrecv([u,MPI.DOUBLE],neighbor,0,[w,MPI.DOUBLE],neighbor,0,MPI.Status())

for i in range(nelems):
    print(f"{rank}  {i}  {u[i]}  {w[i]}")

Previous
Next