MPI Project Set 3: Jacobi Implementation
Project 7
Implement a serial Jacobi code to solve the two-dimensional Poisson equation with boundary conditions as described in our example, namely
u[0,:]=0.
u[nr+1,:]=100.
u[:,0]=100.
u[:,nc+1]=100.
Do not assume a square grid, but you may use one for testing, such as 500x500. Write your output to a file in your preferred format.
Suggestions.
It is usually not necessary to check for convergence at every timestep. Make it a variable and experiment with different values. Examine whether it affects the time required for a run.
Python programmers: once your code is working, try to use NumPy array operations with appropriate ranges to replace the double for
loop. It should speed up the code substantially.
Fortran programmers: when writing double do
loops over array indices, whenever possible arrange the loops in order of the indices from right to left, for cache efficiency. Correct loop ordering can be approximately a factor of 50 faster than incorrect for a double loop.
Use whatever plotting package you know to make a contour plot of the result. If you do not have a preference, you may use contour.py
below. It reads files starting with a specified base name and followed by any number of digits, including none. It will assemble them into one image and show a contour plot; if given a command-line option of -f
it will transpose each subimage for Fortran.
When plotting, the top of an array (row 0) is the bottom of the plot.
Python script to contour output
import sys
import os
import argparse
import re
import numpy as np
import pylab as plt
parser = argparse.ArgumentParser()
parser.add_argument("-f", "--fortran", help="Fortran ordering", action="store_true")
parser.add_argument("filename", help="Base name for data files")
args = parser.parse_args()
base = args.filename
files = [f for f in os.listdir('.') if re.match(base+"\d*",f)]
files.sort()
subdomains=[]
for file in files:
data=np.loadtxt(file,unpack=False)
if args.fortran:
data.T
else:
pass
subdomains.append(data)
if args.fortran:
image=np.hstack(subdomains)
else:
image=np.vstack(subdomains)
fig=plt.figure()
plt.contourf(image)
plt.colorbar()
plt.show()
Example solutions
C++
#include <iostream>
#include <cmath>
#include <string>
#include <sstream>
#include <fstream>
#include <iomanip>
#define MAX_ITER 10000000
void set_bcs(double **u, int nr, int nc);
using namespace std;
int main (int argc, char *argv[]) {
int i, j;
double diff; // Change in value
double mean; //Average boundary value
int N, M;
int iterations = 0;
int diffInterval;
double epsilon;
/* check number of parameters and read in epsilon, filename, optionally size */
if (argc < 3) {
printf ("USAGE: %s epsilon output-file <N> <M>\n", argv[0]);
return 1;
}
else {
stringstream ssepsilon;
ssepsilon<<argv[1];
if (ssepsilon.fail()) {
printf ("Error reading in epsilon value\n");
return 2;
}
ssepsilon>>epsilon;
if (argc==3) {
N=500;
M=500;
}
if (argc==4) {
stringstream ssnr;
ssnr<<argv[3];
if (ssnr.fail()) {
printf ("Error converting row dimension \n");
return 2;
}
ssnr>>N;
M=N;
}
if (argc==5) {
stringstream ssnr;
ssnr<<argv[3];
if (ssnr.fail()) {
printf ("Error converting row dimension \n");
return 2;
}
ssnr>>N;
stringstream ssnc;
ssnc<<argv[4];
if (ssnc.fail()) {
printf ("Error converting row dimension \n");
return 2;
}
ssnc>>M;
}
}
cout<<"Running until the difference is <="<<epsilon<<" with size "<<N<<"x"<<M<<"\n";
int nr=N;
int nc=M;
int nrows=nr+2;
int ncols=nc+2;
double **u=new double*[nrows]; // Old values
double **w=new double*[nrows]; // New values
double **diffs=new double*[nrows+1]; // Diffs
double *uptr=new double[(nrows)*(ncols)];
double *wptr=new double[(nrows)*(ncols)];
double *dptr=new double[(nrows)*(ncols)];
for (i=0;i<nrows;++i,uptr+=ncols)
u[i] = uptr;
for (i=0;i<nrows;++i,wptr+=ncols)
w[i] = wptr;
for (i=0;i<nrows;++i,dptr+=ncols)
diffs[i] = dptr;
mean = 0.0;
set_bcs(u, nr, nc);
//Initialize to something like the mean boundary. Not important, can use zeros
//but this may speed convergence a bit.
for ( i = 1; i < nr+1; i++ ) {
mean += u[i][0] + u[i][nc+1] + u[0][i] + u[nr+1][i];
}
mean /= (4.0 * (nr+2));
// Initialize interior values
for ( i = 1; i <= nr; i++ ) {
for (j = 1; j <= nc; j++ ) {
u[i][j] = mean;
}
}
//Arbitrary, just to make sure we enter loop
diff = 10.*epsilon;
diffInterval=1;
time_t time=clock();
// Compute steady-state solution
// We are making a tradeoff between memory and speed here by creating a diffs
// array. Especially for parallel, this should be reasonable.
while ( diff >= epsilon) {
diff=.8*epsilon; // reset each time through to get max abs diff
for (i=1; i<=nr;i++) {
for (j=1;j<=nc;j++) {
w[i][j] = (u[i-1][j] + u[i+1][j] + u[i][j-1] + u[i][j+1])/4.0;
diffs[i][j] = abs(w[i][j] - u[i][j]);
}
}
if (iterations%diffInterval==0) {
for (i=1; i<=nr;i++) {
for (j=1;j<=nc;j++) {
if (diff<diffs[i][j]) {
diff=diffs[i][j];
}
}
}
if (diff <= epsilon)
break;
}
for (i=1; i<=nr;i++) {
for (j=1;j<=nc;j++) {
u[i][j] = w[i][j];
}
}
if (iterations >= MAX_ITER)
break;
iterations++;
}
time_t totalTime=(clock()-time)/CLOCKS_PER_SEC;
cout << "Completed "<<iterations<<" iterations; time "<<totalTime<<endl;
// Write solution to output file
ofstream fp;
fp.open(argv[2],ios::out);
for (i = 1; i <= nr; i++) {
for (j = 1; j <= nc; j++) {
fp<<u[i][j]<<" ";
}
fp<<"\n";
}
fp.close();
// All done!
cout<<"wrote output file "<<argv[2]<<"\n";
return 0;
}
void set_bcs(double **u, int nr, int nc) {
/* Set boundary values.
* This has an ice bath on the top edge.
* Note: when plotting, 0,0 is the bottom.
*/
double topBC=0;
double bottomBC=100.;
double edgeBC=100.;
for (int i=0;i<=nc+1;++i){
u[0][i]=topBC;
}
for (int i=0;i<=nc+1;++i){
u[nr+1][i]=bottomBC;
}
for (int i=0;i<=nr+1;++i){
u[i][0]=edgeBC;
u[i][nc+1]=edgeBC;
}
}
Fortran
program heatedplate
implicit none
integer, parameter:: maxiter=10000000
integer :: N, M, nr, nc, lb
integer :: numargs, i, j, diff_interval, iterations = 0
double precision :: eps, mean, diff = 1.0
character(len=80) :: arg, filename
double precision, allocatable, dimension(:,:) :: u, w
real :: time0, time1
interface
subroutine set_bcs(lb,nr,nc,u)
implicit none
integer, intent(in) :: lb, nr, nc
double precision, dimension(lb:,lb:), intent(inout) :: u
end subroutine
end interface
! check number of parameters and read in epsilon, filename, optionally size
numargs=command_argument_count()
if (numargs .lt. 2) then
stop 'USAGE: epsilon output-file <N> <M>'
else
call get_command_argument(1,arg)
read(arg,*) eps
call get_command_argument(2,filename)
if (numargs==2) then
N=500
M=500
endif
if (numargs==3) then
call get_command_argument(3,arg)
read(arg,*) N
M=N
endif
if (numargs==4) then
call get_command_argument(3,arg)
read(arg,*) N
call get_command_argument(4,arg)
read(arg,*) M
endif
endif
nr=N; nc=M
! Set boundary values and compute mean boundary value.
! This has the ice bath on the top edge.
! Allocate arrays)
allocate(u(0:nr+1,0:nc+1),w(0:nr+1,0:nc+1))
u=0.d0
w=0.d0
call set_bcs(lb,nr,nc,u)
! Initialize interior values to the boundary mean, more or les
! Doesn't matter much, starting from all zeroes works too. May speed
! up convergence a bit.
mean=sum(u(:,0))+sum(u(:,nc+1))+sum(u(nr+1,:))+sum(u(0,:))
mean = mean / (4.0 * (N+2))
u(1:nr,1:nc)=mean
!Arbitrary, just to make sure we enter loop
diff=10.*eps
diff_interval=1
call cpu_time(time0)
write (*,'(a,es15.8,a,i6,a,i6)') 'Running until the difference is <= ', eps,&
' for size ',nr,'x',nc
! Compute steady-state solution
do while ( diff >= eps )
do j=1,nc
do i=1,nr
w(i,j) = (u(i-1,j) + u(i+1,j) + u(i,j-1) + u(i,j+1))/4.0
enddo
enddo
if (mod(iterations,diff_interval)==0) then
diff=maxval(abs(w(1:nr,1:nc)-u(1:nr,1:nc)))
if (diff <= eps) then
exit
endif
endif
! Don't overwrite BCs
u(1:nr,1:nc)=w(1:nr,1:nc)
if (iterations>maxiter) then
write(*,*) "Warning: maximum iterations exceeded"
exit
endif
iterations = iterations + 1
enddo
call cpu_time(time1)
write (*,*) 'completed; iterations = ', iterations,"in ",time1-time0," sec"
! Write solution to output file by row
!Some compilers (Intel) like to add EOls to line up columns. The Fortran
! standard permits this with * IO, but it complicates plotting. Use some
! F08 features to fix this.
open (unit=10,file=filename)
do i=1,nr
write (10,'(*(g0,1x))') u(i,1:nc)
enddo
! All done!
write (*,'(a,a)') "wrote output file ", filename
end program heatedplate
subroutine set_bcs(lb,nr,nc,u)
implicit none
integer, intent(in) :: lb, nr, nc
double precision, dimension(lb:,lb:), intent(inout) :: u
double precision :: topBC, bottomBC, edgeBC
! Set boundary values
! This has the ice bath on the bottom edge.
! Note: when plotting, 0,0 is the bottom.
topBC=100.d0
bottomBC=0.d0
edgeBC=100.d0
!Set boundary conditions
u(1:nr,0)=edgeBC
u(1:nr,nc+1)=edgeBC
u(nr+1,1:nc)=topBC
end subroutine set_bcs
Python
import sys
import time
import numpy as np
# check number of parameters and read in parameters
#Better to use argparse but this is closer to compiled language code and simpler
#if one hasn't used argparse.
argv=sys.argv
if len(argv) > 2:
try:
epsilon=float(argv[1])
filename=argv[2]
except:
"Illegal input"
sys.exit(1)
if len(argv) == 3:
N=500
M=500
if len(argv) == 4:
try:
N=int(argv[3])
M=N
except:
"Cannot convert grid dimension to integer"
sys.exit(2)
if len(argv) == 5:
try:
N=int(argv[3])
M=int(argv[4])
except:
"Cannot convert grid dimension to integer"
sys.exit(2)
else:
print('USAGE: epsilon output-file <N> <M>')
sys.exit(1)
print(f'Running until the difference is < {epsilon:} with size {N:}x{M:}')
#Initialize comparison
diff = epsilon
#Set max iterations. 10 million should do it.
maxiter=10000000
#Set size of arrays
nr=N
nc=M
u=np.zeros((nr+2,nc+2))
w=np.zeros((nr+2,nc+2))
# Set boundary values and compute mean boundary value.
#This has the ice bath on the top edge, not the bottom edge.
u[:,0]=100.
u[:,nc+1]=100.
u[nr+1,:]=100.
#Initialize interior values to the boundary mean. This value really
#doesn't matter much, it just affects the convergence rate somewhat.
#Using all zeroes would be fine.
mean = sum(u[:,0]) + sum(u[:,nr+1]) + sum(u[0,:]) + sum(u[nr+1,:])
mean = mean / (4.0 * (N+2))
#Be sure not to overwrite the boundary values
u[1:nr+1,1:nc+1]=mean
# Compute steady-state solution
iterations=0
diff_interval=1
#Arbitrary, just to make sure we enter loop
diff=10*epsilon
start_time=time.time()
while ( diff >= epsilon ):
# for i in range(1,nr+1):
# for j in range(1,nc+1):
# w[i,j] = 0.25*(u[i-1,j] + u[i+1,j] + u[i,j-1] + u[i,j+1])
# It is approximately a factor of 100 faster to use numpy array operations.
w[1:-1,1:-1]=0.25*(u[:-2,1:-1]+u[2:,1:-1]+u[1:-1,:-2]+u[1:-1,2:])
if iterations%diff_interval==0:
diff=np.max(np.abs(w[1:-1,1:-1]-u[1:-1,1:-1]))
if diff<=epsilon:
break
if iterations>maxiter:
print("Warning: maximum iterations exceeded")
break
u[1:nr+1,1:nc+1]=w[1:nr+1,1:nc+1].copy()
iterations+=1
print(f'completed in {iterations:} iterations with time {time.time()-start_time:.2f}')
# Write solution to output file
fout = open (filename,'w')
for i in range(1,N+1):
line=" ".join(map(str,list(u[i,1:M+1])))
row=line+"\n"
fout.write(row)
# All done!
print(f"wrote output file {filename:}")
Project 8
Using the halo exchange code we have already seen, plus at least one collective communication, to parallelize your heated-plate code. Do not declare a single lage global array; break up the domain as we have learned (row-wise for C++ and Python, column-wise for Fortran). For output, have each rank write its set of rows or columns, labeling the file with its rank ID.
Example syntax:
fname=argv[2]+to_string(rank);
Fortran
write(fname,'(a,i4.4)') filename(1:len_trim(filename)),rank
Python
filename = filename + str( rank )
To plot the results you will have to stitch the files together appropriately. If you have no other preference you may use the contour.py
code below. Add the -f
command-line option for Fortran. The script must be given an argument that is the “base” of the filenames, and all the output files must be in the same folder and numbered appropriately.
Python script to merge output files and contour
import sys
import os
import argparse
import re
import numpy as np
import pylab as plt
parser = argparse.ArgumentParser()
parser.add_argument("-f", "--fortran", help="Fortran ordering", action="store_true")
parser.add_argument("filename", help="Base name for data files")
args = parser.parse_args()
base = args.filename
files = [f for f in os.listdir('.') if re.match(base+"\d*",f)]
files.sort()
subdomains=[]
for file in files:
data=np.loadtxt(file,unpack=False)
if args.fortran:
data.T
else:
pass
subdomains.append(data)
if args.fortran:
image=np.hstack(subdomains)
else:
image=np.vstack(subdomains)
fig=plt.figure()
plt.contourf(image)
plt.colorbar()
plt.show()
Feel free to use the example solutions provided as your basis, but we recommend that you attempt to write your own serial version first.
Hints
- You will need to modify the stopping criterion for the update loop. There are several options to do this. One is to use the maximum number of iterations as the criterion, then break out of the loop when the maximum difference in all the ranks is less than the specified tolerance.
- Physical boundary conditions generally must be reapplied at each iteration to the “solution” matrix
u
. A subprogram to do this might be a good idea. - You need to do a full swap of the “current” solution
u
and the “updated” solutionw
at each iteration, which is why the BCs must be reapplied. - Check that your program works for a single process.
Example solutions
C++
#include <iostream>
#include <cmath>
#include <string>
#include <sstream>
#include <fstream>
#include <iomanip>
#include <mpi.h>
#define MAX_ITER 10000000
void set_bcs(double **u, int nr, int nc, int rank, int nprocs, double &bc1, double &bc2, double &bc3, double &bc4);
using namespace std;
int main (int argc, char *argv[]) {
int i, j;
double diff; // Change in value
int N, M;
int iterations = 0;
int diffInterval;
double epsilon;
double bc1, bc2, bc3, bc4;
// Added for MPI
int nrl, ncl;
int rank, nprocs;
MPI_Status status;
int root=0, tag=0;
int up, down;
double gdiff;
//Initialize MPI
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&nprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
/* check number of parameters and read in epsilon, filename, optionally size */
if (argc < 3) {
printf ("USAGE: %s epsilon output-file <N> <M>\n", argv[0]);
MPI_Finalize();
return 1;
}
else {
stringstream ssepsilon;
ssepsilon<<argv[1];
if (ssepsilon.fail()) {
printf ("Error reading in epsilon value\n");
MPI_Finalize();
return 2;
}
ssepsilon>>epsilon;
if (argc==3) {
N=500;
M=500;
}
if (argc==4) {
stringstream ssnr;
ssnr<<argv[3];
if (ssnr.fail()) {
printf ("Error converting row dimension \n");
MPI_Finalize();
return 2;
}
ssnr>>N;
M=N;
}
if (argc==5) {
stringstream ssnr;
ssnr<<argv[3];
if (ssnr.fail()) {
printf ("Error converting row dimension \n");
MPI_Finalize();
return 2;
}
ssnr>>N;
stringstream ssnc;
ssnc<<argv[4];
if (ssnc.fail()) {
printf ("Error converting row dimension \n");
MPI_Finalize();
return 2;
}
ssnc>>M;
}
}
//Weak scaling
//nrl=N;
//Strong scaling
if (N%nprocs!=0) {
MPI_Finalize();
cout<<"Not an even division of the arrays.\n";
return 3;
}
else {
nrl=N/nprocs;
}
//Both
ncl=M;
//Find my neighbors
if (rank==0) {
up=MPI_PROC_NULL;
}
else {
up=rank-1;
}
if (rank==nprocs-1) {
down=MPI_PROC_NULL;
}
else {
down=rank+1;
}
int nrows=nrl+2;
int ncols=ncl+2;
double **u=new double*[nrows]; // Old values
double **w=new double*[nrows]; // New values
double **diffs=new double*[nrows]; // Diffs
double *uptr=new double[(nrows)*(ncols)];
double *wptr=new double[(nrows)*(ncols)];
double *dptr=new double[(nrows)*(ncols)];
for (i=0;i<nrows;++i,uptr+=ncols)
u[i] = uptr;
for (i=0;i<nrows;++i,wptr+=ncols)
w[i] = wptr;
for (i=0;i<nrows;++i,dptr+=ncols)
diffs[i] = dptr;
// Set physical boundary conditions (overwrites estimate on physical edges)
set_bcs(u, nrl, ncl, rank, nprocs, bc1, bc2, bc3, bc4);
diffInterval=1;
float time0=MPI_Wtime();
// Compute steady-state solution
// The diffs array is mostly to make it simpler to extend the testing interval
if (rank==0) {
cout<<"Running until the difference is <="<<epsilon<<" with size "<<N<<"x"<<M<<"\n";
}
while ( iterations<=MAX_ITER ) {
// reset diff each time through to get max abs diff
// max_element doesn't work great for twod arrays and is often slow
diff=.8*epsilon;
//Exchange halo values (one ghost row each side)
MPI_Sendrecv(&u[1][1],ncl, MPI_DOUBLE,up,tag,&u[nrl+1][1],
ncl, MPI_DOUBLE,down,tag,MPI_COMM_WORLD,&status);
MPI_Sendrecv(&u[nrl][1],ncl,MPI_DOUBLE,down,tag,&u[0][1],
ncl,MPI_DOUBLE,up,tag,MPI_COMM_WORLD,&status);
for (i=1; i<=nrl;i++) {
for (j=1;j<=ncl;j++) {
w[i][j] = (u[i-1][j] + u[i+1][j] + u[i][j-1] + u[i][j+1])/4.0;
diffs[i][j] = abs(w[i][j] - u[i][j]);
}
}
if (iterations%diffInterval==0) {
for (i=1; i<=nrl;i++) {
for (j=1;j<=ncl;j++) {
if (diff<diffs[i][j]) {
diff=diffs[i][j];
}
}
}
//Find max of diff in all the processors.
MPI_Allreduce(&diff,&gdiff,1,MPI_DOUBLE,MPI_MAX,MPI_COMM_WORLD);
if (gdiff <= epsilon)
break;
}
for (i=0; i<=nrl+1;i++) {
for (j=0;j<=ncl+1;j++) {
u[i][j] = w[i][j];
}
}
set_bcs(u, nrl, ncl, rank, nprocs, bc1, bc2, bc3, bc4);
iterations++;
} //end of computation
if (iterations>MAX_ITER) {
if (rank==0) {
cout<<"Warning: maximum iterations exceeded\n";
}
}
float totalTime=(MPI_Wtime()-time0);
if (rank==0) {
cout << "Completed "<<iterations<<" iterations; time "<<totalTime<<endl;
}
// Write solution to output file
ofstream fp;
string fname=argv[2]+to_string(rank);
fp.open(fname,ios::out);
for (i = 1; i <= nrl; i++) {
for (j = 1; j <= ncl; j++) {
fp<<u[i][j]<<" ";
}
fp<<"\n";
}
fp.close();
// All done!
MPI_Finalize();
return 0;
}
void set_bcs(double **u, int nr, int nc, int rank, int nprocs, double &bc1, double &bc2, double &bc3, double &bc4) {
/* Set boundary values.
* This has an ice bath on the top edge.
* Note: when plotting, 0,0 is the bottom.
*/
double topBC=0;
double bottomBC=100.;
double edgeBC=100.;
bc1=topBC;
bc2=bottomBC;
bc3=edgeBC;
bc4=edgeBC;
if (rank==0) {
for (int i=0;i<=nc+1;++i){
u[0][i]=topBC;
}
}
if (rank==nprocs-1) {
for (int i=0;i<nc+1;++i){
u[nr+1][i]=bottomBC;
}
}
for (int i=0;i<=nr+1;++i){
u[i][0]=edgeBC;
u[i][nc+1]=edgeBC;
}
}
Fortran
program heatedplate
use mpi
implicit none
integer, parameter:: maxiter=10000000
integer :: N, M, lb
integer :: numargs, i, j, diff_interval, iterations = 0
double precision :: eps, mean, diff = 1.0
character(len=80) :: arg, filename
double precision, allocatable, dimension(:,:) :: u, w
double precision :: bc1, bc2, bc3, bc4
real :: time0, time1
logical :: strong_scaling
!Add for MPI
integer :: nrl, ncl
integer :: rank, nprocs, tag=0
integer :: errcode, ierr
integer, dimension(MPI_STATUS_SIZE) :: mpi_stat
integer, parameter :: root=0
integer :: left, right
double precision :: gdiff
character(len=24) :: fname
interface
subroutine set_bcs(lb,nr,nc,rank,nprocs,bc1,bc2,bc3,bc4,u)
implicit none
integer, intent(in) :: lb, nr, nc, rank, nprocs
double precision, intent(out) :: bc1,bc2,bc3,bc4
double precision, dimension(lb:,lb:), intent(inout) :: u
end subroutine
end interface
!Initialize MPI, get the local number of columns
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
! check number of parameters and read in epsilon, filename, optionally size
! all ranks do this
numargs=command_argument_count()
if (numargs .lt. 2) then
stop 'USAGE: epsilon output-file <N> <M>'
else
call get_command_argument(1,arg)
read(arg,*) eps
call get_command_argument(2,filename)
if (numargs==2) then
N=500
M=500
endif
if (numargs==3) then
call get_command_argument(3,arg)
read(arg,*) N
M=N
endif
if (numargs==4) then
call get_command_argument(3,arg)
read(arg,*) N
call get_command_argument(4,arg)
read(arg,*) M
endif
endif
strong_scaling=.true.
if (strong_scaling) then
! strong scaling
if (mod(M,nprocs).ne.0) then
call MPI_FINALIZE(ierr)
stop 'Not an even division of the array'
else
ncl = M/nprocs
endif
else
! weak scaling
ncl = M
M = M*nprocs !for printing size
endif
! Both
nrl=N
! Find my neighbors
if ( rank .eq. root ) then
left = MPI_PROC_NULL
else
left = rank - 1
endif
if ( rank .eq. nprocs-1 ) then
right = MPI_PROC_NULL
else
right = rank + 1
endif
! Set boundary values and compute mean boundary value.
! This has the ice bath on the top edge.
! Allocate arrays)
lb=0
allocate(u(lb:nrl+1,lb:ncl+1),w(lb:nrl+1,lb:ncl+1))
u=0.d0
w=0.d0
! Set physical boundaries
call set_bcs(lb,nrl,ncl,rank,nprocs,bc1,bc2,bc3,bc4,u)
diff_interval=1
! Walltime from process 0 is good enough for me
if (rank .eq. 0) then
time0=MPI_WTIME()
endif
if (rank==0) then
write (*,'(a,es15.8,a,i6,a,i6)') 'Running until the difference is <= ', &
eps,' for global size ',N,'x',M
endif
! Compute steady-state solution
do while ( iterations<=maxiter )
! Exchange halo values
call MPI_SENDRECV(u(1:nrl,1) ,nrl,MPI_DOUBLE_PRECISION,left,tag, &
u(1:nrl,ncl+1),nrl,MPI_DOUBLE_PRECISION,right,tag, &
MPI_COMM_WORLD,mpi_stat,ierr)
call MPI_SENDRECV(u(1:nrl,ncl) ,nrl,MPI_DOUBLE_PRECISION,right,tag, &
u(1:nrl,0) ,nrl,MPI_DOUBLE_PRECISION,left,tag, &
MPI_COMM_WORLD,mpi_stat,ierr)
do j=1,ncl
do i=1,nrl
w(i,j) = 0.25*(u(i-1,j) + u(i+1,j) + u(i,j-1) + u(i,j+1))
enddo
enddo
if (mod(iterations,diff_interval)==0) then
if (diff_interval==-1) continue !disable convergence test
diff=maxval(abs(w(1:nrl,1:ncl)-u(1:nrl,1:ncl)))
call MPI_ALLREDUCE(diff,gdiff,1,MPI_DOUBLE_PRECISION,MPI_MAX, &
MPI_COMM_WORLD, ierr)
if (gdiff <= eps) then
exit
endif
endif
u = w
! Reset physical boundaries (they get overwritten in the halo exchange)
call set_bcs(lb,nrl,ncl,rank,nprocs,bc1,bc2,bc3,bc4,u)
iterations = iterations + 1
!If this happens we will exit at next conditional test.
!Avoids explicit break here
if (iterations>maxiter) then
if (rank==0) then
write(*,*) "Warning: maximum iterations exceeded"
endif
endif
enddo
if (rank==root) then
time1 = MPI_WTIME()
write (*,*) 'completed; iterations = ', iterations,"in ",time1-time0," sec"
endif
! Write solution to output file by row. Must use correct stitching to get
! overall result, such as a numpy concatenate on axis=1.
! It would also work to output by columns, in which case stitching is more
! straightforward.
!Some compilers (Intel) like to add EOLs to line up columns in output. The
! Fortran standard permits this with * IO, but it complicates plotting.
! Use some F08 features to fix this.
write(fname,'(a,i4.4)') filename(1:len_trim(filename)),rank
open (unit=10,file=fname)
do i=1,nrl
write (10,'(*(g0,1x))') u(i,1:ncl)
enddo
! All done!
call MPI_Finalize(ierr)
end program heatedplate
subroutine set_bcs(lb,nr,nc,rank,nprocs,bc1,bc2,bc3,bc4,u)
implicit none
integer, intent(in) :: lb, nr, nc, rank, nprocs
double precision, intent(out) :: bc1,bc2,bc3,bc4
double precision, dimension(lb:,lb:), intent(inout) :: u
double precision :: topBC, bottomBC, edgeBC
! Set boundary values
! This has the ice bath on the bottom edge.
! Note: when plotting, 0,0 is the bottom.
topBC=100.d0
bottomBC=0.d0
edgeBC=100.d0
bc1=topBC
bc2=bottomBC
bc3=edgeBC
bc4=edgeBC
!Set boundary conditions
if (rank==0) then
u(:,0)=edgeBC
endif
if (rank==nprocs-1) then
u(:,nc+1)=edgeBC
endif
u(nr+1,:)=topBC
u(0,:)=bottomBC
end subroutine set_bcs
Python
import sys
import time
import numpy as np
from mpi4py import MPI
def set_bcs(u,nrl,ncl,rank):
# Set physical boundary values and compute mean boundary value.
# Due to Python pass-by-assignment, mutable parameters will be
#changed outside (so this is a subroutine)
topBC=0.
bottomBC=100.
edgeBC = 100.
bc1=topBC
bc2=bottomBC
bc3=edgeBC
bc4=edgeBC
if rank==0:
u[0,:] = topBC
if rank==nprocs-1:
u[nrl+1,:]=bottomBC
u[:,0]=edgeBC
u[:,ncl+1]=edgeBC
return bc1,bc2,bc3,bc4
# check number of parameters and read in parameters
#Better to use argparse but this is closer to compiled language code and simpler
#if one hasn't used argparse.
argv=sys.argv
if len(argv) > 2:
try:
epsilon=float(argv[1])
filename=argv[2]
except:
"Illegal input"
sys.exit(1)
if len(argv) == 3:
N=500
M=500
if len(argv) == 4:
try:
N=int(argv[3])
M=N
except:
"Cannot convert grid dimension to integer"
sys.exit(2)
if len(argv) == 5:
try:
N=int(argv[3])
M=int(argv[4])
except:
"Cannot convert grid dimension to integer"
sys.exit(2)
else:
print('USAGE: epsilon output-file <N> <M>')
sys.exit(1)
# Initializing MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
nprocs = comm.Get_size()
root = 0
tag = 0
#Set max iterations. 10 million should do it.
max_iterations=10000000
#Set nprocs of local arrays
#Strong scaling
if M%nprocs!=0:
print("Number of rows must be evenly divisible by number of processes")
sys.exit(0)
else:
nrl = N//nprocs
#Weak scaling
#nrl=N
#Both
ncl=M
u=np.zeros((nrl+2,ncl+2))
w=np.zeros((nrl+2,ncl+2))
bc1,bc2,bc3,bc4=set_bcs(u,nrl,ncl,rank)
#Set up the up and down rank for each process
if rank == 0 :
up = MPI.PROC_NULL
else :
up = rank - 1
if rank == nprocs - 1 :
down = MPI.PROC_NULL
else :
down = rank + 1
# Compute steady-state solution
iterations=0
diff_interval=1
start_time=MPI.Wtime()
my_diff = np.zeros(1)
global_diff=np.zeros(1)
if rank==root:
print(f'Running until the difference is < {epsilon:}, global size {N:}x{M:}')
while iterations<=max_iterations:
# Send up and receive down
comm.Sendrecv([u[1,1:ncl+1], MPI.DOUBLE], up, tag, [u[nrl+1,1:ncl+1], MPI.DOUBLE], down, tag)
# Send down and receive up
comm.Sendrecv([u[nrl, 1:ncl+1], MPI.DOUBLE], down, tag, [u[0,1:ncl+1], MPI.DOUBLE], up, tag)
w[1:-1,1:-1]=0.25*(u[:-2,1:-1]+u[2:,1:-1]+u[1:-1,:-2]+u[1:-1,2:])
if iterations%diff_interval==0:
my_diff[0]=np.max(np.abs(w[1:-1,1:-1]-u[1:-1,1:-1]))
comm.Allreduce(my_diff,global_diff,op=MPI.MAX)
if global_diff[0]<=epsilon:
break
u[:,:]=w[:,:]
#Reapply physical boundary conditions
set_bcs(u,nrl,ncl,rank)
iterations+=1
#This is what the weird "else" clause in Python for/while loops is used to do.
#Our stopping criterion is now on exceeding max_iterations so don't need
# to break, but we need to warn that it happened.
else:
if rank==0:
print("Warning: maximum iterations exceeded")
total_time=MPI.Wtime()-start_time
if rank==0:
print(f'completed in {iterations:} iterations with time {total_time:.2f}')
# Write solution to output file
filename = filename + str(rank)
fout = open (filename,'w')
for i in range(1,nrl+1):
line=" ".join(map(str,list(u[i,1:ncl+1])))
row=line+"\n"
fout.write(row)
# All done!
#print(f"wrote output file {filename:}")
Project 9
Scaling Studies
We have discussed weak and strong
scaling. If you have not already done so, add timing to your heated plate solution in the language of your choice. Use MPI_Wtime()
since it will generate uniform timings regardless of your programming language. It is sufficient to instrument the while
loop since nearly all the time will be spent there.
-
The example solutions show how to do strong or weak scaling. You can do only one at a time. Weak scaling will increase the size of the global domain in a way that will make it rectangular rather than square, but that doesn’t matter for our study.
-
Start with strong scaling, dividing up the same amount of work over an increasing number of processes. Choose a value for the number of rows or columns that will be evenly divisible by 8. Run your code for 1, 2, 4, and 8 processes. Make a plot of time versus number of processes. Compute the parallel efficiency for each run.
-
Repeat for weak scaling.
Make sure you are using idle cores for this test, or your results will not reflect the time spent on your processes. If you have access to a high-performance computing cluster with a resource manager (queueing system), submit each test case as a separate job. Are the results what you expected?