Exercise 3
Overview:
- Login to the LC workshop cluster, if you are not already logged in.
- Following the Exercise 3 instructions will take you through all sorts of MPI programs - pick any/all that are of interest.
- The intention is review the codes and see what’s happening - not just compile and run.
- Several codes provide serial examples for a comparison with the parallel MPI versions.
- Check out the “bug” programs.
1. Still logged into the workshop cluster?
If so, then continue to the next step. If not, then login as you did previously for Exercises 1 and 2.
2. Review the array decomposition example code.
First, review the serial version of this example code, either ser_array.c or ser_array.f.
After you understand what’s going on, review the parallel MPI version, either mpi_array.c or mpi_array.f. The comments explain how MPI is used to implement a parallel data decomposition on an array.
3. Compile the parallel MPI and serial versions of the array decomposition example code
As with Exercises 1 & 2, use the compiler command of your choice to compile the mpi_array example code.
Use the appropriate serial compiler command for the serial version. For example:
C:
icc ser_array.c -o ser_array
mpicc mpi_array.c -o mpi_array
Fortran:
ifort ser_array.f -o ser_array
mpif77 mpi_array.f -o mpi_array
4. Run the executables interactively
For the MPI executable, use the special workshop pool and 8 tasks. For example:
Serial:
ser_array
MPI:
srun -n8 -ppReserved mpi_array
Note: The srun command is covered in detail in the “Starting Jobs” section of the Linux Clusters Overview tutorial, located here. There is also a man page.
5. Compare other serial codes to their parallel version
If we had more time, you might even be able to start with a serial code or two and create your own parallel version. Feel free to try if you’d like.
6. Try any/all of the other MPI example codes
- First, review the code(s) so that you understand how MPI is being used.
- Then, using the MPI compiler command(s) of your choice, compile the codes of interest.
- For convenience, the included Makefiles can be used to compile any or all of the exercise codes. For example:
C:
make -f Makefile.MPI.c
make -f Makefile.MPI.c mpi_mm
make -f Makefile.Ser.c
Fortran:
make -f Makefile.MPI.f
make -f Makefile.MPI.f mpi_mm
make -f Makefile.Ser.f
Note: you can change the compiler being used by editing the Makefile.
- Run the executables interactively in the special workshop pool. Use the
sruncommand for this as shown previously. Most of the executables only need 4 MPI tasks or less. Some exceptions and notes:
- Some things to try:
- Different compilers
- Experiment with compiler flags (see respective man pages).
- Vary the number of tasks and nodes used.
7. Compare per task and aggregate communications bandwidths
- Compile the
mpi_bandwidthcode if you haven’t already. - Run the code interactively with 2 tasks on two different nodes:
srun -N2 -n2 -ppReserved mpi_bandwidth - Note the overall average bandwidth for the largest message size of 1,000,000 bytes.
- Now run the code interactively with 4, 8, 16, 32 and 64 tasks on two different nodes:
srun -N2 -n4 -ppReserved mpi_bandwidth
srun -N2 -n8 -ppReserved mpi_bandwidth
srun -N2 -n16 -ppReserved mpi_bandwidth
srun -N2 -n32 -ppReserved mpi_bandwidth
srun -N2 -n64 -ppReserved mpi_bandwidth
- Note the average bandwidths as before.
- What are your observations? Why?
Explanation (Click to expand!)
As the number of tasks increase, the per task bandwidth decreases because they must compete for use of the network adapter. Aggregate bandwidth will increase until it plateaus.8. Compare blocking send/receive with non-blocking send/receive
- Copy your
mpi_bandwidthsource file to another file calledmpi_bandwidthNB. Modify your new file so that it performs non-blocking sends/receives instead of blocking. An examplempi_bandwidth_nonblockfile has been provided in case you need it. - After you’re satisfied with your new non-blocking version of the bandwidth code, compile both.
- Run each code using two tasks on two different nodes in the special workshop pool:
srun -N2 -ppReserved mpi_bandwidth srun -N2 -ppReserved mpi_bandwidthNB - Compare the results. Which one performs best?
Explanation (Click to expand!)
Non-blocking send/receive operations are often significantly faster than blocking send/receive operations.9. When things go wrong…
There are many things that can go wrong when developing MPI programs. The mpi_bug series of programs demonstrate just a few. See if you can figure out what the problem is with each case and then fix it.
Compile with the compile command(s) of your choice and run interactively using 4 tasks in the special workshop pool.
The buggy behavior will differ for each example. Some hints are provided below.
Hints (Click to expand!)
If you’re just finishing the tutorial and haven’t filled out our evaluation form yet, please do!




