If so, then continue to the next step. If not, then login as you did previously for Exercises 1 and 2.
First, review the serial version of this example code, either ser_array.c or ser_array.f.
After you understand what’s going on, review the parallel MPI version, either mpi_array.c or mpi_array.f. The comments explain how MPI is used to implement a parallel data decomposition on an array.
As with Exercises 1 & 2, use the compiler command of your choice to compile the mpi_array example code.
Use the appropriate serial compiler command for the serial version. For example:
C
:
icc ser_array.c -o ser_array
mpicc mpi_array.c -o mpi_array
Fortran
:
ifort ser_array.f -o ser_array
mpif77 mpi_array.f -o mpi_array
For the MPI executable, use the special workshop pool and 8 tasks. For example:
Serial
:
ser_array
MPI:
srun -n8 -ppReserved mpi_array
Note: The srun
command is covered in detail in the “Starting Jobs” section of the Linux Clusters Overview tutorial, located here. There is also a man page.
If we had more time, you might even be able to start with a serial code or two and create your own parallel version. Feel free to try if you’d like.
C:
make -f Makefile.MPI.c
make -f Makefile.MPI.c mpi_mm
make -f Makefile.Ser.c
Fortran:
make -f Makefile.MPI.f
make -f Makefile.MPI.f mpi_mm
make -f Makefile.Ser.f
Note: you can change the compiler being used by editing the Makefile.
srun
command for this as shown previously.
Most of the executables only need 4 MPI tasks or less. Some exceptions and notes:Code | Requirement(s) |
---|---|
mpi_bandwidth | Requires an even number of tasks. |
mpi_cartesian | Requires 16 MPI tasks |
mpi_group | Requires 8 MPI tasks |
mpi_heat2D mpi_wave | These examples attempt to generate an Xwindows display for results. You will need to make sure that your Xwindows environment and software is setup correctly if you want to see the graphic. Ask the instructor if you have any questions. |
mpi_latency | Requires only 2 MPI tasks that should be on DIFFERENT nodes |
mpi_bandwidth
code if you haven’t already.srun -N2 -n2 -ppReserved mpi_bandwidth
srun -N2 -n4 -ppReserved mpi_bandwidth
srun -N2 -n8 -ppReserved mpi_bandwidth
srun -N2 -n16 -ppReserved mpi_bandwidth
srun -N2 -n32 -ppReserved mpi_bandwidth
srun -N2 -n64 -ppReserved mpi_bandwidth
mpi_bandwidth
source file to another file called mpi_bandwidthNB
. Modify your new file so that it performs non-blocking sends/receives instead of blocking. An example mpi_bandwidth_nonblock
file has been provided in case you need it.srun -N2 -ppReserved mpi_bandwidth
srun -N2 -ppReserved mpi_bandwidthNB
There are many things that can go wrong when developing MPI programs. The mpi_bug series of programs demonstrate just a few. See if you can figure out what the problem is with each case and then fix it.
Compile with the compile command(s) of your choice and run interactively using 4 tasks in the special workshop pool.
The buggy behavior will differ for each example. Some hints are provided below.
Code | Behavior | Hints/Notes |
---|---|---|
mpi_bug1 | Hangs | mpi_bug1 demonstrates how miscoding even a simple parameter like a message tag can lead to a hung program. Verify that the message sent from task 0 is not exactly what task 1 is expecting and vice versa. Matching the send tags with the receive tags solves the problem. |
mpi_bug2 | Wrong results or abnormal termination | mpi_bug2 shows another type of miscoding. The data type of the message sent by task 0 is not what task 1 expects. Nevertheless, the message is received, resulting in wrong results or abnormal termination - depending upon the MPI library and platform. Matching the send data type with the receive data type solves the problem. |
mpi_bug3 | Error message and/or abnormal termination | mpi_bug3 shows what happens when the MPI environment is not initialized or terminated properly. Inserting the MPI init and finalize calls in the right locations will solve the problem. |
mpi_bug4 | Gives the wrong result for "Final sum". Compare to mpi_array | Number of MPI tasks must be divisible by 4; mpi_bug4 shows what happens when a task does not participate in a collective communication call. In this case, task 0 needs to call MPI_Reduce as the other tasks do. |
mpi_bug5 | Dies or hangs - depends upon platform and MPI library | mpi_bug5 demonstrates an unsafe program, because sometimes it will execute fine, and other times it will fail. The reason why the program fails or hangs is due to buffer exhaustion on the receiving task side, as a consequence of the way an MPI library has implemented an eager protocol for messages of a certain size. One possible solution is to include an MPI_Barrier call in the both the send and receive loops. |
mpi_bug6 | Terminates or is ignored (depends on platform/language) | Requires 4 MPI tasks; mpi_bug6 has a bug that will terminate the program in some cases but be ignored in other cases. The problem is that task 2 performs a blocking operation, but then hits the MPI_Wait call near the end of the program. Only the tasks that make non-blocking calls should hit the MPI_Wait. The coding error in this case is easy to fix - simply make sure task 2 does not encounter the MPI_Wait call. |
mpi_bug7 | Hangs | mpi_bug7 performs a collective communication broadcast but erroneously codes the count argument incorrectly resulting in a hang condition. |
If you’re just finishing the tutorial and haven’t filled out our evaluation form yet, please do!