Last modified: October 10, 2024

This article is written in: 🇺🇸

Message Passing Interface (MPI)

The Message Passing Interface (MPI) is a standardized and portable message-passing system designed to function on a wide variety of parallel computing architectures. It provides a set of library routines that can be called from programming languages like C, C++, and Fortran to write parallel applications. MPI allows multiple processes to communicate with one another by sending and receiving messages, enabling the development of scalable and efficient parallel programs.

MPI is central to high-performance computing (HPC) and is widely used in scientific computing, engineering simulations, and data-intensive tasks. It provides a rich set of functionalities that support both point-to-point and collective communication, making it suitable for a broad range of parallel algorithms.

Main idea:

MPI Programming Model

MPI follows the message-passing programming model, where processes communicate by explicitly sending and receiving messages. Here are the key aspects of the MPI programming model:

Process Model

Communication Types

Synchronization

Process Topologies

Implementing Parallel Algorithms with MPI

MPI provides the tools necessary to implement a wide range of parallel algorithms. Here are some considerations:

Data Decomposition

Communication Patterns

MPI Basics

While MPI provides a comprehensive set of functions, many parallel applications can be developed using a core subset of functions. These functions cover initialization, communication, and finalization.

Core MPI Functions

Function Description Parameters
MPI_Init Initializes the MPI execution environment. Must be called before any other MPI function. int *argc, char ***argv - Arguments passed to the program
MPI_Finalize Terminates the MPI execution environment. No MPI functions can be called after this. None
MPI_Comm_size Determines the size of the group associated with a communicator. MPI_Comm comm, int *size - Communicator and pointer to store size
MPI_Comm_rank Determines the rank of the calling process in the communicator. MPI_Comm comm, int *rank - Communicator and pointer to store rank
MPI_Send Performs a standard-mode, blocking send. void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm
MPI_Recv Performs a standard-mode, blocking receive. void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status

Introduction to MPI Communicators

Example Program: "Hello World" in MPI

Below is a simple MPI program that illustrates the basic structure of an MPI application.

C Version

#include <mpi.h>

#include <stdio.h>

int main(int argc, char *argv[]) {
    int rank, size;

    // Initialize the MPI environment

    MPI_Init(&argc, &argv);

    // Get the number of processes

    MPI_Comm_size(MPI_COMM_WORLD, &size);

    // Get the rank of the process

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    // Print off a hello world message
    printf("Hello world from rank %d out of %d processors\n", rank, size);

    // Finalize the MPI environment

    MPI_Finalize();

    return 0;

}

Fortran Version

program hello_world
    use mpi
    implicit none

    integer :: rank, size, ierr

    ! Initialize the MPI environment
    call MPI_Init(ierr)

    ! Get the number of processes
    call MPI_Comm_size(MPI_COMM_WORLD, size, ierr)

    ! Get the rank of the process
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

    ! Print off a hello world message
    print *, 'Hello world from rank', rank, 'out of', size, 'processors'

    ! Finalize the MPI environment
    call MPI_Finalize(ierr)
end program hello_world

Compilation and Execution

To compile the C version:

mpicc hello_world.c -o hello_world

To compile the Fortran version:

mpif90 hello_world.f90 -o hello_world

To execute the program with 4 processes:

mpirun -np 4 ./hello_world

Sample Output:

Hello world from rank 0 out of 4 processors

Hello world from rank 1 out of 4 processors

Hello world from rank 2 out of 4 processors

Hello world from rank 3 out of 4 processors

Sending and Receiving Messages

MPI provides various communication functions to send and receive messages between processes.

MPI_Send and MPI_Recv

Example: Sending Messages Between Processes

Consider an example where process 0 sends a message to process 1.

C Version

#include <mpi.h>

#include <stdio.h>

int main(int argc, char *argv[]) {
    int rank, size, number;

    MPI_Status status;

    MPI_Init(&argc, &argv);               // Initialize MPI environment

    MPI_Comm_rank(MPI_COMM_WORLD, &rank); // Get rank

    MPI_Comm_size(MPI_COMM_WORLD, &size); // Get size

    if (rank == 0) {

        // Process 0
        number = 42; // Some arbitrary data

        MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
        printf("Process 0 sent number %d to process 1\n", number);

    } else if (rank == 1) {

        // Process 1

        MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
        printf("Process 1 received number %d from process 0\n", number);

    }

    MPI_Finalize();
    return 0;

}

Fortran Version

program send_recv_example
    use mpi
    implicit none

    integer :: rank, size, number, ierr, status(MPI_STATUS_SIZE)

    call MPI_Init(ierr)
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
    call MPI_Comm_size(MPI_COMM_WORLD, size, ierr)

    if (rank == 0) then
        number = 42
        call MPI_Send(number, 1, MPI_INTEGER, 1, 0, MPI_COMM_WORLD, ierr)
        print *, 'Process 0 sent number', number, 'to process 1'
    else if (rank == 1) then
        call MPI_Recv(number, 1, MPI_INTEGER, 0, 0, MPI_COMM_WORLD, status, ierr)
        print *, 'Process 1 received number', number, 'from process 0'
    end if

    call MPI_Finalize(ierr)
end program send_recv_example

Output

Process 0 sent number 42 to process 1

Process 1 received number 42 from process 0

Non-Blocking Communication

Non-blocking communication allows processes to initiate communication operations and then proceed without waiting for them to complete. This can be useful for overlapping computation with communication.

Function Description Parameters
MPI_Isend Initiates a non-blocking send. void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request
MPI_Irecv Initiates a non-blocking receive. void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request
MPI_Wait Waits for a non-blocking operation to complete. MPI_Request *request, MPI_Status *status

Example: Non-Blocking Communication

#include <mpi.h>

#include <stdio.h>

int main(int argc, char *argv[]) {
    int rank, size, number;

    MPI_Request request;

    MPI_Status status;

    MPI_Init(&argc, &argv);               // Initialize MPI environment

    MPI_Comm_rank(MPI_COMM_WORLD, &rank); // Get rank

    MPI_Comm_size(MPI_COMM_WORLD, &size); // Get size

    if (rank == 0) {
        number = 42;

        MPI_Isend(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD, &request);

        // Do some computation here

        MPI_Wait(&request, &status); // Ensure send is complete
        printf("Process 0 sent number %d to process 1\n", number);

    } else if (rank == 1) {

        MPI_Irecv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);

        // Do some computation here

        MPI_Wait(&request, &status); // Ensure receive is complete
        printf("Process 1 received number %d from process 0\n", number);

    }

    MPI_Finalize();
    return 0;

}

Collective Communication

Collective communication involves all processes in a communicator. MPI provides various collective operations such as:

Function Description Parameters
MPI_Bcast Broadcasts a message from one process to all other processes. void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm
MPI_Reduce Performs a reduction operation (e.g., sum, max) across all processes and returns the result to a single process. const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm
MPI_Allreduce Similar to MPI_Reduce, but the result is returned to all processes. const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm
MPI_Scatter Distributes distinct chunks of data from one process to all processes. const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm
MPI_Gather Gathers data from all processes to one process. const void *sendbuf, int sendcount, MPI_Datatype sendtype, void *recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm

Example: MPI_Reduce

Suppose each process has a local sum, and we want to compute the global sum.

#include <mpi.h>

#include <stdio.h>

int main(int argc, char *argv[]) {
    int rank, size, local_sum, global_sum;

    MPI_Init(&argc, &argv);               // Initialize MPI environment

    MPI_Comm_rank(MPI_COMM_WORLD, &rank); // Get rank

    MPI_Comm_size(MPI_COMM_WORLD, &size); // Get size

    local_sum = rank + 1; // For example, local_sum = rank + 1

    MPI_Reduce(&local_sum, &global_sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

    if (rank == 0) {
        printf("Global sum is %d\n", global_sum);

    }

    MPI_Finalize();
    return 0;

}

Output:

If there are 4 processes, the local sums are 1, 2, 3, 4, and the global sum is 10.

Global sum is 10

MPI Language Bindings

MPI provides language bindings for C, C++, and Fortran, allowing MPI functions to be used naturally in these languages.

C Language Binding

Fortran Language Binding

Determinism in MPI Programs

In parallel computing, determinism refers to the property where a program produces the same output every time it is run with the same input, regardless of the timing of events during execution. In message-passing programming models like MPI, achieving determinism can be challenging due to the inherent nondeterminism in the arrival order of messages.

Consider a scenario where two processes, Process A and Process B, send messages to a third process, Process C. The arrival order of these messages at Process C is not guaranteed because it depends on various factors such as network latency, scheduling, and system load. Although MPI guarantees that messages sent from one process to another are received in the order they were sent, this guarantee does not extend across multiple sender processes.

Ensuring that an MPI program behaves deterministically is crucial for debugging, testing, and verifying parallel applications. It is the programmer's responsibility to design the communication patterns and use MPI features appropriately to achieve determinism.

To make MPI programs deterministic, programmers can employ the following techniques:

1. Specifying Message Sources

By default, when a process calls MPI_Recv, it can specify the source of the message or accept messages from any source by using MPI_ANY_SOURCE. To ensure determinism, it is advisable to specify the exact source process from which to receive messages. This eliminates ambiguity about which message is received and in what order.

// Non-deterministic receive

MPI_Recv(buffer, count, datatype, MPI_ANY_SOURCE, tag, comm, &status);

// Deterministic receive from a specific source

MPI_Recv(buffer, count, datatype, source_rank, tag, comm, &status);

2. Using Message Tags

MPI allows messages to be labeled with a tag, an integer value specified during send and receive operations. By carefully assigning and matching tags, processes can distinguish between different types of messages and ensure that they receive the correct message at the correct time.

// Sender

MPI_Send(data, count, datatype, dest_rank, TAG_DATA, comm);

// Receiver

MPI_Recv(data, count, datatype, source_rank, TAG_DATA, comm, &status);

3. Ordering Communication Operations

Designing the communication sequence so that all processes follow a predetermined order can help in achieving determinism. This often involves structuring the program such that all sends and receives occur in a fixed sequence, possibly by using barriers or other synchronization mechanisms.

4. Avoiding Wildcards

Minimize the use of wildcards like MPI_ANY_SOURCE and MPI_ANY_TAG in receive operations. While they provide flexibility, they can lead to nondeterministic behavior because the order in which messages are received can vary between executions.

Example: Nondeterministic Program

Let's examine a program that demonstrates nondeterministic behavior due to the use of MPI_ANY_SOURCE and MPI_ANY_TAG.

The program implements a symmetric pairwise interaction algorithm in which processes are arranged in a ring topology. Each process sends data halfway around the ring and then receives data from any source. Finally, it returns accumulated data to the originating process.

#include <mpi.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
    int myid, np, rnbr, rdest;
    float buff[600];

    MPI_Status status;

    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &myid);

    MPI_Comm_size(MPI_COMM_WORLD, &np);

    rnbr = (myid + 1) % np;
    rdest = (myid + np / 2 + 1) % np;

    // Circulate data around ring
    for (int i = 0; i < np / 2; i++) {

        MPI_Send(buff, 600, MPI_FLOAT, rnbr, 1, MPI_COMM_WORLD);

        MPI_Recv(buff, 600, MPI_FLOAT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

    }

    // Return accumulated data to source

    MPI_Send(buff, 300, MPI_FLOAT, rdest, 2, MPI_COMM_WORLD);

    MPI_Recv(buff, 300, MPI_FLOAT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);

    MPI_Finalize();
    return 0;

}

In this program:

Ensuring Determinism in the Example

To make the program deterministic, modify the MPI_Recv calls to specify the exact source and tag expected.

MPI_Recv(buff, 600, MPI_FLOAT, rnbr, 1, MPI_COMM_WORLD, &status);

By specifying rnbr as the source and 1 as the tag, the process ensures that it receives the expected message from its right neighbor with the correct tag.

MPI Collective Communication

Parallel algorithms often require coordinated communication among multiple processes. MPI provides a set of collective communication functions that are optimized for such operations. These functions simplify the code and can offer performance benefits due to underlying optimizations.

Key MPI Collective Communication Functions

Below is a summary of important collective communication functions provided by MPI:

Function Purpose
MPI_Barrier Synchronizes all processes in a communicator.
MPI_Bcast Broadcasts data from one process to all others.
MPI_Gather Gathers data from all processes to one process.
MPI_Scatter Distributes distinct data from one process to all.
MPI_Reduce Reduces values on all processes to a single result.
MPI_Allreduce Similar to MPI_Reduce, but result is shared.
MPI_Allgather Gathers data from all processes to all processes.
MPI_Alltoall Sends data from all processes to all processes.

Complete Example: Parallel Summation

Let's consider an example where we need to compute the sum of an array distributed across multiple processes.

Problem Description:

Code Example (C)

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[]) {
    int np, me, array_size, local_size;
    double *global_array = NULL;
    double *local_array;
    double local_sum = 0.0, global_sum = 0.0;
    int root = 0;

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &np);

    MPI_Comm_rank(MPI_COMM_WORLD, &me);

    if (me == root) {

        // Define the total size of the array
        array_size = 1000;
        global_array = malloc(array_size * sizeof(double));

        // Initialize the global array
        for (int i = 0; i < array_size; i++) {
            global_array[i] = 1.0; // Example value

        }

    }

    // Broadcast the array size to all processes

    MPI_Bcast(&array_size, 1, MPI_INT, root, MPI_COMM_WORLD);

    // Determine the size of each local array
    local_size = array_size / np;

    // Allocate memory for the local array
    local_array = malloc(local_size * sizeof(double));

    // Scatter the global array to all local arrays

    MPI_Scatter(global_array, local_size, MPI_DOUBLE, local_array, local_size, MPI_DOUBLE, root, MPI_COMM_WORLD);

    // Each process computes the sum of its local array
    for (int i = 0; i < local_size; i++) {
        local_sum += local_array[i];

    }

    // Reduce all local sums to a global sum at the root process

    MPI_Reduce(&local_sum, &global_sum, 1, MPI_DOUBLE, MPI_SUM, root, MPI_COMM_WORLD);

    if (me == root) {
        printf("Global sum is: %f\n", global_sum);
        free(global_array);

    }

    // Clean up
    free(local_array);

    MPI_Finalize();
    return 0;

}

Explanation:

Finite Difference Problem Using MPI

We aim to solve a finite difference problem where a computational domain is divided among multiple processes. The algorithm requires:

Code Example (C)

#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

void compute(float *local, int lsize) {

    // Example computation: update local array
    for (int i = 1; i <= lsize; i++) {
        local[i] = (local[i - 1] + local[i] + local[i + 1]) / 3.0;

    }

}

float maxerror(float *local, int lsize) {
    float local_err = 0.0;
    for (int i = 1; i <= lsize; i++) {
        float err = fabs(local[i] - local[i - 1]);
        if (err > local_err) {
            local_err = err;

        }

    }
    return local_err;

}

int main(int argc, char *argv[]) {

    MPI_Comm com = MPI_COMM_WORLD;
    int np, me, size, lsize;
    float *work = NULL, *local;
    float globalerr = 99999.0, localerr;
    int lnbr, rnbr;

    MPI_Init(&argc, &argv);

    MPI_Comm_size(com, &np);

    MPI_Comm_rank(com, &me);

    if (me == 0) {

        // Initialize problem size and data
        size = 1000;
        work = malloc(size * sizeof(float));
        for (int i = 0; i < size; i++) {
            work[i] = sin(i * M_PI / size);

        }

    }

    // Broadcast the problem size

    MPI_Bcast(&size, 1, MPI_INT, 0, com);

    // Determine local size and allocate local array with ghost cells
    lsize = size / np;
    local = malloc((lsize + 2) * sizeof(float)); // +2 for ghost cells

    // Scatter the data to all processes

    MPI_Scatter(work, lsize, MPI_FLOAT, local + 1, lsize, MPI_FLOAT, 0, com);

    // Initialize ghost cells
    local[0] = 0.0;
    local[lsize + 1] = 0.0;

    // Determine neighbor ranks (assuming periodic boundary conditions)
    lnbr = (me == 0) ? np - 1 : me - 1;
    rnbr = (me == np - 1) ? 0 : me + 1;

    while (globalerr > 0.0001) {

        // Exchange boundary data with neighbors

        MPI_Sendrecv(&local[1], 1, MPI_FLOAT, lnbr, 0,

                     &local[lsize + 1], 1, MPI_FLOAT, rnbr, 0, com, MPI_STATUS_IGNORE);

        MPI_Sendrecv(&local[lsize], 1, MPI_FLOAT, rnbr, 0,

                     &local[0], 1, MPI_FLOAT, lnbr, 0, com, MPI_STATUS_IGNORE);

        // Compute new values
        compute(local, lsize);

        // Compute local error
        localerr = maxerror(local, lsize);

        // Compute global maximum error

        MPI_Allreduce(&localerr, &globalerr, 1, MPI_FLOAT, MPI_MAX, com);

    }

    // Gather the results at the root process

    MPI_Gather(local + 1, lsize, MPI_FLOAT, work, lsize, MPI_FLOAT, 0, com);

    if (me == 0) {

        // Process and output the results
        printf("Computation complete. Sample output:\n");
        for (int i = 0; i < 10; i++) {
            printf("work[%d] = %f\n", i, work[i]);

        }
        free(work);

    }

    free(local);

    MPI_Finalize();
    return 0;

}

Explanation:

MPI Modularity and Communicators

In complex parallel applications, it is essential to structure the code into modules for maintainability and reusability. MPI supports modular programming through the use of communicators, which define communication contexts and process groups.

Communicators in MPI

A communicator in MPI is an object that represents a group of processes that can communicate with each other. The default communicator, MPI_COMM_WORLD, includes all the processes in the MPI program.

Creating New Communicators

Function Description Parameters
MPI_Comm_dup Duplicates an existing communicator to create a new one with the same group but a different communication context. This is useful for isolating communication in different modules. MPI_Comm comm, MPI_Comm *newcomm
MPI_Comm_split Splits a communicator into multiple, disjoint sub-communicators based on color and key values. This is useful for creating process subgroups. MPI_Comm comm, int color, int key, MPI_Comm *newcomm

Splitting Processes into Subgroups:

MPI_Comm new_comm;
int color = myid / 4; // Divide processes into groups of 4

MPI_Comm_split(MPI_COMM_WORLD, color, myid, &new_comm);

// Now new_comm contains a subgroup of processes

Communicating Between Groups

To enable communication between different groups, MPI provides intercommunicators.

Function Description Parameters
MPI_Intercomm_create Creates an intercommunicator that allows communication between two groups of processes. MPI_Comm local_comm, int local_leader, MPI_Comm peer_comm, int remote_leader, int tag, MPI_Comm *newintercomm

Creating an Intercommunicator:

MPI_Comm intercomm;
int local_leader = 0; // Rank of leader in local group
int remote_leader = 0; // Rank of leader in remote group

MPI_Intercomm_create(local_comm, local_leader, MPI_COMM_WORLD, remote_leader, tag, &intercomm);

// Now processes in local_comm can communicate with processes in the remote group via intercomm

MPI Derived Data Types

In many applications, data to be sent or received may not be stored contiguously in memory. MPI allows the creation of derived data types to describe such complex memory layouts, enabling efficient communication without extra copying.

Function Description Parameters
MPI_Type_contiguous Creates a new data type representing a contiguous block of elements. int count, MPI_Datatype oldtype, MPI_Datatype *newtype
MPI_Type_vector Creates a data type representing blocks of elements with a regular stride. int count, int blocklength, int stride, MPI_Datatype oldtype, MPI_Datatype *newtype
MPI_Type_indexed Creates a data type with blocks at arbitrary displacements. int count, const int array_of_blocklengths[], const int array_of_displacements[], MPI_Datatype oldtype, MPI_Datatype *newtype

Example: Sending a Column of a Matrix

Suppose we have a 2D array stored in row-major order, and we want to send a column.

int rows = 10, cols = 10;
double matrix[rows][cols];

MPI_Datatype column_type;

// Create a data type for a column

MPI_Type_vector(rows, 1, cols, MPI_DOUBLE, &column_type);

MPI_Type_commit(&column_type);

// Send the column starting at matrix[0][col_index]
int dest = 1;

MPI_Send(&matrix[0][col_index], 1, column_type, dest, tag, MPI_COMM_WORLD);

// Clean up

MPI_Type_free(&column_type);

Benefits:

Asynchronous Communication

Asynchronous communication allows a process to initiate a communication operation and then proceed without waiting for it to complete. This can help overlap computation and communication, improving performance.

Non-blocking Operations

Probing for Messages

Sometimes, a process may need to check if a message has arrived without actually receiving it.

Function Description Parameters
MPI_Iprobe Non-blocking check for the arrival of a message. int source, int tag, MPI_Comm comm, int *flag, MPI_Status *status
MPI_Probe Blocking check for a message; returns when a message is available. int source, int tag, MPI_Comm comm, MPI_Status *status

Example: Dynamic Message Handling

MPI_Status status;
int flag;

// Periodically check for messages
while (!done) {

    MPI_Iprobe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &flag, &status);
    if (flag) {

        // Message is available, determine its size
        int count;

        MPI_Get_count(&status, MPI_DOUBLE, &count);
        double *buffer = malloc(count * sizeof(double));

        // Receive the message

        MPI_Recv(buffer, count, MPI_DOUBLE, status.MPI_SOURCE, status.MPI_TAG, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

        // Process the message
        process_message(buffer, count);

        free(buffer);

    }

    // Perform other computations
    perform_computation();

}

Use Cases:

Best Practices

  1. Minimize communication overhead by reducing the number and size of messages.
  2. Use non-blocking communication to allow the program to continue executing while sending or receiving messages.
  3. Utilize collective communication operations for efficient communication among multiple processes.
  4. Ensure your program scales well with an increasing number of processes, distributing work evenly to prevent idle or overloaded processes (implement load balancing).
  5. Always check the return codes of MPI functions to handle errors gracefully.
  6. Ensure every MPI_Send has a matching MPI_Recv to avoid deadlocks.
  7. Use non-blocking communication or MPI_Sendrecv to prevent processes from waiting indefinitely and avoid deadlocks.
  8. Properly manage resources by allocating and freeing communicators and data types.
  9. Design algorithms that scale efficiently and minimize communication frequency and volume.
  10. Leverage MPI's optimized collective communication functions to improve performance.

Examples

C/C++ MPI

  1. Install an MPI implementation, such as OpenMPI or MPICH.
  2. Compile your C/C++ MPI program using the provided wrapper scripts:
  3. For C: mpicc mpi_program.c -o mpi_program
  4. For C++: mpiCC mpi_program.cpp -o mpi_program
  5. Run your MPI program using the provided mpiexec or mpirun command:
  6. mpiexec -n <number_of_processes> ./mpi_program
  7. mpirun -n <number_of_processes> ./mpi_program

The following C/C++ examples demonstrate different aspects of MPI:

Python MPI

  1. Install the mpi4py library and an MPI implementation, such as OpenMPI or MPICH.
  2. Run your Python MPI program using the provided mpiexec or mpirun command:
  3. mpiexec -n <number_of_processes> python mpi_program.py
  4. mpirun -n <number_of_processes> python mpi_program.py

The following Python examples demonstrate different aspects of MPI:

Table of Contents

    Message Passing Interface (MPI)
    1. MPI Programming Model
      1. Process Model
      2. Communication Types
      3. Synchronization
      4. Process Topologies
    2. Implementing Parallel Algorithms with MPI
      1. Data Decomposition
      2. Communication Patterns
    3. MPI Basics
      1. Core MPI Functions
      2. Introduction to MPI Communicators
      3. Example Program: "Hello World" in MPI
    4. Sending and Receiving Messages
      1. MPI_Send and MPI_Recv
      2. Example: Sending Messages Between Processes
    5. Non-Blocking Communication
      1. Example: Non-Blocking Communication
    6. Collective Communication
      1. Example: MPI_Reduce
    7. MPI Language Bindings
      1. C Language Binding
      2. Fortran Language Binding
    8. Determinism in MPI Programs
      1. 1. Specifying Message Sources
      2. 2. Using Message Tags
      3. 3. Ordering Communication Operations
      4. 4. Avoiding Wildcards
      5. Example: Nondeterministic Program
      6. Ensuring Determinism in the Example
    9. MPI Collective Communication
      1. Key MPI Collective Communication Functions
      2. Complete Example: Parallel Summation
      3. Finite Difference Problem Using MPI
    10. MPI Modularity and Communicators
      1. Communicators in MPI
      2. Creating New Communicators
      3. Communicating Between Groups
    11. MPI Derived Data Types
      1. Example: Sending a Column of a Matrix
    12. Asynchronous Communication
      1. Non-blocking Operations
      2. Probing for Messages
      3. Example: Dynamic Message Handling
    13. Best Practices
    14. Examples
      1. C/C++ MPI
      2. Python MPI