The Full Wiki

Message Passing Interface: Wikis


Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.


From Wikipedia, the free encyclopedia

Message Passing Interface (MPI) is a specification for an API that allows many computers to communicate with one another. It is used in computer clusters and supercomputers. MPI was created by William Gropp, Ewing Lusk and others.



MPI is a language-independent communications protocol used to program parallel computers. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation."[1] MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing today.[2]

MPI is not sanctioned by any major standards body; nevertheless, it has become a de facto standard for communication among processes that model a parallel program running on a distributed memory system. Actual distributed memory supercomputers such as computer clusters often run these programs. The principal MPI-1 model has no shared memory concept, and MPI-2 has only a limited distributed shared memory concept. Nonetheless, MPI programs are regularly run on shared memory computers. Designing programs around the MPI model (as opposed to explicit shared memory models) has advantages on NUMA architectures since MPI encourages memory locality.

Although MPI belongs in layers 5 and higher of the OSI Reference Model, implementations may cover most layers of the reference model, with socket and TCP being used in the transport layer.

Most MPI implementations consist of a specific set of routines (i.e., an API) callable from Fortran, C, C++ or Java and from any language capable of interfacing with such routine libraries. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs).

MPI has Language Independent Specifications (LIS) for the function calls and language bindings. The first MPI standard specified ANSI C and Fortran-77 language bindings together with the LIS. The draft of this standard was presented at Supercomputing 1994 (November 1994) (Obs: TO FIX: a reference must be included, because at sc94 site it is not possible to find references to support this statement:, , and finalized soon thereafter. About 128 functions constitute the MPI-1.2 standard as it is now defined.

There are two versions of the standard that are currently popular: version 1.2 (shortly called MPI-1), which emphasizes message passing and has a static runtime environment, and MPI-2.1 (MPI-2), which includes new features such as parallel I/O, dynamic process management and remote memory operations.[3] MPI-2's LIS specifies over 500 functions and provides language bindings for ANSI C, ANSI Fortran (Fortran90), and ANSI C++. Interoperability of objects defined in MPI was also added to allow for easier mixed-language message passing programming. A side effect of MPI-2 standardization (completed in 1996) was clarification of the MPI-1 standard, creating the MPI-1.2 level.

It is important to note that MPI-2 is mostly a superset of MPI-1, although some functions have been deprecated. Thus MPI-1.2 programs still work under MPI implementations compliant with the MPI-2 standard.

MPI is often compared with PVM, which is a popular distributed environment and message passing system developed in 1989, and which was one of the systems that motivated the need for standard parallel message passing systems. Threaded shared memory programming models (such as Pthreads and OpenMP) and message passing programming (MPI/PVM) can be considered as complementary programming approaches, and can occasionally be seen used together in applications where this suits architecture, e.g. in servers with multiple large shared-memory nodes.


The MPI interface is meant to provide essential virtual topology, synchronization, and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances) in a language-independent way, with language-specific syntax (bindings), plus a few features that are language-specific. MPI programs always work with processes, but programmers commonly refer to the processes as processors. Typically, for maximum performance, each CPU (or core in a multicore machine) will be assigned just a single process. This assignment happens at runtime through the agent that starts the MPI program, normally called mpirun or mpiexec.

The MPI library functions include, but are not limited to, point-to-point rendezvous-type send/receive operations, choosing between a Cartesian or graph-like logical process topology, exchanging data between process pairs (send/receive operations), combining partial results of computations (gathering and reduction operations), synchronizing nodes (barrier operation) as well as obtaining network-related information such as the number of processes in the computing session, current processor identity that a process is mapped to, neighboring processes accessible in a logical topology, and so on. Point-to-point operations come in synchronous, asynchronous, buffered, and ready forms, to allow both relatively stronger and weaker semantics for the synchronization aspects of a rendezvous-send. Many outstanding operations are possible in asynchronous mode, in most implementations.

MPI-1 and MPI-2 both enable implementations that do good work in overlapping communication and computation, but practice and theory differ. MPI also specifies thread safe interfaces, which have cohesion and coupling strategies that help avoid the manipulation of unsafe hidden state within the interface. It is relatively easy to write multithreaded point-to-point MPI code, and some implementations support such code. Multithreaded collective communication is best accomplished by using multiple copies of Communicators, as described below.


MPI provides a rich range of capabilities. The following concepts help in understanding and providing context for all of those capabilities and help the programmer to decide what functionality to use in their application programs.

There are eight basic concepts of MPI, four of which are unique to MPI-2.


Communicators are objects connecting groups of processes in the MPI session. Within each communicator each contained process has an independent identifier and the contained processes are arranged in an ordered topology. MPI also has explicit groups, but these are mainly good for organizing and reorganizing subsets of processes, before another communicator is made. MPI understands single group intracommunicator operations, and bi-partite (two-group) intercommunicator communication. In MPI-1, single group operations are most prevalent, with bi-partite operations finding their biggest role in MPI-2 where their usability is expanded to include collective communication and in dynamic process management.

Communicators can be partitioned using several commands in MPI, these commands include a graph-coloring-type algorithm called MPI_COMM_SPLIT, which is commonly used to derive topological and other logical subgroupings in an efficient way.

Point-to-point basics

A number of important functions in the MPI API involve communication between two specific processes. A much used example is the MPI_Send interface, which allows one specified process to send a message to a second specified process. Point-to-point operations, as these are called, are particularly useful in patterned or irregular communication, for example, a data-parallel architecture in which each processor routinely swaps regions of data with specific other processors between calculation steps, or a master-slave architecture in which the master sends new task data to a slave whenever the previous task is completed.

MPI-1 specifies mechanisms for both blocking and non-blocking point-to-point communication mechanisms, as well as the so-called 'ready-send' mechanism whereby a send request can be made only when the matching receive request has already been made.

Collective basics

Collective functions in the MPI API involve communication among all processes in a process group (which can mean the entire process pool or a program-defined subset). A typical function is the MPI_Bcast call (short for "broadcast"). This function takes data from one specially identified node and sends that message to all processes in the process group. A reverse operation is the MPI_Reduce call, which is a function designed to take data from all processes in a group, performs a user-chosen operation (like summing), and store the results on one individual node. These types of calls are often useful at the beginning or end of a large distributed calculation, where each processor operates on a part of the data and then combines it into a result.

There are also more complex operations such as MPI_Alltoall, which rearranges n items of data from each processor such that the nth node gets the nth item of data from each.

Derived Datatypes

Many MPI functions require that you specify the type of the data which is sent between processors. This is because these arguments to MPI functions are variables, not defined types. If the data type is a standard one, such as, int, char, double, etc., you can use predefined MPI datatypes such as MPI_INT, MPI_CHAR, MPI_DOUBLE. Suppose your data is an array of ints and all the processors want to send their array to the root with MPI_Gather.

Here is a C example of how to do it:

int array[100]; 
int root, total_p, *receive_array;
MPI_Comm_size(comm, &total_p);
receive_array=(int *) malloc(total_p*100*sizeof(int));
MPI_Gather(array, 100, MPI_INT, receive_array, 100, MPI_INT, root, comm);

However, you may instead wish to send your data as one block as opposed to 100 ints. You can do this by defining a contiguous block derived data type.

MPI_Datatype newtype;
MPI_Type_contiguous(100, MPI_INT, &newtype);
MPI_Gather(array, 1, newtype, receive_array, 1, newtype, root, comm);

Sometimes, your data might be a class or a data structure. In this case, there is not a predefined data type and you have to create one. You can make an MPI derived data type from MPI_predefined data types, by using MPI_Type_create_struct, which has the following format:

int MPI_Type_create_struct(int count, int blocklen[], MPI_Aint disp[], MPI_Datatype type[], MPI_Datatype *newtype)

where count is a number of blocks, also number of entries in types[], disp[] and blocklen[], blocklen[] — number of elements in each block (array of integer), disp[] — byte displacement of each block (array of integer), type[] — type of elements in each block (array of handles to datatype objects).

The disp[] array is needed because processors require the variables to be aligned a specific way on the memory. For example, Char is one byte and can go anywhere on the memory. Short is 2 bytes, so it goes to even memory addresses. Long is 4 bytes, it goes on locations divisible by 4 and so on. The compiler tries to accommodate this architecture in a class or data structure by putting padding between the variables. The safest way to find the distance between different variables in a data structure is by using their addresses by another MPI function, MPI_Get_address. You can use this function to calculate the displacement of all the elements of the data structure from the beginning of the data structure.

Suppose you have the following data structures:

  typedef struct{
     int f;
     short p;
   } A;
  typedef struct{
    A a;
    int pp,vp;
   } B;

Here's the C code for building MPI-derived data type:

void define_MPI_datatype(){
  int blocklen[6]={1,1,1,1,1,1}; //The first and last elements mark the beg and end of data structure
  MPI_Aint disp[6];
  MPI_Datatype newtype;
  B findsize[2]; //You need an array to establish the upper bound of the data structure
  MPI_Aint findsize_addr, a_addr, f_addr, p_addr, pp_addr, vp_addr, UB_addr;
  int error;
  MPI_Get_address(&findsize[0], &findsize_addr);
  MPI_Get_address(&(findsize[0]).a, &a_addr);
  MPI_Get_address(&((findsize[0]).a).f, &f_addr);
  MPI_Get_address(&((findsize[0]).a).p, &p_addr);
  MPI_Get_address(&(findsize[0]).pp, &pp_addr);
  MPI_Get_address(&(findsize[0]).vp, &vp_addr);
  error=MPI_Type_create_struct(6, blocklen, disp, type, &newtype);

One-sided communication (MPI-2)

MPI-2 defines three one-sided communications operations, Put, Get, and Accumulate, being a write to remote memory, a read from remote memory, and a reduction operation on the same memory across a number of tasks. Also defined are three different methods for synchronising this communication - global, pairwise, and remote locks - as the specification does not guarantee that these operations have taken place until a synchronisation point.

These types of call can often be useful for algorithms in which synchronisation would be inconvenient (e.g. distributed matrix multiplication), or where it is desirable for tasks to be able to balance their load while other processors are operating on data.

Collective extensions (MPI-2)

This section needs to be developed.

Dynamic process management (MPI-2)

The key aspect of this MPI-2 feature is "the ability of an MPI process to participate in the creation of new MPI processes or to establish communication with MPI processes that have been started separately." The MPI-2 specification describes three main interfaces by which MPI processes can dynamically establish communications, MPI_Comm_spawn, MPI_Comm_accept/MPI_Comm_connect and MPI_Comm_join. The MPI_Comm_spawn interface allows an MPI process to spawn a number of instances of the named MPI process. The newly spawned set of MPI processes form a new MPI_COMM_WORLD intracommunicator but can communicate with the parent and the intercommunicator the function returns. MPI_Comm_spawn_multiple is an alternate interface that allows the different instances spawned to be different binaries with different arguments.[4]


The Parallel I/O feature introduced with MPI-2, is sometimes shortly called MPI-IO,[5] and refers to a collection of functions designed to allow the difficulties of managing I/O on distributed systems to be abstracted away to the MPI library, as well as allowing files to be easily accessed in a patterned fashion using the existing derived datatype functionality.

The little research that has been done on this feature indicates the difficulty for good performance. For example, some implementations of sparse matrix-vector multiplications using the MPI I/O library disastrously fail in efficient parallelization.


'Classical' cluster and supercomputer implementations

The implementation language for MPI is different in general from the language or languages it seeks to support at runtime. Most MPI implementations are done in a combination of C, C++ and assembly language, and target C, C++, and Fortran programmers. However, the implementation language and the end-user language are in principle always decoupled.

The initial implementation of the MPI 1.x standard was MPICH, from Argonne National Laboratory and Mississippi State University. IBM also was an early implementor of the MPI standard, and most supercomputer companies of the early 1990s either commercialized MPICH, or built their own implementation of the MPI 1.x standard. LAM/MPI from Ohio Supercomputing Center was another early open implementation. Argonne National Laboratory has continued developing MPICH for over a decade, and now offers MPICH 2, which is an implementation of the MPI-2.1 standard. LAM/MPI and a number of other MPI efforts recently merged to form a combined project, Open MPI. There are many other efforts that are derivatives of MPICH, LAM, and other works, including, but not limited to, commercial implementations from HP, Intel, and Microsoft.

Besides the mainstream of MPI programming for high performance, MPI has been used widely with Python, Perl, and Java. These communities are growing. MATLAB-based MPI use appears in many forms, but no consensus on a single way of using MPI with MATLAB yet exists. The next sections detail some of these efforts.


There are at least five implementations of MPI for Python: pyMPI, mpi4py, pypar, MYMPI (Now part of Pydusa), and the MPI submodule in ScientificPython. PyMPI is notable because it is a variant python interpreter, while pypar, MYMPI, and ScientificPython's module are modules that are used by importing. They make it the coder's job to decide when and where the call to MPI_Init belongs. Recently the well known Boost C++ Libraries acquired Boost:MPI which included the Boost:MPI Python Bindings. This is of particular interest for those who want to mix C++ and Python.


The OCamlMPI Module implements a large subset of MPI functions and is in active use in scientific computing. It was reported on caml-list that an eleven thousand line OCaml program was "MPI-ified", using the module, with an additional 500 lines of code and slight restructuring and has run with excellent results on up to 170 nodes in a supercomputer.[6]


Although Java does not have an official MPI binding, there have been several attempts to bridge Java and MPI, with different degrees of success and compatibility. One of the first attempts was Bryan Carpenter's mpiJava, essentially a collection of JNI wrappers to a local C MPI library, resulting in a hybrid implementation with limited portability, which also has to be recompiled against the specific MPI library being used.

However, this original project also defined the mpiJava API (a de-facto MPI API for Java following the equivalent C++ bindings closely) which other subsequent Java MPI projects followed. An alternative although less used API is the MPJ API, designed to be more object-oriented and closer to Sun Microsystems' coding conventions. Other than the API used, Java MPI libraries can be either dependent on a local MPI library, or implement the message passing functions in Java, while some like P2P-MPI also provide Peer to peer functionality and allow mixed platform operation.

Some of the most challenging parts of any MPI implementation for Java arise from the language's own limitations and peculiarities, such as the lack of explicit pointers and linear memory address space for its objects , which make transferring multi-dimensional arrays and complex objects inefficient. The workarounds usually used involve transferring one line at a time and/or performing explicit de-serialization and casting both at the sending and receiving end, simulating C or FORTRAN-like arrays by the use of a one-dimensional array, and pointers to primitive types by the use of single-element arrays, thus resulting in programming styles quite extraneous from Java's conventions.

One major improvement is MPJ Express by Dr. Aamir Shafi [1]. This project was supervised by Bryan Carpenter and Mark Baker. On commodity platform like Fast Ethernet, advances in JVM technology now enable networking programs written in Java to rival their C counterparts. On the other hand, improvements in specialized networking hardware have continued, cutting down the communication costs to a couple of microseconds. Keeping both in mind, the key issue at present is not to debate the JNI approach versus the pure Java approach, but to provide a flexible mechanism for programs to swap communication protocols. The aim of this project is to provide a reference Java messaging system based on the MPI standard. The implementation follows a layered architecture based on an idea of device drivers. The idea is analogous to UNIX device drivers. For more info visit [2]

Common Language Infrastucture

There are two managed CLI (.NET) implementations of MPI. The first is Pure Mpi.NET. The second is MPI.NET, a research effort at Indiana University licensed under a BSD-style license. It is compatible with Mono, and can make full use of underlying low-latency MPI network fabrics.

Hardware implementations

There has been research over time into implementing MPI directly into the hardware of the system, for example by means of Processor-in-memory, where the MPI operations are actually built into the microcircuitry of the RAM chips in each node. By implication, this type of implementation would be independent of the language, OS or CPU on the system, but cannot be readily updated or unloaded.

Another approach has been to add hardware acceleration to one or more parts of the operation. This may include hardware processing of the MPI queues or the use of RDMA to directly transfer data between memory and the network interface without needing CPU or kernel intervention.

Example program

Here is a "Hello World" program in MPI written in C. In this example, we send a "hello" message to each processor, manipulate it trivially, send the results back to the main process, and print the messages out.

  "Hello World" Type MPI Test Program
 #include <mpi.h>
 #include <stdio.h>
 #include <string.h>
 #define BUFSIZE 128
 #define TAG 0
 int main(int argc, char *argv[])
   char idstr[32];
   char buff[BUFSIZE];
   int numprocs;
   int myid;
   int i;
   MPI_Status stat; 
   MPI_Init(&argc,&argv); /* all MPI programs start with MPI_Init; all 'N' processes exist thereafter */
   MPI_Comm_size(MPI_COMM_WORLD,&numprocs); /* find out how big the SPMD world is */
   MPI_Comm_rank(MPI_COMM_WORLD,&myid); /* and this processes' rank is */
   /* At this point, all the programs are running equivalently, the rank is used to
      distinguish the roles of the programs in the SPMD model, with rank 0 often used
      specially... */
   if(myid == 0)
     printf("%d: We have %d processors\n", myid, numprocs);
       sprintf(buff, "Hello %d! ", i);
       MPI_Recv(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD, &stat);
       printf("%d: %s\n", myid, buff);
     /* receive from rank 0: */
     MPI_Recv(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD, &stat);
     sprintf(idstr, "Processor %d ", myid);
     strcat(buff, idstr);
     strcat(buff, "reporting for duty\n");
     /* send to rank 0: */
   MPI_Finalize(); /* MPI Programs end with MPI Finalize; this is a weak synchronization point */
   return 0;

It is important to note that the runtime environment for the MPI implementation used (often called mpirun or mpiexec) spawns multiple copies of the program, with the total number of copies determining the number of process ranks in MPI_COMM_WORLD, which is an opaque descriptor for communication between the set of processes. A Single-Program-Multiple-Data (SPMD) programming model is thereby facilitated, but not required; many MPI implementations allow multiple, different, executables to be started in the same MPI job. Each process has its own rank, the total number of processes in the world, and the ability to communicate between them either with point-to-point (send/receive) communication, or by collective communication among the group. It is enough for MPI to provide an SPMD-style program with MPI_COMM_WORLD, its own rank, and the size of the world to allow for algorithms to decide what they do based on their rank. In more robust examples, I/O should be more carefully managed than in this example. MPI does not guarantee how POSIX I/O would actually work on a given system, but it commonly does work, at least from rank 0.

The notion of process and not processor is used in MPI. The copies of this program are mapped to processors by the runtime environment of MPI. In that sense, the parallel machine can map to 1 physical processor, or N where N is the total number of processors available, or something in between. For maximal potential for parallel speedup more physical processors are used. It should also be noted that this example adjusts its behavior to the size of the world N, so it also seeks to be scalable to the size given at runtime. There is no separate compilation for each size of the concurrency, although different decisions might be taken internally depending on that absolute amount of concurrency provided to the program.

Adoption of MPI-2

While the adoption of MPI-1.2 has been universal, particularly in cluster computing, the acceptance of MPI-2.1 has been more limited. Here are some of the reasons.

  1. While MPI-1.2 emphasizes message passing and a minimal, static runtime environment, full MPI-2 implementations include I/O and dynamic process management, and the size of the middleware implementation is substantially larger. Furthermore, most sites that use batch scheduling systems cannot support dynamic process management. Parallel I/O is well accepted as a key value of MPI-2.
  2. Many legacy MPI-1.2 programs were already developed by the time MPI-2 came out, and work fine. The threat of potentially lost portability by using MPI-2 functions kept people from using the enhanced standard for many years, though this is lessening in the mid 2000's, with wider support for MPI-2.
  3. Many MPI-1.2 applications use only a subset of that standard (16-25 functions). This minimalism of use contrasts with the huge availability of functionality now afforded in MPI-2.

Other inhibiting factors can be cited too, although these may amount more to perceptions and belief than fact. MPI-2 has been well supported in free and commercial implementations since at least the early 2000s, with some implementations coming earlier than that.

The future of MPI

Some aspects of MPI's future appear solid; others less so. The MPI Forum reconvened in 2007, to clarify some MPI-2 issues and explore developments for a possible MPI-3.

Irrespective of what the MPI Forum decides for MPI-3, MPI as a legacy interface will exist at the MPI-1.2 and MPI-2.1 levels for many years to come. Like Fortran, it is ubiquitous in technical computing, and it is taught and used widely. The body of free and commercial products that require MPI, combined with new ports of the existing free and commercial implementations to new target platforms, help ensure that MPI will go on indefinitely.

Architectures are changing, with greater internal concurrency (multi-core), better fine-grain concurrency control (threading, affinity), and more levels of memory hierarchy. Multithreaded programs can take advantage of these developments more easily than single threaded applications. This has already yielded separate, complementary standards for symmetric multiprocessing, namely OpenMP. The MPI-2 standard does define how standard-conforming implementations should deal with multithreaded issues, the standard does not require that implementations be multithreaded, or even thread safe. While multithreaded capable MPI implementations do exist, the number of multithreaded, message passing applications are few. The drive to achieve multi-level concurrency all within MPI is both a challenge and an opportunity for the standard in future.

Much of the discussion within the MPI Forum centres around fault tolerance. Improved fault tolerance within MPI would have clear benefits in the context of Grid computing, a growing trend in large-scale computing.

See also


  1. ^ Gropp et al. 96, p.3
  2. ^ High-performance and scalable MPI over InfiniBand with reduced memory usage
  3. ^ Gropp et al. 1999-advanced, pp.4-5
  4. ^ Gropp et al. 1999-advanced, p.7
  5. ^ Gropp et al. 1999-advanced, pp.5-6
  6. ^


  • Using MPI series:
    • Gropp, William; Lusk, Ewing; Skjellum, Anthony. (1994) Using MPI: portable parallel programming with the message-passing interface. MIT Press In Scientific And Engineering Computation Series, Cambridge, MA, USA. 307 pp. ISBN 0-262-57104-8
    • Gropp, William; Lusk, Ewing; Skjellum, Anthony. (1999) Using MPI, 2nd Edition: portable Parallel Programming with the Message Passing Interface. MIT Press In Scientific And Engineering Computation Series, Cambridge, MA, USA. 395 pp. ISBN 978-0-262-57132-6
    • Gropp, William; R Thakur, E Lusk (1999) Using MPI-2: Advanced Features of the Message Passing Interface - MIT Press Cambridge, MA, USA ISBN 0-262-57133-1
  • MPI—The Complete Reference series:
    • Snir, Marc; Otto, Steve; Huss-Lederman, Steven; Walker, David; Dongarra, Jack (1995) MPI: The Complete Reference. MIT Press Cambridge, MA, USA. ISBN 0-262-69215-5
    • M Snir, SW Otto, S Huss-Lederman, DW Walker, J (1998) MPI—The Complete Reference: Volume 1, The MPI Core. MIT Press, Cambridge, MA. ISBN 0-262-69215-5
    • Gropp, William; Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir (1998) MPI—The Complete Reference: Volume 2, The MPI-2 Extensions. MIT Press, Cambridge, MA ISBN 9780262571234
  • Parallel Processing via MPI & OpenMP, M. Firuziaan, O. Nommensen. Linux Enterprise, 10/2002
  • Vanneschi, Marco (1999) Parallel paradigms for scientific computing In Proc. of the European School on Computational Chemistry (1999, Perugia, Italy), number 75 in Lecture Notes in Chemistry, pages 170–183. Springer, 2000.

External links

Got something to say? Make a comment.
Your name
Your email address