General run-time tuning for Open MPI 1.4 and later (Part1)


Taken from 17. How do I tell Open MPI to use processor and/or memory affinity in Open MPI v1.4.x? (How do I use the –by* and –bind-to-* options?)

During the mpirun, you can put in the parameter of the Open MPI 1.4 and above to improve performance

  1. –bind-to-none: Do not bind processes (Default)
  2. –bind-to-core: Bind each MPI process to a core
  3. –bind-to-socket: Bind each MPI process to a processor socket
  4. –report bindings: Report how the launches processes are bound by Open MPI

If the hardware has multiple hardware threads like those belonging to Hyperthreading, only the first thread of each core is used with the -bind-to-*. According to the article, it is supposed to be fixed in v1.5

The following options below is to be used with –bind-to-*

  1. –byslot: Alias for –bycore
  2. –bycore: When laying out processes, put sequential MPI processes on adjacent processor cores. (Default)
  3. –bysocket: When laying out processes, put sequential MPI processes on adjacent processor sockets.
  4. –bynode: When laying out processes, put sequential MPI processes on adjacent nodes.

Finally you can use the –cpus-per-procs which binds ncpus OS processor IDS to each MPI process. If there is a machine with 4 cores and 4 cores, hence 16 cores in total.

$ mpirun -np 8 --cupus-per-proc 2 my_mpi_process

The command will bind each MPI process to ncpus=2 cores. All cores on the machine will be used.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s