mpirun
Run mpi programs
Description
"mpirun" is a shell script that attempts to hide the differences in
starting jobs for various devices from the user. Mpirun attempts to
determine what kind of machine it is running on and start the required
number of jobs on that machine. On workstation clusters, if you are
not using Chameleon, you must supply a file that lists the different
machines that mpirun can use to run remote jobs or specify this file
every time you run mpirun with the -machine file option. The default
file is in util/machines/machines.<arch>.
mpirun typically works like this
mpirun -np <number of processes> <program name and arguments>
If mpirun cannot determine what kind of machine you are on, and it
is supported by the mpi implementation, you can the -machine
and -arch options to tell it what kind of machine you are running
on. The current valid values for machine are
chameleon (including chameleon/pvm, chameleon/p4, etc...)
meiko (the meiko device on the meiko)
paragon (the ch_nx device on a paragon not running NQS)
p4 (the ch_p4 device on a workstation cluster)
ibmspx (ch_eui for IBM SP2)
anlspx (ch_eui for ANLs SPx)
ksr (ch_p4 for KSR 1 and 2)
sgi_mp (ch_shmem for SGI multiprocessors)
cray_t3d (t3d for Cray T3D)
smp (ch_shmem for SMPs)
execer (a custom script for starting ch_p4 programs
without using a procgroup file. This script
currently does not work well with interactive
jobs)
You should only have to specify mr_arch if mpirun does not recognize
your machine, the default value is wrong, or you are using the p4 or
execer devices. The full list of options is
Parameters
The options for mpirun must come before the program you want to run
mpirun [mpirun_options...] <progname> [options...]
- -arch <architecture>
- specify the architecture (must have matching machines.<arch>
file in ${MPIR_HOME}/util/machines) if using the execer
- -h This help
- . -machine <machine name>
use startup procedure for <machine name>
- -machinefile <machine
- file name>
Take the list of possible machines to run on from the
file <machine-file name>
- -np <np>
- specify the number of processors to run on
- -nolocal
- do not run on the local machine (only works for
p4 and ch_p4 jobs)
- -stdin filename
- Use filename as the standard input for the program. This
is needed for programs that must be run as batch jobs, such
as some IBM SP systems and Intel Paragons using NQS (see
-paragontype below).
- -t Testing
- do not actually run, just print what would be
executed
- -v Verbose
- throw in some comments
- -dbx Start the first process under dbx where possible
- . -gdb Start the first process under gdb where possible
(on the Meiko, selecting either -dbx or -gdb starts prun
under totalview instead)
- -xxgdb Start the first process under xxgdb where possible (
- xdbx
does not work)
- -tv Start under totalview
-
Special Options for NEC - CENJU-3:
- -batch Excecute program as a batch job (using cjbr)
-
- -stdout filename
- Use filename as the standard output for the program.
- -stderr filename
- Use filename as the standard error for the program.
Special Options for Nexus device:
- -nexuspg filename
- Use the given Nexus startup file instead of creating one.
Overrides -np and -nolocal, selects -leave_pg.
- -nexusdb filename
- Use the given Nexus resource database.
Special Options for Workstation Clusters
- -e Use execer to start the program on workstation
- clusters
- -pg Use a procgroup file to start the p4 programs, not execer
- (default)
- -leave_pg
- Do not delete the P4 procgroup file after running
- -p4pg filename
- Use the given p4 procgroup file instead of creating one.
Overrides -np and -nolocal, selects -leave_pg.
- -tcppg filename
- Use the given tcp procgroup file instead of creating one.
Overrides -np and -nolocal, selects -leave_pg.
- -p4ssport num
- Use the p4 secure server with port number num to start the
programs. If num is 0, use the value of the
environment variable MPI_P4SSPORT. Using the server can
speed up process startup. If MPI_USEP4SSPORT as well as
MPI_P4SSPORT are set, then that has the effect of giving
mpirun the -p4ssport 0 parameters.
Special Options for Batch Environments
- -mvhome Move the executable to the home directory. This
- is needed when all file systems are not cross-mounted
Currently only used by anlspx
- -mvback files
- Move the indicated files back to the current directory.
Needed only when using -mvhome; has no effect otherwise.
- -maxtime min
- Maximum job run time in minutes. Currently used only
by anlspx. Default value is 15 minutes
- -nopoll Do not use a polling
- mode communication.
Available only on IBM SPx.
- -mem value
- This is the per node memory request (in Mbytes). Needed for some
CM-5s.
- -cpu time
- This is the the hard cpu limit used for some CM-5s in
minutes.
Special Options for IBM SP2
- -cac name
- CAC for ANL scheduler. Currently used only by anlspx.
If not provided will choose some valid CAC.
Special Options for Intel Paragon
- -paragontype name
- Selects one of default, mkpart, NQS, depending on how you want
to submit jobs to a Paragon.
- -paragonname name
- Remote shells to name to run the job (using the -sz method) on
a Paragon.
- -paragonpn name
- Name of partition to run on in a Paragon (using the -pn name
command-line argument)
On exit, mpirun returns a status of zero unless mpirun detected a problem, in
which case it returns a non-zero status (currently, all are one, but this
may change in the future).
Specifying Heterogeneous Systems
Multiple architectures may be handled by giving multiple -arch and -np
arguments. For example, to run a program on 2 sun4s and 3 rs6000s, with
the local machine being a sun4, use
mpirun -arch sun4 -np 2 -arch rs6000 -np 3 program
This assumes that program will run on both architectures. If different
executables are needed (as in this case), the string %a will be replaced
with the arch name. For example, if the programs are program.sun4 and
program.rs6000, then the command is
mpirun -arch sun4 -np 2 -arch rs6000 -np 3 program.%a
If instead the execuables are in different directories; for example,
/tmp/me/sun4 and /tmp/me/rs6000, then the command is
mpirun -arch sun4 -np 2 -arch rs6000 -np 3 /tmp/me/%a/program
It is important to specify the architecture with -arch before specifying
the number of processors. Also, the first -arch command must refer to the
processor on which the job will be started. Specifically, if -nolocal is
not specified, then the first -arch must refer to the processor from which
mpirun is running.
(You must have machines.<arch> files for each arch that you use in the
util/machines directory.)
Another approach that may be used the the ch_p4 device is to create a
procgroup file directly. See the MPICH Users Guide for more information.
Location:/home/MPI/mansrc/commands