-Some explanations on the MPI implementation of NPB 3.3 (NPB3.3-MPI)
+Some explanations on the MPI implementation of a subset of NPB 3.3 (NPB3.3-MPI)
----------------------------------------------------------------------
NPB-MPI is a sample MPI implementation based on NPB2.4 and NPB3.0-SER.
-This implementation contains all eight original benchmarks:
-Seven in Fortran: BT, SP, LU, FT, CG, MG, and EP; one in C: IS,
-as well as the DT benchmark, written in C, introduced in NPB3.2-MPI.
-
-For changes from different versions, see the Changes.log file
-included in the upper directory of this distribution.
-
-This version has been tested, among others, on an SGI Origin3000 and
-an SGI Altix. For problem reports and suggestions on the implementation,
-please contact
-
- NAS Parallel Benchmark Team
- npb@nas.nasa.gov
-
-
-CAUTION *********************************
-When running the I/O benchmark, one or more data files will be written
-in the directory from which the executable is invoked. They are not
-deleted at the end of the program. A new run will overwrite the old
-file(s). If not enough space is available in the user partition, the
-program will fail. For classes C and D the disk space required is
-3 GB and 135 GB, respectively.
-*****************************************
-
+This subset contains three of the original benchmarks: one in Fortran: EP;
+one in C: IS, as well as the DT benchmark, written in C, introduced in NPB3.2-MPI.
1. Compilation
- NPB3-MPI uses the same directory tree as NPB3-SER (and NPB2.x) does.
Before compilation, one needs to check the configuration file
- 'make.def' in the config directory and modify the file if necessary.
- If it does not (yet) exist, copy 'make.def.template' or one of the
- sample files in the NAS.samples subdirectory to 'make.def' and
- edit the content for site- and machine-specific data. Then
+ 'make.def' in the config directory and modify the file if necessary.
make <benchmark-name> NPROCS=<number> CLASS=<class> \
[SUBTYPE=<type>] [VERSION=VEC]
- where <benchmark-name> is "bt", "cg", "dt", "ep", "ft", "is",
- "lu", "mg", or "sp"
+ where <benchmark-name> is "dt", "ep", or "is",
<number> is the number of processes
<class> is "S", "W", "A", "B", "C", "D", or "E"
- Classes C, D and E are not available for DT.
- Class E is not available for IS.
-
- The "VERSION=VEC" option is used for selecting the vectorized
- versions of BT and LU.
-
- Only when making the I/O benchmark:
- <benchmark-name> is "bt"
- <number>, <class> as above
- <type> is "full", "simple", "fortran", or "epio"
-
- Three parameters not used in the original BT benchmark are present in
- the I/O benchmark. Two are set by default in the file BT/bt.f.
- Changing them is optional.
- One is set in make.def. It must be specified.
-
- bt.f: collbuf_nodes: number of processes used to buffer data before
- writing to file in the collective buffering mode
- (<type> is "full").
- collbuf_size: size of buffer (in bytes) per process used in
- collective buffering
-
- make.def: -DFORTRAN_REC_SIZE: Fortran I/O record length in bytes. This
- is a system-specific value. It is part of the
- definition string of variable CONVERTFLAG. Syntax:
- "CONVERTFLAG = -DFORTRAN_REC_SIZE=n", where n is
- the record length.
-
- When <type> is "full" or "simple", the code must be linked with an
- MPI library that contains the subset of IO routines defined in MPI 2.
-
-
- Class D for IS (Integer Sort) requires a compiler/system that
- supports the "long" type in C to be 64-bit. As examples, the SGI
- MIPS compiler for the SGI Origin using the "-64" compilation flag and
- the Intel compiler for IA64 are known to work.
-
-
- The above procedure allows you to build one benchmark
- at a time. To build a whole suite, you can type "make suite"
- Make will look in file "config/suite.def" for a list of
- executables to build. The file contains one line per specification,
- with comments preceded by "#". Each line contains the name
- of a benchmark, the class, and the number of processors, separated
- by spaces or tabs. config/suite.def.template contains an example
- of such a file.
-
-
- The benchmarks have been designed so that they can be run
- on a single processor without an MPI library. A few "dummy"
- MPI routines are still required for linking. For convenience
- such a library is supplied in the "MPI_dummy" subdirectory of
- the distribution. It contains an mpif.h and mpi.f include files
- which must be used as well. The dummy library is built and
- linked automatically and paths to the include files are defined
- by inserting the line "include ../config/make.dummy" into the
- make.def file (see example in make.def.template). Make sure to
- read the warnings in the README file in "MPI_dummy".The use of
- the library is fragile and can produce unexpected errors.
+ Class E is not available for IS and DT.
+ Class D for IS (Integer Sort) requires a compiler/system that
+ supports the "long" type in C to be 64-bit.
- ================================
-
- The "RAND" variable in make.def
- --------------------------------
-
- Most of the NPBs use a random number generator. In two of the NPBs (FT
- and EP) the computation of random numbers is included in the timed
- part of the calculation, and it is important that the random number
- generator be efficient. The default random number generator package
- provided is called "randi8" and should be used where possible. It has
- the following requirements:
-
- randi8:
- 1. Uses integer*8 arithmetic. Compiler must support integer*8
- 2. Uses the Fortran 90 IAND intrinsic. Compiler must support IAND.
- 3. Assumes overflow bits are discarded by the hardware. In particular,
- that the lowest 46 bits of a*b are always correct, even if the
- result a*b is larger than 2^64.
-
- Since randi8 may not work on all machines, we supply the following
- alternatives:
-
- randi8_safe
- 1. Uses integer*8 arithmetic
- 2. Uses the Fortran 90 IBITS intrinsic.
- 3. Does not make any assumptions about overflow. Should always
- work correctly if compiler supports integer*8 and IBITS.
-
- randdp
- 1. Uses double precision arithmetic (to simulate integer*8 operations).
- Should work with any system with support for 64-bit floating
- point arithmetic.
-
- randdpvec
- 1. Similar to randdp but written to be easier to vectorize.
-
-
2. Execution
- The executable is named <benchmark-name>.<class>.<nprocs>[.<suffix>],
- where <suffix> is "fortran_io", "mpi_io_simple", "ep_io", or
- "mpi_io_full"
- The executable is placed in the bin subdirectory (or in the directory
- BINDIR specified in make.def, if you've defined it). The method for
+ The executable is named <benchmark-name>.<class>.<nprocs>
+ The executable is placed in the bin subdirectory (or in the directory
+ BINDIR specified in make.def, if you've defined it). The method for
running the MPI program depends on your local system.
- When any of the I/O benchmarks is run (non-empty subtype), one or
- more output files are created, and placed in the directory from which
- the program was started. These are not removed automatically, and
- will be overwritten the next time an IO benchmark is run.