examples/smpi/NAS/README.install

   1 Some explanations on the MPI implementation of NPB 3.3 (NPB3.3-MPI)
   2 ----------------------------------------------------------------------
   3
   4 NPB-MPI is a sample MPI implementation based on NPB2.4 and NPB3.0-SER.
   5 This implementation contains all eight original benchmarks:
   6 Seven in Fortran: BT, SP, LU, FT, CG, MG, and EP; one in C: IS,
   7 as well as the DT benchmark, written in C, introduced in NPB3.2-MPI.
   8
   9 For changes from different versions, see the Changes.log file
  10 included in the upper directory of this distribution.
  11
  12 This version has been tested, among others, on an SGI Origin3000 and
  13 an SGI Altix.  For problem reports and suggestions on the implementation,
  14 please contact
  15
  16    NAS Parallel Benchmark Team
  17    npb@nas.nasa.gov
  18
  19
  20 CAUTION *********************************
  21 When running the I/O benchmark, one or more data files will be written
  22 in the directory from which the executable is invoked. They are not
  23 deleted at the end of the program. A new run will overwrite the old
  24 file(s). If not enough space is available in the user partition, the
  25 program will fail. For classes C and D the disk space required is
  26 3 GB and 135 GB, respectively.
  27 *****************************************
  28
  29
  30 1. Compilation
  31
  32    NPB3-MPI uses the same directory tree as NPB3-SER (and NPB2.x) does.
  33    Before compilation, one needs to check the configuration file
  34    'make.def' in the config directory and modify the file if necessary.
  35    If it does not (yet) exist, copy 'make.def.template' or one of the
  36    sample files in the NAS.samples subdirectory to 'make.def' and
  37    edit the content for site- and machine-specific data.  Then
  38
  39        make <benchmark-name> NPROCS=<number> CLASS=<class> \
  40          [SUBTYPE=<type>] [VERSION=VEC]
  41
  42    where <benchmark-name>  is "bt", "cg", "dt", "ep", "ft", "is",
  43                               "lu", "mg", or "sp"
  44          <number>          is the number of processes
  45          <class>           is "S", "W", "A", "B", "C", "D", or "E"
  46
  47    Classes C, D and E are not available for DT.
  48    Class E is not available for IS.
  49
  50    The "VERSION=VEC" option is used for selecting the vectorized
  51    versions of BT and LU.
  52
  53    Only when making the I/O benchmark:
  54          <benchmark-name>  is "bt"
  55          <number>, <class> as above
  56          <type>            is "full", "simple", "fortran", or "epio"
  57
  58    Three parameters not used in the original BT benchmark are present in
  59    the I/O benchmark. Two are set by default in the file BT/bt.f.
  60    Changing them is optional.
  61    One is set in make.def. It must be specified.
  62
  63    bt.f: collbuf_nodes: number of processes used to buffer data before
  64                         writing to file in the collective buffering mode
  65                         (<type> is "full").
  66          collbuf_size:  size of buffer (in bytes) per process used in
  67                         collective buffering
  68
  69    make.def: -DFORTRAN_REC_SIZE: Fortran I/O record length in bytes. This
  70                         is a system-specific value. It is part of the
  71                         definition string of variable CONVERTFLAG. Syntax:
  72                         "CONVERTFLAG = -DFORTRAN_REC_SIZE=n", where n is
  73                         the record length.
  74
  75    When <type> is "full" or "simple", the code must be linked with an
  76    MPI library that contains the subset of IO routines defined in MPI 2.
  77
  78
  79    Class D for IS (Integer Sort) requires a compiler/system that
  80    supports the "long" type in C to be 64-bit.  As examples, the SGI
  81    MIPS compiler for the SGI Origin using the "-64" compilation flag and
  82    the Intel compiler for IA64 are known to work.
  83
  84
  85    The above procedure allows you to build one benchmark
  86    at a time. To build a whole suite, you can type "make suite"
  87    Make will look in file "config/suite.def" for a list of
  88    executables to build. The file contains one line per specification,
  89    with comments preceded by "#". Each line contains the name
  90    of a benchmark, the class, and the number of processors, separated
  91    by spaces or tabs. config/suite.def.template contains an example
  92    of such a file.
  93
  94
  95    The benchmarks have been designed so that they can be run
  96    on a single processor without an MPI library. A few "dummy"
  97    MPI routines are still required for linking. For convenience
  98    such a library is supplied in the "MPI_dummy" subdirectory of
  99    the distribution. It contains an mpif.h and mpi.f include files
 100    which must be used as well. The dummy library is built and
 101    linked automatically and paths to the include files are defined
 102    by inserting the line "include ../config/make.dummy" into the
 103    make.def file (see example in make.def.template). Make sure to
 104    read the warnings in the README file in "MPI_dummy".The use of
 105    the library is fragile and can produce unexpected errors.
 106
 107
 108    ================================
 109
 110    The "RAND" variable in make.def
 111    --------------------------------
 112
 113    Most of the NPBs use a random number generator. In two of the NPBs (FT
 114    and EP) the computation of random numbers is included in the timed
 115    part of the calculation, and it is important that the random number
 116    generator be efficient.  The default random number generator package
 117    provided is called "randi8" and should be used where possible. It has
 118    the following requirements:
 119
 120    randi8:
 121      1. Uses integer*8 arithmetic. Compiler must support integer*8
 122      2. Uses the Fortran 90 IAND intrinsic. Compiler must support IAND.
 123      3. Assumes overflow bits are discarded by the hardware. In particular,
 124         that the lowest 46 bits of a*b are always correct, even if the
 125         result a*b is larger than 2^64.
 126
 127    Since randi8 may not work on all machines, we supply the following
 128    alternatives:
 129
 130    randi8_safe
 131      1. Uses integer*8 arithmetic
 132      2. Uses the Fortran 90 IBITS intrinsic.
 133      3. Does not make any assumptions about overflow. Should always
 134         work correctly if compiler supports integer*8 and IBITS.
 135
 136    randdp
 137      1. Uses double precision arithmetic (to simulate integer*8 operations).
 138         Should work with any system with support for 64-bit floating
 139         point arithmetic.
 140
 141    randdpvec
 142      1. Similar to randdp but written to be easier to vectorize.
 143
 144
 145 2. Execution
 146
 147    The executable is named <benchmark-name>.<class>.<nprocs>[.<suffix>],
 148    where <suffix> is "fortran_io", "mpi_io_simple",  "ep_io", or
 149                      "mpi_io_full"
 150    The executable is placed in the bin subdirectory (or in the directory
 151    BINDIR specified in make.def, if you've defined it). The method for
 152    running the MPI program depends on your local system.
 153    When any of the I/O benchmarks is run (non-empty subtype), one or
 154    more output files are created, and placed in the directory from which
 155    the program was started. These are not removed automatically, and
 156    will be overwritten the next time an IO benchmark is run.