1 Some explanations on the MPI implementation of NPB 3.3 (NPB3.3-MPI)
2 ----------------------------------------------------------------------
4 NPB-MPI is a sample MPI implementation based on NPB2.4 and NPB3.0-SER.
5 This implementation contains all eight original benchmarks:
6 Seven in Fortran: BT, SP, LU, FT, CG, MG, and EP; one in C: IS,
7 as well as the DT benchmark, written in C, introduced in NPB3.2-MPI.
9 For changes from different versions, see the Changes.log file
10 included in the upper directory of this distribution.
12 This version has been tested, among others, on an SGI Origin3000 and
13 an SGI Altix. For problem reports and suggestions on the implementation,
16 NAS Parallel Benchmark Team
20 CAUTION *********************************
21 When running the I/O benchmark, one or more data files will be written
22 in the directory from which the executable is invoked. They are not
23 deleted at the end of the program. A new run will overwrite the old
24 file(s). If not enough space is available in the user partition, the
25 program will fail. For classes C and D the disk space required is
26 3 GB and 135 GB, respectively.
27 *****************************************
32 NPB3-MPI uses the same directory tree as NPB3-SER (and NPB2.x) does.
33 Before compilation, one needs to check the configuration file
34 'make.def' in the config directory and modify the file if necessary.
35 If it does not (yet) exist, copy 'make.def.template' or one of the
36 sample files in the NAS.samples subdirectory to 'make.def' and
37 edit the content for site- and machine-specific data. Then
39 make <benchmark-name> NPROCS=<number> CLASS=<class> \
40 [SUBTYPE=<type>] [VERSION=VEC]
42 where <benchmark-name> is "bt", "cg", "dt", "ep", "ft", "is",
44 <number> is the number of processes
45 <class> is "S", "W", "A", "B", "C", "D", or "E"
47 Classes C, D and E are not available for DT.
48 Class E is not available for IS.
50 The "VERSION=VEC" option is used for selecting the vectorized
51 versions of BT and LU.
53 Only when making the I/O benchmark:
54 <benchmark-name> is "bt"
55 <number>, <class> as above
56 <type> is "full", "simple", "fortran", or "epio"
58 Three parameters not used in the original BT benchmark are present in
59 the I/O benchmark. Two are set by default in the file BT/bt.f.
60 Changing them is optional.
61 One is set in make.def. It must be specified.
63 bt.f: collbuf_nodes: number of processes used to buffer data before
64 writing to file in the collective buffering mode
66 collbuf_size: size of buffer (in bytes) per process used in
69 make.def: -DFORTRAN_REC_SIZE: Fortran I/O record length in bytes. This
70 is a system-specific value. It is part of the
71 definition string of variable CONVERTFLAG. Syntax:
72 "CONVERTFLAG = -DFORTRAN_REC_SIZE=n", where n is
75 When <type> is "full" or "simple", the code must be linked with an
76 MPI library that contains the subset of IO routines defined in MPI 2.
79 Class D for IS (Integer Sort) requires a compiler/system that
80 supports the "long" type in C to be 64-bit. As examples, the SGI
81 MIPS compiler for the SGI Origin using the "-64" compilation flag and
82 the Intel compiler for IA64 are known to work.
85 The above procedure allows you to build one benchmark
86 at a time. To build a whole suite, you can type "make suite"
87 Make will look in file "config/suite.def" for a list of
88 executables to build. The file contains one line per specification,
89 with comments preceded by "#". Each line contains the name
90 of a benchmark, the class, and the number of processors, separated
91 by spaces or tabs. config/suite.def.template contains an example
95 The benchmarks have been designed so that they can be run
96 on a single processor without an MPI library. A few "dummy"
97 MPI routines are still required for linking. For convenience
98 such a library is supplied in the "MPI_dummy" subdirectory of
99 the distribution. It contains an mpif.h and mpi.f include files
100 which must be used as well. The dummy library is built and
101 linked automatically and paths to the include files are defined
102 by inserting the line "include ../config/make.dummy" into the
103 make.def file (see example in make.def.template). Make sure to
104 read the warnings in the README file in "MPI_dummy".The use of
105 the library is fragile and can produce unexpected errors.
108 ================================
110 The "RAND" variable in make.def
111 --------------------------------
113 Most of the NPBs use a random number generator. In two of the NPBs (FT
114 and EP) the computation of random numbers is included in the timed
115 part of the calculation, and it is important that the random number
116 generator be efficient. The default random number generator package
117 provided is called "randi8" and should be used where possible. It has
118 the following requirements:
121 1. Uses integer*8 arithmetic. Compiler must support integer*8
122 2. Uses the Fortran 90 IAND intrinsic. Compiler must support IAND.
123 3. Assumes overflow bits are discarded by the hardware. In particular,
124 that the lowest 46 bits of a*b are always correct, even if the
125 result a*b is larger than 2^64.
127 Since randi8 may not work on all machines, we supply the following
131 1. Uses integer*8 arithmetic
132 2. Uses the Fortran 90 IBITS intrinsic.
133 3. Does not make any assumptions about overflow. Should always
134 work correctly if compiler supports integer*8 and IBITS.
137 1. Uses double precision arithmetic (to simulate integer*8 operations).
138 Should work with any system with support for 64-bit floating
142 1. Similar to randdp but written to be easier to vectorize.
147 The executable is named <benchmark-name>.<class>.<nprocs>[.<suffix>],
148 where <suffix> is "fortran_io", "mpi_io_simple", "ep_io", or
150 The executable is placed in the bin subdirectory (or in the directory
151 BINDIR specified in make.def, if you've defined it). The method for
152 running the MPI program depends on your local system.
153 When any of the I/O benchmarks is run (non-empty subtype), one or
154 more output files are created, and placed in the directory from which
155 the program was started. These are not removed automatically, and
156 will be overwritten the next time an IO benchmark is run.