examples/smpi/NAS/MG/README

   1 Some info about the MG benchmark
   2 ================================
   3
   4 'mg_demo' demonstrates the capabilities of a very simple multigrid
   5 solver in computing a three dimensional potential field.  This is
   6 a simplified multigrid solver in two important respects:
   7
   8   (1) it solves only a constant coefficient equation,
   9   and that only on a uniform cubical grid,
  10
  11   (2) it solves only a single equation, representing
  12   a scalar field rather than a vector field.
  13
  14 We chose it for its portability and simplicity, and expect that a
  15 supercomputer which can run it effectively will also be able to
  16 run more complex multigrid programs at least as well.
  17
  18      Eric Barszcz                         Paul Frederickson
  19      RIACS
  20      NASA Ames Research Center            NASA Ames Research Center
  21
  22 ========================================================================
  23 Running the program:  (Note: also see parameter lm information in the
  24                        two sections immediately below this section)
  25
  26 The program may be run with or without an input deck (called "mg.input").
  27 The following describes a few things about the input deck if you want to
  28 use one.
  29
  30 The four lines below are the "mg.input" file required to run a
  31 problem of total size 256x256x256, for 4 iterations (Class "A"),
  32 and presumes the use of 8 processors:
  33
  34    8 = top level
  35    256 256 256 = nx ny nz
  36    4 = nit
  37    0 0 0 0 0 0 0 0 = debug_vec
  38
  39 The first line of input indicates how many levels of multi-grid
  40 cycle will be applied to a particular subpartition.  Presuming that
  41 8 processors are solving this problem (recall that the number of
  42 processors is specified to MPI as a run parameter, and MPI subsequently
  43 determines this for the code via an MPI subroutine call), a 2x2x2
  44 processor grid is  formed, and thus each partition on a processor is
  45 of size 128x128x128.  Therefore, a maximum of 8 multi-grid levels may
  46 be used.  These are of size 128,64,32,16,8,4,2,1, with the coarsest
  47 level being a single point on a given processor.
  48
  49
  50 Next, consider the same size problem but running on 1 processor.  The
  51 following "mg.input" file is appropriate:
  52
  53     9 = top level
  54     256 256 256 = nx ny nz
  55     4 = nit
  56     0 0 0 0 0 0 0 0 = debug_vec
  57
  58 Since this processor must solve the full 256x256x256 problem, this
  59 permits 9 multi-grid levels (256,128,64,32,16,8,4,2,1), resulting in
  60 a coarsest multi-grid level of a single point on the processor
  61
  62
  63 Next, consider the same size problem but running on 2 processors.  The
  64 following "mg.input" file is required:
  65
  66     8 = top level
  67     256 256 256 = nx ny nz
  68     4 = nit
  69     0 0 0 0 0 0 0 0 = debug_vec
  70
  71 The algorithm for partitioning the full grid onto some power of 2 number
  72 of processors is to start by splitting the last dimension of the grid
  73 (z dimension) in 2: the problem is now partitioned onto 2 processors.
  74 Next the middle dimension (y dimension) is split in 2: the problem is now
  75 partitioned onto 4 processors.  Next, first dimension (x dimension) is
  76 split in 2: the problem is now partitioned onto 8 processors.  Next, the
  77 last dimension (z dimension) is split again in 2: the problem is now
  78 partitioned onto 16 processors.  This partitioning is repeated until all
  79 of the power of 2 processors have been allocated.
  80
  81 Thus to run the above problem on 2 processors, the grid partitioning
  82 algorithm will allocate the two processors across the last dimension,
  83 creating two partitions each of size 256x256x128. The coarsest level of
  84 multi-grid must be a single point surrounded by a cubic number of grid
  85 points.  Therefore, each of the two processor partitions will contain 4
  86 coarsest multi-grid level points, each surrounded by a cube of grid points
  87 of size 128x128x128, indicated by a top level of 8.
  88
  89
  90 Next, consider the same size problem but running on 4 processors.  The
  91 following "mg.input" file is required:
  92
  93     8 = top level
  94     256 256 256 = nx ny nz
  95     4 = nit
  96     0 0 0 0 0 0 0 0 = debug_vec
  97
  98 The partitioning algorithm will create 4 partitions, each of size
  99 256x128x128.  Each partition will contain 2 coarsest multi-grid level
 100 points each surrounded by a cube of grid points of size 128x128x128,
 101 indicated by a top level of 8.
 102
 103
 104 Next, consider the same size problem but running on 16 processors.  The
 105 following "mg.input" file is required:
 106
 107     7 = top level
 108     256 256 256 = nx ny nz
 109     4 = nit
 110     0 0 0 0 0 0 0 0 = debug_vec
 111
 112 On each node a partition of size 128x128x64 will be created.  A maximum
 113 of 7 multi-grid levels (64,32,16,8,4,2,1) may be used, resulting in each
 114 partions containing 4 coarsest multi-grid level points, each surrounded
 115 by a cube of grid points of size 64x64x64, indicated by a top level of 7.
 116
 117
 118
 119
 120 Note that non-cubic problem sizes may also be considered:
 121
 122 The four lines below are the "mg.input" file appropriate for running a
 123 problem of total size 256x512x512, for 20 iterations and presumes the
 124 use of 32 processors (note: this is NOT a class C problem):
 125
 126     8 = top level
 127     256 512 512 = nx ny nz
 128     20 = nit
 129     0 0 0 0 0 0 0 0 = debug_vec
 130
 131 The first line of input indicates how many levels of multi-grid
 132 cycle will be applied to a particular subpartition.  Presuming that
 133 32 processors are solving this problem, a 2x4x4 processor grid is
 134 formed, and thus each partition on a processor is of size 128x128x128.
 135 Therefore, a maximum of 8 multi-grid levels may be used.  These are of
 136 size 128,64,32,16,8,4,2,1, with the coarsest level being a single
 137 point on a given processor.
 138