MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/README - llvm-test-suite - Git at Google

 **************************************************************************
 LLVM Test-suite Note:
 **************************************************************************
 The original source is located at https://github.com/Mantevo/miniAMR.
 Beyond this paragraph is the original README contained with the source
 code.  The Makefile refered to within is not utilized within the
 test-suite.  The test-suite builds a serial version (openmp and
 mpi disabled) with its own cmake and make build system.
 **************************************************************************

 miniAMR mini-application

 --------------------------------------
 Contents of this README file:
 1.      miniAMR overview
 2.      miniAMR versions
 3.      building miniAMR
 4.      running miniAMR
 5.      notes about the code
 --------------------------------------

 --------------------------------------
 1. miniAMR overview

      miniAMR applies a stencil calculation on a unit cube computational domain,
 which is divided into blocks. The blocks all have the same number of cells
 in each direction and communicate ghost values with neighboring blocks. With
 adaptive mesh refinement, the blocks can represent different levels of
 refinement in the larger mesh. Neighboring blocks can be at the same level
 or one level different, which means that the length of cells in neighboring
 blocks can differ by only a factor of two in each direction. The calculations
 on the variables in each cell is an averaging of the values in the chosen
 stencil. The refinement and coarsening of the blocks is driven by objects
 that are pushed through the mesh. If a block intersects with the surface
 or the volume of an object, then that block can be refined. There is also
 an option to uniformly refine the mesh. Each cell contains a number of
 variables, each of which is evaluated indepently.

 --------------------------------------
 2. miniAMR versions:

 - miniAMR_ref:

      reference version: self-contained MPI-parallel.

 - miniAMR_serial

      serial version of reference version

 -------------------
 3. Building miniAMR:

      To make the code, type 'make' in the directory containing the source.
 The enclosed Makefile.mpi is configured for a general MPI installation.
 Other compiler or other machines will need changes in the CFLAGS
 variable to correspond with the flags available for the compiler being used.

 -------------------
 4. Running miniAMR:

  miniAMR can be run like this:

    % <mpi-run-command> ./miniAMR.x

  where <mpi-run-command> varies from system to system but usually looks  something like 'mpirun -np 4 ' or similar.

  Execution is then driven entirely by the default settings, as configured in default-settings.h. Options may be listed using

     % ./miniAMR.x --help

      To run the program, there are several arguments on the command line.
 The list of arguments and their defaults is as follows:

    --nx - block size in x
    --ny - block size in y
    --nz - block size in z
       These control the size of the blocks in the mesh.  All of these need to
       be even and greater than zero.  The default is 10 for each variable.

    --init_x - initial blocks in x
    --init_y - initial blocks in y
    --init_z - initial blocks in z
       These control the number of the blocks on each processor in the
       initial mesh.  These need to be greater than zero.  The default
       is 1 block in each direction per processor.  The initial mesh
       is a unit cube regardless of the number of blocks.

    --reorder - ordering of blocks
       This controls whether the blocks are ordered by the RCB algorithm
       or by a natural ordering of the processors.  The default is 1 which
       selects the RCB ordering and the natural ordering is 0.

    --npx - number of processors in the x direction
    --npy - number of processors in the y direction
    --npz - number of processors in the z direction
       These control the number of processors is each direction.  The product
       of these number has to equal the number of processors being used.  The
       default is 1 block in each direction.

    --max_blocks - maximun number of blocks per processor
       The maximun number of blocks used per processor.  This is the number of
       blocks that will be allocated at the start of the run and the code will
       fail if this number is exceeded.  The default is 500 blocks.

    --num_refine - number of levels of refinement
       This is the number of levels of refinement that blocks which are refined
       will be refined to.  If it is zero then the mesh will not be refined.
       the default is 5 levels of refinement.

    --block_change - number of levels a block can change during refinement
       This parameter controls the number of levels that a block can change
       (either refining or coarsening) during a refinement step.  The default
       is the number of levels of refinement.

    --uniform_refine - if 1, then grid is uniformly refined
       This controls whether the mesh is uniformly refined.  If it is 1 then the
       mesh will be uniformly refined, while if it is zero, the refinement will
       be controlled by objects in the mesh.  The default is 1.

    --refine_freq - frequency (in timesteps) of checking for refinement
       This determines the frequency (in timesteps) between checking if
       refinement is needed.  The default is every 5 timesteps.

    --target_active - target number of blocks per processor
    --target_max - max number of blocks per processor
    --target_min - min number of blocks per processor
       These allow the user to control the number of blocks per processor.
       If these are zero, then no adjustment is made.  If target_active is
       greater than zero than the code will adjust the number of blocks to
       that target after the refinement step.  If target_max is greater than
       zero then the number of blocks will be reduced if it exceeds this
       number.  Likewise, if target_min is greater than zero, than the number
       of blocks will be raised if there is less than that number after the
       refinement step.  The default for all of these is zero.

    --inbalance - percentage inbalance to trigger inbalance
       This parameter allows the user to set a percentage threshold above
       which the load will be balanced amoung the processors.  The value
       that this is checked against is the maximum number of blocks on a
       processor minus the minimum number of blocks on a processor divided
       by the average.  The default is zero, which means to always load
       balance at each refinement step.

    --lb_opt - (0, 1, 2) determine load balance strategy
       If set to 0, then load balancing is not performed.  The default is
       set to 1 which load balances each refinement step.  Setting the
       parameter to 2 results in load balancing at each stage of the
       refinement step.  If a processor has a large number of blocks which
       are refined several steps, this allows the work (and space needed)
       to be shared amoung more processors.

    --num_vars - number of variables (> 0)
       The number of variables the will be calculated on and communicated.
       The default is 40 variables.

    --comm_vars - number of vars to communicate together
       The number of variables that will communicated together.  This will
       allow shorter but more variables if it is set to something less than
       the total number of variables.  The default is zero which will
       communicate all of the variables at once.

    --num_tsteps - number of timesteps (> 0)
       The number of timesteps for which the simulation will be run.  The
       default is 20.

    --stages_per_ts - number of comm/calc stages per timestep
       The number of calculate/communicate stages per timestep.  The default
       is 20.

    --permute - (no argument) permute communication directions
       If this is set, then the order of the communication directions will
       be permuted through the six options available.  The default is
       to send messages in the x direction first, then y, and then z.

    --blocking_send - (no argument) Use blocking sends in the communication
       routine instead of the default nonblocking sends.

    --code - change the way communication is done
       The default is 0 which communicates only the ghost values that are
       needed.  Setting this to 1 sends all of the ghost values, and setting
       this to 2 also does all of the message processing (refinement or
       unrefinement) to be done on the sending side.  This allows us to
       more closely minic the communication behaviour of codes.

    --checksum_freq - number of stages between checksums
       The number of stages between calculating checksums on the variables.
       The default is 5.  If it is zero, no checks are performed.

    --stencil - 7 or 27 point 3D stencil
       The 3D stencil used for the calculations.  It can be either 7 or 27
       and the default is 7 since the 27 point calculation will not conserve
       the sum of the variables except for the case of uniform refinement.

    --error_tol - (e^{-error_tol} ; >= 0)
       This determines the error tolerance for the checksums for the variables.
       the tolerance is 10 to the negative power of error_tol.  The default
       is 8, so the default tolerance is 10^(-8).

    --report_diffusion - (>= 0) none if 0
       This determines if the checksums are printed when they are calculated.
       The default is 0, which is no printing.

    --report_perf - (0 .. 15)
       This determines how the performance output is displayed.  The default
       is YAML output (value of 1).  There are four output modes and each is
       controlled by a bit in the value.  The YAML output (to a file called
       results.yaml) is controlled by the first bit (report_perf & 1), the
       text output file (results.txt) is controlled by the second bit
       (report_perf & 2), the output to standard out is controlled by the
       third bit (report_perf & 4), and the output of block decomposition
       at each refine step is controlled by the forth bit (report_perf & 8).
       These options can be combined in any way desired and zero to four
       of these options can be used in any run.  Setting report_perf to 0
       will result in no output.

    --refine_freq - frequency (timesteps) of refinement (0 for none)
       This determines how frequently (in timesteps) the mesh is checked
       and refinement is done.  The default is every 5 timesteps.  If
       uniform refinement is turned on, the setting of refine_freq does
       not matter and the mesh will be refined before the first timestep.

    --refine_ghosts - (no argument)
       The default is to not use the ghost cells of a block to determine if
       that block will be refined.  Specifying this flag will allow those
       ghost cells to be used.

    --num_objects - (>= 0) number of objects to cause refinement
       The number of objects on which refinement is based.  Default is zero.

    --object - type, position, movement, size, size rate of change
       The object keyword has 14 arguments.  The first two are integers
       and the rest are floating point numbers.  They are:
       type - The type of object.  There is 16 types of objects.  They include
              the surface of a rectangle (0), a solid rectangle (1),
              the surface of a spheroid (2), a solid spheroid (3),
              the surface of a hemispheroid (+/- with 3 cutting planes)
              (4, 6, 8, 10, 12, 14),
              a solid spheroid (+/- with 3 cutting planes)(5, 7, 9, 11, 13, 15),
              the surface of a cylinder (20, 22, 24),
              and the volume of a cylinder (21, 23, 25).
       bounce - If this is 1 then an object will bounce off of the walls
                when the center hits an edge of the unit cube.  If it is
                zero, then the object can leave the mesh.
       center - Three doubles that determine the center of the object in the
                x, y, and z directions.
       move - Three doubles that determine the rate of movement of the center
              of the object in the x, y, and z directions.  The object moves
              this far at each timestep.
       size - The initial size of the object in the x, y, and z directions.
              If any of these become negative, the object will not be used
              in the calculations to determine refinement.  These sizes are
              from the center to the edge in the specified direction.
       inc - The change in size of the object in the x, y, and z directions.


 Examples of run scripts for a Cray XE6 that illustrate several of the options:

 One sphere moving diagonally on 27 processors:

 mpirun -np 27 -N 7 miniAMR.x --num_refine 4 --max_blocks 9000 --npx 3 --npy 3 --npz 3 --nx 8 --ny 8 --nz 8 --num_objects 1 --object 2 0 -1.71 -1.71 -1.71 0.04 0.04 0.04 1.7 1.7 1.7 0.0 0.0 0.0 --num_tsteps 100 --checksum_freq 1

 An expanding sphere on 64 processors:

 mpirun -np 64 miniAMR.x --num_refine 4 --max_blocks 6000 --init_x 1 --init_y 1 --init_z 1 --npx 4 --npy 4 --npz 4 --nx 8 --ny 8 --nz 8 --num_objects 1 --object 2 0 -0.01 -0.01 -0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0009 0.0009 0.0009 --num_tsteps 200 --comm_vars 2

 Two moving spheres on 16 processors:

 mpirun -np 16 miniAMR.x --num_refine 4 --max_blocks 4000 --init_x 1 --init_y 1 --init_z 1 --npx 4 --npy 2 --npz 2 --nx 8 --ny 8 --nz 8 --num_objects 2 --object 2 0 -1.10 -1.10 -1.10 0.030 0.030 0.030 1.5 1.5 1.5 0.0 0.0 0.0 --object 2 0 0.5 0.5 1.76 0.0 0.0 -0.025 0.75 0.75 0.75 0.0 0.0 0.0 --num_tsteps 100 --checksum_freq 4 --stages_per_ts 16

 -------------------
 5. The code:

    block.c         Routines to split and recombine blocks
    check_sum.c     Calculates check_sum for the arrays
    comm_block.c    Communicate new location for block during refine
    comm.c          General routine to do interblock communication
    comm_parent.c   Communicate refine/unrefine information to parents/children
    comm_refine.c   Communicate block refine/unrefine to neighbors during refine
    comm_util.c     Utilities to manage communication lists
    driver.c        Main driver
    init.c          Initialization routine
    main.c          Main routine that reads command line and launches program
    move.c          Routines that check overlap of objects and blocks
    pack.c          Pack and unpack blocks to move
    plot.c          Write out block information for plotting
    profile.c       Write out performance data
    rcb.c           Load balancing routines
    refine.c        Routines to direct refinement step
    stencil.c       Perform stencil calculations
    target.c        Add/subtract blocks to reach a target number
    util.c          Utility routines for timing and allocation

 -- End README file.

 Courtenay T. Vaughan
 (ctvaugh@sandia.gov)
	**************************************************************************
	LLVM Test-suite Note:
	**************************************************************************
	The original source is located at https://github.com/Mantevo/miniAMR.
	Beyond this paragraph is the original README contained with the source
	code. The Makefile refered to within is not utilized within the
	test-suite. The test-suite builds a serial version (openmp and
	mpi disabled) with its own cmake and make build system.
	**************************************************************************

	miniAMR mini-application

	--------------------------------------
	Contents of this README file:
	1. miniAMR overview
	2. miniAMR versions
	3. building miniAMR
	4. running miniAMR
	5. notes about the code
	--------------------------------------

	--------------------------------------
	1. miniAMR overview

	miniAMR applies a stencil calculation on a unit cube computational domain,
	which is divided into blocks. The blocks all have the same number of cells
	in each direction and communicate ghost values with neighboring blocks. With
	adaptive mesh refinement, the blocks can represent different levels of
	refinement in the larger mesh. Neighboring blocks can be at the same level
	or one level different, which means that the length of cells in neighboring
	blocks can differ by only a factor of two in each direction. The calculations
	on the variables in each cell is an averaging of the values in the chosen
	stencil. The refinement and coarsening of the blocks is driven by objects
	that are pushed through the mesh. If a block intersects with the surface
	or the volume of an object, then that block can be refined. There is also
	an option to uniformly refine the mesh. Each cell contains a number of
	variables, each of which is evaluated indepently.

	--------------------------------------
	2. miniAMR versions:

	- miniAMR_ref:

	reference version: self-contained MPI-parallel.

	- miniAMR_serial

	serial version of reference version

	-------------------
	3. Building miniAMR:

	To make the code, type 'make' in the directory containing the source.
	The enclosed Makefile.mpi is configured for a general MPI installation.
	Other compiler or other machines will need changes in the CFLAGS
	variable to correspond with the flags available for the compiler being used.

	-------------------
	4. Running miniAMR:

	miniAMR can be run like this:

	% <mpi-run-command> ./miniAMR.x

	where <mpi-run-command> varies from system to system but usually looks something like 'mpirun -np 4 ' or similar.

	Execution is then driven entirely by the default settings, as configured in default-settings.h. Options may be listed using

	% ./miniAMR.x --help

	To run the program, there are several arguments on the command line.
	The list of arguments and their defaults is as follows:

	--nx - block size in x
	--ny - block size in y
	--nz - block size in z
	These control the size of the blocks in the mesh. All of these need to
	be even and greater than zero. The default is 10 for each variable.

	--init_x - initial blocks in x
	--init_y - initial blocks in y
	--init_z - initial blocks in z
	These control the number of the blocks on each processor in the
	initial mesh. These need to be greater than zero. The default
	is 1 block in each direction per processor. The initial mesh
	is a unit cube regardless of the number of blocks.

	--reorder - ordering of blocks
	This controls whether the blocks are ordered by the RCB algorithm
	or by a natural ordering of the processors. The default is 1 which
	selects the RCB ordering and the natural ordering is 0.

	--npx - number of processors in the x direction
	--npy - number of processors in the y direction
	--npz - number of processors in the z direction
	These control the number of processors is each direction. The product
	of these number has to equal the number of processors being used. The
	default is 1 block in each direction.

	--max_blocks - maximun number of blocks per processor
	The maximun number of blocks used per processor. This is the number of
	blocks that will be allocated at the start of the run and the code will
	fail if this number is exceeded. The default is 500 blocks.

	--num_refine - number of levels of refinement
	This is the number of levels of refinement that blocks which are refined
	will be refined to. If it is zero then the mesh will not be refined.
	the default is 5 levels of refinement.

	--block_change - number of levels a block can change during refinement
	This parameter controls the number of levels that a block can change
	(either refining or coarsening) during a refinement step. The default
	is the number of levels of refinement.

	--uniform_refine - if 1, then grid is uniformly refined
	This controls whether the mesh is uniformly refined. If it is 1 then the
	mesh will be uniformly refined, while if it is zero, the refinement will
	be controlled by objects in the mesh. The default is 1.

	--refine_freq - frequency (in timesteps) of checking for refinement
	This determines the frequency (in timesteps) between checking if
	refinement is needed. The default is every 5 timesteps.

	--target_active - target number of blocks per processor
	--target_max - max number of blocks per processor
	--target_min - min number of blocks per processor
	These allow the user to control the number of blocks per processor.
	If these are zero, then no adjustment is made. If target_active is
	greater than zero than the code will adjust the number of blocks to
	that target after the refinement step. If target_max is greater than
	zero then the number of blocks will be reduced if it exceeds this
	number. Likewise, if target_min is greater than zero, than the number
	of blocks will be raised if there is less than that number after the
	refinement step. The default for all of these is zero.

	--inbalance - percentage inbalance to trigger inbalance
	This parameter allows the user to set a percentage threshold above
	which the load will be balanced amoung the processors. The value
	that this is checked against is the maximum number of blocks on a
	processor minus the minimum number of blocks on a processor divided
	by the average. The default is zero, which means to always load
	balance at each refinement step.

	--lb_opt - (0, 1, 2) determine load balance strategy
	If set to 0, then load balancing is not performed. The default is
	set to 1 which load balances each refinement step. Setting the
	parameter to 2 results in load balancing at each stage of the
	refinement step. If a processor has a large number of blocks which
	are refined several steps, this allows the work (and space needed)
	to be shared amoung more processors.

	--num_vars - number of variables (> 0)
	The number of variables the will be calculated on and communicated.
	The default is 40 variables.

	--comm_vars - number of vars to communicate together
	The number of variables that will communicated together. This will
	allow shorter but more variables if it is set to something less than
	the total number of variables. The default is zero which will
	communicate all of the variables at once.

	--num_tsteps - number of timesteps (> 0)
	The number of timesteps for which the simulation will be run. The
	default is 20.

	--stages_per_ts - number of comm/calc stages per timestep
	The number of calculate/communicate stages per timestep. The default
	is 20.

	--permute - (no argument) permute communication directions
	If this is set, then the order of the communication directions will
	be permuted through the six options available. The default is
	to send messages in the x direction first, then y, and then z.

	--blocking_send - (no argument) Use blocking sends in the communication
	routine instead of the default nonblocking sends.

	--code - change the way communication is done
	The default is 0 which communicates only the ghost values that are
	needed. Setting this to 1 sends all of the ghost values, and setting
	this to 2 also does all of the message processing (refinement or
	unrefinement) to be done on the sending side. This allows us to
	more closely minic the communication behaviour of codes.

	--checksum_freq - number of stages between checksums
	The number of stages between calculating checksums on the variables.
	The default is 5. If it is zero, no checks are performed.

	--stencil - 7 or 27 point 3D stencil
	The 3D stencil used for the calculations. It can be either 7 or 27
	and the default is 7 since the 27 point calculation will not conserve
	the sum of the variables except for the case of uniform refinement.

	--error_tol - (e^{-error_tol} ; >= 0)
	This determines the error tolerance for the checksums for the variables.
	the tolerance is 10 to the negative power of error_tol. The default
	is 8, so the default tolerance is 10^(-8).

	--report_diffusion - (>= 0) none if 0
	This determines if the checksums are printed when they are calculated.
	The default is 0, which is no printing.

	--report_perf - (0 .. 15)
	This determines how the performance output is displayed. The default
	is YAML output (value of 1). There are four output modes and each is
	controlled by a bit in the value. The YAML output (to a file called
	results.yaml) is controlled by the first bit (report_perf & 1), the
	text output file (results.txt) is controlled by the second bit
	(report_perf & 2), the output to standard out is controlled by the
	third bit (report_perf & 4), and the output of block decomposition
	at each refine step is controlled by the forth bit (report_perf & 8).
	These options can be combined in any way desired and zero to four
	of these options can be used in any run. Setting report_perf to 0
	will result in no output.

	--refine_freq - frequency (timesteps) of refinement (0 for none)
	This determines how frequently (in timesteps) the mesh is checked
	and refinement is done. The default is every 5 timesteps. If
	uniform refinement is turned on, the setting of refine_freq does
	not matter and the mesh will be refined before the first timestep.

	--refine_ghosts - (no argument)
	The default is to not use the ghost cells of a block to determine if
	that block will be refined. Specifying this flag will allow those
	ghost cells to be used.

	--num_objects - (>= 0) number of objects to cause refinement
	The number of objects on which refinement is based. Default is zero.

	--object - type, position, movement, size, size rate of change
	The object keyword has 14 arguments. The first two are integers
	and the rest are floating point numbers. They are:
	type - The type of object. There is 16 types of objects. They include
	the surface of a rectangle (0), a solid rectangle (1),
	the surface of a spheroid (2), a solid spheroid (3),
	the surface of a hemispheroid (+/- with 3 cutting planes)
	(4, 6, 8, 10, 12, 14),
	a solid spheroid (+/- with 3 cutting planes)(5, 7, 9, 11, 13, 15),
	the surface of a cylinder (20, 22, 24),
	and the volume of a cylinder (21, 23, 25).
	bounce - If this is 1 then an object will bounce off of the walls
	when the center hits an edge of the unit cube. If it is
	zero, then the object can leave the mesh.
	center - Three doubles that determine the center of the object in the
	x, y, and z directions.
	move - Three doubles that determine the rate of movement of the center
	of the object in the x, y, and z directions. The object moves
	this far at each timestep.
	size - The initial size of the object in the x, y, and z directions.
	If any of these become negative, the object will not be used
	in the calculations to determine refinement. These sizes are
	from the center to the edge in the specified direction.
	inc - The change in size of the object in the x, y, and z directions.


	Examples of run scripts for a Cray XE6 that illustrate several of the options:

	One sphere moving diagonally on 27 processors:

	mpirun -np 27 -N 7 miniAMR.x --num_refine 4 --max_blocks 9000 --npx 3 --npy 3 --npz 3 --nx 8 --ny 8 --nz 8 --num_objects 1 --object 2 0 -1.71 -1.71 -1.71 0.04 0.04 0.04 1.7 1.7 1.7 0.0 0.0 0.0 --num_tsteps 100 --checksum_freq 1

	An expanding sphere on 64 processors:

	mpirun -np 64 miniAMR.x --num_refine 4 --max_blocks 6000 --init_x 1 --init_y 1 --init_z 1 --npx 4 --npy 4 --npz 4 --nx 8 --ny 8 --nz 8 --num_objects 1 --object 2 0 -0.01 -0.01 -0.01 0.0 0.0 0.0 0.0 0.0 0.0 0.0009 0.0009 0.0009 --num_tsteps 200 --comm_vars 2

	Two moving spheres on 16 processors:

	mpirun -np 16 miniAMR.x --num_refine 4 --max_blocks 4000 --init_x 1 --init_y 1 --init_z 1 --npx 4 --npy 2 --npz 2 --nx 8 --ny 8 --nz 8 --num_objects 2 --object 2 0 -1.10 -1.10 -1.10 0.030 0.030 0.030 1.5 1.5 1.5 0.0 0.0 0.0 --object 2 0 0.5 0.5 1.76 0.0 0.0 -0.025 0.75 0.75 0.75 0.0 0.0 0.0 --num_tsteps 100 --checksum_freq 4 --stages_per_ts 16

	-------------------
	5. The code:

	block.c Routines to split and recombine blocks
	check_sum.c Calculates check_sum for the arrays
	comm_block.c Communicate new location for block during refine
	comm.c General routine to do interblock communication
	comm_parent.c Communicate refine/unrefine information to parents/children
	comm_refine.c Communicate block refine/unrefine to neighbors during refine
	comm_util.c Utilities to manage communication lists
	driver.c Main driver
	init.c Initialization routine
	main.c Main routine that reads command line and launches program
	move.c Routines that check overlap of objects and blocks
	pack.c Pack and unpack blocks to move
	plot.c Write out block information for plotting
	profile.c Write out performance data
	rcb.c Load balancing routines
	refine.c Routines to direct refinement step
	stencil.c Perform stencil calculations
	target.c Add/subtract blocks to reach a target number
	util.c Utility routines for timing and allocation

	-- End README file.

	Courtenay T. Vaughan
	(ctvaugh@sandia.gov)