the lambda or closure passed as a parameter will run in kernel mode.
Every callbacks should be rewritten to that interface at some point.
+ MC
+ * BC breaks:
+ - The option "model-check/sparse-checkpoint" was renamed to
+ "model-check/sparse_checkpoint" as we attempt to unify our naming
+ schemes.
+
Surf
* Reorganizing and cleaning the internals all around the place.
SimGrid requires Lua 5.3; it will not work with Lua 5.2 or Lua 5.1,
-as Lua 5.3 breaks backwards compatibility.
+as Lua 5.3 breaks backwards compatibility.
+Version 5.3.2, 5.3.3 or any 5.3.X are ok, though.
-However, installing Lua 5.3 is easy. (If you're an administrator)
+However, installing Lua 5.3 is easy. (If you are an administrator of your machine)
Step 1: Go to http://www.lua.org/download.html and download the 5.3 package.
Step 3: cd into the new directory
-Step 4: Apply the patch in "<simgrid-source-dir>/contrib/lua/lualib.patch" to the
+Step 4: Apply the patch in "<simgrid-source-dir>/tool/lualib.patch" to the
lua source:
For instance, if you unpacked the lua sourcecode to /tmp/lua-5.3.1, use
the following commands:
- cp contrib/lua/lualib.patch /tmp/lua-5.3.1
+ cp tools/lualib.patch /tmp/lua-5.3.1
cd /tmp/lua-5.3.1/
patch -p1 < lualib.patch
-Step 5: make <platform>, for instance "make linux"
+Step 5: make linux
Step 6: sudo make install
-Step 7: Run ccmake (or supply the config option to cmake) to enable Lua in SimGrid. Done!
+Step 7: Go back to the SimGrid source, and run ccmake again. Try removing CMakeCache.txt if it still complains about Lua being not found.
INPUT = doxygen/index.doc \
doxygen/getting_started.doc \
doxygen/getting_started_index.doc \
- doxygen/introduction.doc \
+ doxygen/tutorial.doc \
doxygen/install.doc \
doxygen/examples.doc \
doxygen/platform.doc \
doxygen/advanced.doc \
doxygen/pls.doc \
doxygen/bindings.doc \
- doxygen/internals.doc \
+ doxygen/inside.doc \
doxygen/inside_doxygen.doc \
doxygen/inside_extending.doc \
doxygen/inside_cmake.doc \
- doxygen/inside_ci.doc \
+ doxygen/inside_tests.doc \
doxygen/inside_release.doc \
doxygen/contributing.doc \
doxygen/tracing.doc \
you could also get in real settings to not hinder the realism of your
simulation.
-\verbatim
+\code
double get_host_load() {
m_task_t task = MSG_task_create("test", 0.001, 0, NULL);
double date = MSG_get_clock();
MSG_task_destroy(task);
return (0.001/date);
}
-\endverbatim
+\endcode
Of course, it may not match your personal definition of "host load". In this
case, please detail what you mean on the mailing list, and we will extend
ready). However, getting the *real* communication time is not really
hard either. The following solution is a good starting point.
-\verbatim
+\code
int sender()
{
m_task_t task = MSG_task_create("Task", task_comp_size, task_comm_size,
MSG_task_destroy(task);
return 0;
}
-\endverbatim
+\endcode
\subsection faq_MIA_SimDag SimDag related questions
create 3 SD_tasks: t1, t2 and c and add dependencies in the following
way:
-\verbatim
+\code
SD_task_dependency_add(NULL, NULL, t1, c);
SD_task_dependency_add(NULL, NULL, c, t2);
-\endverbatim
+\endcode
This way task t2 cannot start before the termination of communication c
which in turn cannot start before t1 ends.
distributed. Here is an example of how you could do that. Assume T1
has to be done before T2.
-\verbatim
+\code
int your_agent(int argc, char *argv[] {
...
T1 = MSG_task_create(...);
}
}
}
-\endverbatim
+\endcode
If you decide that the distributed part is not that much important and that
DAG is really the level of abstraction you want to work with, then you should
don't assume that it's here because we don't care. It survived only
because nobody told us. We unfortunately cannot endlessly review our
large code and documentation base. So please, <b>report any issue you
-find to us</b>, be it a typo in the documentation, a paragraph that
+find</b>, be it a typo in the documentation, a paragraph that
needs to be reworded, a bug in the code or any other problem. The best
way to do so is to open a bug on our
<a href="https://gforge.inria.fr/tracker/?atid=165&group_id=12&func=browse">Bug
\section gs_introduction Introduction, Installation and how we can help
-| Document name | Description |
-| ----------------- | ------------------------------------------------- |
-| \ref introduction | Introduces the user to basic features of SimGrid. |
-| \ref install | Explains how SimGrid can be installed; this covers Windows as well as Linux; plus, it shows how to install from a package or how to install from source. |
-| \ref FAQ | Our FAQ |
-| \ref help | There are many ways to find answers to your questions. This document lists them. |
+| Document name | Description |
+| --------------- | ------------------------------------------------- |
+| \ref tutorial | Introduces the user to basic features of SimGrid. |
+| \ref install | Explains how SimGrid can be installed; this covers Windows as well as Linux; plus, it shows how to install from a package or how to install from source. |
+| \ref FAQ | Our FAQ |
+| \ref help | There are many ways to find answers to your questions. This document lists them. |
\section gs_new_users Documentation for new users
| Document name | Description |
| ----------------- | ------------------------------------------------- |
-| \ref introduction | Introduces the user to basic features of SimGrid. |
+| \ref tutorial | Introduces the user to basic features of SimGrid. |
| \ref install | Explains how SimGrid can be installed; this covers Windows as well as Linux; plus, it shows how to install from a package or how to install from source. |
| [Tutorials](http://simgrid.gforge.inria.fr/tutorials.html) | These tutorials cover most of the basics and might be valuable for what you want to do, especially the [SimGrid User 101](http://simgrid.gforge.inria.fr/tutorials/simgrid-use-101.pdf). |
| \ref MSG_examples | This document explains several tests that we wrote for MSG; these tests are working simulations and you may learn something from looking at them. |
| \ref tracing | Shows how the behavior of a program can be written to a file so that it can be analyzed. |
| \ref bindings | SimGrid supports many different bindings for languages such as Lua, Ruby, Java, ... You can run your simulations with those! |
| \ref pls | Although SimGrid is not a packet level simulator, it does have bindings to two such simulators. |
-| \ref internals | If you want to contribute or obtain a deeper understanding of SimGrid, this is the right location. |
+| \ref inside | If you want to contribute or obtain a deeper understanding of SimGrid, this is the right location. |
\section gs_examples Examples shipped with SimGrid
<tr><td width="50%">
@endhtmlonly
-- @subpage introduction
+- @subpage tutorial
- @subpage platform
- @subpage options
- @subpage deployment
- @ref bindings
- @ref pls
- @ref tracing
- - @ref contributing
- @subpage FAQ
-- @subpage internals
+- @subpage inside
+ - @ref inside_tests
+ - @ref inside_doxygen
+ - @ref inside_extending
+ - @ref inside_cmake
+ - @ref inside_release
- @subpage contributing
@htmlonly
-/*! @page internals SimGrid Developer Guide
+/*! @page inside SimGrid Developer Guide
This page does not exist yet, sorry. We are currently refurbishing the
user documentation -- the internal documentation will follow (FIXME).
automatic tests are run, and so on. These informations are split on
several pages, as follows:
+ - @subpage inside_tests
- @subpage inside_doxygen
- @subpage inside_extending
- @subpage inside_cmake
- @subpage inside_release
- - @subpage inside_ci
\htmlonly
+++ /dev/null
-/*!
-\page inside_ci Continous Integration (with Jenkins)
-
-\section ci_jenkins Jenkins Interface
-
-\subsection inside_jenkins_basics Where can I find Jenkins?
-
-The SimGrid team currently uses Jenkins to automate testing. Our Jenkins
-interface can be found here: https://ci.inria.fr/simgrid/
-
-If you need an account, talk to Martin Quinson.
-
-\subsection inside_jenkins_add_host How can I add a new host?
-
-You have to login to the CI interface of INRIA: https://ci.inria.fr
-There you can manage the project and add new nodes.
-
-\subsection inside_jenkins_reboot_host How can I restart / reboot a host?
-
-Exactly the same as in \ref inside_jenkins_add_host. The only exception
-is that you have to click on "restart" of the host you want to restart.
-
-
-\subsection inside_jenkins_config_matrix Disable a certain build in the configuration matrix
-
-Jenkins uses a configuration matrix, i.e., a matrix consisting of configurations
-on the one axis and hosts on the other. Normally, every host will try to build
-every configuration but this may not be desirable.
-
-In order to disable a single configuration for a specific host (but leave
-other configurations for the same host enabled), go to your Project and click
-on "Configuration". There, find the field "combination filter" (if your interface
-language is English) and tick the checkbox; then add a groovy-expression to
-disable a specific configuration. For example, in order to disable the "ModelChecker"
-build on host "small-freebsd-64-clang", use:
-
-\verbatim
-(label=="small-freebsd-64-clang").implies(build_mode!="ModelChecker")
-\endverbatim
-
-\section ci_servers CI Servers
-
-\subsection ci_servers_build_dir Where is SimGrid built?
-
-SimGrid gets built in /builds/workspace/$PROJECT/build_mode/$CONFIG/label/$SERVER/build
-
-Here, $PROJECT could be for instance "SimGrid-Multi", $CONFIG "DEBUG" or "ModelChecker"
-and $SERVER for instance "simgrid-fedora20-64-clang".
-
-*/
to the cmake files: it checks whether all necessary files are present
in the distribution.
-\section cmake_dev_guide_ex How to add examples?
-
-First of all, are you sure that you want to create an example, or is
-it merely a new test case? The examples located in examples/ must be
-interesting to the users. It is expected that the users will take one
-of these examples and start editing it to make it fit their needs. If
-what you are about to write is merly a test, exercising a specific
-part of the tool suite but not really interesting to the users, then
-you want to add it to the teshsuite/ directory.
-
-Actually, the examples are also used as regresion tests by adding tesh
-files and registering them to the testing infrastructure (for that,
-don't forget to add a tesh file and follow the instructions of
-section \ref inside_cmake_addtest). The main difference is that
-examples must be interesting to the users in addition.
-
-In both cases, you have to create a CMakeList.txt in the chosen source
+\section inside_cmake_examples How to add an example?
+
+The first rule is that the content of examples/ must be interesting to
+the users. It is expected that the users will take one of these
+examples and start editing it to make it fit their needs.
+So, it should be self-contained, informative and should use only the
+public APIs.
+
+To ensure that all examples actually work as expected, every examples
+are also used as integration test (see \ref inside_tests), but you
+should still strive to keep the code under examples/ as informative as
+possible for the users. In particular, torture test cases should be
+placed in teshsuite/, not examples/, so that the users don't stumble
+upon them by error.
+
+To add a new example, create a CMakeList.txt in the chosen source
directory. It must specify where to create the executable, the source
list, dependencies and the name of the binary.
\verbatim
-cmake_minimum_required(VERSION 2.6)
-
set(EXECUTABLE_OUTPUT_PATH "${CMAKE_CURRENT_BINARY_DIR}")
add_executable(Hello Hello.c)
)
\endverbatim
-\li if you add tesh files (and you should), please refer to the
-following section to register them to the testing infrastructure.
+\li Add the tesh file and register your example to the testing
+ infrastructure. See \ref inside_tests_add_integration for more
+ details.
-Once you're done, you should check with "make distcheck" that you did
+Once you're done, you must run "make distcheck" to ensure that you did
not forget to add any file to the distributed archives.
-\section inside_cmake_addtest How to add tests?
-
-\subsection inside_cmake_addtest_unit Unit testing in SimGrid
-
-If you want to test a specific function or set of functions, you need
-a unit test. Edit
-<project/directory>/tools/cmake/UnitTesting.cmake to add your
-source file to the TEST_CFILES list, and add the corresponding unit
-file to the TEST_UNITS list. For example, if your file is toto.c,
-your unit file will be toto_unit.c. The full path to your file must be
-provided, but the unit file will always be in src/ directly.
-
-Then, you want to actually add your tests in the source file. All the
-tests must be protected by "#ifdef SIMGRID_TEST" so that they don't
-get included in the regular build. Then, you want to add a test suite
-that will contain a bunch of tests (in Junit, that would be a test
-unit) with the macro #XBT_TEST_SUITE, and populate it with a bunch of
-actual tests with the macro #XBT_TEST_UNIT (sorry for the mischosen
-names if you are used to junit). Just look at the dynar example (or
-any other) to see how it works in practice. Do not hesitate to stress
-test your code this way, but make sure that it runs reasonably fast,
-or nobody will run "ctest" before commiting code.
-
-If you are interested in the mechanic turning this into an actual
-test, check the <project/directory>/tools/sg_unit_extractor.pl script.
-
-To actually run your tests once you're done, run "make testall", that
-builds the binary containing all our unit tests and run it. This
-binary allows you to chose which suite you want to test:
-
-\verbatim
-$ testall --help # revise how it goes if you forgot
-$ testall --dump-only # learn about all existing test suites
-$ testall --tests=-all # run no test at all
-$ testall --tests=-all,+foo # run only the foo test suite.
-$ testall --tests=-all,+foo:bar # run only the bar test from the foo suite.
-$ testall --tests=-foo:bar # run all tests but the bar test from the foo suite.
-\endverbatim
-
-\subsection inside_cmake_addtest_integration Integration testing in SimGrid
-
-If you want to test a full binary (such as an example), then you
-probably want to use the tesh binary that ensures that the output
-produced by a command perfectly matches the expected output. In
-particular, this is very precious to ensure that no change modifies
-the timings computed by the models without notice.
-
-The first step is to write a tesh file for your test, containing the
-command to run, the provided input (if any, but almost no SimGrid test
-provide such an input) and the expected output. Check the tesh man
-page for more details.
-
-Tesh is sometimes annoying as you have to ensure that the expected
-output will always be exactly the same. In particular, your should not
-output machine dependent informations, nor memory adresses as they
-would change on each run. Several steps can be used here, such as the
-obfucation of the memory adresses unless the verbose logs are
-displayed (using the #XBT_LOG_ISENABLED() macro), or the modification
-of the log formats to hide the timings when they depend on the host
-machine.
-
-Then you have to request cmake to run your test when "ctest" is
-launched. For that, you have to modify source
-<project/directory>/tools/cmake/Tests.cmake. Make sure to pick
-a wise name for your test. It is often useful to check a category of
-tests together. The only way to do so in ctest is to use the -R
-argument that specifies a regular expression that the test names must
-match. For example, you can run all MSG test with "ctest -R msg" That
-explains the importance of the test names.
-
-Once the name is chosen, create a new test by adding a line similar to
-the following (assuming that you use tesh as expected).
-
-\verbatim
-# ADD_TEST(test-name ${CMAKE_BINARY_DIR}/bin/tesh <options> <tesh-file>)
-# option --setenv bindir set the directory containing the binary
-# --setenv srcdir set the directory containing the source file
-# --cd set the working directory
-ADD_TEST(my-test-name ${CMAKE_BINARY_DIR}/bin/tesh
- --setenv bindir=${CMAKE_BINARY_DIR}/examples/my-test/
- --setenv srcdir=${CMAKE_HOME_DIRECTORY}/examples/my-test/
- --cd ${CMAKE_HOME_DIRECTORY}/examples/my-test/
- ${CMAKE_HOME_DIRECTORY}/examples/msg/io/io.tesh
-)
-\endverbatim
-
-If you prefer to not use tesh for some reasons, prefer the following
-form:
-
-\verbatim
-# ADD_TEST(NAME <name>]
-# [WORKING_DIRECTORY dir]
-# COMMAND <command> [arg1 [arg2 ...]])
-ADD_TEST(NAME my-test-name
- WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/examples/my-test/
- COMMAND Hello
-)
-\endverbatim
-
-As usual, you must run "make distcheck" after modifying the cmake files,
-to ensure that you did not forget any files in the distributed archive.
-
-\subsection inside_cmake_ci Continous Integration
-
-We are using Continous Integration to help us provide a stable build
-across as many platforms as possible. %As this is not related to cmake,
-you have to head over to Section \ref inside_ci.
*/
/*!
-\page inside_release SimGrid Developer Guide - Releasing
+\page inside_release Releasing SimGrid
\section inside_release_c Releasing the main library
--- /dev/null
+/*!
+@page inside_tests Testing SimGrid
+
+This page will teach you how to run the tests, selecting the ones you
+want, and how to add new tests to the archive.
+
+\tableofcontents
+
+SimGrid code coverage is usually between 70% and 80%, which is much
+more than most projects out there. This is because we consider SimGrid
+to be a rather complex project, and we want to modify it with less fear.
+
+We have two sets of tests in SimGrid: Each of the 10,000+ unit tests
+check one specific case for one specific function, while the 500+
+integration tests run a given simulation specifically intended to
+exercise a larger amount of functions together. Every example provided
+in examples/ is used as an integration test, while some other torture
+tests and corner cases integration tests are located in teshsuite/.
+For each integration test, we ensure that the output exactly matches
+the defined expectations. Since SimGrid displays the timestamp of
+every loggued line, this ensures that every change of the models'
+prediction will be noticed. All these tests should ensure that SimGrid
+is safe to use and to depend on.
+
+\section inside_tests_runintegration Running the tests
+
+Running the tests is done using the ctest binary that comes with
+cmake. These tests are run for every commit and the result is publicly
+<a href="https://ci.inria.fr/simgrid/">available</a>.
+
+\verbatim
+ctest # Launch all tests
+ctest -R msg # Launch only the tests which name match the string "msg"
+ctest -j4 # Launch all tests in parallel, at most 4 at the same time
+ctest --verbose # Display all details on what's going on
+ctest --output-on-failure # Only get verbose for the tests that fail
+
+ctest -R msg- -j5 --output-on-failure # You changed MSG and want to check that you didn't break anything, huh?
+ # That's fine, I do so all the time myself.
+\endverbatim
+
+\section inside_tests_rununit Running the unit tests
+
+All unit tests are packed into the testall binary, that lives in src/.
+These tests are run when you launch ctest, don't worry.
+
+\verbatim
+make testall # Rebuild the test runner on need
+./src/testall # Launch all tests
+./src/testall --help # revise how it goes if you forgot
+./src/testall --tests=-all # run no test at all (yeah, that's useless)
+./src/testall --dump-only # Display all existing test suite
+./src/testall --tests=-all,+dict # Only launch the tests from the dict testsuite
+./src/testall --tests=-all,+foo:bar # run only the bar test from the foo suite.
+\endverbatim
+
+
+\section inside_tests_add_units Adding unit tests
+
+If you want to test a specific function or set of functions, you need
+a unit test. Edit
+<project/directory>/tools/cmake/UnitTesting.cmake to add your
+source file to the TEST_CFILES list, and add the corresponding unit
+file to the TEST_UNITS list. For example, if your file is toto.c,
+your unit file will be toto_unit.c. The full path to your file must be
+provided, but the unit file will always be in src/ directly.
+
+If you want to create unit tests in the file src/xbt/toto.c, your
+changes should look similar to:
+
+\verbatim
+--- a/tools/cmake/UnitTesting.cmake
++++ b/tools/cmake/UnitTesting.cmake
+@@ -11,6 +11,7 @@ set(TEST_CFILES
+ src/xbt/xbt_strbuff.c
+ src/xbt/xbt_sha.c
+ src/xbt/config.c
++ src/xbt/toto.c
+ )
+ set(TEST_UNITS
+ ${CMAKE_CURRENT_BINARY_DIR}/src/cunit_unit.c
+@@ -22,6 +23,7 @@ set(TEST_UNITS
+ ${CMAKE_CURRENT_BINARY_DIR}/src/xbt_strbuff_unit.c
+ ${CMAKE_CURRENT_BINARY_DIR}/src/xbt_sha_unit.c
+ ${CMAKE_CURRENT_BINARY_DIR}/src/config_unit.c
++ ${CMAKE_CURRENT_BINARY_DIR}/src/toto_unit.c
+
+ ${CMAKE_CURRENT_BINARY_DIR}/src/simgrid_units_main.c
+ )
+\endverbatim
+
+Then, you want to actually add your tests in the source file. All the
+tests must be protected by "#ifdef SIMGRID_TEST" so that they don't
+get included in the regular build. The line SIMGRID_TEST must also be
+written on the endif line for the extraction script to work properly.
+
+Tests are subdivided in three levels. The top-level, called <b>test
+suite</b>, is created with the macro #XBT_TEST_SUITE. There can be
+only one suite per source file. A suite contains <b>test units</b>
+that you create with the #XBT_TEST_UNIT macro. Finally, you start
+<b>actual tests</b> with #xbt_test_add. There is no closing marker of
+any sort, and an unit is closed when the next unit starts, or when the
+end of file is reached.
+
+Once a given test is started with #xbt_test_add, you use
+#xbt_test_assert to specify that it was actually an assert, or
+#xbt_test_fail to specify that it failed (if your test cannot easily
+be written as an assert). #xbt_test_exception can be used to report
+that it failed with an exception. There is nothing to do to report
+that a given test succeeded, just start the next test without
+reporting any issue. Finally, #xbt_test_log can be used to report
+intermediate steps. The messages will be shown only if the
+corresponding test fails.
+
+Here is a recaping example, inspired from the dynar implementation.
+@code
+/* The rest of your module implementation */
+
+#ifdef SIMGRID_TEST
+
+XBT_TEST_SUITE("dynar", "Dynar data container");
+XBT_LOG_EXTERNAL_DEFAULT_CATEGORY(xbt_dyn); // Just the regular logging stuff
+
+XBT_TEST_UNIT("int", test_dynar_int, "Dynars of integers")
+{
+ int i, cpt;
+ unsigned int cursor;
+
+ xbt_test_add("==== Traverse the empty dynar");
+ xbt_dynar_t d = xbt_dynar_new(sizeof(int), NULL);
+ xbt_dynar_foreach(d, cursor, i) {
+ xbt_test_fail( "Damnit, there is something in the empty dynar");
+ }
+ xbt_dynar_free(&d);
+
+ xbt_test_add("==== Push %d int and re-read them", NB_ELEM);
+ d = xbt_dynar_new(sizeof(int), NULL);
+ for (cpt = 0; cpt < NB_ELEM; cpt++) {
+ xbt_test_log("Push %d, length=%lu", cpt, xbt_dynar_length(d));
+ xbt_dynar_push_as(d, int, cpt);
+ }
+
+ for (cursor = 0; cursor < NB_ELEM; cursor++) {
+ int *iptr = xbt_dynar_get_ptr(d, cursor);
+ xbt_test_assert(cursor == *iptr,
+ "The retrieved value is not the same than the injected one (%u!=%d)",cursor, cpt);
+ }
+
+ xbt_test_add("==== Check the size of that %d-long dynar", NB_ELEM);
+ xbt_test_assert(xbt_dynar_size(d) == NB_ELEM);
+ xbt_dynar_free(&d);
+}
+
+XBT_TEST_UNIT("insert",test_dynar_insert,"Using the xbt_dynar_insert and xbt_dynar_remove functions")
+{
+ xbt_dynar_t d = xbt_dynar_new(sizeof(unsigned int), NULL);
+ unsigned int cursor;
+ int cpt;
+
+ xbt_test_add("==== Insert %d int, traverse them, remove them",NB_ELEM);
+ // BLA BLA BLA
+}
+
+#endif /* SIMGRID_TEST <-- that string must appear on the endif line */
+@endcode
+
+For more details on the macro used to write unit tests, see their
+reference guide: @ref XBT_cunit. For details on on how the tests are
+extracted from the module source, check the tools/sg_unit_extractor.pl
+script directly.
+
+Last note: please try to keep your tests fast. We run them very very
+very often, and you should strive to make it as fast as possible, to
+not upset the other developers. Do not hesitate to stress test your
+code with such unit tests, but make sure that it runs reasonably fast,
+or nobody will run "ctest" before commiting code.
+
+\section inside_tests_add_integration Adding integration tests
+
+TESH (the TEsting SHell) is the test runner that we wrote for our
+integration tests. It is distributed with the SimGrid source file, and
+even comes with a man page. TESH ensures that the output produced by a
+command perfectly matches the expected output. This is very precious
+to ensure that no change modifies the timings computed by the models
+without notice.
+
+To add a new integration test, you thus have 3 things to do:
+
+ - <b>Write the code exercising the feature you target</b>. You should
+ strive to make this code clear, well documented and informative for
+ the users. If you manage to do so, put this somewhere under
+ examples/ and modify the cmake files as explained on this page:
+ \ref inside_cmake_examples. If you feel like you should write a
+ torture test that is not interesting to the users (because nobody
+ would sainly write something similar in user code), then put it under
+ teshsuite/ somewhere.
+ - <b>Write the tesh file</b>, containing the command to run, the
+ provided input (if any, but almost no SimGrid test provide such an
+ input) and the expected output. Check the tesh man page for more
+ details. \n
+ Tesh is sometimes annoying as you have to ensure that the expected
+ output will always be exactly the same. In particular, your should
+ not output machine dependent informations, nor memory adresses as
+ they would change on each run. Several steps can be used here, such
+ as the obfucation of the memory adresses unless the verbose logs
+ are displayed (using the #XBT_LOG_ISENABLED() macro), or the
+ modification of the log formats to hide the timings when they
+ depend on the host machine.
+ - <b>Add your test in the cmake infrastructure</b>. For that, modify
+ the file <project/directory>/tools/cmake/Tests.cmake. Make sure to
+ pick a wise name for your test. It is often useful to check a
+ category of tests together. The only way to do so in ctest is to
+ use the -R argument that specifies a regular expression that the
+ test names must match. For example, you can run all MSG test with
+ "ctest -R msg". That explains the importance of the test names.
+
+Once the name is chosen, create a new test by adding a line similar to
+the following (assuming that you use tesh as expected).
+
+\verbatim
+# Usage: ADD_TEST(test-name ${CMAKE_BINARY_DIR}/bin/tesh <options> <tesh-file>)
+# option --setenv bindir set the directory containing the binary
+# --setenv srcdir set the directory containing the source file
+# --cd set the working directory
+ADD_TEST(my-test-name ${CMAKE_BINARY_DIR}/bin/tesh
+ --setenv bindir=${CMAKE_BINARY_DIR}/examples/my-test/
+ --setenv srcdir=${CMAKE_HOME_DIRECTORY}/examples/my-test/
+ --cd ${CMAKE_HOME_DIRECTORY}/examples/my-test/
+ ${CMAKE_HOME_DIRECTORY}/examples/msg/io/io.tesh
+)
+\endverbatim
+
+As usual, you must run "make distcheck" after modifying the cmake files,
+to ensure that you did not forget any files in the distributed archive.
+
+\section inside_tests_ci Continous Integration
+
+We use several systems to automatically test SimGrid with a large set
+of parameters, across as many platforms as possible.
+We use <a href="https://ci.inria.fr/simgrid/">Jenkins on Inria
+servers</a> as a workhorse: it runs all of our tests for many
+configurations. It takes a long time to answer, and it often reports
+issues but when it's green, then you know that SimGrid is very fit!
+We use <a href="https://travis-ci.org/mquinson/simgrid">Travis</a> to
+quickly run some tests on Linux and Mac. It answers quickly but may
+miss issues. And we use <a href="https://ci.appveyor.com/project/mquinson/simgrid">AppVeyor</a>
+to build and somehow test SimGrid on windows.
+
+\subsection inside_tests_jenkins Jenkins on the Inria CI servers
+
+You should not have to change the configuration of the Jenkins tool
+yourself, although you could have to change the slaves' configuration
+using the <a href="https://ci.inria.fr">CI interface of INRIA</a> --
+refer to the <a href="https://wiki.inria.fr/ciportal/">CI documentation</a>.
+
+The result can be seen here: https://ci.inria.fr/simgrid/
+
+We have 3 projects on Jenkins:
+\li <a href="https://ci.inria.fr/simgrid/job/SimGrid-Multi/">SimGrid-Multi</a>
+ is the main project, running the tests that we spoke about.\n It is
+ configured (on Jenkins) to run the script <tt>tools/jenkins/build.sh</tt>
+\li <a href="https://ci.inria.fr/simgrid/job/SimGrid-DynamicAnalysis/">SimGrid-DynamicAnalysis</a>
+ runs the tests both under valgrind to find the memory errors and
+ under gcovr to report the achieved test coverage.\n It is configured
+ (on Jenkins) to run the script <tt>tools/jenkins/DynamicAnalysis.sh</tt>
+\li <a href="https://ci.inria.fr/simgrid/job/SimGrid-Windows/">SimGrid-Windows</a>
+ is an ongoing attempt to get Windows tested on Jenkins too.
+
+In each case, SimGrid gets built in
+/builds/workspace/$PROJECT/build_mode/$CONFIG/label/$SERVER/build
+with $PROJECT being for instance "SimGrid-Multi", $CONFIG "DEBUG" or
+"ModelChecker" and $SERVER for instance "simgrid-fedora20-64-clang".
+
+If some configurations are known to fail on some systems (such as
+model-checking on non-linux systems), go to your Project and click on
+"Configuration". There, find the field "combination filter" (if your
+interface language is English) and tick the checkbox; then add a
+groovy-expression to disable a specific configuration. For example, in
+order to disable the "ModelChecker" build on host
+"small-freebsd-64-clang", use:
+
+\verbatim
+(label=="small-freebsd-64-clang").implies(build_mode!="ModelChecker")
+\endverbatim
+
+\subsection inside_tests_travis Travis
+
+Travis is a free (as in free beer) Continuous Integration system that
+open-sourced project can use freely. It is very well integrated in the
+GitHub ecosystem. There is a plenty of documentation out there. Our
+configuration is in the file .travis.yml as it should be, and the
+result is here: https://travis-ci.org/mquinson/simgrid
+
+\subsection inside_tests_appveyor AppVeyor
+
+AppVeyor aims at becoming the Travis of Windows. It is maybe less
+mature than Travis, or maybe it is just that I'm less trained in
+Windows. Our configuration is in the file appveyor.yml as it should
+be, and the result is here: https://ci.appveyor.com/project/mquinson/simgrid
+
+It should be noted that I miserably failed to use the environment
+provided by AppVeyor, since SimGrid does not build with Microsoft
+Visual Studio. Instead, we download a whole development environment
+from the internet at each build. That's an archive of already compiled
+binaries that are unpacked on the appveyor systems each time we start.
+We re-use the ones from the
+<a href="https://github.com/symengine/symengine">symengine</a>
+project. Thanks to them for compiling sane tools and constituting that
+archive, it saved my mind!
+
+*/
The easiest way to install SimGrid is to go for a binary package.
Under Debian or Ubuntu, this is very easy as SimGrid is directly
-integrated to the official repositories. Under Windows, SimGrid can be
-installed in a few clicks once you downloaded the installer from
-gforge. If you just want to use Java, simply copy the jar file on your
-disk and you're set.
+integrated to the official repositories. If you just want to use
+Java, simply copy the jar file on your disk and you're set. Note that
+under Windows, you should go for Java, as the native C interface is
+not supported on that OS.
Recompiling an official archive is not much more complex, actually.
SimGrid has very few dependencies and rely only on very standard
Please contact us if you want to contribute the build scripts for your
preferred distribution.
-@subsection install_binary_win Installation wizard for Windows
-
-Before starting the installation, make sure that you have the following dependencies:
- @li cmake 2.8 <a href="http://www.cmake.org/cmake/resources/software.html">(download page)</a>
- @li MinGW <a href="http://sourceforge.net/projects/mingw/files/MinGW/">(download page)</a>
- @li perl <a href="http://www.activestate.com/activeperl/downloads">(download page)</a>
- @li git <a href="http://msysgit.googlecode.com/files/Git-1.7.4-preview20110204.exe">(download page)</a>
-
-Then download the package <a href="https://gforge.inria.fr/frs/?group_id=12">SimGrid Installer</a>,
-execute it and follow instructions.
-
-@image html win_install_01.png Step 1: Accept the license.
-@image html win_install_02.png Step 2: Select packets to install.
-@image html win_install_03.png Step 3: Choice where to install packets previously selected. Please don't use spaces in path.
-@image html win_install_04.png Step 4: Add CLASSPATH to environment variables.
-@image html win_install_05.png Step 5: Add PATH to environment variables.
-@image html win_install_06.png Step 6: Restart your computer to take in consideration environment variables.
-
@subsection install_binary_java Using the binary jar file
The easiest way to install the Java bindings of SimGrid is to grab the
supported, drop us an email: we may extend the jarfile for you, if we
have access to your architecture to build SimGrid on it.
+If the error message is about the boost-context library, then you
+should install that library on your machine. This is a known issue in
+the 3.12 release that will be fixed in the next release.
+
+You can retrieve a nightly build of the jar file from our autobuilders.
+For Windows, head to
+<a href="https://ci.appveyor.com/project/mquinson/simgrid">AppVeyor</a>.
+Click on the artefact link on the right, and grab your file. If the
+latest build failed, there will be no artefact so you will need to
+first click on "History" on the top to search for the last successful
+build.
+For non-Windows systems (Linux, Mac or FreeBSD), head to
+<a href="https://ci.inria.fr/simgrid/job/SimGrid-Multi">Jenkins</a>.
+In the build history, pick the last green (or at least yellow) build
+that is not blinking (ie, that is done building). In the list, pick a
+system that is close to your system, and click on the ball in the
+Debug row. The build artefact appear on the top of the resulting page.
+
@section install_src Installing from source
@subsection install_src_deps Resolving the dependencies
- perl (but you may try to go without it)
- We use cmake to configure our compilation
(<a href="http://www.cmake.org/cmake/resources/software.html">download page</a>).
- You need cmake version 2.8 or higher. You may want to use ccmake
+ You need cmake version 2.8.8 or higher. You may want to use ccmake
for a graphical interface over cmake.
- LibBoost:
- osX: with <a href="http://www.finkproject.org/">fink</a>: `sudo fink install boost1.53.nopython`
- - debian: `apt-get install libboost-dev`
+ - debian: `apt-get install libboost-dev libboost-context-dev`
On MacOSX, it is advised to use the clang compiler (version 3.0 or
-higher), from either MacPort or XCode. If you insist on using gcc on
-this system, you still need a recent version of this compiler, so you
-need an unofficial gcc47 from MacPort because the version provided by
-Apple is ways to ancient to suffice. See also @ref install_cmake_mac.
-
-On Windows, it is strongly advised to use the
-<a href="http://sourceforge.net/projects/mingw/files/MinGW/">MinGW
-environment</a> to build SimGrid, with <a href="http://www.mingw.org/wiki/MSYS">
-MSYS tools</a> installed. Any other compilers are not tested
-(and thus probably broken). We usually use the
+higher), from either MacPort or XCode. See also @ref install_cmake_mac.
+
+Building from the source on Windows, may be something of an adventure.
+We never managed to compile SimGrid with something else than MinGW-64
+ourselves. We usually use the
<a href="http://www.activestate.com/activeperl/downloads">activestate</a>
version of Perl, and the
<a href="http://msysgit.googlecode.com/files/Git-1.7.4-preview20110204.exe">msys</a>
-version of git on this architecture, but YMMV. See also @ref install_cmake_win.
+version of git on this architecture, but YMMV. You can have a look at
+the configuration scripts in the appveyor.yml file, but you are
+basically on your own here. Sorry. We are not fluent with Windows so
+we cannot really help.
@subsection install_src_fetch Retrieving the source
make
@endverbatim
-\subsubsection install_cmake_win Cmake on Windows (with MinGW + MSYS)
-
-Cmake can produce several kind of of makefiles. Under Windows, it has
-no way of determining what kind you want to use, so you have to hint it:
-
-@verbatim
-cmake -G "MSYS Makefiles" (other options) .
-make
-@endverbatim
-
\subsubsection install_cmake_mac Cmake on Mac OS X
SimGrid compiles like a charm with clang on Mac OS X:
Once everything is built, you may want to test the result. SimGrid
comes with an extensive set of regression tests (see @ref
-inside_cmake_addtest "that page of the insider manual" for more
+inside_tests "that page of the insider manual" for more
details). Running the tests is done using the ctest binary that comes
-with cmake. These tests are run every night and the result is publicly
-<a href="http://cdash.inria.fr/CDash/index.php?project=Simgrid">available</a>.
+with cmake. These tests are run for every commit and the result is
+publicly <a href="https://ci.inria.fr/simgrid/">available</a>.
\verbatim
ctest # Launch all tests
-ctest -D Experimental # Launch all tests and report the result to
- # http://cdash.inria.fr/CDash/index.php?project=SimGrid
ctest -R msg # Launch only the tests which name match the string "msg"
ctest -j4 # Launch all tests in parallel, at most 4 at the same time
ctest --verbose # Display all details on what's going on
\section install_setting_own Setting up your own code
-\subsection install_setting_MSG MSG code on Unix (Linux or Mac OSX)
+\subsection install_setting_MSG MSG code on Unix
Do not build your simulator by modifying the SimGrid examples. Go
outside the SimGrid source tree and create your own working directory
perform some more complex compilations...
-\subsection install_setting_win_provided Compile the "HelloWorld" project on Windows
-
-In the SimGrid install directory you should have an HelloWorld project to explain you how to start
-compiling a source file. There are:
-\verbatim
-- HelloWorld.c The example source file.
-- CMakeLists.txt It allows to configure the project.
-- README This explanation.
-\endverbatim
-
-Now let's compile this example:
-\li Run windows shell "cmd".
-\li Open HelloWorld Directory ('cd' command line).
-\li Create a build directory and change directory. (optional)
-\li Type 'cmake -G"MinGW Makefiles" \<path_to_HelloWorld_project\>'
-\li Run mingw32-make
-\li You should obtain a runnable example ("HelloWorld.exe").
-
-For compiling your own code you can simply copy the HelloWorld project and rename source name. It will
-create a target with the same name of the source.
-
-
-\subsection install_setting_win_new Adding and Compiling a new example on Windows
-
-\li Put your source file into the helloWord directory.
-\li Edit CMakeLists.txt by removing the Find Targets section and add those two lines into this section
-\verbatim
-################
-# FIND TARGETS #
-################
-#It creates a target called 'TARGET_NAME.exe' with the sources 'SOURCES'
-add_executable(TARGET_NAME SOURCES)
-#Links TARGET_NAME with simgrid
-target_link_libraries(TARGET_NAME simgrid)
-\endverbatim
-\li To initialize and build your project, you'll need to run
-\verbatim
-cmake -G"MinGW Makefiles" <path_to_HelloWorld_project>
-\endverbatim
-\li Run "mingw32-make"
-\li You should obtain "TARGET_NAME.exe".
-
-\subsection install_Win_ruby Setup a virtualbox to use SimGrid-Ruby on windows
-
-Allan Espinosa made these set of Vagrant rules available so that you
-can use the SimGrid Ruby bindings in a virtual machine using
-VirtualBox. Thanks to him for that. You can find his project here:
-https://github.com/aespinosa/simgrid-vagrant
-
-
-
*/
change much between different snapshots and taking a complete copy of each
snapshot is a waste of memory.
-The \b model-check/sparse-checkpoint option item can be set to \b yes in order
+The \b model-check/sparse_checkpoint option item can be set to \b yes in order
to avoid making a complete copy of each snapshot: instead, each snapshot will be
decomposed in blocks which will be stored separately.
If multiple snapshots share the same block (or if the same block
consumption of the snapshots to be \f$ \mbox{number of processes}
\times \mbox{stack size} \times \mbox{number of states} \f$.
-The \b model-check/sparse-checkpoint can be used to reduce the memory
+The \b model-check/sparse_checkpoint can be used to reduce the memory
consumption by trying to share memory between the different snapshots.
When compiled against the model checker, the stacks are not
- \c model-check/reduction: \ref options_modelchecking_reduction
- \c model-check/replay: \ref options_modelchecking_recordreplay
- \c model-check/send_determinism: \ref options_modelchecking_sparse_checkpoint
-- \c model-check/sparse-checkpoint: \ref options_modelchecking_sparse_checkpoint
+- \c model-check/sparse_checkpoint: \ref options_modelchecking_sparse_checkpoint
- \c model-check/termination: \ref options_modelchecking_termination
- \c model-check/timeout: \ref options_modelchecking_timeout
- \c model-check/visited: \ref options_modelchecking_visited
<b>Example:</b>
-\verbatim
+\code
<AS id="AS0" routing="Full">
<host id="host1" power="1000000000"/>
<host id="host2" power="1000000000"/>
<link id="link1" bandwidth="125000000" latency="0.000100"/>
<route src="host1" dst="host2"><link_ctn id="link1"/></route>
</AS>
-\endverbatim
+\endcode
In this example, AS0 contains two hosts (host1 and host2). The route
between the hosts goes through link1.
-/*! @page introduction Introduction to SimGrid
+/*! @page tutorial SimGrid First Tutorial
+SimGrid is a toolkit providing the core functionalities for the
+simulation of distributed applications in heterogeneous distributed
+environments.
-[SimGrid](http://simgrid.gforge.inria.fr/) is a toolkit
-that provides core functionalities for the simulation of distributed
-applications in heterogeneous distributed environments.
-
-The specific goal of the project is to facilitate research in the area of
-distributed and parallel application scheduling on distributed computing
-platforms ranging from simple network of workstations to Computational
-Grids.
+The project goal is both to facilitate research and to help improving
+real applications in the area of distributed and parallel systems,
+ranging from simple network of workstations to Computational Grids to
+Clouds and to supercomputers.
\tableofcontents
! expect return 2
! timeout 10
-$ ${bindir:=.}/../../../bin/simgrid-mc ${bindir:=.}/bugged1_liveness ${srcdir:=.}/../../platforms/platform.xml ${srcdir:=.}/deploy_bugged1_liveness.xml --log=xbt_cfg.thresh:warning "--log=root.fmt:[%10.6r]%e(%i:%P@%h)%e%m%n" --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:256 --cfg=model-check/sparse-checkpoint:yes --cfg=model-check/property:promela_bugged1_liveness
+$ ${bindir:=.}/../../../bin/simgrid-mc ${bindir:=.}/bugged1_liveness ${srcdir:=.}/../../platforms/platform.xml ${srcdir:=.}/deploy_bugged1_liveness.xml --log=xbt_cfg.thresh:warning "--log=root.fmt:[%10.6r]%e(%i:%P@%h)%e%m%n" --cfg=contexts/factory:ucontext --cfg=contexts/stack_size:256 --cfg=model-check/sparse_checkpoint:yes --cfg=model-check/property:promela_bugged1_liveness
> [ 0.000000] (0:maestro@) Check the liveness property promela_bugged1_liveness
> [ 0.000000] (2:client@Boivin) Ask the request
> [ 0.000000] (3:client@Fafard) Ask the request
! expect return 2
! timeout 10
-$ ${bindir:=.}/../../../bin/simgrid-mc ${bindir:=.}/bugged1_liveness ${srcdir:=.}/../../platforms/platform.xml ${srcdir:=.}/deploy_bugged1_liveness_visited.xml --log=xbt_cfg.thresh:warning "--log=root.fmt:[%10.6r]%e(%i:%P@%h)%e%m%n" --cfg=contexts/factory:ucontext --cfg=model-check/visited:100 --cfg=contexts/stack_size:256 --cfg=model-check/sparse-checkpoint:yes --cfg=model-check/property:promela_bugged1_liveness
+$ ${bindir:=.}/../../../bin/simgrid-mc ${bindir:=.}/bugged1_liveness ${srcdir:=.}/../../platforms/platform.xml ${srcdir:=.}/deploy_bugged1_liveness_visited.xml --log=xbt_cfg.thresh:warning "--log=root.fmt:[%10.6r]%e(%i:%P@%h)%e%m%n" --cfg=contexts/factory:ucontext --cfg=model-check/visited:100 --cfg=contexts/stack_size:256 --cfg=model-check/sparse_checkpoint:yes --cfg=model-check/property:promela_bugged1_liveness
> [ 0.000000] (0:maestro@) Check the liveness property promela_bugged1_liveness
> [ 0.000000] (2:client@Boivin) Ask the request
> [ 0.000000] (3:client@Fafard) Ask the request
if args.topology ~= "TORUS" and args.topology ~= "FAT_TREE" then
args.topology = "Cluster"
end
- --if args.core==nil then args.core = 1 end
- --if args.core==nil then args.core = 1 end
- --if args.core==nil then args.core = 1 end
- --if args.core==nil then args.core = 1 end
- --if args.core==nil then args.core = 1 end
- --if args.core==nil then args.core = 1 end
- --if args.core==nil then args.core = 1 end
- --if args.core==nil then args.core = 1 end
- --if args.core==nil then args.core = 1 end
-- Check the mode = Cluster here
return function()
+++ /dev/null
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-
-#include "DGraph.h"
-
-DGArc *newArc(DGNode *tl,DGNode *hd){
- DGArc *ar=(DGArc *)malloc(sizeof(DGArc));
- ar->tail=tl;
- ar->head=hd;
- return ar;
-}
-void arcShow(DGArc *ar){
- DGNode *tl=(DGNode *)ar->tail,
- *hd=(DGNode *)ar->head;
- fprintf(stderr,"%d. |%s ->%s\n",ar->id,tl->name,hd->name);
-}
-
-DGNode *newNode(char *nm){
- DGNode *nd=(DGNode *)malloc(sizeof(DGNode));
- nd->attribute=0;
- nd->color=0;
- nd->inDegree=0;
- nd->outDegree=0;
- nd->maxInDegree=SMALL_BLOCK_SIZE;
- nd->maxOutDegree=SMALL_BLOCK_SIZE;
- nd->inArc=(DGArc **)malloc(nd->maxInDegree*sizeof(DGArc*));
- nd->outArc=(DGArc **)malloc(nd->maxOutDegree*sizeof(DGArc*));
- nd->name=strdup(nm);
- nd->feat=NULL;
- return nd;
-}
-void nodeShow(DGNode* nd){
- fprintf( stderr,"%3d.%s: (%d,%d)\n",
- nd->id,nd->name,nd->inDegree,nd->outDegree);
-/*
- if(nd->verified==1) fprintf(stderr,"%ld.%s\t: usable.",nd->id,nd->name);
- else if(nd->verified==0) fprintf(stderr,"%ld.%s\t: unusable.",nd->id,nd->name);
- else fprintf(stderr,"%ld.%s\t: notverified.",nd->id,nd->name);
-*/
-}
-
-DGraph* newDGraph(char* nm){
- DGraph *dg=(DGraph *)malloc(sizeof(DGraph));
- dg->numNodes=0;
- dg->numArcs=0;
- dg->maxNodes=BLOCK_SIZE;
- dg->maxArcs=BLOCK_SIZE;
- dg->node=(DGNode **)malloc(dg->maxNodes*sizeof(DGNode*));
- dg->arc=(DGArc **)malloc(dg->maxArcs*sizeof(DGArc*));
- dg->name=strdup(nm);
- return dg;
-}
-int AttachNode(DGraph* dg, DGNode* nd) {
- int i=0,j,len=0;
- DGNode **nds =NULL, *tmpnd=NULL;
- DGArc **ar=NULL;
-
- if (dg->numNodes == dg->maxNodes-1 ) {
- dg->maxNodes += BLOCK_SIZE;
- nds =(DGNode **) calloc(dg->maxNodes,sizeof(DGNode*));
- memcpy(nds,dg->node,(dg->maxNodes-BLOCK_SIZE)*sizeof(DGNode*));
- free(dg->node);
- dg->node=nds;
- }
-
- len = strlen( nd->name);
- for (i = 0; i < dg->numNodes; i++) {
- tmpnd =dg->node[ i];
- ar=NULL;
- if ( strlen( tmpnd->name) != len ) continue;
- if ( strncmp( nd->name, tmpnd->name, len) ) continue;
- if ( nd->inDegree > 0 ) {
- tmpnd->maxInDegree += nd->maxInDegree;
- ar =(DGArc **) calloc(tmpnd->maxInDegree,sizeof(DGArc*));
- memcpy(ar,tmpnd->inArc,(tmpnd->inDegree)*sizeof(DGArc*));
- free(tmpnd->inArc);
- tmpnd->inArc=ar;
- for (j = 0; j < nd->inDegree; j++ ) {
- nd->inArc[ j]->head = tmpnd;
- }
- memcpy( &(tmpnd->inArc[ tmpnd->inDegree]), nd->inArc, nd->inDegree*sizeof( DGArc *));
- tmpnd->inDegree += nd->inDegree;
- }
- if ( nd->outDegree > 0 ) {
- tmpnd->maxOutDegree += nd->maxOutDegree;
- ar =(DGArc **) calloc(tmpnd->maxOutDegree,sizeof(DGArc*));
- memcpy(ar,tmpnd->outArc,(tmpnd->outDegree)*sizeof(DGArc*));
- free(tmpnd->outArc);
- tmpnd->outArc=ar;
- for (j = 0; j < nd->outDegree; j++ ) {
- nd->outArc[ j]->tail = tmpnd;
- }
- memcpy( &(tmpnd->outArc[tmpnd->outDegree]),nd->outArc,nd->outDegree*sizeof( DGArc *));
- tmpnd->outDegree += nd->outDegree;
- }
- free(nd);
- return i;
- }
- nd->id = dg->numNodes;
- dg->node[dg->numNodes] = nd;
- dg->numNodes++;
-return nd->id;
-}
-int AttachArc(DGraph *dg,DGArc* nar){
-int arcId = -1;
-int i=0,newNumber=0;
-DGNode *head = nar->head,
- *tail = nar->tail;
-DGArc **ars=NULL,*probe=NULL;
-/*fprintf(stderr,"AttachArc %ld\n",dg->numArcs); */
- if ( !tail || !head ) return arcId;
- if ( dg->numArcs == dg->maxArcs-1 ) {
- dg->maxArcs += BLOCK_SIZE;
- ars =(DGArc **) calloc(dg->maxArcs,sizeof(DGArc*));
- memcpy(ars,dg->arc,(dg->maxArcs-BLOCK_SIZE)*sizeof(DGArc*));
- free(dg->arc);
- dg->arc=ars;
- }
- for(i = 0; i < tail->outDegree; i++ ) { /* parallel arc */
- probe = tail->outArc[ i];
- if(probe->head == head
- &&
- probe->length == nar->length
- ){
- free(nar);
- return probe->id;
- }
- }
-
- nar->id = dg->numArcs;
- arcId=dg->numArcs;
- dg->arc[dg->numArcs] = nar;
- dg->numArcs++;
-
- head->inArc[ head->inDegree] = nar;
- head->inDegree++;
- if ( head->inDegree >= head->maxInDegree ) {
- newNumber = head->maxInDegree + SMALL_BLOCK_SIZE;
- ars =(DGArc **) calloc(newNumber,sizeof(DGArc*));
- memcpy(ars,head->inArc,(head->inDegree)*sizeof(DGArc*));
- free(head->inArc);
- head->inArc=ars;
- head->maxInDegree = newNumber;
- }
- tail->outArc[ tail->outDegree] = nar;
- tail->outDegree++;
- if(tail->outDegree >= tail->maxOutDegree ) {
- newNumber = tail->maxOutDegree + SMALL_BLOCK_SIZE;
- ars =(DGArc **) calloc(newNumber,sizeof(DGArc*));
- memcpy(ars,tail->outArc,(tail->outDegree)*sizeof(DGArc*));
- free(tail->outArc);
- tail->outArc=ars;
- tail->maxOutDegree = newNumber;
- }
-/*fprintf(stderr,"AttachArc: head->in=%d tail->out=%ld\n",head->inDegree,tail->outDegree);*/
-return arcId;
-}
-void graphShow(DGraph *dg,int DetailsLevel){
- int i=0,j=0;
- fprintf(stderr,"%d.%s: (%d,%d)\n",dg->id,dg->name,dg->numNodes,dg->numArcs);
- if ( DetailsLevel < 1) return;
- for (i = 0; i < dg->numNodes; i++ ) {
- DGNode *focusNode = dg->node[ i];
- if(DetailsLevel >= 2) {
- for (j = 0; j < focusNode->inDegree; j++ ) {
- fprintf(stderr,"\t ");
- nodeShow(focusNode->inArc[ j]->tail);
- }
- }
- nodeShow(focusNode);
- if ( DetailsLevel < 2) continue;
- for (j = 0; j < focusNode->outDegree; j++ ) {
- fprintf(stderr, "\t ");
- nodeShow(focusNode->outArc[ j]->head);
- }
- fprintf(stderr, "---\n");
- }
- fprintf(stderr,"----------------------------------------\n");
- if ( DetailsLevel < 3) return;
-}
-
-
-
+++ /dev/null
-#ifndef _DGRAPH
-#define _DGRAPH
-
-#define BLOCK_SIZE 128
-#define SMALL_BLOCK_SIZE 32
-
-typedef struct{
- int id;
- void *tail,*head;
- int length,width,attribute,maxWidth;
-}DGArc;
-
-typedef struct{
- int maxInDegree,maxOutDegree;
- int inDegree,outDegree;
- int id;
- char *name;
- DGArc **inArc,**outArc;
- int depth,height,width;
- int color,attribute,address,verified;
- void *feat;
-}DGNode;
-
-typedef struct{
- int maxNodes,maxArcs;
- int id;
- char *name;
- int numNodes,numArcs;
- DGNode **node;
- DGArc **arc;
-} DGraph;
-
-DGArc *newArc(DGNode *tl,DGNode *hd);
-void arcShow(DGArc *ar);
-DGNode *newNode(char *nm);
-void nodeShow(DGNode* nd);
-
-DGraph* newDGraph(char *nm);
-int AttachNode(DGraph *dg,DGNode *nd);
-int AttachArc(DGraph *dg,DGArc* nar);
-void graphShow(DGraph *dg,int DetailsLevel);
-
-#endif
+++ /dev/null
-SHELL=/bin/sh
-BENCHMARK=dt
-BENCHMARKU=DT
-
-include ../config/make.def
-
-include ../sys/make.common
-#Override PROGRAM
-DTPROGRAM = $(BINDIR)/$(BENCHMARK)-folding.$(CLASS)
-
-OBJS = dt.o DGraph.o \
- ${COMMON}/c_print_results.o ${COMMON}/c_timers.o ${COMMON}/c_randdp.o
-
-
-${PROGRAM}: config ${OBJS}
- ${CLINK} ${CLINKFLAGS} -o ${DTPROGRAM} ${OBJS} ${CMPI_LIB}
-
-.c.o:
- ${CCOMPILE} $<
-
-dt.o: dt.c npbparams.h
-DGraph.o: DGraph.c DGraph.h
-
-clean:
- - rm -f *.o *~ mputil*
- - rm -f dt npbparams.h core
+++ /dev/null
-Data Traffic benchmark DT is new in the NPB suite
-(released as part of NPB3.x-MPI package).
-----------------------------------------------------
-
-DT is written in C and same executable can run on any number of processors,
-provided this number is not less than the number of nodes in the communication
-graph. DT benchmark takes one argument: BH, WH, or SH. This argument
-specifies the communication graph Black Hole, White Hole, or SHuffle
-respectively. The current release contains verification numbers for
-CLASSES S, W, A, and B only. Classes C and D are defined, but verification
-numbers are not provided in this release.
-
-The following table summarizes the number of nodes in the communication
-graph based on CLASS and graph TYPE.
-
-CLASS N_Source N_Nodes(BH,WH) N_Nodes(SH)
- S 4 5 12
- W 8 11 32
- A 16 21 80
- B 32 43 192
- C 64 85 448
- D 128 171 1024
+++ /dev/null
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-
-#include "DGraph.h"
-
-DGArc *newArc(DGNode *tl,DGNode *hd){
- DGArc *ar=(DGArc *)malloc(sizeof(DGArc));
- ar->tail=tl;
- ar->head=hd;
- return ar;
-}
-void arcShow(DGArc *ar){
- DGNode *tl=(DGNode *)ar->tail,
- *hd=(DGNode *)ar->head;
- fprintf(stderr,"%d. |%s ->%s\n",ar->id,tl->name,hd->name);
-}
-
-DGNode *newNode(char *nm){
- DGNode *nd=(DGNode *)malloc(sizeof(DGNode));
- nd->attribute=0;
- nd->color=0;
- nd->inDegree=0;
- nd->outDegree=0;
- nd->maxInDegree=SMALL_BLOCK_SIZE;
- nd->maxOutDegree=SMALL_BLOCK_SIZE;
- nd->inArc=(DGArc **)malloc(nd->maxInDegree*sizeof(DGArc*));
- nd->outArc=(DGArc **)malloc(nd->maxOutDegree*sizeof(DGArc*));
- nd->name=strdup(nm);
- nd->feat=NULL;
- return nd;
-}
-void nodeShow(DGNode* nd){
- fprintf( stderr,"%3d.%s: (%d,%d)\n",
- nd->id,nd->name,nd->inDegree,nd->outDegree);
-/*
- if(nd->verified==1) fprintf(stderr,"%ld.%s\t: usable.",nd->id,nd->name);
- else if(nd->verified==0) fprintf(stderr,"%ld.%s\t: unusable.",nd->id,nd->name);
- else fprintf(stderr,"%ld.%s\t: notverified.",nd->id,nd->name);
-*/
-}
-
-DGraph* newDGraph(char* nm){
- DGraph *dg=(DGraph *)malloc(sizeof(DGraph));
- dg->numNodes=0;
- dg->numArcs=0;
- dg->maxNodes=BLOCK_SIZE;
- dg->maxArcs=BLOCK_SIZE;
- dg->node=(DGNode **)malloc(dg->maxNodes*sizeof(DGNode*));
- dg->arc=(DGArc **)malloc(dg->maxArcs*sizeof(DGArc*));
- dg->name=strdup(nm);
- return dg;
-}
-int AttachNode(DGraph* dg, DGNode* nd) {
- int i=0,j,len=0;
- DGNode **nds =NULL, *tmpnd=NULL;
- DGArc **ar=NULL;
-
- if (dg->numNodes == dg->maxNodes-1 ) {
- dg->maxNodes += BLOCK_SIZE;
- nds =(DGNode **) calloc(dg->maxNodes,sizeof(DGNode*));
- memcpy(nds,dg->node,(dg->maxNodes-BLOCK_SIZE)*sizeof(DGNode*));
- free(dg->node);
- dg->node=nds;
- }
-
- len = strlen( nd->name);
- for (i = 0; i < dg->numNodes; i++) {
- tmpnd =dg->node[ i];
- ar=NULL;
- if ( strlen( tmpnd->name) != len ) continue;
- if ( strncmp( nd->name, tmpnd->name, len) ) continue;
- if ( nd->inDegree > 0 ) {
- tmpnd->maxInDegree += nd->maxInDegree;
- ar =(DGArc **) calloc(tmpnd->maxInDegree,sizeof(DGArc*));
- memcpy(ar,tmpnd->inArc,(tmpnd->inDegree)*sizeof(DGArc*));
- free(tmpnd->inArc);
- tmpnd->inArc=ar;
- for (j = 0; j < nd->inDegree; j++ ) {
- nd->inArc[ j]->head = tmpnd;
- }
- memcpy( &(tmpnd->inArc[ tmpnd->inDegree]), nd->inArc, nd->inDegree*sizeof( DGArc *));
- tmpnd->inDegree += nd->inDegree;
- }
- if ( nd->outDegree > 0 ) {
- tmpnd->maxOutDegree += nd->maxOutDegree;
- ar =(DGArc **) calloc(tmpnd->maxOutDegree,sizeof(DGArc*));
- memcpy(ar,tmpnd->outArc,(tmpnd->outDegree)*sizeof(DGArc*));
- free(tmpnd->outArc);
- tmpnd->outArc=ar;
- for (j = 0; j < nd->outDegree; j++ ) {
- nd->outArc[ j]->tail = tmpnd;
- }
- memcpy( &(tmpnd->outArc[tmpnd->outDegree]),nd->outArc,nd->outDegree*sizeof( DGArc *));
- tmpnd->outDegree += nd->outDegree;
- }
- free(nd);
- return i;
- }
- nd->id = dg->numNodes;
- dg->node[dg->numNodes] = nd;
- dg->numNodes++;
-return nd->id;
-}
-int AttachArc(DGraph *dg,DGArc* nar){
-int arcId = -1;
-int i=0,newNumber=0;
-DGNode *head = nar->head,
- *tail = nar->tail;
-DGArc **ars=NULL,*probe=NULL;
-/*fprintf(stderr,"AttachArc %ld\n",dg->numArcs); */
- if ( !tail || !head ) return arcId;
- if ( dg->numArcs == dg->maxArcs-1 ) {
- dg->maxArcs += BLOCK_SIZE;
- ars =(DGArc **) calloc(dg->maxArcs,sizeof(DGArc*));
- memcpy(ars,dg->arc,(dg->maxArcs-BLOCK_SIZE)*sizeof(DGArc*));
- free(dg->arc);
- dg->arc=ars;
- }
- for(i = 0; i < tail->outDegree; i++ ) { /* parallel arc */
- probe = tail->outArc[ i];
- if(probe->head == head
- &&
- probe->length == nar->length
- ){
- free(nar);
- return probe->id;
- }
- }
-
- nar->id = dg->numArcs;
- arcId=dg->numArcs;
- dg->arc[dg->numArcs] = nar;
- dg->numArcs++;
-
- head->inArc[ head->inDegree] = nar;
- head->inDegree++;
- if ( head->inDegree >= head->maxInDegree ) {
- newNumber = head->maxInDegree + SMALL_BLOCK_SIZE;
- ars =(DGArc **) calloc(newNumber,sizeof(DGArc*));
- memcpy(ars,head->inArc,(head->inDegree)*sizeof(DGArc*));
- free(head->inArc);
- head->inArc=ars;
- head->maxInDegree = newNumber;
- }
- tail->outArc[ tail->outDegree] = nar;
- tail->outDegree++;
- if(tail->outDegree >= tail->maxOutDegree ) {
- newNumber = tail->maxOutDegree + SMALL_BLOCK_SIZE;
- ars =(DGArc **) calloc(newNumber,sizeof(DGArc*));
- memcpy(ars,tail->outArc,(tail->outDegree)*sizeof(DGArc*));
- free(tail->outArc);
- tail->outArc=ars;
- tail->maxOutDegree = newNumber;
- }
-/*fprintf(stderr,"AttachArc: head->in=%d tail->out=%ld\n",head->inDegree,tail->outDegree);*/
-return arcId;
-}
-void graphShow(DGraph *dg,int DetailsLevel){
- int i=0,j=0;
- fprintf(stderr,"%d.%s: (%d,%d)\n",dg->id,dg->name,dg->numNodes,dg->numArcs);
- if ( DetailsLevel < 1) return;
- for (i = 0; i < dg->numNodes; i++ ) {
- DGNode *focusNode = dg->node[ i];
- if(DetailsLevel >= 2) {
- for (j = 0; j < focusNode->inDegree; j++ ) {
- fprintf(stderr,"\t ");
- nodeShow(focusNode->inArc[ j]->tail);
- }
- }
- nodeShow(focusNode);
- if ( DetailsLevel < 2) continue;
- for (j = 0; j < focusNode->outDegree; j++ ) {
- fprintf(stderr, "\t ");
- nodeShow(focusNode->outArc[ j]->head);
- }
- fprintf(stderr, "---\n");
- }
- fprintf(stderr,"----------------------------------------\n");
- if ( DetailsLevel < 3) return;
-}
-
-
-
+++ /dev/null
-#ifndef _DGRAPH
-#define _DGRAPH
-
-#define BLOCK_SIZE 128
-#define SMALL_BLOCK_SIZE 32
-
-typedef struct{
- int id;
- void *tail,*head;
- int length,width,attribute,maxWidth;
-}DGArc;
-
-typedef struct{
- int maxInDegree,maxOutDegree;
- int inDegree,outDegree;
- int id;
- char *name;
- DGArc **inArc,**outArc;
- int depth,height,width;
- int color,attribute,address,verified;
- void *feat;
-}DGNode;
-
-typedef struct{
- int maxNodes,maxArcs;
- int id;
- char *name;
- int numNodes,numArcs;
- DGNode **node;
- DGArc **arc;
-} DGraph;
-
-DGArc *newArc(DGNode *tl,DGNode *hd);
-void arcShow(DGArc *ar);
-DGNode *newNode(char *nm);
-void nodeShow(DGNode* nd);
-
-DGraph* newDGraph(char *nm);
-int AttachNode(DGraph *dg,DGNode *nd);
-int AttachArc(DGraph *dg,DGArc* nar);
-void graphShow(DGraph *dg,int DetailsLevel);
-
-#endif
+++ /dev/null
-SHELL=/bin/sh
-BENCHMARK=dt
-BENCHMARKU=DT
-
-include ../config/make.def
-
-include ../sys/make.common
-#Override PROGRAM
-DTPROGRAM = $(BINDIR)/$(BENCHMARK)-trace.$(CLASS)
-
-OBJS = dt.o DGraph.o \
- ${COMMON}/c_print_results.o ${COMMON}/c_timers.o ${COMMON}/c_randdp.o
-
-
-${PROGRAM}: config ${OBJS}
- ${CLINK} ${CLINKFLAGS} -o ${DTPROGRAM} ${OBJS} ${CMPI_LIB}
-
-.c.o:
- ${CCOMPILE} $<
-
-dt.o: dt.c npbparams.h
-DGraph.o: DGraph.c DGraph.h
-
-clean:
- - rm -f *.o *~ mputil*
- - rm -f dt npbparams.h core
+++ /dev/null
-Data Traffic benchmark DT is new in the NPB suite
-(released as part of NPB3.x-MPI package).
-----------------------------------------------------
-
-DT is written in C and same executable can run on any number of processors,
-provided this number is not less than the number of nodes in the communication
-graph. DT benchmark takes one argument: BH, WH, or SH. This argument
-specifies the communication graph Black Hole, White Hole, or SHuffle
-respectively. The current release contains verification numbers for
-CLASSES S, W, A, and B only. Classes C and D are defined, but verification
-numbers are not provided in this release.
-
-The following table summarizes the number of nodes in the communication
-graph based on CLASS and graph TYPE.
-
-CLASS N_Source N_Nodes(BH,WH) N_Nodes(SH)
- S 4 5 12
- W 8 11 32
- A 16 21 80
- B 32 43 192
- C 64 85 448
- D 128 171 1024
+++ /dev/null
-/*************************************************************************
- * *
- * N A S P A R A L L E L B E N C H M A R K S 3.3 *
- * *
- * D T *
- * *
- *************************************************************************
- * *
- * This benchmark is part of the NAS Parallel Benchmark 3.3 suite. *
- * *
- * Permission to use, copy, distribute and modify this software *
- * for any purpose with or without fee is hereby granted. We *
- * request, however, that all derived work reference the NAS *
- * Parallel Benchmarks 3.3. This software is provided "as is" *
- * without express or implied warranty. *
- * *
- * Information on NPB 3.3, including the technical report, the *
- * original specifications, source code, results and information *
- * on how to submit new results, is available at: *
- * *
- * http: www.nas.nasa.gov/Software/NPB *
- * *
- * Send comments or suggestions to npb@nas.nasa.gov *
- * Send bug reports to npb-bugs@nas.nasa.gov *
- * *
- * NAS Parallel Benchmarks Group *
- * NASA Ames Research Center *
- * Mail Stop: T27A-1 *
- * Moffett Field, CA 94035-1000 *
- * *
- * E-mail: npb@nas.nasa.gov *
- * Fax: (650) 604-3957 *
- * *
- *************************************************************************
- * *
- * Author: M. Frumkin * *
- * *
- *************************************************************************/
-
-#include <stdlib.h>
-#include <stdio.h>
-#include <string.h>
-
-#include "mpi.h"
-#include "npbparams.h"
-
-#include "simgrid/instr.h" //TRACE_
-
-#ifndef CLASS
-#define CLASS 'S'
-#define NUM_PROCS 1
-#endif
-
-//int passed_verification;
-extern double randlc( double *X, double *A );
-extern
-void c_print_results( char *name,
- char class,
- int n1,
- int n2,
- int n3,
- int niter,
- int nprocs_compiled,
- int nprocs_total,
- double t,
- double mops,
- char *optype,
- int passed_verification,
- char *npbversion,
- char *compiletime,
- char *mpicc,
- char *clink,
- char *cmpi_lib,
- char *cmpi_inc,
- char *cflags,
- char *clinkflags );
-
-void timer_clear( int n );
-void timer_start( int n );
-void timer_stop( int n );
-double timer_read( int n );
-int timer_on=0,timers_tot=64;
-
-int verify(char *bmname,double rnm2){
- double verify_value=0.0;
- double epsilon=1.0E-8;
- char cls=CLASS;
- int verified=-1;
- if (cls != 'U') {
- if(cls=='S') {
- if(strstr(bmname,"BH")){
- verify_value=30892725.0;
- }else if(strstr(bmname,"WH")){
- verify_value=67349758.0;
- }else if(strstr(bmname,"SH")){
- verify_value=58875767.0;
- }else{
- fprintf(stderr,"No such benchmark as %s.\n",bmname);
- }
- verified = 0;
- }else if(cls=='W') {
- if(strstr(bmname,"BH")){
- verify_value = 4102461.0;
- }else if(strstr(bmname,"WH")){
- verify_value = 204280762.0;
- }else if(strstr(bmname,"SH")){
- verify_value = 186944764.0;
- }else{
- fprintf(stderr,"No such benchmark as %s.\n",bmname);
- }
- verified = 0;
- }else if(cls=='A') {
- if(strstr(bmname,"BH")){
- verify_value = 17809491.0;
- }else if(strstr(bmname,"WH")){
- verify_value = 1289925229.0;
- }else if(strstr(bmname,"SH")){
- verify_value = 610856482.0;
- }else{
- fprintf(stderr,"No such benchmark as %s.\n",bmname);
- }
- verified = 0;
- }else if(cls=='B') {
- if(strstr(bmname,"BH")){
- verify_value = 4317114.0;
- }else if(strstr(bmname,"WH")){
- verify_value = 7877279917.0;
- }else if(strstr(bmname,"SH")){
- verify_value = 1836863082.0;
- }else{
- fprintf(stderr,"No such benchmark as %s.\n",bmname);
- verified = 0;
- }
- }else if(cls=='C') {
- if(strstr(bmname,"BH")){
- verify_value = 0.0;
- }else if(strstr(bmname,"WH")){
- verify_value = 0.0;
- }else if(strstr(bmname,"SH")){
- verify_value = 0.0;
- }else{
- fprintf(stderr,"No such benchmark as %s.\n",bmname);
- verified = -1;
- }
- }else if(cls=='D') {
- if(strstr(bmname,"BH")){
- verify_value = 0.0;
- }else if(strstr(bmname,"WH")){
- verify_value = 0.0;
- }else if(strstr(bmname,"SH")){
- verify_value = 0.0;
- }else{
- fprintf(stderr,"No such benchmark as %s.\n",bmname);
- }
- verified = -1;
- }else{
- fprintf(stderr,"No such class as %c.\n",cls);
- }
- fprintf(stderr," %s L2 Norm = %f\n",bmname,rnm2);
- if(verified==-1){
- fprintf(stderr," No verification was performed.\n");
- }else if( rnm2 - verify_value < epsilon &&
- rnm2 - verify_value > -epsilon) { /* abs here does not work on ALTIX */
- verified = 1;
- fprintf(stderr," Deviation = %f\n",(rnm2 - verify_value));
- }else{
- verified = 0;
- fprintf(stderr," The correct verification value = %f\n",verify_value);
- fprintf(stderr," Got value = %f\n",rnm2);
- }
- }else{
- verified = -1;
- }
- return verified;
- }
-
-int ipowMod(int a,long long int n,int md){
- int seed=1,q=a,r=1;
- if(n<0){
- fprintf(stderr,"ipowMod: exponent must be nonnegative exp=%lld\n",n);
- n=-n; /* temp fix */
-/* return 1; */
- }
- if(md<=0){
- fprintf(stderr,"ipowMod: module must be positive mod=%d",md);
- return 1;
- }
- if(n==0) return 1;
- while(n>1){
- int n2 = n/2;
- if (n2*2==n){
- seed = (q*q)%md;
- q=seed;
- n = n2;
- }else{
- seed = (r*q)%md;
- r=seed;
- n = n-1;
- }
- }
- seed = (r*q)%md;
- return seed;
-}
-
-#include "DGraph.h"
-DGraph *buildSH(char cls){
-/*
- Nodes of the graph must be topologically sorted
- to avoid MPI deadlock.
-*/
- DGraph *dg;
- int numSources=NUM_SOURCES; /* must be power of 2 */
- int numOfLayers=0,tmpS=numSources>>1;
- int firstLayerNode=0;
- DGArc *ar=NULL;
- DGNode *nd=NULL;
- int mask=0x0,ndid=0,ndoff=0;
- int i=0,j=0;
- char nm[BLOCK_SIZE];
-
- sprintf(nm,"DT_SH.%c",cls);
- dg=newDGraph(nm);
-
- while(tmpS>1){
- numOfLayers++;
- tmpS>>=1;
- }
- for(i=0;i<numSources;i++){
- sprintf(nm,"Source.%d",i);
- nd=newNode(nm);
- AttachNode(dg,nd);
- }
- for(j=0;j<numOfLayers;j++){
- mask=0x00000001<<j;
- for(i=0;i<numSources;i++){
- sprintf(nm,"Comparator.%d",(i+j*firstLayerNode));
- nd=newNode(nm);
- AttachNode(dg,nd);
- ndoff=i&(~mask);
- ndid=firstLayerNode+ndoff;
- ar=newArc(dg->node[ndid],nd);
- AttachArc(dg,ar);
- ndoff+=mask;
- ndid=firstLayerNode+ndoff;
- ar=newArc(dg->node[ndid],nd);
- AttachArc(dg,ar);
- }
- firstLayerNode+=numSources;
- }
- mask=0x00000001<<numOfLayers;
- for(i=0;i<numSources;i++){
- sprintf(nm,"Sink.%d",i);
- nd=newNode(nm);
- AttachNode(dg,nd);
- ndoff=i&(~mask);
- ndid=firstLayerNode+ndoff;
- ar=newArc(dg->node[ndid],nd);
- AttachArc(dg,ar);
- ndoff+=mask;
- ndid=firstLayerNode+ndoff;
- ar=newArc(dg->node[ndid],nd);
- AttachArc(dg,ar);
- }
-return dg;
-}
-DGraph *buildWH(char cls){
-/*
- Nodes of the graph must be topologically sorted
- to avoid MPI deadlock.
-*/
- int i=0,j=0;
- int numSources=NUM_SOURCES,maxInDeg=4;
- int numLayerNodes=numSources,firstLayerNode=0;
- int totComparators=0;
- int numPrevLayerNodes=numLayerNodes;
- int id=0,sid=0;
- DGraph *dg;
- DGNode *nd=NULL,*source=NULL,*tmp=NULL,*snd=NULL;
- DGArc *ar=NULL;
- char nm[BLOCK_SIZE];
-
- sprintf(nm,"DT_WH.%c",cls);
- dg=newDGraph(nm);
-
- for(i=0;i<numSources;i++){
- sprintf(nm,"Sink.%d",i);
- nd=newNode(nm);
- AttachNode(dg,nd);
- }
- totComparators=0;
- numPrevLayerNodes=numLayerNodes;
- while(numLayerNodes>maxInDeg){
- numLayerNodes=numLayerNodes/maxInDeg;
- if(numLayerNodes*maxInDeg<numPrevLayerNodes)numLayerNodes++;
- for(i=0;i<numLayerNodes;i++){
- sprintf(nm,"Comparator.%d",totComparators);
- totComparators++;
- nd=newNode(nm);
- id=AttachNode(dg,nd);
- for(j=0;j<maxInDeg;j++){
- sid=i*maxInDeg+j;
- if(sid>=numPrevLayerNodes) break;
- snd=dg->node[firstLayerNode+sid];
- ar=newArc(dg->node[id],snd);
- AttachArc(dg,ar);
- }
- }
- firstLayerNode+=numPrevLayerNodes;
- numPrevLayerNodes=numLayerNodes;
- }
- source=newNode("Source");
- AttachNode(dg,source);
- for(i=0;i<numPrevLayerNodes;i++){
- nd=dg->node[firstLayerNode+i];
- ar=newArc(source,nd);
- AttachArc(dg,ar);
- }
-
- for(i=0;i<dg->numNodes/2;i++){ /* Topological sorting */
- tmp=dg->node[i];
- dg->node[i]=dg->node[dg->numNodes-1-i];
- dg->node[i]->id=i;
- dg->node[dg->numNodes-1-i]=tmp;
- dg->node[dg->numNodes-1-i]->id=dg->numNodes-1-i;
- }
-return dg;
-}
-DGraph *buildBH(char cls){
-/*
- Nodes of the graph must be topologically sorted
- to avoid MPI deadlock.
-*/
- int i=0,j=0;
- int numSources=NUM_SOURCES,maxInDeg=4;
- int numLayerNodes=numSources,firstLayerNode=0;
- DGraph *dg;
- DGNode *nd=NULL, *snd=NULL, *sink=NULL;
- DGArc *ar=NULL;
- int totComparators=0;
- int numPrevLayerNodes=numLayerNodes;
- int id=0, sid=0;
- char nm[BLOCK_SIZE];
-
- sprintf(nm,"DT_BH.%c",cls);
- dg=newDGraph(nm);
-
- for(i=0;i<numSources;i++){
- sprintf(nm,"Source.%d",i);
- nd=newNode(nm);
- AttachNode(dg,nd);
- }
- while(numLayerNodes>maxInDeg){
- numLayerNodes=numLayerNodes/maxInDeg;
- if(numLayerNodes*maxInDeg<numPrevLayerNodes)numLayerNodes++;
- for(i=0;i<numLayerNodes;i++){
- sprintf(nm,"Comparator.%d",totComparators);
- totComparators++;
- nd=newNode(nm);
- id=AttachNode(dg,nd);
- for(j=0;j<maxInDeg;j++){
- sid=i*maxInDeg+j;
- if(sid>=numPrevLayerNodes) break;
- snd=dg->node[firstLayerNode+sid];
- ar=newArc(snd,dg->node[id]);
- AttachArc(dg,ar);
- }
- }
- firstLayerNode+=numPrevLayerNodes;
- numPrevLayerNodes=numLayerNodes;
- }
- sink=newNode("Sink");
- AttachNode(dg,sink);
- for(i=0;i<numPrevLayerNodes;i++){
- nd=dg->node[firstLayerNode+i];
- ar=newArc(nd,sink);
- AttachArc(dg,ar);
- }
-return dg;
-}
-
-typedef struct{
- int len;
- double* val;
-} Arr;
-Arr *newArr(int len){
- Arr *arr=(Arr *)SMPI_SHARED_MALLOC(sizeof(Arr));
- arr->len=len;
- arr->val=(double *)SMPI_SHARED_MALLOC(len*sizeof(double));
- return arr;
-}
-void arrShow(Arr* a){
- if(!a) fprintf(stderr,"-- NULL array\n");
- else{
- fprintf(stderr,"-- length=%d\n",a->len);
- }
-}
-double CheckVal(Arr *feat){
- double csum=0.0;
- int i=0;
- for(i=0;i<feat->len;i++){
- csum+=feat->val[i]*feat->val[i]/feat->len; /* The truncation does not work since
- result will be 0 for large len */
- }
- return csum;
-}
-int GetFNumDPar(int* mean, int* stdev){
- *mean=NUM_SAMPLES;
- *stdev=STD_DEVIATION;
- return 0;
-}
-int GetFeatureNum(char *mbname,int id){
- double tran=314159265.0;
- double A=2*id+1;
- double denom=randlc(&tran,&A);
- char cval='S';
- int mean=NUM_SAMPLES,stdev=128;
- int rtfs=0,len=0;
- GetFNumDPar(&mean,&stdev);
- rtfs=ipowMod((int)(1/denom)*(int)cval,(long long int) (2*id+1),2*stdev);
- if(rtfs<0) rtfs=-rtfs;
- len=mean-stdev+rtfs;
- return len;
-}
-Arr* RandomFeatures(char *bmname,int fdim,int id){
- int len=GetFeatureNum(bmname,id)*fdim;
- Arr* feat=newArr(len);
- int nxg=2,nyg=2,nzg=2,nfg=5;
- int nx=421,ny=419,nz=1427,nf=3527;
- long long int expon=(len*(id+1))%3141592;
- int seedx=ipowMod(nxg,expon,nx),
- seedy=ipowMod(nyg,expon,ny),
- seedz=ipowMod(nzg,expon,nz),
- seedf=ipowMod(nfg,expon,nf);
- int i=0;
- if(timer_on){
- timer_clear(id+1);
- timer_start(id+1);
- }
- for(i=0;i<len;i+=fdim){
- seedx=(seedx*nxg)%nx;
- seedy=(seedy*nyg)%ny;
- seedz=(seedz*nzg)%nz;
- seedf=(seedf*nfg)%nf;
- feat->val[i]=seedx;
- feat->val[i+1]=seedy;
- feat->val[i+2]=seedz;
- feat->val[i+3]=seedf;
- }
- if(timer_on){
- timer_stop(id+1);
- fprintf(stderr,"** RandomFeatures time in node %d = %f\n",id,timer_read(id+1));
- }
- return feat;
-}
-void Resample(Arr *a,int blen){
- long long int i=0,j=0,jlo=0,jhi=0;
- double avval=0.0;
- double *nval=(double *)SMPI_SHARED_MALLOC(blen*sizeof(double));
- Arr *tmp=newArr(10);
- for(i=0;i<blen;i++) nval[i]=0.0;
- for(i=1;i<a->len-1;i++){
- jlo=(int)(0.5*(2*i-1)*(blen/a->len));
- jhi=(int)(0.5*(2*i+1)*(blen/a->len));
-
- avval=a->val[i]/(jhi-jlo+1);
- for(j=jlo;j<=jhi;j++){
- nval[j]+=avval;
- }
- }
- nval[0]=a->val[0];
- nval[blen-1]=a->val[a->len-1];
- SMPI_SHARED_FREE(a->val);
- a->val=nval;
- a->len=blen;
-}
-#define fielddim 4
-Arr* WindowFilter(Arr *a, Arr* b,int w){
- int i=0,j=0,k=0;
- double rms0=0.0,rms1=0.0,rmsm1=0.0;
- double weight=((double) (w+1))/(w+2);
-
- w+=1;
- if(timer_on){
- timer_clear(w);
- timer_start(w);
- }
- if(a->len<b->len) Resample(a,b->len);
- if(a->len>b->len) Resample(b,a->len);
- for(i=fielddim;i<a->len-fielddim;i+=fielddim){
- rms0=(a->val[i]-b->val[i])*(a->val[i]-b->val[i])
- +(a->val[i+1]-b->val[i+1])*(a->val[i+1]-b->val[i+1])
- +(a->val[i+2]-b->val[i+2])*(a->val[i+2]-b->val[i+2])
- +(a->val[i+3]-b->val[i+3])*(a->val[i+3]-b->val[i+3]);
- j=i+fielddim;
- rms1=(a->val[j]-b->val[j])*(a->val[j]-b->val[j])
- +(a->val[j+1]-b->val[j+1])*(a->val[j+1]-b->val[j+1])
- +(a->val[j+2]-b->val[j+2])*(a->val[j+2]-b->val[j+2])
- +(a->val[j+3]-b->val[j+3])*(a->val[j+3]-b->val[j+3]);
- j=i-fielddim;
- rmsm1=(a->val[j]-b->val[j])*(a->val[j]-b->val[j])
- +(a->val[j+1]-b->val[j+1])*(a->val[j+1]-b->val[j+1])
- +(a->val[j+2]-b->val[j+2])*(a->val[j+2]-b->val[j+2])
- +(a->val[j+3]-b->val[j+3])*(a->val[j+3]-b->val[j+3]);
- k=0;
- if(rms1<rms0){
- k=1;
- rms0=rms1;
- }
- if(rmsm1<rms0) k=-1;
- if(k==0){
- j=i+fielddim;
- a->val[i]=weight*b->val[i];
- a->val[i+1]=weight*b->val[i+1];
- a->val[i+2]=weight*b->val[i+2];
- a->val[i+3]=weight*b->val[i+3];
- }else if(k==1){
- j=i+fielddim;
- a->val[i]=weight*b->val[j];
- a->val[i+1]=weight*b->val[j+1];
- a->val[i+2]=weight*b->val[j+2];
- a->val[i+3]=weight*b->val[j+3];
- }else { /*if(k==-1)*/
- j=i-fielddim;
- a->val[i]=weight*b->val[j];
- a->val[i+1]=weight*b->val[j+1];
- a->val[i+2]=weight*b->val[j+2];
- a->val[i+3]=weight*b->val[j+3];
- }
- }
- if(timer_on){
- timer_stop(w);
- fprintf(stderr,"** WindowFilter time in node %d = %f\n",(w-1),timer_read(w));
- }
- return a;
-}
-
-int SendResults(DGraph *dg,DGNode *nd,Arr *feat){
- int i=0,tag=0;
- DGArc *ar=NULL;
- DGNode *head=NULL;
- if(!feat) return 0;
- TRACE_smpi_set_category ("SendResults");
- for(i=0;i<nd->outDegree;i++){
- ar=nd->outArc[i];
- if(ar->tail!=nd) continue;
- head=ar->head;
- tag=ar->id;
- if(head->address!=nd->address){
- MPI_Send(&feat->len,1,MPI_INT,head->address,tag,MPI_COMM_WORLD);
- MPI_Send(feat->val,feat->len,MPI_DOUBLE,head->address,tag,MPI_COMM_WORLD);
- }
- }
- TRACE_smpi_set_category (NULL);
- return 1;
-}
-Arr* CombineStreams(DGraph *dg,DGNode *nd){
- Arr *resfeat=newArr(NUM_SAMPLES*fielddim);
- int i=0,len=0,tag=0;
- DGArc *ar=NULL;
- DGNode *tail=NULL;
- MPI_Status status;
- Arr *feat=NULL,*featp=NULL;
-
- if(nd->inDegree==0) return NULL;
- for(i=0;i<nd->inDegree;i++){
- ar=nd->inArc[i];
- if(ar->head!=nd) continue;
- tail=ar->tail;
- if(tail->address!=nd->address){
- len=0;
- tag=ar->id;
- MPI_Recv(&len,1,MPI_INT,tail->address,tag,MPI_COMM_WORLD,&status);
- feat=newArr(len);
- MPI_Recv(feat->val,feat->len,MPI_DOUBLE,tail->address,tag,MPI_COMM_WORLD,&status);
- resfeat=WindowFilter(resfeat,feat,nd->id);
- SMPI_SHARED_FREE(feat);
- }else{
- featp=(Arr *)tail->feat;
- feat=newArr(featp->len);
- memcpy(feat->val,featp->val,featp->len*sizeof(double));
- resfeat=WindowFilter(resfeat,feat,nd->id);
- SMPI_SHARED_FREE(feat);
- }
- }
- for(i=0;i<resfeat->len;i++) resfeat->val[i]=((int)resfeat->val[i])/nd->inDegree;
- nd->feat=resfeat;
- return nd->feat;
-}
-double Reduce(Arr *a,int w){
- double retv=0.0;
- if(timer_on){
- timer_clear(w);
- timer_start(w);
- }
- retv=(int)(w*CheckVal(a));/* The casting needed for node
- and array dependent verifcation */
- if(timer_on){
- timer_stop(w);
- fprintf(stderr,"** Reduce time in node %d = %f\n",(w-1),timer_read(w));
- }
- return retv;
-}
-
-double ReduceStreams(DGraph *dg,DGNode *nd){
- double csum=0.0;
- int i=0,len=0,tag=0;
- DGArc *ar=NULL;
- DGNode *tail=NULL;
- Arr *feat=NULL;
- double retv=0.0;
-
- TRACE_smpi_set_category ("ReduceStreams");
-
- for(i=0;i<nd->inDegree;i++){
- ar=nd->inArc[i];
- if(ar->head!=nd) continue;
- tail=ar->tail;
- if(tail->address!=nd->address){
- MPI_Status status;
- len=0;
- tag=ar->id;
- MPI_Recv(&len,1,MPI_INT,tail->address,tag,MPI_COMM_WORLD,&status);
- feat=newArr(len);
- MPI_Recv(feat->val,feat->len,MPI_DOUBLE,tail->address,tag,MPI_COMM_WORLD,&status);
- csum+=Reduce(feat,(nd->id+1));
- SMPI_SHARED_FREE(feat);
- }else{
- csum+=Reduce(tail->feat,(nd->id+1));
- }
- }
- if(nd->inDegree>0)csum=(((long long int)csum)/nd->inDegree);
- retv=(nd->id+1)*csum;
- return retv;
-}
-
-int ProcessNodes(DGraph *dg,int me){
- double chksum=0.0;
- Arr *feat=NULL;
- int i=0,verified=0,tag;
- DGNode *nd=NULL;
- double rchksum=0.0;
- MPI_Status status;
-
- TRACE_smpi_set_category ("ProcessNodes");
-
-
- for(i=0;i<dg->numNodes;i++){
- nd=dg->node[i];
- if(nd->address!=me) continue;
- if(strstr(nd->name,"Source")){
- nd->feat=RandomFeatures(dg->name,fielddim,nd->id);
- SendResults(dg,nd,nd->feat);
- }else if(strstr(nd->name,"Sink")){
- chksum=ReduceStreams(dg,nd);
- tag=dg->numArcs+nd->id; /* make these to avoid clash with arc tags */
- MPI_Send(&chksum,1,MPI_DOUBLE,0,tag,MPI_COMM_WORLD);
- }else{
- feat=CombineStreams(dg,nd);
- SendResults(dg,nd,feat);
- }
- }
-
- TRACE_smpi_set_category ("ProcessNodes");
-
-
- if(me==0){ /* Report node */
- rchksum=0.0;
- chksum=0.0;
- for(i=0;i<dg->numNodes;i++){
- nd=dg->node[i];
- if(!strstr(nd->name,"Sink")) continue;
- tag=dg->numArcs+nd->id; /* make these to avoid clash with arc tags */
- MPI_Recv(&rchksum,1,MPI_DOUBLE,nd->address,tag,MPI_COMM_WORLD,&status);
- chksum+=rchksum;
- }
- verified=verify(dg->name,chksum);
- }
-return verified;
-}
-
-int main(int argc,char **argv ){
- int my_rank,comm_size;
- int i;
- DGraph *dg=NULL;
- int verified=0, featnum=0;
- double bytes_sent=2.0,tot_time=0.0;
-
-
-
- MPI_Init( &argc, &argv );
- MPI_Comm_rank( MPI_COMM_WORLD, &my_rank );
- MPI_Comm_size( MPI_COMM_WORLD, &comm_size );
- TRACE_smpi_set_category ("begin");
-
- if(argc!=2||
- ( strncmp(argv[1],"BH",2)!=0
- &&strncmp(argv[1],"WH",2)!=0
- &&strncmp(argv[1],"SH",2)!=0
- )
- ){
- if(my_rank==0){
- fprintf(stderr,"** Usage: mpirun -np N ../bin/dt.S GraphName\n");
- fprintf(stderr,"** Where \n - N is integer number of MPI processes\n");
- fprintf(stderr," - S is the class S, W, or A \n");
- fprintf(stderr," - GraphName is the communication graph name BH, WH, or SH.\n");
- fprintf(stderr," - the number of MPI processes N should not be be less than \n");
- fprintf(stderr," the number of nodes in the graph\n");
- }
- MPI_Finalize();
- exit(0);
- }
- if(strncmp(argv[1],"BH",2)==0){
- dg=buildBH(CLASS);
- }else if(strncmp(argv[1],"WH",2)==0){
- dg=buildWH(CLASS);
- }else if(strncmp(argv[1],"SH",2)==0){
- dg=buildSH(CLASS);
- }
-
- if(timer_on&&dg->numNodes+1>timers_tot){
- timer_on=0;
- if(my_rank==0)
- fprintf(stderr,"Not enough timers. Node timeing is off. \n");
- }
- if(dg->numNodes>comm_size){
- if(my_rank==0){
- fprintf(stderr,"** The number of MPI processes should not be less than \n");
- fprintf(stderr,"** the number of nodes in the graph\n");
- fprintf(stderr,"** Number of MPI processes = %d\n",comm_size);
- fprintf(stderr,"** Number nodes in the graph = %d\n",dg->numNodes);
- }
- MPI_Finalize();
- exit(0);
- }
- for(i=0;i<dg->numNodes;i++){
- dg->node[i]->address=i;
- }
- if( my_rank == 0 ){
- printf( "\n\n NAS Parallel Benchmarks 3.3 -- DT Benchmark\n\n" );
- graphShow(dg,0);
- timer_clear(0);
- timer_start(0);
- }
-
- verified=ProcessNodes(dg,my_rank);
- TRACE_smpi_set_category ("end");
-
- featnum=NUM_SAMPLES*fielddim;
- bytes_sent=featnum*dg->numArcs;
- bytes_sent/=1048576;
- if(my_rank==0){
- timer_stop(0);
- tot_time=timer_read(0);
- c_print_results( dg->name,
- CLASS,
- featnum,
- 0,
- 0,
- dg->numNodes,
- 0,
- comm_size,
- tot_time,
- bytes_sent/tot_time,
- "bytes transmitted",
- verified,
- NPBVERSION,
- COMPILETIME,
- MPICC,
- CLINK,
- CMPI_LIB,
- CMPI_INC,
- CFLAGS,
- CLINKFLAGS );
- }
- MPI_Finalize();
- return 1;
-}
SHELL=/bin/sh
BENCHMARK=dt
-BENCHMARKU=DT
include ../config/make.def
#Override PROGRAM
DTPROGRAM = $(BINDIR)/$(BENCHMARK).$(CLASS)
-OBJS = dt.o DGraph.o \
+OBJS = dt.o DGraph.o \
+ ${COMMON}/c_print_results.o ${COMMON}/c_timers.o ${COMMON}/c_randdp.o
+
+OBJS-F = dt-folding.o DGraph.o \
${COMMON}/c_print_results.o ${COMMON}/c_timers.o ${COMMON}/c_randdp.o
-${PROGRAM}: config ${OBJS}
+${PROGRAM}: config ${OBJS} ${OBJS-F}
${CLINK} ${CLINKFLAGS} -o ${DTPROGRAM} ${OBJS} ${CMPI_LIB}
+ ${CLINK} ${CLINKFLAGS} -o ${DTPROGRAM}-folding ${OBJS-F} ${CMPI_LIB}
.c.o:
${CCOMPILE} $<
dt.o: dt.c npbparams.h
+dt-folding.o: dt-folding.c npbparams.h
DGraph.o: DGraph.c DGraph.h
clean:
- rm -f *.o *~ mputil*
- - rm -f dt npbparams.h core
+ - rm -f dt dt-folding npbparams.h
#include "mpi.h"
#include "npbparams.h"
+#include "simgrid/instr.h" //TRACE_
+
#ifndef CLASS
#define CLASS 'S'
#define NUM_PROCS 1
DGArc *ar=NULL;
DGNode *head=NULL;
if(!feat) return 0;
+ TRACE_smpi_set_category ("SendResults");
for(i=0;i<nd->outDegree;i++){
ar=nd->outArc[i];
if(ar->tail!=nd) continue;
MPI_Send(feat->val,feat->len,MPI_DOUBLE,head->address,tag,MPI_COMM_WORLD);
}
}
+ TRACE_smpi_set_category (NULL);
return 1;
}
Arr* CombineStreams(DGraph *dg,DGNode *nd){
Arr *feat=NULL;
double retv=0.0;
+ TRACE_smpi_set_category ("ReduceStreams");
+
for(i=0;i<nd->inDegree;i++){
ar=nd->inArc[i];
if(ar->head!=nd) continue;
double rchksum=0.0;
MPI_Status status;
+ TRACE_smpi_set_category ("ProcessNodes");
+
for(i=0;i<dg->numNodes;i++){
nd=dg->node[i];
if(nd->address!=me) continue;
SendResults(dg,nd,feat);
}
}
+
+ TRACE_smpi_set_category ("ProcessNodes");
+
if(me==0){ /* Report node */
rchksum=0.0;
chksum=0.0;
MPI_Init( &argc, &argv );
MPI_Comm_rank( MPI_COMM_WORLD, &my_rank );
MPI_Comm_size( MPI_COMM_WORLD, &comm_size );
+ TRACE_smpi_set_category ("begin");
if(argc!=2||
( strncmp(argv[1],"BH",2)!=0
timer_start(0);
}
verified=ProcessNodes(dg,my_rank);
-
+ TRACE_smpi_set_category ("end");
+
featnum=NUM_SAMPLES*fielddim;
bytes_sent=featnum*dg->numArcs;
bytes_sent/=1048576;
+++ /dev/null
-SHELL=/bin/sh
-BENCHMARK=ep
-BENCHMARKU=EP
-
-include ../config/make.def
-
-#OBJS = ep.o ${COMMON}/print_results.o ${COMMON}/${RAND}.o ${COMMON}/timers.o
-OBJS = ep.o randlc.o
-
-include ../sys/make.common
-
-${PROGRAM}: config ${OBJS}
-# ${FLINK} ${FLINKFLAGS} -o ${PROGRAM} ${OBJS} ${FMPI_LIB}
- ${CLINK} ${CLINKFLAGS} -o ${PROGRAM} ${OBJS} ${CMPI_LIB}
-
-
-#ep.o: ep.f mpinpb.h npbparams.h
-# ${FCOMPILE} ep.f
-
-ep.o: ep.c randlc.c ../EP/mpinpb.h npbparams.h
- ${CCOMPILE} ep.c
-
-clean:
- - rm -f *.o *~
- - rm -f npbparams.h core
-
-
-
+++ /dev/null
-This code implements the random-number generator described in the
-NAS Parallel Benchmark document RNR Technical Report RNR-94-007.
-The code is "embarrassingly" parallel in that no communication is
-required for the generation of the random numbers itself. There is
-no special requirement on the number of processors used for running
-the benchmark.
+++ /dev/null
-
-/*
- * FUNCTION RANDLC (X, A)
- *
- * This routine returns a uniform pseudorandom double precision number in the
- * range (0, 1) by using the linear congruential generator
- *
- * x_{k+1} = a x_k (mod 2^46)
- *
- * where 0 < x_k < 2^46 and 0 < a < 2^46. This scheme generates 2^44 numbers
- * before repeating. The argument A is the same as 'a' in the above formula,
- * and X is the same as x_0. A and X must be odd double precision integers
- * in the range (1, 2^46). The returned value RANDLC is normalized to be
- * between 0 and 1, i.e. RANDLC = 2^(-46) * x_1. X is updated to contain
- * the new seed x_1, so that subsequent calls to RANDLC using the same
- * arguments will generate a continuous sequence.
- *
- * This routine should produce the same results on any computer with at least
- * 48 mantissa bits in double precision floating point data. On Cray systems,
- * double precision should be disabled.
- *
- * David H. Bailey October 26, 1990
- *
- * IMPLICIT DOUBLE PRECISION (A-H, O-Z)
- * SAVE KS, R23, R46, T23, T46
- * DATA KS/0/
- *
- * If this is the first call to RANDLC, compute R23 = 2 ^ -23, R46 = 2 ^ -46,
- * T23 = 2 ^ 23, and T46 = 2 ^ 46. These are computed in loops, rather than
- * by merely using the ** operator, in order to insure that the results are
- * exact on all systems. This code assumes that 0.5D0 is represented exactly.
- */
-
-
-/*****************************************************************/
-/************* R A N D L C ************/
-/************* ************/
-/************* portable random number generator ************/
-/*****************************************************************/
-
-double randlc( double *X, double *A )
-{
- static int KS=0;
- static double R23, R46, T23, T46;
- double T1, T2, T3, T4;
- double A1;
- double A2;
- double X1;
- double X2;
- double Z;
- int i, j;
-
- if (KS == 0)
- {
- R23 = 1.0;
- R46 = 1.0;
- T23 = 1.0;
- T46 = 1.0;
-
- for (i=1; i<=23; i++)
- {
- R23 = 0.50 * R23;
- T23 = 2.0 * T23;
- }
- for (i=1; i<=46; i++)
- {
- R46 = 0.50 * R46;
- T46 = 2.0 * T46;
- }
- KS = 1;
- }
-
-/* Break A into two parts such that A = 2^23 * A1 + A2 and set X = N. */
-
- T1 = R23 * *A;
- j = T1;
- A1 = j;
- A2 = *A - T23 * A1;
-
-/* Break X into two parts such that X = 2^23 * X1 + X2, compute
- Z = A1 * X2 + A2 * X1 (mod 2^23), and then
- X = 2^23 * Z + A2 * X2 (mod 2^46). */
-
- T1 = R23 * *X;
- j = T1;
- X1 = j;
- X2 = *X - T23 * X1;
- T1 = A1 * X2 + A2 * X1;
-
- j = R23 * T1;
- T2 = j;
- Z = T1 - T23 * T2;
- T3 = T23 * Z + A2 * X2;
- j = R46 * T3;
- T4 = j;
- *X = T3 - T46 * T4;
- return(R46 * *X);
-}
-
-
-
-/*****************************************************************/
-/************ F I N D _ M Y _ S E E D ************/
-/************ ************/
-/************ returns parallel random number seq seed ************/
-/*****************************************************************/
-
+++ /dev/null
-
-double randlc( double *X, double *A );
-
+++ /dev/null
-SHELL=/bin/sh
-BENCHMARK=ep
-BENCHMARKU=EP
-
-include ../config/make.def
-
-#OBJS = ep-trace.o ${COMMON}/print_results.o ${COMMON}/${RAND}.o ${COMMON}/timers.o
-OBJS = ep-trace.o randlc.o
-
-include ../sys/make.common
-
-${PROGRAM}: config ${OBJS}
-# ${FLINK} ${FLINKFLAGS} -o ${PROGRAM} ${OBJS} ${FMPI_LIB}
- ${CLINK} ${CLINKFLAGS} -o ${PROGRAM} ${OBJS} ${CMPI_LIB}
-
-
-#ep-trace.o: ep-trace.f mpinpb.h npbparams.h
-# ${FCOMPILE} ep-trace.f
-
-ep-trace.o: ep-trace.c randlc.c mpinpb.h npbparams.h
- ${CCOMPILE} ep-trace.c
-
-clean:
- - rm -f *.o *~
- - rm -f npbparams.h core
-
-
-
+++ /dev/null
-This code implements the random-number generator described in the
-NAS Parallel Benchmark document RNR Technical Report RNR-94-007.
-The code is "embarrassingly" parallel in that no communication is
-required for the generation of the random numbers itself. There is
-no special requirement on the number of processors used for running
-the benchmark.
+++ /dev/null
-#include <stdlib.h>
-#include <stdio.h>
-#include <string.h>
-#include <math.h>
-
-#include "mpi.h"
-#include "npbparams.h"
-
-#include "simgrid/instr.h" //TRACE_
-
-#include "randlc.h"
-
-#ifndef CLASS
-#define CLASS 'S'
-#define NUM_PROCS 1
-#endif
-#define true 1
-#define false 0
-
-
-//---NOTE : all the timers function have been modified to
-// avoid global timers (privatize these).
- // ----------------------- timers ---------------------
- void timer_clear(double *onetimer) {
- //elapsed[n] = 0.0;
- *onetimer = 0.0;
- }
-
- void timer_start(double *onetimer) {
- *onetimer = MPI_Wtime();
- }
-
- void timer_stop(int n,double *elapsed,double *start) {
- double t, now;
-
- now = MPI_Wtime();
- t = now - start[n];
- elapsed[n] += t;
- }
-
- double timer_read(int n, double *elapsed) { /* ok, useless, but jsut to keep function call */
- return(elapsed[n]);
- }
- /********************************************************************
- ***************** V R A N L C ******************
- ***************** *****************/
- double vranlc(int n, double x, double a, double *y)
- {
- int i;
- long i246m1=0x00003FFFFFFFFFFF;
- long LLx, Lx, La;
- double d2m46;
-
-// This doesn't work, because the compiler does the calculation in 32
-// bits and overflows. No standard way (without f90 stuff) to specify
-// that the rhs should be done in 64 bit arithmetic.
-// parameter(i246m1=2**46-1)
-
- d2m46=pow(0.5,46);
-
-// c Note that the v6 compiler on an R8000 does something stupid with
-// c the above. Using the following instead (or various other things)
-// c makes the calculation run almost 10 times as fast.
-//
-// c save d2m46
-// c data d2m46/0.0d0/
-// c if (d2m46 .eq. 0.0d0) then
-// c d2m46 = 0.5d0**46
-// c endif
-
- Lx = (long)x;
- La = (long)a;
- //fprintf(stdout,("================== Vranlc ================");
- //fprintf(stdout,("Before Loop: Lx = " + Lx + ", La = " + La);
- LLx = Lx;
- for (i=0; i< n; i++) {
- Lx = Lx*La & i246m1 ;
- LLx = Lx;
- y[i] = d2m46 * (double)LLx;
- /*
- if(i == 0) {
- fprintf(stdout,("After loop 0:");
- fprintf(stdout,("Lx = " + Lx + ", La = " + La);
- fprintf(stdout,("d2m46 = " + d2m46);
- fprintf(stdout,("LLX(Lx) = " + LLX.doubleValue());
- fprintf(stdout,("Y[0]" + y[0]);
- }
- */
- }
-
- x = (double)LLx;
- /*
- fprintf(stdout,("Change: Lx = " + Lx);
- fprintf(stdout,("=============End Vranlc ================");
- */
- return x;
- }
-
-
-
-//-------------- the core (unique function) -----------
- void doTest(int argc, char **argv) {
- double dum[3] = {1.,1.,1.};
- double x1, x2, sx, sy, tm, an, tt, gc;
- double Mops;
- double epsilon=1.0E-8, a = 1220703125., s=271828183.;
- double t1, t2, t3, t4;
- double sx_verify_value, sy_verify_value, sx_err, sy_err;
-
-#include "npbparams.h"
- int mk=16,
- // --> set by make : in npbparams.h
- //m=28, // for CLASS=A
- //m=30, // for CLASS=B
- //npm=2, // NPROCS
- mm = m-mk,
- nn = (int)(pow(2,mm)),
- nk = (int)(pow(2,mk)),
- nq=10,
- np,
- node,
- no_nodes,
- i,
- ik,
- kk,
- l,
- k, nit, no_large_nodes,
- np_add, k_offset, j;
- int me, nprocs, root=0, dp_type;
- int verified,
- timers_enabled=true;
- char size[500]; // mind the size of the string to represent a big number
-
- //Use in randlc..
- int KS = 0;
- double R23, R46, T23, T46;
-
- double *qq = (double *) malloc (10000*sizeof(double));
- double *start = (double *) malloc (64*sizeof(double));
- double *elapsed = (double *) malloc (64*sizeof(double));
-
- double *x = (double *) malloc (2*nk*sizeof(double));
- double *q = (double *) malloc (nq*sizeof(double));
-
- TRACE_smpi_set_category ("start");
-
- MPI_Init( &argc, &argv );
- MPI_Comm_size( MPI_COMM_WORLD, &no_nodes);
- MPI_Comm_rank( MPI_COMM_WORLD, &node);
-
-#ifdef USE_MPE
- MPE_Init_log();
-#endif
- root = 0;
- if (node == root ) {
-
- /* Because the size of the problem is too large to store in a 32-bit
- * integer for some classes, we put it into a string (for printing).
- * Have to strip off the decimal point put in there by the floating
- * point print statement (internal file)
- */
- fprintf(stdout," NAS Parallel Benchmarks 3.2 -- EP Benchmark");
- sprintf(size,"%d",pow(2,m+1));
- //size = size.replace('.', ' ');
- fprintf(stdout," Number of random numbers generated: %s\n",size);
- fprintf(stdout," Number of active processes: %d\n",no_nodes);
-
- }
- verified = false;
-
- /* c Compute the number of "batches" of random number pairs generated
- c per processor. Adjust if the number of processors does not evenly
- c divide the total number
-*/
-
- np = nn / no_nodes;
- no_large_nodes = nn % no_nodes;
- if (node < no_large_nodes) np_add = 1;
- else np_add = 0;
- np = np + np_add;
-
- if (np == 0) {
- fprintf(stdout,"Too many nodes: %d %d",no_nodes,nn);
- MPI_Abort(MPI_COMM_WORLD,1);
- exit(0);
- }
-
-/* c Call the random number generator functions and initialize
- c the x-array to reduce the effects of paging on the timings.
- c Also, call all mathematical functions that are used. Make
- c sure these initializations cannot be eliminated as dead code.
-*/
-
- //call vranlc(0, dum[1], dum[2], dum[3]);
- // Array indexes start at 1 in Fortran, 0 in Java
- vranlc(0, dum[0], dum[1], &(dum[2]));
-
- dum[0] = randlc(&(dum[1]),&(dum[2]));
- /////////////////////////////////
- for (i=0;i<2*nk;i++) {
- x[i] = -1e99;
- }
- Mops = log(sqrt(abs(1)));
-
- /*
- c---------------------------------------------------------------------
- c Synchronize before placing time stamp
- c---------------------------------------------------------------------
- */
- MPI_Barrier( MPI_COMM_WORLD );
-
-
- TRACE_smpi_set_category ("ep");
-
- timer_clear(&(elapsed[1]));
- timer_clear(&(elapsed[2]));
- timer_clear(&(elapsed[3]));
- timer_start(&(start[1]));
-
- t1 = a;
- //fprintf(stdout,("(ep.f:160) t1 = " + t1);
- t1 = vranlc(0, t1, a, x);
- //fprintf(stdout,("(ep.f:161) t1 = " + t1);
-
-
-/* c Compute AN = A ^ (2 * NK) (mod 2^46). */
-
- t1 = a;
- //fprintf(stdout,("(ep.f:165) t1 = " + t1);
- for (i=1; i <= mk+1; i++) {
- t2 = randlc(&t1, &t1);
- //fprintf(stdout,("(ep.f:168)[loop i=" + i +"] t1 = " + t1);
- }
- an = t1;
- //fprintf(stdout,("(ep.f:172) s = " + s);
- tt = s;
- gc = 0.;
- sx = 0.;
- sy = 0.;
- for (i=0; i < nq ; i++) {
- q[i] = 0.;
- }
-
-/*
- Each instance of this loop may be performed independently. We compute
- the k offsets separately to take into account the fact that some nodes
- have more numbers to generate than others
-*/
-
- if (np_add == 1)
- k_offset = node * np -1;
- else
- k_offset = no_large_nodes*(np+1) + (node-no_large_nodes)*np -1;
-
- int stop = false;
- for(k = 1; k <= np; k++) {
- stop = false;
- kk = k_offset + k ;
- t1 = s;
- //fprintf(stdout,("(ep.f:193) t1 = " + t1);
- t2 = an;
-
-// Find starting seed t1 for this kk.
-
- for (i=1;i<=100 && !stop;i++) {
- ik = kk / 2;
- //fprintf(stdout,("(ep.f:199) ik = " +ik+", kk = " + kk);
- if (2 * ik != kk) {
- t3 = randlc(&t1, &t2);
- //fprintf(stdout,("(ep.f:200) t1= " +t1 );
- }
- if (ik==0)
- stop = true;
- else {
- t3 = randlc(&t2, &t2);
- kk = ik;
- }
- }
-// Compute uniform pseudorandom numbers.
-
- //if (timers_enabled) timer_start(3);
- timer_start(&(start[3]));
- //call vranlc(2 * nk, t1, a, x) --> t1 and y are modified
-
- //fprintf(stdout,">>>>>>>>>>>Before vranlc(l.210)<<<<<<<<<<<<<");
- //fprintf(stdout,"2*nk = " + (2*nk));
- //fprintf(stdout,"t1 = " + t1);
- //fprintf(stdout,"a = " + a);
- //fprintf(stdout,"x[0] = " + x[0]);
- //fprintf(stdout,">>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<");
-
- t1 = vranlc(2 * nk, t1, a, x);
-
- //fprintf(stdout,(">>>>>>>>>>>After Enter vranlc (l.210)<<<<<<");
- //fprintf(stdout,("2*nk = " + (2*nk));
- //fprintf(stdout,("t1 = " + t1);
- //fprintf(stdout,("a = " + a);
- //fprintf(stdout,("x[0] = " + x[0]);
- //fprintf(stdout,(">>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<");
-
- //if (timers_enabled) timer_stop(3);
- timer_stop(3,elapsed,start);
-
-/* Compute Gaussian deviates by acceptance-rejection method and
- * tally counts in concentric square annuli. This loop is not
- * vectorizable.
- */
- //if (timers_enabled) timer_start(2);
- timer_start(&(start[2]));
- for(i=1; i<=nk;i++) {
- x1 = 2. * x[2*i-2] -1.0;
- x2 = 2. * x[2*i-1] - 1.0;
- t1 = x1*x1 + x2*x2;
- if (t1 <= 1.) {
- t2 = sqrt(-2. * log(t1) / t1);
- t3 = (x1 * t2);
- t4 = (x2 * t2);
- l = (int)(abs(t3) > abs(t4) ? abs(t3) : abs(t4));
- q[l] = q[l] + 1.;
- sx = sx + t3;
- sy = sy + t4;
- }
- /*
- if(i == 1) {
- fprintf(stdout,"x1 = " + x1);
- fprintf(stdout,"x2 = " + x2);
- fprintf(stdout,"t1 = " + t1);
- fprintf(stdout,"t2 = " + t2);
- fprintf(stdout,"t3 = " + t3);
- fprintf(stdout,"t4 = " + t4);
- fprintf(stdout,"l = " + l);
- fprintf(stdout,"q[l] = " + q[l]);
- fprintf(stdout,"sx = " + sx);
- fprintf(stdout,"sy = " + sy);
- }
- */
- }
- //if (timers_enabled) timer_stop(2);
- timer_stop(2,elapsed,start);
- }
-
- TRACE_smpi_set_category ("finalize");
-
- //int MPI_Allreduce(void *sbuf, void *rbuf, int count, MPI_Datatype dtype, MPI_Op op, MPI_Comm comm)
- MPI_Allreduce(&sx, x, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
- sx = x[0]; //FIXME : x[0] or x[1] => x[0] because fortran starts with 1
- MPI_Allreduce(&sy, x, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
- sy = x[0];
- MPI_Allreduce(q, x, nq, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
-
- for(i = 0; i < nq; i++) {
- q[i] = x[i];
- }
- for(i = 0; i < nq; i++) {
- gc += q[i];
- }
-
- timer_stop(1,elapsed,start);
- tm = timer_read(1,elapsed);
- MPI_Allreduce(&tm, x, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD);
- tm = x[0];
-
- if(node == root) {
- nit = 0;
- verified = true;
-
- if(m == 24) {
- sx_verify_value = -3.247834652034740E3;
- sy_verify_value = -6.958407078382297E3;
- } else if(m == 25) {
- sx_verify_value = -2.863319731645753E3;
- sy_verify_value = -6.320053679109499E3;
- } else if(m == 28) {
- sx_verify_value = -4.295875165629892E3;
- sy_verify_value = -1.580732573678431E4;
- } else if(m == 30) {
- sx_verify_value = 4.033815542441498E4;
- sy_verify_value = -2.660669192809235E4;
- } else if(m == 32) {
- sx_verify_value = 4.764367927995374E4;
- sy_verify_value = -8.084072988043731E4;
- } else if(m == 36) {
- sx_verify_value = 1.982481200946593E5;
- sy_verify_value = -1.020596636361769E5;
- } else {
- verified = false;
- }
-
- /*
- fprintf(stdout,("sx = " + sx);
- fprintf(stdout,("sx_verify = " + sx_verify_value);
- fprintf(stdout,("sy = " + sy);
- fprintf(stdout,("sy_verify = " + sy_verify_value);
- */
- if(verified) {
- sx_err = abs((sx - sx_verify_value)/sx_verify_value);
- sy_err = abs((sy - sy_verify_value)/sy_verify_value);
- /*
- fprintf(stdout,("sx_err = " + sx_err);
- fprintf(stdout,("sy_err = " + sx_err);
- fprintf(stdout,("epsilon= " + epsilon);
- */
- verified = ((sx_err < epsilon) && (sy_err < epsilon));
- }
-
- Mops = (pow(2.0, m+1))/tm/1000;
-
- fprintf(stdout,"EP Benchmark Results:\n");
- fprintf(stdout,"CPU Time=%d\n",tm);
- fprintf(stdout,"N = 2^%d\n",m);
- fprintf(stdout,"No. Gaussain Pairs =%d\n",gc);
- fprintf(stdout,"Sum = %f %ld\n",sx,sy);
- fprintf(stdout,"Count:");
- for(i = 0; i < nq; i++) {
- fprintf(stdout,"%d\t %ld\n",i,q[i]);
- }
-
- /*
- print_results("EP", _class, m+1, 0, 0, nit, npm, no_nodes, tm, Mops,
- "Random numbers generated", verified, npbversion,
- compiletime, cs1, cs2, cs3, cs4, cs5, cs6, cs7) */
- fprintf(stdout,"\nEP Benchmark Completed\n");
- fprintf(stdout,"Class = %s\n", _class);
- fprintf(stdout,"Size = %s\n", size);
- fprintf(stdout,"Iteration = %d\n", nit);
- fprintf(stdout,"Time in seconds = %f\n",(tm/1000));
- fprintf(stdout,"Total processes = %d\n",no_nodes);
- fprintf(stdout,"Mops/s total = %f\n",Mops);
- fprintf(stdout,"Mops/s/process = %f\n", Mops/no_nodes);
- fprintf(stdout,"Operation type = Random number generated\n");
- if(verified) {
- fprintf(stdout,"Verification = SUCCESSFUL\n");
- } else {
- fprintf(stdout,"Verification = UNSUCCESSFUL\n");
- }
- fprintf(stdout,"Total time: %f\n",(timer_read(1,elapsed)/1000));
- fprintf(stdout,"Gaussian pairs: %f\n",(timer_read(2,elapsed)/1000));
- fprintf(stdout,"Random numbers: %f\n",(timer_read(3,elapsed)/1000));
- }
-#ifdef USE_MPE
- MPE_Finish_log(argv[0]);
-#endif
-
- MPI_Finalize();
- }
-
- int main(int argc, char **argv) {
- doTest(argc,argv);
- }
+++ /dev/null
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- include 'mpif.h'
-
- integer me, nprocs, root, dp_type
- common /mpistuff/ me, nprocs, root, dp_type
-
+++ /dev/null
-
-/*
- * FUNCTION RANDLC (X, A)
- *
- * This routine returns a uniform pseudorandom double precision number in the
- * range (0, 1) by using the linear congruential generator
- *
- * x_{k+1} = a x_k (mod 2^46)
- *
- * where 0 < x_k < 2^46 and 0 < a < 2^46. This scheme generates 2^44 numbers
- * before repeating. The argument A is the same as 'a' in the above formula,
- * and X is the same as x_0. A and X must be odd double precision integers
- * in the range (1, 2^46). The returned value RANDLC is normalized to be
- * between 0 and 1, i.e. RANDLC = 2^(-46) * x_1. X is updated to contain
- * the new seed x_1, so that subsequent calls to RANDLC using the same
- * arguments will generate a continuous sequence.
- *
- * This routine should produce the same results on any computer with at least
- * 48 mantissa bits in double precision floating point data. On Cray systems,
- * double precision should be disabled.
- *
- * David H. Bailey October 26, 1990
- *
- * IMPLICIT DOUBLE PRECISION (A-H, O-Z)
- * SAVE KS, R23, R46, T23, T46
- * DATA KS/0/
- *
- * If this is the first call to RANDLC, compute R23 = 2 ^ -23, R46 = 2 ^ -46,
- * T23 = 2 ^ 23, and T46 = 2 ^ 46. These are computed in loops, rather than
- * by merely using the ** operator, in order to insure that the results are
- * exact on all systems. This code assumes that 0.5D0 is represented exactly.
- */
-
-
-/*****************************************************************/
-/************* R A N D L C ************/
-/************* ************/
-/************* portable random number generator ************/
-/*****************************************************************/
-
-double randlc( double *X, double *A )
-{
- static int KS=0;
- static double R23, R46, T23, T46;
- double T1, T2, T3, T4;
- double A1;
- double A2;
- double X1;
- double X2;
- double Z;
- int i, j;
-
- if (KS == 0)
- {
- R23 = 1.0;
- R46 = 1.0;
- T23 = 1.0;
- T46 = 1.0;
-
- for (i=1; i<=23; i++)
- {
- R23 = 0.50 * R23;
- T23 = 2.0 * T23;
- }
- for (i=1; i<=46; i++)
- {
- R46 = 0.50 * R46;
- T46 = 2.0 * T46;
- }
- KS = 1;
- }
-
-/* Break A into two parts such that A = 2^23 * A1 + A2 and set X = N. */
-
- T1 = R23 * *A;
- j = T1;
- A1 = j;
- A2 = *A - T23 * A1;
-
-/* Break X into two parts such that X = 2^23 * X1 + X2, compute
- Z = A1 * X2 + A2 * X1 (mod 2^23), and then
- X = 2^23 * Z + A2 * X2 (mod 2^46). */
-
- T1 = R23 * *X;
- j = T1;
- X1 = j;
- X2 = *X - T23 * X1;
- T1 = A1 * X2 + A2 * X1;
-
- j = R23 * T1;
- T2 = j;
- Z = T1 - T23 * T2;
- T3 = T23 * Z + A2 * X2;
- j = R46 * T3;
- T4 = j;
- *X = T3 - T46 * T4;
- return(R46 * *X);
-}
-
-
-
-/*****************************************************************/
-/************ F I N D _ M Y _ S E E D ************/
-/************ ************/
-/************ returns parallel random number seq seed ************/
-/*****************************************************************/
-
+++ /dev/null
-
-double randlc( double *X, double *A );
-
include ../config/make.def
-#OBJS = ep.o ${COMMON}/print_results.o ${COMMON}/${RAND}.o ${COMMON}/timers.o
OBJS = ep.o randlc.o
+OBJS-S = ep-sampling.o randlc.o
include ../sys/make.common
-${PROGRAM}: config ${OBJS}
-# ${FLINK} ${FLINKFLAGS} -o ${PROGRAM} ${OBJS} ${FMPI_LIB}
- ${CLINK} ${CLINKFLAGS} -o ${PROGRAM} ${OBJS} ${CMPI_LIB}
-
-
-#ep.o: ep.f mpinpb.h npbparams.h
-# ${FCOMPILE} ep.f
+${PROGRAM}: config ${OBJS} ${OBJS-S}
+ ${CLINK} ${CLINKFLAGS} -o ${PROGRAM} ${OBJS} ${CMPI_LIB} -lm
+ ${CLINK} ${CLINKFLAGS} -o ${PROGRAM}-sampling ${OBJS-S} ${CMPI_LIB} -lm
ep.o: ep.c randlc.c mpinpb.h npbparams.h
${CCOMPILE} ep.c
+ep-sampling.o: ep-sampling.c randlc.c mpinpb.h npbparams.h
+ ${CCOMPILE} ep-sampling.c
clean:
- rm -f *.o *~
- - rm -f npbparams.h core
+ - rm -f npbparams.h
* point print statement (internal file)
*/
fprintf(stdout," NAS Parallel Benchmarks 3.2 -- EP Benchmark");
- sprintf(size,"%d",pow(2,m+1));
+ sprintf(size,"%d",(int) pow(2,m+1));
//size = size.replace('.', ' ');
fprintf(stdout," Number of random numbers generated: %s\n",size);
fprintf(stdout," Number of active processes: %d\n",no_nodes);
Mops = (pow(2.0, m+1))/tm/1000;
fprintf(stdout,"EP Benchmark Results:\n");
- fprintf(stdout,"CPU Time=%d\n",tm);
+ fprintf(stdout,"CPU Time=%d\n",(int) tm);
fprintf(stdout,"N = 2^%d\n",m);
- fprintf(stdout,"No. Gaussain Pairs =%d\n",gc);
- fprintf(stdout,"Sum = %f %ld\n",sx,sy);
+ fprintf(stdout,"No. Gaussain Pairs =%d\n",(int) gc);
+ fprintf(stdout,"Sum = %f %ld\n",sx,(long) sy);
fprintf(stdout,"Count:");
for(i = 0; i < nq; i++) {
- fprintf(stdout,"%d\t %ld\n",i,q[i]);
+ fprintf(stdout,"%d\t %ld\n",i,(long) q[i]);
}
/*
#include "mpi.h"
#include "npbparams.h"
+#include "simgrid/instr.h" //TRACE_
+
#include "randlc.h"
#ifndef CLASS
#define true 1
#define false 0
-
//---NOTE : all the timers function have been modified to
// avoid global timers (privatize these).
// ----------------------- timers ---------------------
double *x = (double *) malloc (2*nk*sizeof(double));
double *q = (double *) malloc (nq*sizeof(double));
+ TRACE_smpi_set_category ("start");
+
MPI_Init( &argc, &argv );
MPI_Comm_size( MPI_COMM_WORLD, &no_nodes);
MPI_Comm_rank( MPI_COMM_WORLD, &node);
* point print statement (internal file)
*/
fprintf(stdout," NAS Parallel Benchmarks 3.2 -- EP Benchmark");
- sprintf(size,"%d",pow(2,m+1));
+ sprintf(size,"%d",(int)pow(2,m+1));
//size = size.replace('.', ' ');
fprintf(stdout," Number of random numbers generated: %s\n",size);
fprintf(stdout," Number of active processes: %d\n",no_nodes);
*/
MPI_Barrier( MPI_COMM_WORLD );
+ TRACE_smpi_set_category ("ep");
+
timer_clear(&(elapsed[1]));
timer_clear(&(elapsed[2]));
timer_clear(&(elapsed[3]));
timer_stop(2,elapsed,start);
}
+ TRACE_smpi_set_category ("finalize");
+
//int MPI_Allreduce(void *sbuf, void *rbuf, int count, MPI_Datatype dtype, MPI_Op op, MPI_Comm comm)
MPI_Allreduce(&sx, x, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
sx = x[0]; //FIXME : x[0] or x[1] => x[0] because fortran starts with 1
Mops = (pow(2.0, m+1))/tm/1000;
fprintf(stdout,"EP Benchmark Results:\n");
- fprintf(stdout,"CPU Time=%d\n",tm);
+ fprintf(stdout,"CPU Time=%d\n",(int) tm);
fprintf(stdout,"N = 2^%d\n",m);
- fprintf(stdout,"No. Gaussain Pairs =%d\n",gc);
- fprintf(stdout,"Sum = %f %ld\n",sx,sy);
+ fprintf(stdout,"No. Gaussain Pairs =%d\n",(int) gc);
+ fprintf(stdout,"Sum = %f %ld\n",sx,(long) sy);
fprintf(stdout,"Count:");
for(i = 0; i < nq; i++) {
- fprintf(stdout,"%d\t %ld\n",i,q[i]);
+ fprintf(stdout,"%d\t %ld\n",i,(long) q[i]);
}
/*
+++ /dev/null
-!-------------------------------------------------------------------------!
-! !
-! N A S P A R A L L E L B E N C H M A R K S 3.3 !
-! !
-! E P !
-! !
-!-------------------------------------------------------------------------!
-! !
-! This benchmark is part of the NAS Parallel Benchmark 3.3 suite. !
-! It is described in NAS Technical Reports 95-020 and 02-007 !
-! !
-! Permission to use, copy, distribute and modify this software !
-! for any purpose with or without fee is hereby granted. We !
-! request, however, that all derived work reference the NAS !
-! Parallel Benchmarks 3.3. This software is provided "as is" !
-! without express or implied warranty. !
-! !
-! Information on NPB 3.3, including the technical report, the !
-! original specifications, source code, results and information !
-! on how to submit new results, is available at: !
-! !
-! http://www.nas.nasa.gov/Software/NPB/ !
-! !
-! Send comments or suggestions to npb@nas.nasa.gov !
-! !
-! NAS Parallel Benchmarks Group !
-! NASA Ames Research Center !
-! Mail Stop: T27A-1 !
-! Moffett Field, CA 94035-1000 !
-! !
-! E-mail: npb@nas.nasa.gov !
-! Fax: (650) 604-3957 !
-! !
-!-------------------------------------------------------------------------!
-
-
-c---------------------------------------------------------------------
-c
-c Authors: P. O. Frederickson
-c D. H. Bailey
-c A. C. Woo
-c R. F. Van der Wijngaart
-c---------------------------------------------------------------------
-
-c---------------------------------------------------------------------
- program EMBAR
-c---------------------------------------------------------------------
-C
-c This is the MPI version of the APP Benchmark 1,
-c the "embarassingly parallel" benchmark.
-c
-c
-c M is the Log_2 of the number of complex pairs of uniform (0, 1) random
-c numbers. MK is the Log_2 of the size of each batch of uniform random
-c numbers. MK can be set for convenience on a given system, since it does
-c not affect the results.
-
- implicit none
-
- include 'npbparams.h'
- include 'mpinpb.h'
-
- double precision Mops, epsilon, a, s, t1, t2, t3, t4, x, x1,
- > x2, q, sx, sy, tm, an, tt, gc, dum(3),
- > timer_read
- double precision sx_verify_value, sy_verify_value, sx_err, sy_err
- integer mk, mm, nn, nk, nq, np, ierr, node, no_nodes,
- > i, ik, kk, l, k, nit, ierrcode, no_large_nodes,
- > np_add, k_offset, j
- logical verified, timers_enabled
- parameter (timers_enabled = .false.)
- external randlc, timer_read
- double precision randlc, qq
- character*15 size
-
- parameter (mk = 16, mm = m - mk, nn = 2 ** mm,
- > nk = 2 ** mk, nq = 10, epsilon=1.d-8,
- > a = 1220703125.d0, s = 271828183.d0)
-
- common/storage/ x(2*nk), q(0:nq-1), qq(10000)
- data dum /1.d0, 1.d0, 1.d0/
-
- call mpi_init(ierr)
- call mpi_comm_rank(MPI_COMM_WORLD,node,ierr)
- call mpi_comm_size(MPI_COMM_WORLD,no_nodes,ierr)
-
- root = 0
-
- if (.not. convertdouble) then
- dp_type = MPI_DOUBLE_PRECISION
- else
- dp_type = MPI_REAL
- endif
-
- if (node.eq.root) then
-
-c Because the size of the problem is too large to store in a 32-bit
-c integer for some classes, we put it into a string (for printing).
-c Have to strip off the decimal point put in there by the floating
-c point print statement (internal file)
-
- write(*, 1000)
- write(size, '(f15.0)' ) 2.d0**(m+1)
- j = 15
- if (size(j:j) .eq. '.') j = j - 1
- write (*,1001) size(1:j)
- write(*, 1003) no_nodes
-
- 1000 format(/,' NAS Parallel Benchmarks 3.3 -- EP Benchmark',/)
- 1001 format(' Number of random numbers generated: ', a15)
- 1003 format(' Number of active processes: ', 2x, i13, /)
-
- endif
-
- verified = .false.
-
-c Compute the number of "batches" of random number pairs generated
-c per processor. Adjust if the number of processors does not evenly
-c divide the total number
-
- np = nn / no_nodes
- no_large_nodes = mod(nn, no_nodes)
- if (node .lt. no_large_nodes) then
- np_add = 1
- else
- np_add = 0
- endif
- np = np + np_add
-
- if (np .eq. 0) then
- write (6, 1) no_nodes, nn
- 1 format ('Too many nodes:',2i6)
- call mpi_abort(MPI_COMM_WORLD,ierrcode,ierr)
- stop
- endif
-
-c Call the random number generator functions and initialize
-c the x-array to reduce the effects of paging on the timings.
-c Also, call all mathematical functions that are used. Make
-c sure these initializations cannot be eliminated as dead code.
-
- call vranlc(0, dum(1), dum(2), dum(3))
- dum(1) = randlc(dum(2), dum(3))
- do 5 i = 1, 2*nk
- x(i) = -1.d99
- 5 continue
- Mops = log(sqrt(abs(max(1.d0,1.d0))))
-
-c---------------------------------------------------------------------
-c Synchronize before placing time stamp
-c---------------------------------------------------------------------
- call mpi_barrier(MPI_COMM_WORLD, ierr)
-
- call timer_clear(1)
- call timer_clear(2)
- call timer_clear(3)
- call timer_start(1)
-
- t1 = a
- call vranlc(0, t1, a, x)
-
-c Compute AN = A ^ (2 * NK) (mod 2^46).
-
- t1 = a
-
- do 100 i = 1, mk + 1
- t2 = randlc(t1, t1)
- 100 continue
-
- an = t1
- tt = s
- gc = 0.d0
- sx = 0.d0
- sy = 0.d0
-
- do 110 i = 0, nq - 1
- q(i) = 0.d0
- 110 continue
-
-c Each instance of this loop may be performed independently. We compute
-c the k offsets separately to take into account the fact that some nodes
-c have more numbers to generate than others
-
- if (np_add .eq. 1) then
- k_offset = node * np -1
- else
- k_offset = no_large_nodes*(np+1) + (node-no_large_nodes)*np -1
- endif
-
- do 150 k = 1, np
- kk = k_offset + k
- t1 = s
- t2 = an
-
-c Find starting seed t1 for this kk.
-
- do 120 i = 1, 100
- ik = kk / 2
- if (2 * ik .ne. kk) t3 = randlc(t1, t2)
- if (ik .eq. 0) goto 130
- t3 = randlc(t2, t2)
- kk = ik
- 120 continue
-
-c Compute uniform pseudorandom numbers.
- 130 continue
-
- if (timers_enabled) call timer_start(3)
- call vranlc(2 * nk, t1, a, x)
- if (timers_enabled) call timer_stop(3)
-
-c Compute Gaussian deviates by acceptance-rejection method and
-c tally counts in concentric square annuli. This loop is not
-c vectorizable.
-
- if (timers_enabled) call timer_start(2)
-
- do 140 i = 1, nk
- x1 = 2.d0 * x(2*i-1) - 1.d0
- x2 = 2.d0 * x(2*i) - 1.d0
- t1 = x1 ** 2 + x2 ** 2
- if (t1 .le. 1.d0) then
- t2 = sqrt(-2.d0 * log(t1) / t1)
- t3 = (x1 * t2)
- t4 = (x2 * t2)
- l = max(abs(t3), abs(t4))
- q(l) = q(l) + 1.d0
- sx = sx + t3
- sy = sy + t4
- endif
- 140 continue
-
- if (timers_enabled) call timer_stop(2)
-
- 150 continue
-
- call mpi_allreduce(sx, x, 1, dp_type,
- > MPI_SUM, MPI_COMM_WORLD, ierr)
- sx = x(1)
- call mpi_allreduce(sy, x, 1, dp_type,
- > MPI_SUM, MPI_COMM_WORLD, ierr)
- sy = x(1)
- call mpi_allreduce(q, x, nq, dp_type,
- > MPI_SUM, MPI_COMM_WORLD, ierr)
-
- do i = 1, nq
- q(i-1) = x(i)
- enddo
-
- do 160 i = 0, nq - 1
- gc = gc + q(i)
- 160 continue
-
- call timer_stop(1)
- tm = timer_read(1)
-
- call mpi_allreduce(tm, x, 1, dp_type,
- > MPI_MAX, MPI_COMM_WORLD, ierr)
- tm = x(1)
-
- if (node.eq.root) then
- nit=0
- verified = .true.
- if (m.eq.24) then
- sx_verify_value = -3.247834652034740D+3
- sy_verify_value = -6.958407078382297D+3
- elseif (m.eq.25) then
- sx_verify_value = -2.863319731645753D+3
- sy_verify_value = -6.320053679109499D+3
- elseif (m.eq.28) then
- sx_verify_value = -4.295875165629892D+3
- sy_verify_value = -1.580732573678431D+4
- elseif (m.eq.30) then
- sx_verify_value = 4.033815542441498D+4
- sy_verify_value = -2.660669192809235D+4
- elseif (m.eq.32) then
- sx_verify_value = 4.764367927995374D+4
- sy_verify_value = -8.084072988043731D+4
- elseif (m.eq.36) then
- sx_verify_value = 1.982481200946593D+5
- sy_verify_value = -1.020596636361769D+5
- elseif (m.eq.40) then
- sx_verify_value = -5.319717441530D+05
- sy_verify_value = -3.688834557731D+05
- else
- verified = .false.
- endif
- if (verified) then
- sx_err = abs((sx - sx_verify_value)/sx_verify_value)
- sy_err = abs((sy - sy_verify_value)/sy_verify_value)
- verified = ((sx_err.le.epsilon) .and. (sy_err.le.epsilon))
- endif
- Mops = 2.d0**(m+1)/tm/1000000.d0
-
- write (6,11) tm, m, gc, sx, sy, (i, q(i), i = 0, nq - 1)
- 11 format ('EP Benchmark Results:'//'CPU Time =',f10.4/'N = 2^',
- > i5/'No. Gaussian Pairs =',f15.0/'Sums = ',1p,2d25.15/
- > 'Counts:'/(i3,0p,f15.0))
-
- call print_results('EP', class, m+1, 0, 0, nit, npm,
- > no_nodes, tm, Mops,
- > 'Random numbers generated',
- > verified, npbversion, compiletime, cs1,
- > cs2, cs3, cs4, cs5, cs6, cs7)
-
- endif
-
- if (timers_enabled .and. (node .eq. root)) then
- print *, 'Total time: ', timer_read(1)
- print *, 'Gaussian pairs: ', timer_read(2)
- print *, 'Random numbers: ', timer_read(3)
- endif
-
- call mpi_finalize(ierr)
-
- end
+++ /dev/null
-SHELL=/bin/sh
-BENCHMARK=is
-BENCHMARKU=IS
-
-include ../config/make.def
-
-include ../sys/make.common
-
-OBJS = is-trace.o ${COMMON}/c_print_results.o
-
-
-${PROGRAM}: config ${OBJS}
- ${CLINK} ${CLINKFLAGS} -o ${PROGRAM} ${OBJS} ${CMPI_LIB}
-
-.c.o:
- ${CCOMPILE} $<
-
-is-trace.o: is-trace.c npbparams.h
-
-clean:
- - rm -f *.o *~ mputil*
- - rm -f is-trace npbparams.h core
+++ /dev/null
-/*************************************************************************
- * *
- * N A S P A R A L L E L B E N C H M A R K S 3.3 *
- * *
- * I S *
- * *
- *************************************************************************
- * *
- * This benchmark is part of the NAS Parallel Benchmark 3.3 suite. *
- * It is described in NAS Technical Report 95-020. *
- * *
- * Permission to use, copy, distribute and modify this software *
- * for any purpose with or without fee is hereby granted. We *
- * request, however, that all derived work reference the NAS *
- * Parallel Benchmarks 3.3. This software is provided "as is" *
- * without express or implied warranty. *
- * *
- * Information on NPB 3.3, including the technical report, the *
- * original specifications, source code, results and information *
- * on how to submit new results, is available at: *
- * *
- * http://www.nas.nasa.gov/Software/NPB *
- * *
- * Send comments or suggestions to npb@nas.nasa.gov *
- * Send bug reports to npb-bugs@nas.nasa.gov *
- * *
- * NAS Parallel Benchmarks Group *
- * NASA Ames Research Center *
- * Mail Stop: T27A-1 *
- * Moffett Field, CA 94035-1000 *
- * *
- * E-mail: npb@nas.nasa.gov *
- * Fax: (650) 604-3957 *
- * *
- *************************************************************************
- * *
- * Author: M. Yarrow *
- * H. Jin *
- * *
- *************************************************************************/
-
-#include "mpi.h"
-#include "npbparams.h"
-#include <stdlib.h>
-#include <stdio.h>
-
-#include "simgrid/instr.h" //TRACE_
-
-/******************/
-/* default values */
-/******************/
-#ifndef CLASS
-#define CLASS 'S'
-#define NUM_PROCS 1
-#endif
-#define MIN_PROCS 1
-
-
-/*************/
-/* CLASS S */
-/*************/
-#if CLASS == 'S'
-#define TOTAL_KEYS_LOG_2 16
-#define MAX_KEY_LOG_2 11
-#define NUM_BUCKETS_LOG_2 9
-#endif
-
-
-/*************/
-/* CLASS W */
-/*************/
-#if CLASS == 'W'
-#define TOTAL_KEYS_LOG_2 20
-#define MAX_KEY_LOG_2 16
-#define NUM_BUCKETS_LOG_2 10
-#endif
-
-/*************/
-/* CLASS A */
-/*************/
-#if CLASS == 'A'
-#define TOTAL_KEYS_LOG_2 23
-#define MAX_KEY_LOG_2 19
-#define NUM_BUCKETS_LOG_2 10
-#endif
-
-
-/*************/
-/* CLASS B */
-/*************/
-#if CLASS == 'B'
-#define TOTAL_KEYS_LOG_2 25
-#define MAX_KEY_LOG_2 21
-#define NUM_BUCKETS_LOG_2 10
-#endif
-
-
-/*************/
-/* CLASS C */
-/*************/
-#if CLASS == 'C'
-#define TOTAL_KEYS_LOG_2 27
-#define MAX_KEY_LOG_2 23
-#define NUM_BUCKETS_LOG_2 10
-#endif
-
-
-/*************/
-/* CLASS D */
-/*************/
-#if CLASS == 'D'
-#define TOTAL_KEYS_LOG_2 29
-#define MAX_KEY_LOG_2 27
-#define NUM_BUCKETS_LOG_2 10
-#undef MIN_PROCS
-#define MIN_PROCS 4
-#endif
-
-
-#define TOTAL_KEYS (1 << TOTAL_KEYS_LOG_2)
-#define MAX_KEY (1 << MAX_KEY_LOG_2)
-#define NUM_BUCKETS (1 << NUM_BUCKETS_LOG_2)
-#define NUM_KEYS (TOTAL_KEYS/NUM_PROCS*MIN_PROCS)
-
-/*****************************************************************/
-/* On larger number of processors, since the keys are (roughly) */
-/* gaussian distributed, the first and last processor sort keys */
-/* in a large interval, requiring array sizes to be larger. Note */
-/* that for large NUM_PROCS, NUM_KEYS is, however, a small number*/
-/* The required array size also depends on the bucket size used. */
-/* The following values are validated for the 1024-bucket setup. */
-/*****************************************************************/
-#if NUM_PROCS < 256
-#define SIZE_OF_BUFFERS 3*NUM_KEYS/2
-#elif NUM_PROCS < 512
-#define SIZE_OF_BUFFERS 5*NUM_KEYS/2
-#elif NUM_PROCS < 1024
-#define SIZE_OF_BUFFERS 4*NUM_KEYS
-#else
-#define SIZE_OF_BUFFERS 13*NUM_KEYS/2
-#endif
-
-/*****************************************************************/
-/* NOTE: THIS CODE CANNOT BE RUN ON ARBITRARILY LARGE NUMBERS OF */
-/* PROCESSORS. THE LARGEST VERIFIED NUMBER IS 1024. INCREASE */
-/* MAX_PROCS AT YOUR PERIL */
-/*****************************************************************/
-#if CLASS == 'S'
-#define MAX_PROCS 128
-#else
-#define MAX_PROCS 1024
-#endif
-
-#define MAX_ITERATIONS 10
-#define TEST_ARRAY_SIZE 5
-
-
-/***********************************/
-/* Enable separate communication, */
-/* computation timing and printout */
-/***********************************/
-/* #define TIMING_ENABLED */
-
-
-/*************************************/
-/* Typedef: if necessary, change the */
-/* size of int here by changing the */
-/* int type to, say, long */
-/*************************************/
-typedef int INT_TYPE;
-typedef long INT_TYPE2;
-#define MP_KEY_TYPE MPI_INT
-
-
-typedef struct {
-
-/********************/
-/* MPI properties: */
-/********************/
-int my_rank,
- comm_size;
-
-
-/********************/
-/* Some global info */
-/********************/
-INT_TYPE *key_buff_ptr_global, /* used by full_verify to get */
- total_local_keys, /* copies of rank info */
- total_lesser_keys;
-
-
-int passed_verification;
-
-
-
-/************************************/
-/* These are the three main arrays. */
-/* See SIZE_OF_BUFFERS def above */
-/************************************/
-INT_TYPE key_array[SIZE_OF_BUFFERS],
- key_buff1[SIZE_OF_BUFFERS],
- key_buff2[SIZE_OF_BUFFERS],
- bucket_size[NUM_BUCKETS+TEST_ARRAY_SIZE], /* Top 5 elements for */
- bucket_size_totals[NUM_BUCKETS+TEST_ARRAY_SIZE], /* part. ver. vals */
- bucket_ptrs[NUM_BUCKETS],
- process_bucket_distrib_ptr1[NUM_BUCKETS+TEST_ARRAY_SIZE],
- process_bucket_distrib_ptr2[NUM_BUCKETS+TEST_ARRAY_SIZE];
-int send_count[MAX_PROCS], recv_count[MAX_PROCS],
- send_displ[MAX_PROCS], recv_displ[MAX_PROCS];
-
-
-/**********************/
-/* Partial verif info */
-/**********************/
-INT_TYPE2 test_index_array[TEST_ARRAY_SIZE],
- test_rank_array[TEST_ARRAY_SIZE];
-
-/**********/
-/* Timers */
-/**********/
-double start[64], elapsed[64];
-
-} global_data;
-
-
-const INT_TYPE2
- S_test_index_array[TEST_ARRAY_SIZE] =
- {48427,17148,23627,62548,4431},
- S_test_rank_array[TEST_ARRAY_SIZE] =
- {0,18,346,64917,65463},
-
- W_test_index_array[TEST_ARRAY_SIZE] =
- {357773,934767,875723,898999,404505},
- W_test_rank_array[TEST_ARRAY_SIZE] =
- {1249,11698,1039987,1043896,1048018},
-
- A_test_index_array[TEST_ARRAY_SIZE] =
- {2112377,662041,5336171,3642833,4250760},
- A_test_rank_array[TEST_ARRAY_SIZE] =
- {104,17523,123928,8288932,8388264},
-
- B_test_index_array[TEST_ARRAY_SIZE] =
- {41869,812306,5102857,18232239,26860214},
- B_test_rank_array[TEST_ARRAY_SIZE] =
- {33422937,10244,59149,33135281,99},
-
- C_test_index_array[TEST_ARRAY_SIZE] =
- {44172927,72999161,74326391,129606274,21736814},
- C_test_rank_array[TEST_ARRAY_SIZE] =
- {61147,882988,266290,133997595,133525895},
-
- D_test_index_array[TEST_ARRAY_SIZE] =
- {1317351170,995930646,1157283250,1503301535,1453734525},
- D_test_rank_array[TEST_ARRAY_SIZE] =
- {1,36538729,1978098519,2145192618,2147425337};
-
-
-
-/***********************/
-/* function prototypes */
-/***********************/
-double randlc( double *X, double *A );
-
-void full_verify( global_data* gd );
-
-void c_print_results( char *name,
- char class,
- int n1,
- int n2,
- int n3,
- int niter,
- int nprocs_compiled,
- int nprocs_total,
- double t,
- double mops,
- char *optype,
- int passed_verification,
- char *npbversion,
- char *compiletime,
- char *mpicc,
- char *clink,
- char *cmpi_lib,
- char *cmpi_inc,
- char *cflags,
- char *clinkflags );
-
-void timer_clear(global_data* gd, int n );
-void timer_start(global_data* gd, int n );
-void timer_stop(global_data* gd, int n );
-double timer_read(global_data* gd, int n );
-
-void timer_clear(global_data* gd, int n ) {
- gd->elapsed[n] = 0.0;
-}
-
-void timer_start(global_data* gd, int n ) {
- gd->start[n] = MPI_Wtime();
-}
-
-void timer_stop(global_data* gd, int n ) {
- gd->elapsed[n] += MPI_Wtime() - gd->start[n];
-}
-
-double timer_read(global_data* gd, int n ) {
- return gd->elapsed[n];
-}
-
-
-/*
- * FUNCTION RANDLC (X, A)
- *
- * This routine returns a uniform pseudorandom double precision number in the
- * range (0, 1) by using the linear congruential generator
- *
- * x_{k+1} = a x_k (mod 2^46)
- *
- * where 0 < x_k < 2^46 and 0 < a < 2^46. This scheme generates 2^44 numbers
- * before repeating. The argument A is the same as 'a' in the above formula,
- * and X is the same as x_0. A and X must be odd double precision integers
- * in the range (1, 2^46). The returned value RANDLC is normalized to be
- * between 0 and 1, i.e. RANDLC = 2^(-46) * x_1. X is updated to contain
- * the new seed x_1, so that subsequent calls to RANDLC using the same
- * arguments will generate a continuous sequence.
- *
- * This routine should produce the same results on any computer with at least
- * 48 mantissa bits in double precision floating point data. On Cray systems,
- * double precision should be disabled.
- *
- * David H. Bailey October 26, 1990
- *
- * IMPLICIT DOUBLE PRECISION (A-H, O-Z)
- * SAVE KS, R23, R46, T23, T46
- * DATA KS/0/
- *
- * If this is the first call to RANDLC, compute R23 = 2 ^ -23, R46 = 2 ^ -46,
- * T23 = 2 ^ 23, and T46 = 2 ^ 46. These are computed in loops, rather than
- * by merely using the ** operator, in order to insure that the results are
- * exact on all systems. This code assumes that 0.5D0 is represented exactly.
- */
-
-
-/*****************************************************************/
-/************* R A N D L C ************/
-/************* ************/
-/************* portable random number generator ************/
-/*****************************************************************/
-
-double randlc( double *X, double *A )
-{
- static int KS=0;
- static double R23, R46, T23, T46;
- double T1, T2, T3, T4;
- double A1;
- double A2;
- double X1;
- double X2;
- double Z;
- int i, j;
-
- if (KS == 0)
- {
- R23 = 1.0;
- R46 = 1.0;
- T23 = 1.0;
- T46 = 1.0;
-
- for (i=1; i<=23; i++)
- {
- R23 = 0.50 * R23;
- T23 = 2.0 * T23;
- }
- for (i=1; i<=46; i++)
- {
- R46 = 0.50 * R46;
- T46 = 2.0 * T46;
- }
- KS = 1;
- }
-
-/* Break A into two parts such that A = 2^23 * A1 + A2 and set X = N. */
-
- T1 = R23 * *A;
- j = T1;
- A1 = j;
- A2 = *A - T23 * A1;
-
-/* Break X into two parts such that X = 2^23 * X1 + X2, compute
- Z = A1 * X2 + A2 * X1 (mod 2^23), and then
- X = 2^23 * Z + A2 * X2 (mod 2^46). */
-
- T1 = R23 * *X;
- j = T1;
- X1 = j;
- X2 = *X - T23 * X1;
- T1 = A1 * X2 + A2 * X1;
-
- j = R23 * T1;
- T2 = j;
- Z = T1 - T23 * T2;
- T3 = T23 * Z + A2 * X2;
- j = R46 * T3;
- T4 = j;
- *X = T3 - T46 * T4;
- return(R46 * *X);
-}
-
-
-
-/*****************************************************************/
-/************ F I N D _ M Y _ S E E D ************/
-/************ ************/
-/************ returns parallel random number seq seed ************/
-/*****************************************************************/
-
-/*
- * Create a random number sequence of total length nn residing
- * on np number of processors. Each processor will therefore have a
- * subsequence of length nn/np. This routine returns that random
- * number which is the first random number for the subsequence belonging
- * to processor rank kn, and which is used as seed for proc kn ran # gen.
- */
-
-double find_my_seed( int kn, /* my processor rank, 0<=kn<=num procs */
- int np, /* np = num procs */
- long nn, /* total num of ran numbers, all procs */
- double s, /* Ran num seed, for ex.: 314159265.00 */
- double a ) /* Ran num gen mult, try 1220703125.00 */
-{
-
- long i;
-
- double t1,t2,t3,an;
- long mq,nq,kk,ik;
-
-
-
- nq = nn / np;
-
- for( mq=0; nq>1; mq++,nq/=2 )
- ;
-
- t1 = a;
-
- for( i=1; i<=mq; i++ )
- t2 = randlc( &t1, &t1 );
-
- an = t1;
-
- kk = kn;
- t1 = s;
- t2 = an;
-
- for( i=1; i<=100; i++ )
- {
- ik = kk / 2;
- if( 2 * ik != kk )
- t3 = randlc( &t1, &t2 );
- if( ik == 0 )
- break;
- t3 = randlc( &t2, &t2 );
- kk = ik;
- }
-
- return( t1 );
-
-}
-
-
-
-
-/*****************************************************************/
-/************* C R E A T E _ S E Q ************/
-/*****************************************************************/
-
-void create_seq( global_data* gd, double seed, double a )
-{
- double x;
- int i, k;
-
- k = MAX_KEY/4;
-
- for (i=0; i<NUM_KEYS; i++)
- {
- x = randlc(&seed, &a);
- x += randlc(&seed, &a);
- x += randlc(&seed, &a);
- x += randlc(&seed, &a);
-
- gd->key_array[i] = k*x;
- }
-}
-
-
-
-
-/*****************************************************************/
-/************* F U L L _ V E R I F Y ************/
-/*****************************************************************/
-
-
-void full_verify( global_data* gd )
-{
- MPI_Status status;
- MPI_Request request;
-
- INT_TYPE i, j;
- INT_TYPE k, last_local_key;
-
-
-/* Now, finally, sort the keys: */
- for( i=0; i<gd->total_local_keys; i++ )
- gd->key_array[--gd->key_buff_ptr_global[gd->key_buff2[i]]-
- gd->total_lesser_keys] = gd->key_buff2[i];
- last_local_key = (gd->total_local_keys<1)? 0 : (gd->total_local_keys-1);
-
-/* Send largest key value to next processor */
- if( gd->my_rank > 0 )
- MPI_Irecv( &k,
- 1,
- MP_KEY_TYPE,
- gd->my_rank-1,
- 1000,
- MPI_COMM_WORLD,
- &request );
- if( gd->my_rank < gd->comm_size-1 )
- MPI_Send( &gd->key_array[last_local_key],
- 1,
- MP_KEY_TYPE,
- gd->my_rank+1,
- 1000,
- MPI_COMM_WORLD );
- if( gd->my_rank > 0 )
- MPI_Wait( &request, &status );
-
-/* Confirm that neighbor's greatest key value
- is not greater than my least key value */
- j = 0;
- if( gd->my_rank > 0 && gd->total_local_keys > 0 )
- if( k > gd->key_array[0] )
- j++;
-
-
-/* Confirm keys correctly sorted: count incorrectly sorted keys, if any */
- for( i=1; i<gd->total_local_keys; i++ )
- if( gd->key_array[i-1] > gd->key_array[i] )
- j++;
-
-
- if( j != 0 )
- {
- printf( "Processor %d: Full_verify: number of keys out of sort: %d\n",
- gd->my_rank, j );
- }
- else
- gd->passed_verification++;
-
-
-}
-
-
-
-
-/*****************************************************************/
-/************* R A N K ****************/
-/*****************************************************************/
-
-
-void rank( global_data* gd, int iteration )
-{
-
- INT_TYPE i, k;
-
- INT_TYPE shift = MAX_KEY_LOG_2 - NUM_BUCKETS_LOG_2;
- INT_TYPE key;
- INT_TYPE2 bucket_sum_accumulator, j, m;
- INT_TYPE local_bucket_sum_accumulator;
- INT_TYPE min_key_val, max_key_val;
- INT_TYPE *key_buff_ptr;
-
-
-
-
-/* Iteration alteration of keys */
- if(gd->my_rank == 0 )
- {
- gd->key_array[iteration] = iteration;
- gd->key_array[iteration+MAX_ITERATIONS] = MAX_KEY - iteration;
- }
-
-
-/* Initialize */
- for( i=0; i<NUM_BUCKETS+TEST_ARRAY_SIZE; i++ )
- {
- gd->bucket_size[i] = 0;
- gd->bucket_size_totals[i] = 0;
- gd->process_bucket_distrib_ptr1[i] = 0;
- gd->process_bucket_distrib_ptr2[i] = 0;
- }
-
-
-/* Determine where the partial verify test keys are, load into */
-/* top of array bucket_size */
- for( i=0; i<TEST_ARRAY_SIZE; i++ )
- if( (gd->test_index_array[i]/NUM_KEYS) == gd->my_rank )
- gd->bucket_size[NUM_BUCKETS+i] =
- gd->key_array[gd->test_index_array[i] % NUM_KEYS];
-
-
-/* Determine the number of keys in each bucket */
- for( i=0; i<NUM_KEYS; i++ )
- gd->bucket_size[gd->key_array[i] >> shift]++;
-
-
-/* Accumulative bucket sizes are the bucket pointers */
- gd->bucket_ptrs[0] = 0;
- for( i=1; i< NUM_BUCKETS; i++ )
- gd->bucket_ptrs[i] = gd->bucket_ptrs[i-1] + gd->bucket_size[i-1];
-
-
-/* Sort into appropriate bucket */
- for( i=0; i<NUM_KEYS; i++ )
- {
- key = gd->key_array[i];
- gd->key_buff1[gd->bucket_ptrs[key >> shift]++] = key;
- }
-
-#ifdef TIMING_ENABLED
- timer_stop(gd, 2 );
- timer_start(gd, 3 );
-#endif
-
-/* Get the bucket size totals for the entire problem. These
- will be used to determine the redistribution of keys */
- MPI_Allreduce( gd->bucket_size,
- gd->bucket_size_totals,
- NUM_BUCKETS+TEST_ARRAY_SIZE,
- MP_KEY_TYPE,
- MPI_SUM,
- MPI_COMM_WORLD );
-
-#ifdef TIMING_ENABLED
- timer_stop(gd, 3 );
- timer_start(gd, 2 );
-#endif
-
-/* Determine Redistibution of keys: accumulate the bucket size totals
- till this number surpasses NUM_KEYS (which the average number of keys
- per processor). Then all keys in these buckets go to processor 0.
- Continue accumulating again until supassing 2*NUM_KEYS. All keys
- in these buckets go to processor 1, etc. This algorithm guarantees
- that all processors have work ranking; no processors are left idle.
- The optimum number of buckets, however, does not result in as high
- a degree of load balancing (as even a distribution of keys as is
- possible) as is obtained from increasing the number of buckets, but
- more buckets results in more computation per processor so that the
- optimum number of buckets turns out to be 1024 for machines tested.
- Note that process_bucket_distrib_ptr1 and ..._ptr2 hold the bucket
- number of first and last bucket which each processor will have after
- the redistribution is done. */
-
- bucket_sum_accumulator = 0;
- local_bucket_sum_accumulator = 0;
- gd->send_displ[0] = 0;
- gd->process_bucket_distrib_ptr1[0] = 0;
- for( i=0, j=0; i<NUM_BUCKETS; i++ )
- {
- bucket_sum_accumulator += gd->bucket_size_totals[i];
- local_bucket_sum_accumulator += gd->bucket_size[i];
- if( bucket_sum_accumulator >= (j+1)*NUM_KEYS )
- {
- gd->send_count[j] = local_bucket_sum_accumulator;
- if( j != 0 )
- {
- gd->send_displ[j] = gd->send_displ[j-1] + gd->send_count[j-1];
- gd->process_bucket_distrib_ptr1[j] =
- gd->process_bucket_distrib_ptr2[j-1]+1;
- }
- gd->process_bucket_distrib_ptr2[j++] = i;
- local_bucket_sum_accumulator = 0;
- }
- }
-
-/* When NUM_PROCS approaching NUM_BUCKETS, it is highly possible
- that the last few processors don't get any buckets. So, we
- need to set counts properly in this case to avoid any fallouts. */
- while( j < gd->comm_size )
- {
- gd->send_count[j] = 0;
- gd->process_bucket_distrib_ptr1[j] = 1;
- j++;
- }
-
-#ifdef TIMING_ENABLED
- timer_stop(gd, 2 );
- timer_start(gd, 3 );
-#endif
-
-/* This is the redistribution section: first find out how many keys
- each processor will send to every other processor: */
- MPI_Alltoall( gd->send_count,
- 1,
- MPI_INT,
- gd->recv_count,
- 1,
- MPI_INT,
- MPI_COMM_WORLD );
-
-/* Determine the receive array displacements for the buckets */
- gd->recv_displ[0] = 0;
- for( i=1; i<gd->comm_size; i++ )
- gd->recv_displ[i] = gd->recv_displ[i-1] + gd->recv_count[i-1];
-
-
-/* Now send the keys to respective processors */
- MPI_Alltoallv( gd->key_buff1,
- gd->send_count,
- gd->send_displ,
- MP_KEY_TYPE,
- gd->key_buff2,
- gd->recv_count,
- gd->recv_displ,
- MP_KEY_TYPE,
- MPI_COMM_WORLD );
-
-#ifdef TIMING_ENABLED
- timer_stop(gd, 3 );
- timer_start(gd, 2 );
-#endif
-
-/* The starting and ending bucket numbers on each processor are
- multiplied by the interval size of the buckets to obtain the
- smallest possible min and greatest possible max value of any
- key on each processor */
- min_key_val = gd->process_bucket_distrib_ptr1[gd->my_rank] << shift;
- max_key_val = ((gd->process_bucket_distrib_ptr2[gd->my_rank] + 1) << shift)-1;
-
-/* Clear the work array */
- for( i=0; i<max_key_val-min_key_val+1; i++ )
- gd->key_buff1[i] = 0;
-
-/* Determine the total number of keys on all other
- processors holding keys of lesser value */
- m = 0;
- for( k=0; k<gd->my_rank; k++ )
- for( i= gd->process_bucket_distrib_ptr1[k];
- i<=gd->process_bucket_distrib_ptr2[k];
- i++ )
- m += gd->bucket_size_totals[i]; /* m has total # of lesser keys */
-
-/* Determine total number of keys on this processor */
- j = 0;
- for( i= gd->process_bucket_distrib_ptr1[gd->my_rank];
- i<=gd->process_bucket_distrib_ptr2[gd->my_rank];
- i++ )
- j += gd->bucket_size_totals[i]; /* j has total # of local keys */
-
-
-/* Ranking of all keys occurs in this section: */
-/* shift it backwards so no subtractions are necessary in loop */
- key_buff_ptr = gd->key_buff1 - min_key_val;
-
-/* In this section, the keys themselves are used as their
- own indexes to determine how many of each there are: their
- individual population */
- for( i=0; i<j; i++ )
- key_buff_ptr[gd->key_buff2[i]]++; /* Now they have individual key */
- /* population */
-
-/* To obtain ranks of each key, successively add the individual key
- population, not forgetting the total of lesser keys, m.
- NOTE: Since the total of lesser keys would be subtracted later
- in verification, it is no longer added to the first key population
- here, but still needed during the partial verify test. This is to
- ensure that 32-bit key_buff can still be used for class D. */
-/* key_buff_ptr[min_key_val] += m; */
- for( i=min_key_val; i<max_key_val; i++ )
- key_buff_ptr[i+1] += key_buff_ptr[i];
-
-
-/* This is the partial verify test section */
-/* Observe that test_rank_array vals are */
-/* shifted differently for different cases */
- for( i=0; i<TEST_ARRAY_SIZE; i++ )
- {
- k = gd->bucket_size_totals[i+NUM_BUCKETS]; /* Keys were hidden here */
- if( min_key_val <= k && k <= max_key_val )
- {
- /* Add the total of lesser keys, m, here */
- INT_TYPE2 key_rank = key_buff_ptr[k-1] + m;
- int failed = 0;
-
- switch( CLASS )
- {
- case 'S':
- if( i <= 2 )
- {
- if( key_rank != gd->test_rank_array[i]+iteration )
- failed = 1;
- else
- gd->passed_verification++;
- }
- else
- {
- if( key_rank != gd->test_rank_array[i]-iteration )
- failed = 1;
- else
- gd->passed_verification++;
- }
- break;
- case 'W':
- if( i < 2 )
- {
- if( key_rank != gd->test_rank_array[i]+(iteration-2) )
- failed = 1;
- else
- gd->passed_verification++;
- }
- else
- {
- if( key_rank != gd->test_rank_array[i]-iteration )
- failed = 1;
- else
- gd->passed_verification++;
- }
- break;
- case 'A':
- if( i <= 2 )
- {
- if( key_rank != gd->test_rank_array[i]+(iteration-1) )
- failed = 1;
- else
- gd->passed_verification++;
- }
- else
- {
- if( key_rank != gd->test_rank_array[i]-(iteration-1) )
- failed = 1;
- else
- gd->passed_verification++;
- }
- break;
- case 'B':
- if( i == 1 || i == 2 || i == 4 )
- {
- if( key_rank != gd->test_rank_array[i]+iteration )
- failed = 1;
- else
- gd->passed_verification++;
- }
- else
- {
- if( key_rank != gd->test_rank_array[i]-iteration )
- failed = 1;
- else
- gd->passed_verification++;
- }
- break;
- case 'C':
- if( i <= 2 )
- {
- if( key_rank != gd->test_rank_array[i]+iteration )
- failed = 1;
- else
- gd->passed_verification++;
- }
- else
- {
- if( key_rank != gd->test_rank_array[i]-iteration )
- failed = 1;
- else
- gd->passed_verification++;
- }
- break;
- case 'D':
- if( i < 2 )
- {
- if( key_rank != gd->test_rank_array[i]+iteration )
- failed = 1;
- else
- gd->passed_verification++;
- }
- else
- {
- if( key_rank != gd->test_rank_array[i]-iteration )
- failed = 1;
- else
- gd->passed_verification++;
- }
- break;
- }
- if( failed == 1 )
- printf( "Failed partial verification: "
- "iteration %d, processor %d, test key %d\n",
- iteration, gd->my_rank, (int)i );
- }
- }
-
-
-
-
-/* Make copies of rank info for use by full_verify: these variables
- in rank are local; making them global slows down the code, probably
- since they cannot be made register by compiler */
-
- if( iteration == MAX_ITERATIONS )
- {
- gd->key_buff_ptr_global = key_buff_ptr;
- gd->total_local_keys = j;
- gd->total_lesser_keys = 0; /* no longer set to 'm', see note above */
- }
-
-}
-
-
-/*****************************************************************/
-/************* M A I N ****************/
-/*****************************************************************/
-
-int main( int argc, char **argv )
-{
-
- int i, iteration, itemp;
-
- double timecounter, maxtime;
-
- global_data* gd = malloc(sizeof(global_data));
-/* Initialize MPI */
- MPI_Init( &argc, &argv );
- MPI_Comm_rank( MPI_COMM_WORLD, &gd->my_rank );
- MPI_Comm_size( MPI_COMM_WORLD, &gd->comm_size );
-
-/* Initialize the verification arrays if a valid class */
- for( i=0; i<TEST_ARRAY_SIZE; i++ )
- switch( CLASS )
- {
- case 'S':
- gd->test_index_array[i] = S_test_index_array[i];
- gd->test_rank_array[i] = S_test_rank_array[i];
- break;
- case 'A':
- gd->test_index_array[i] = A_test_index_array[i];
- gd->test_rank_array[i] = A_test_rank_array[i];
- break;
- case 'W':
- gd->test_index_array[i] = W_test_index_array[i];
- gd->test_rank_array[i] = W_test_rank_array[i];
- break;
- case 'B':
- gd->test_index_array[i] = B_test_index_array[i];
- gd->test_rank_array[i] = B_test_rank_array[i];
- break;
- case 'C':
- gd->test_index_array[i] = C_test_index_array[i];
- gd->test_rank_array[i] = C_test_rank_array[i];
- break;
- case 'D':
- gd->test_index_array[i] = D_test_index_array[i];
- gd->test_rank_array[i] = D_test_rank_array[i];
- break;
- };
-
-
-
-/* Printout initial NPB info */
- if( gd->my_rank == 0 )
- {
- printf( "\n\n NAS Parallel Benchmarks 3.3 -- IS Benchmark\n\n" );
- printf( " Size: %ld (class %c)\n", (long)TOTAL_KEYS*MIN_PROCS, CLASS );
- printf( " Iterations: %d\n", MAX_ITERATIONS );
- printf( " Number of processes: %d\n",gd->comm_size );
- }
-
-/* Check that actual and compiled number of processors agree */
- if( gd->comm_size != NUM_PROCS )
- {
- if( gd->my_rank == 0 )
- printf( "\n ERROR: compiled for %d processes\n"
- " Number of active processes: %d\n"
- " Exiting program!\n\n", NUM_PROCS, gd->comm_size );
- MPI_Finalize();
- exit( 1 );
- }
-
-/* Check to see whether total number of processes is within bounds.
- This could in principle be checked in setparams.c, but it is more
- convenient to do it here */
- if( gd->comm_size < MIN_PROCS || gd->comm_size > MAX_PROCS)
- {
- if( gd->my_rank == 0 )
- printf( "\n ERROR: number of processes %d not within range %d-%d"
- "\n Exiting program!\n\n", gd->comm_size, MIN_PROCS, MAX_PROCS);
- MPI_Finalize();
- exit( 1 );
- }
-
-
-/* Generate random number sequence and subsequent keys on all procs */
- create_seq(gd, find_my_seed( gd->my_rank,
- gd->comm_size,
- 4*(long)TOTAL_KEYS*MIN_PROCS,
- 314159265.00, /* Random number gen seed */
- 1220703125.00 ), /* Random number gen mult */
- 1220703125.00 ); /* Random number gen mult */
-
-/* Do one interation for free (i.e., untimed) to guarantee initialization of
- all data and code pages and respective tables */
- rank(gd, 1 );
-
-/* Start verification counter */
- gd->passed_verification = 0;
-
- if( gd->my_rank == 0 && CLASS != 'S' ) printf( "\n iteration\n" );
-
-/* Initialize timer */
- timer_clear(gd, 0 );
-
-/* Initialize separate communication, computation timing */
-#ifdef TIMING_ENABLED
- for( i=1; i<=3; i++ ) timer_clear(gd, i );
-#endif
-
-/* Start timer */
- timer_start(gd, 0 );
-
-#ifdef TIMING_ENABLED
- timer_start(gd, 1 );
- timer_start(gd, 2 );
-#endif
-
- char smpi_category[100];
- snprintf (smpi_category, 100, "%d", gd->my_rank);
- TRACE_smpi_set_category (smpi_category);
-
-/* This is the main iteration */
- for( iteration=1; iteration<=MAX_ITERATIONS; iteration++ )
- {
- if( gd->my_rank == 0 && CLASS != 'S' ) printf( " %d\n", iteration );
- rank(gd, iteration );
- }
- TRACE_smpi_set_category (NULL);
-
-#ifdef TIMING_ENABLED
- timer_stop(gd, 2 );
- timer_stop(gd, 1 );
-#endif
-
-/* Stop timer, obtain time for processors */
- timer_stop(gd, 0 );
-
- timecounter = timer_read(gd, 0 );
-
-/* End of timing, obtain maximum time of all processors */
- MPI_Reduce( &timecounter,
- &maxtime,
- 1,
- MPI_DOUBLE,
- MPI_MAX,
- 0,
- MPI_COMM_WORLD );
-
-#ifdef TIMING_ENABLED
- {
- double tmin, tsum, tmax;
-
- if( my_rank == 0 )
- {
- printf( "\ntimer 1/2/3 = total/computation/communication time\n");
- printf( " min avg max\n" );
- }
- for( i=1; i<=3; i++ )
- {
- timecounter = timer_read(gd, i );
- MPI_Reduce( &timecounter,
- &tmin,
- 1,
- MPI_DOUBLE,
- MPI_MIN,
- 0,
- MPI_COMM_WORLD );
- MPI_Reduce( &timecounter,
- &tsum,
- 1,
- MPI_DOUBLE,
- MPI_SUM,
- 0,
- MPI_COMM_WORLD );
- MPI_Reduce( &timecounter,
- &tmax,
- 1,
- MPI_DOUBLE,
- MPI_MAX,
- 0,
- MPI_COMM_WORLD );
- if( my_rank == 0 )
- printf( "timer %d: %f %f %f\n",
- i, tmin, tsum/((double) comm_size), tmax );
- }
- if( my_rank == 0 )
- printf( "\n" );
- }
-#endif
-
-/* This tests that keys are in sequence: sorting of last ranked key seq
- occurs here, but is an untimed operation */
- full_verify(gd);
-
-
-/* Obtain verification counter sum */
- itemp =gd->passed_verification;
- MPI_Reduce( &itemp,
- &gd->passed_verification,
- 1,
- MPI_INT,
- MPI_SUM,
- 0,
- MPI_COMM_WORLD );
-
-
-
-/* The final printout */
- if( gd->my_rank == 0 )
- {
- if( gd->passed_verification != 5*MAX_ITERATIONS + gd->comm_size )
- gd->passed_verification = 0;
- c_print_results( "IS",
- CLASS,
- (int)(TOTAL_KEYS),
- MIN_PROCS,
- 0,
- MAX_ITERATIONS,
- NUM_PROCS,
- gd->comm_size,
- maxtime,
- ((double) (MAX_ITERATIONS)*TOTAL_KEYS*MIN_PROCS)
- /maxtime/1000000.,
- "keys ranked",
- gd->passed_verification,
- NPBVERSION,
- COMPILETIME,
- MPICC,
- CLINK,
- CMPI_LIB,
- CMPI_INC,
- CFLAGS,
- CLINKFLAGS );
- }
-
- MPI_Finalize();
- free(gd);
-
- return 0;
- /**************************/
-} /* E N D P R O G R A M */
- /**************************/
#include <stdlib.h>
#include <stdio.h>
+#include "simgrid/instr.h" //TRACE_
+
/******************/
/* default values */
/******************/
timer_start(gd, 2 );
#endif
+ char smpi_category[100];
+ snprintf (smpi_category, 100, "%d", gd->my_rank);
+ TRACE_smpi_set_category (smpi_category);
+
/* This is the main iteration */
for( iteration=1; iteration<=MAX_ITERATIONS; iteration++ )
{
if( gd->my_rank == 0 && CLASS != 'S' ) printf( " %d\n", iteration );
rank(gd, iteration );
}
-
+ TRACE_smpi_set_category (NULL);
#ifdef TIMING_ENABLED
timer_stop(gd, 2 );
+++ /dev/null
-# Makefile for MPI dummy library.
-# Must be edited for a specific machine. Does NOT read in
-# the make.def file of NPB 2.3
-F77 = f77
-CC = cc
-AR = ar
-
-# Enable if either Cray or IBM: (no such flag for most machines: see wtime.h)
-# MACHINE = -DCRAY
-# MACHINE = -DIBM
-
-libmpi.a: mpi_dummy.o mpi_dummy_c.o wtime.o
- $(AR) r libmpi.a mpi_dummy.o mpi_dummy_c.o wtime.o
-
-mpi_dummy.o: mpi_dummy.f mpif.h
- $(F77) -c mpi_dummy.f
-# For a Cray C90, try:
-# cf77 -dp -c mpi_dummy.f
-# For an IBM 590, try:
-# xlf -c mpi_dummy.f
-
-mpi_dummy_c.o: mpi_dummy.c mpi.h
- $(CC) -c ${MACHINE} -o mpi_dummy_c.o mpi_dummy.c
-
-wtime.o: wtime.c
-# For most machines or CRAY or IBM
- $(CC) -c ${MACHINE} wtime.c
-# For a precise timer on an SGI Power Challenge, try:
-# $(CC) -o wtime.o -c wtime_sgi64.c
-
-test: test.f
- $(F77) -o test -I. test.f -L. -lmpi
-
-
-
-clean:
- - rm -f *~ *.o
- - rm -f test libmpi.a
+++ /dev/null
-###########################################
-# NAS Parallel Benchmarks 2&3 #
-# MPI/F77/C #
-# Revision 3.3 #
-# NASA Ames Research Center #
-# npb@nas.nasa.gov #
-# http://www.nas.nasa.gov/Software/NPB/ #
-###########################################
-
-MPI Dummy Library
-
-
-The MPI dummy library is supplied as a convenience for people who do
-not have an MPI library but would like to try running on one processor
-anyway. The NPB 2.x/3.x benchmarks are designed so that they do not
-actually try to do any message passing when run on one node. The MPI
-dummy library is just that - a set of dummy MPI routines which don't
-do anything, but allow you to link the benchmarks. Actually they do a
-few things, but nothing important. Note that the dummy library is
-sufficient only for the NPB 2.x/3.x benchmarks. It probably won't be
-useful for anything else because it implements only a handful of
-functions.
-
-Because the dummy library is just an extra goody, and since we don't
-have an infinite amount of time, it may be a bit trickier to configure
-than the rest of the benchmarks. You need to:
-
-1. Find out how C and Fortran interact on your machine. On most machines,
-the fortran functon foo(x) is declared in C as foo_(xp) where xp is
-a pointer, not a value. On IBMs, it's just foo(xp). On Cray C90s, its
-FOO(xp). You can define CRAY or IBM to get these, or you need to
-edit wtime.c if you've got something else.
-
-2. Edit the Makefile to compile mpi_dummy.f and wtime.c correctly
-for your machine (including -DCRAY or -DIBM if necessary).
-
-3. The substitute MPI timer gives wall clock time, not CPU time.
-If you're running on a timeshared machine, you may want to
-use a CPU timer. Edit the function mpi_wtime() in mpi_dummy.f
-to change this timer. (NOTE: for official benchmark results,
-ONLY wall clock times are valid. Using a CPU timer is ok
-if you want to get things running, but don't report any results
-measured with a CPU timer. )
-
-TROUBLESHOOTING
-
-o Compiling or linking of the benchmark aborts because the dummy MPI
- header file or the dummy MPI library cannot be found.
- - the file make.dummy in subdirectory config relies on the use
- of the -I"path" and -L"path" -l"library" constructs to pass
- information to the compilers and linkers. Edit this file to conform
- to your system.
+++ /dev/null
-#define MPI_DOUBLE 1
-#define MPI_INT 2
-#define MPI_BYTE 3
-#define MPI_FLOAT 4
-#define MPI_LONG 5
-
-#define MPI_COMM_WORLD 0
-
-#define MPI_MAX 1
-#define MPI_SUM 2
-#define MPI_MIN 3
-
-#define MPI_SUCCESS 0
-#define MPI_ANY_SOURCE -1
-#define MPI_ERR_OTHER -1
-#define MPI_STATUS_SIZE 3
-
-
-/*
- Status object. It is the only user-visible MPI data-structure
- The "count" field is PRIVATE; use MPI_Get_count to access it.
- */
-typedef struct {
- int count;
- int MPI_SOURCE;
- int MPI_TAG;
- int MPI_ERROR;
-} MPI_Status;
-
-
-/* MPI request objects */
-typedef int MPI_Request;
-
-/* MPI datatype */
-typedef int MPI_Datatype;
-
-/* MPI comm */
-typedef int MPI_Comm;
-
-/* MPI operation */
-typedef int MPI_Op;
-
-
-
-/* Prototypes: */
-void mpi_error( void );
-
-int MPI_Irecv( void *buf,
- int count,
- MPI_Datatype datatype,
- int source,
- int tag,
- MPI_Comm comm,
- MPI_Request *request );
-
-int MPI_Send( void *buf,
- int count,
- MPI_Datatype datatype,
- int dest,
- int tag,
- MPI_Comm comm );
-
-int MPI_Wait( MPI_Request *request,
- MPI_Status *status );
-
-int MPI_Init( int *argc,
- char ***argv );
-
-int MPI_Comm_rank( MPI_Comm comm,
- int *rank );
-
-int MPI_Comm_size( MPI_Comm comm,
- int *size );
-
-double MPI_Wtime( void );
-
-int MPI_Barrier( MPI_Comm comm );
-
-int MPI_Finalize( void );
-
-int MPI_Allreduce( void *sendbuf,
- void *recvbuf,
- int nitems,
- MPI_Datatype type,
- MPI_Op op,
- MPI_Comm comm );
-
-int MPI_Reduce( void *sendbuf,
- void *recvbuf,
- int nitems,
- MPI_Datatype type,
- MPI_Op op,
- int root,
- MPI_Comm comm );
-
-int MPI_Alltoall( void *sendbuf,
- int sendcount,
- MPI_Datatype sendtype,
- void *recvbuf,
- int recvcount,
- MPI_Datatype recvtype,
- MPI_Comm comm );
-
-int MPI_Alltoallv( void *sendbuf,
- int *sendcounts,
- int *senddispl,
- MPI_Datatype sendtype,
- void *recvbuf,
- int *recvcounts,
- int *recvdispl,
- MPI_Datatype recvtype,
- MPI_Comm comm );
+++ /dev/null
-#include <stdlib.h>
-#include "mpi.h"
-#include "wtime.h"
-
-void mpi_error( void )
-{
- printf( "mpi_error called\n" );
- abort();
-}
-
-
-
-
-int MPI_Irecv( void *buf,
- int count,
- MPI_Datatype datatype,
- int source,
- int tag,
- MPI_Comm comm,
- MPI_Request *request )
-{
- mpi_error();
- return( MPI_ERR_OTHER );
-}
-
-
-
-
-int MPI_Recv( void *buf,
- int count,
- MPI_Datatype datatype,
- int source,
- int tag,
- MPI_Comm comm,
- MPI_Status *status )
-{
- mpi_error();
- return( MPI_ERR_OTHER );
-}
-
-
-
-
-int MPI_Send( void *buf,
- int count,
- MPI_Datatype datatype,
- int dest,
- int tag,
- MPI_Comm comm )
-{
- mpi_error();
- return( MPI_ERR_OTHER );
-}
-
-
-
-
-int MPI_Wait( MPI_Request *request,
- MPI_Status *status )
-{
- mpi_error();
- return( MPI_ERR_OTHER );
-}
-
-
-
-
-int MPI_Init( int *argc,
- char ***argv )
-{
- return( MPI_SUCCESS );
-}
-
-
-
-
-int MPI_Comm_rank( MPI_Comm comm,
- int *rank )
-{
- *rank = 0;
- return( MPI_SUCCESS );
-}
-
-
-
-
-int MPI_Comm_size( MPI_Comm comm,
- int *size )
-{
- *size = 1;
- return( MPI_SUCCESS );
-}
-
-
-
-
-double MPI_Wtime( void )
-{
- void wtime();
-
- double t;
- wtime( &t );
- return( t );
-}
-
-
-
-
-int MPI_Barrier( MPI_Comm comm )
-{
- return( MPI_SUCCESS );
-}
-
-
-
-
-int MPI_Finalize( void )
-{
- return( MPI_SUCCESS );
-}
-
-
-
-
-int MPI_Allreduce( void *sendbuf,
- void *recvbuf,
- int nitems,
- MPI_Datatype type,
- MPI_Op op,
- MPI_Comm comm )
-{
- int i;
- if( type == MPI_INT )
- {
- int *pd_sendbuf, *pd_recvbuf;
- pd_sendbuf = (int *) sendbuf;
- pd_recvbuf = (int *) recvbuf;
- for( i=0; i<nitems; i++ )
- *(pd_recvbuf+i) = *(pd_sendbuf+i);
- }
- if( type == MPI_LONG )
- {
- long *pd_sendbuf, *pd_recvbuf;
- pd_sendbuf = (long *) sendbuf;
- pd_recvbuf = (long *) recvbuf;
- for( i=0; i<nitems; i++ )
- *(pd_recvbuf+i) = *(pd_sendbuf+i);
- }
- if( type == MPI_DOUBLE )
- {
- double *pd_sendbuf, *pd_recvbuf;
- pd_sendbuf = (double *) sendbuf;
- pd_recvbuf = (double *) recvbuf;
- for( i=0; i<nitems; i++ )
- *(pd_recvbuf+i) = *(pd_sendbuf+i);
- }
- return( MPI_SUCCESS );
-}
-
-
-
-
-int MPI_Reduce( void *sendbuf,
- void *recvbuf,
- int nitems,
- MPI_Datatype type,
- MPI_Op op,
- int root,
- MPI_Comm comm )
-{
- int i;
- if( type == MPI_INT )
- {
- int *pi_sendbuf, *pi_recvbuf;
- pi_sendbuf = (int *) sendbuf;
- pi_recvbuf = (int *) recvbuf;
- for( i=0; i<nitems; i++ )
- *(pi_recvbuf+i) = *(pi_sendbuf+i);
- }
- if( type == MPI_LONG )
- {
- long *pi_sendbuf, *pi_recvbuf;
- pi_sendbuf = (long *) sendbuf;
- pi_recvbuf = (long *) recvbuf;
- for( i=0; i<nitems; i++ )
- *(pi_recvbuf+i) = *(pi_sendbuf+i);
- }
- if( type == MPI_DOUBLE )
- {
- double *pd_sendbuf, *pd_recvbuf;
- pd_sendbuf = (double *) sendbuf;
- pd_recvbuf = (double *) recvbuf;
- for( i=0; i<nitems; i++ )
- *(pd_recvbuf+i) = *(pd_sendbuf+i);
- }
- return( MPI_SUCCESS );
-}
-
-
-
-
-int MPI_Alltoall( void *sendbuf,
- int sendcount,
- MPI_Datatype sendtype,
- void *recvbuf,
- int recvcount,
- MPI_Datatype recvtype,
- MPI_Comm comm )
-{
- int i;
- if( recvtype == MPI_INT )
- {
- int *pd_sendbuf, *pd_recvbuf;
- pd_sendbuf = (int *) sendbuf;
- pd_recvbuf = (int *) recvbuf;
- for( i=0; i<sendcount; i++ )
- *(pd_recvbuf+i) = *(pd_sendbuf+i);
- }
- if( recvtype == MPI_LONG )
- {
- long *pd_sendbuf, *pd_recvbuf;
- pd_sendbuf = (long *) sendbuf;
- pd_recvbuf = (long *) recvbuf;
- for( i=0; i<sendcount; i++ )
- *(pd_recvbuf+i) = *(pd_sendbuf+i);
- }
- return( MPI_SUCCESS );
-}
-
-
-
-
-int MPI_Alltoallv( void *sendbuf,
- int *sendcounts,
- int *senddispl,
- MPI_Datatype sendtype,
- void *recvbuf,
- int *recvcounts,
- int *recvdispl,
- MPI_Datatype recvtype,
- MPI_Comm comm )
-{
- int i;
- if( recvtype == MPI_INT )
- {
- int *pd_sendbuf, *pd_recvbuf;
- pd_sendbuf = (int *) sendbuf;
- pd_recvbuf = (int *) recvbuf;
- for( i=0; i<sendcounts[0]; i++ )
- *(pd_recvbuf+i+recvdispl[0]) = *(pd_sendbuf+i+senddispl[0]);
- }
- if( recvtype == MPI_LONG )
- {
- long *pd_sendbuf, *pd_recvbuf;
- pd_sendbuf = (long *) sendbuf;
- pd_recvbuf = (long *) recvbuf;
- for( i=0; i<sendcounts[0]; i++ )
- *(pd_recvbuf+i+recvdispl[0]) = *(pd_sendbuf+i+senddispl[0]);
- }
- return( MPI_SUCCESS );
-}
-
-
-
-
+++ /dev/null
- subroutine mpi_isend(buf,count,datatype,source,
- & tag,comm,request,ierror)
- integer buf(*), count,datatype,source,tag,comm,
- & request,ierror
- call mpi_error()
- return
- end
-
- subroutine mpi_irecv(buf,count,datatype,source,
- & tag,comm,request,ierror)
- integer buf(*), count,datatype,source,tag,comm,
- & request,ierror
- call mpi_error()
- return
- end
-
- subroutine mpi_send(buf,count,datatype,dest,tag,comm,ierror)
- integer buf(*), count,datatype,dest,tag,comm,ierror
- call mpi_error()
- return
- end
-
- subroutine mpi_recv(buf,count,datatype,source,
- & tag,comm,status,ierror)
- integer buf(*), count,datatype,source,tag,comm,
- & status(*),ierror
- call mpi_error()
- return
- end
-
- subroutine mpi_comm_split(comm,color,key,newcomm,ierror)
- integer comm,color,key,newcomm,ierror
- return
- end
-
- subroutine mpi_comm_rank(comm, rank,ierr)
- implicit none
- integer comm, rank,ierr
- rank = 0
- return
- end
-
- subroutine mpi_comm_size(comm, size, ierr)
- implicit none
- integer comm, size, ierr
- size = 1
- return
- end
-
- double precision function mpi_wtime()
- implicit none
- double precision t
-c This function must measure wall clock time, not CPU time.
-c Since there is no portable timer in Fortran (77)
-c we call a routine compiled in C (though the C source may have
-c to be tweaked).
- call wtime(t)
-c The following is not ok for "official" results because it reports
-c CPU time not wall clock time. It may be useful for developing/testing
-c on timeshared Crays, though.
-c call second(t)
-
- mpi_wtime = t
-
- return
- end
-
-
-c may be valid to call this in single processor case
- subroutine mpi_barrier(comm,ierror)
- return
- end
-
-c may be valid to call this in single processor case
- subroutine mpi_bcast(buf, nitems, type, root, comm, ierr)
- implicit none
- integer buf(*), nitems, type, root, comm, ierr
- return
- end
-
- subroutine mpi_comm_dup(oldcomm, newcomm,ierror)
- integer oldcomm, newcomm,ierror
- newcomm= oldcomm
- return
- end
-
- subroutine mpi_error()
- print *, 'mpi_error called'
- stop
- end
-
- subroutine mpi_abort(comm, errcode, ierr)
- implicit none
- integer comm, errcode, ierr
- print *, 'mpi_abort called'
- stop
- end
-
- subroutine mpi_finalize(ierr)
- return
- end
-
- subroutine mpi_init(ierr)
- return
- end
-
-
-c assume double precision, which is all SP uses
- subroutine mpi_reduce(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- implicit none
- include 'mpif.h'
- integer nitems, type, op, root, comm, ierr
- double precision inbuf(*), outbuf(*)
-
- if (type .eq. mpi_double_precision) then
- call mpi_reduce_dp(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- else if (type .eq. mpi_double_complex) then
- call mpi_reduce_dc(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- else if (type .eq. mpi_complex) then
- call mpi_reduce_complex(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- else if (type .eq. mpi_real) then
- call mpi_reduce_real(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- else if (type .eq. mpi_integer) then
- call mpi_reduce_int(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- else
- print *, 'mpi_reduce: unknown type ', type
- end if
- return
- end
-
-
- subroutine mpi_reduce_real(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- implicit none
- integer nitems, type, op, root, comm, ierr, i
- real inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
- subroutine mpi_reduce_dp(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- implicit none
- integer nitems, type, op, root, comm, ierr, i
- double precision inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
- subroutine mpi_reduce_dc(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- implicit none
- integer nitems, type, op, root, comm, ierr, i
- double complex inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
-
- subroutine mpi_reduce_complex(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- implicit none
- integer nitems, type, op, root, comm, ierr, i
- complex inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
- subroutine mpi_reduce_int(inbuf, outbuf, nitems,
- $ type, op, root, comm, ierr)
- implicit none
- integer nitems, type, op, root, comm, ierr, i
- integer inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
- subroutine mpi_allreduce(inbuf, outbuf, nitems,
- $ type, op, comm, ierr)
- implicit none
- integer nitems, type, op, comm, ierr
- double precision inbuf(*), outbuf(*)
-
- call mpi_reduce(inbuf, outbuf, nitems,
- $ type, op, 0, comm, ierr)
- return
- end
-
- subroutine mpi_alltoall(inbuf, nitems, type, outbuf, nitems_dum,
- $ type_dum, comm, ierr)
- implicit none
- include 'mpif.h'
- integer nitems, type, comm, ierr, nitems_dum, type_dum
- double precision inbuf(*), outbuf(*)
- if (type .eq. mpi_double_precision) then
- call mpi_alltoall_dp(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- else if (type .eq. mpi_double_complex) then
- call mpi_alltoall_dc(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- else if (type .eq. mpi_complex) then
- call mpi_alltoall_complex(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- else if (type .eq. mpi_real) then
- call mpi_alltoall_real(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- else if (type .eq. mpi_integer) then
- call mpi_alltoall_int(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- else
- print *, 'mpi_alltoall: unknown type ', type
- end if
- return
- end
-
- subroutine mpi_alltoall_dc(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- implicit none
- integer nitems, type, comm, ierr, i
- double complex inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
-
- subroutine mpi_alltoall_complex(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- implicit none
- integer nitems, type, comm, ierr, i
- double complex inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
- subroutine mpi_alltoall_dp(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- implicit none
- integer nitems, type, comm, ierr, i
- double precision inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
- subroutine mpi_alltoall_real(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- implicit none
- integer nitems, type, comm, ierr, i
- real inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
- subroutine mpi_alltoall_int(inbuf, outbuf, nitems,
- $ type, comm, ierr)
- implicit none
- integer nitems, type, comm, ierr, i
- integer inbuf(*), outbuf(*)
- do i = 1, nitems
- outbuf(i) = inbuf(i)
- end do
-
- return
- end
-
- subroutine mpi_wait(request,status,ierror)
- integer request,status,ierror
- call mpi_error()
- return
- end
-
- subroutine mpi_waitall(count,requests,status,ierror)
- integer count,requests(*),status(*),ierror
- call mpi_error()
- return
- end
-
+++ /dev/null
- integer mpi_comm_world
- parameter (mpi_comm_world = 0)
-
- integer mpi_max, mpi_min, mpi_sum
- parameter (mpi_max = 1, mpi_sum = 2, mpi_min = 3)
-
- integer mpi_byte, mpi_integer, mpi_real,
- > mpi_double_precision, mpi_complex,
- > mpi_double_complex
- parameter (mpi_double_precision = 1,
- $ mpi_integer = 2,
- $ mpi_byte = 3,
- $ mpi_real= 4,
- $ mpi_complex = 5,
- $ mpi_double_complex = 6)
-
- integer mpi_any_source
- parameter (mpi_any_source = -1)
-
- integer mpi_err_other
- parameter (mpi_err_other = -1)
-
- double precision mpi_wtime
- external mpi_wtime
-
- integer mpi_status_size
- parameter (mpi_status_size=3)
+++ /dev/null
- program
- implicit none
- double precision t, mpi_wtime
- external mpi_wtime
- t = 0.0
- t = mpi_wtime()
- print *, t
- t = mpi_wtime()
- print *, t
- end
+++ /dev/null
-#include "wtime.h"
-#include <sys/time.h>
-
-void wtime(double *t)
-{
- static int sec = -1;
- struct timeval tv;
- gettimeofday(&tv, (void *)0);
- if (sec < 0) sec = tv.tv_sec;
- *t = (tv.tv_sec - sec) + 1.0e-6*tv.tv_usec;
-}
-
-
+++ /dev/null
- subroutine wtime(tim)
- real*8 tim
- dimension tarray(2)
- call etime(tarray)
- tim = tarray(1)
- return
- end
-
-
-
-
-
+++ /dev/null
-/* C/Fortran interface is different on different machines.
- * You may need to tweak this.
- */
-
-
-#if defined(IBM)
-#define wtime wtime
-#elif defined(CRAY)
-#define wtime WTIME
-#else
-#define wtime wtime_
-#endif
+++ /dev/null
-#include <sys/types.h>
-#include <fcntl.h>
-#include <sys/mman.h>
-#include <sys/syssgi.h>
-#include <sys/immu.h>
-#include <errno.h>
-#include <stdio.h>
-
-/* The following works on SGI Power Challenge systems */
-
-typedef unsigned long iotimer_t;
-
-unsigned int cycleval;
-volatile iotimer_t *iotimer_addr, base_counter;
-double resolution;
-
-/* address_t is an integer type big enough to hold an address */
-typedef unsigned long address_t;
-
-
-
-void timer_init()
-{
-
- int fd;
- char *virt_addr;
- address_t phys_addr, page_offset, pagemask, pagebase_addr;
-
- pagemask = getpagesize() - 1;
- errno = 0;
- phys_addr = syssgi(SGI_QUERY_CYCLECNTR, &cycleval);
- if (errno != 0) {
- perror("SGI_QUERY_CYCLECNTR");
- exit(1);
- }
- /* rel_addr = page offset of physical address */
- page_offset = phys_addr & pagemask;
- pagebase_addr = phys_addr - page_offset;
- fd = open("/dev/mmem", O_RDONLY);
-
- virt_addr = mmap(0, pagemask, PROT_READ, MAP_PRIVATE, fd, pagebase_addr);
- virt_addr = virt_addr + page_offset;
- iotimer_addr = (iotimer_t *)virt_addr;
- /* cycleval in picoseconds to this gives resolution in seconds */
- resolution = 1.0e-12*cycleval;
- base_counter = *iotimer_addr;
-}
-
-void wtime_(double *time)
-{
- static int initialized = 0;
- volatile iotimer_t counter_value;
- if (!initialized) {
- timer_init();
- initialized = 1;
- }
- counter_value = *iotimer_addr - base_counter;
- *time = (double)counter_value * resolution;
-}
-
-
-void wtime(double *time)
-{
- static int initialized = 0;
- volatile iotimer_t counter_value;
- if (!initialized) {
- timer_init();
- initialized = 1;
- }
- counter_value = *iotimer_addr - base_counter;
- *time = (double)counter_value * resolution;
-}
-
-
SHELL=/bin/sh
-CLASS=U
+CLASS=S
NPROCS=1
SUBTYPE=
VERSION=
is: header
cd IS; $(MAKE) NPROCS=$(NPROCS) CLASS=$(CLASS)
-IS-trace: is-trace
-is-trace: header
- cd IS-trace; $(MAKE) NPROCS=$(NPROCS) CLASS=$(CLASS)
EP: ep
ep: header
cd EP; $(MAKE) NPROCS=$(NPROCS) CLASS=$(CLASS)
-EP-trace: ep-trace
-ep-trace: header
- cd EP-trace; $(MAKE) NPROCS=$(NPROCS) CLASS=$(CLASS)
-
-EP-sampling: ep-sampling
-ep-sampling: header
- cd EP-sampling; $(MAKE) NPROCS=$(NPROCS) CLASS=$(CLASS)
-
DT: dt
dt: header
cd DT; $(MAKE) CLASS=$(CLASS)
-DT-trace: dt-trace
-dt-trace: header
- cd DT-trace; $(MAKE) CLASS=$(CLASS)
-
-DT-folding: dt-folding
-dt-folding: header
- cd DT-folding; $(MAKE) CLASS=$(CLASS)
-
# Awk script courtesy cmg@cray.com, modified by Haoqiang Jin
suite:
@ awk -f sys/suite.awk SMAKE=$(MAKE) $(SFILE) | $(SHELL)
# are defined) but on a really clean system this will won't work
# because those makefiles need config/make.def
clean:
- - rm -f core
- - rm -f *~ */core */*~ */*.o */npbparams.h */*.obj */*.exe
- - rm -f MPI_dummy/test MPI_dummy/libmpi.a
+ - rm -f *~ */*~ */*.o */npbparams.h
- rm -f sys/setparams sys/makesuite sys/setparams.h
- - rm -f btio.*.out*
veryclean: clean
- rm -f config/make.def config/suite.def
- - rm -f bin/sp.* bin/lu.* bin/mg.* bin/ft.* bin/bt.* bin/is.*
- - rm -f bin/ep.* bin/cg.* bin/dt.*
+ - rm -f bin/is.* bin/ep.* bin/dt.*
header:
@ sys/print_header
+++ /dev/null
-
- subroutine print_results(name, class, n1, n2, n3, niter,
- > nprocs_compiled, nprocs_total,
- > t, mops, optype, verified, npbversion,
- > compiletime, cs1, cs2, cs3, cs4, cs5, cs6, cs7)
-
- implicit none
- character*2 name
- character*1 class
- integer n1, n2, n3, niter, nprocs_compiled, nprocs_total, j
- double precision t, mops
- character optype*24, size*15
- logical verified
- character*(*) npbversion, compiletime,
- > cs1, cs2, cs3, cs4, cs5, cs6, cs7
-
- write (*, 2) name
- 2 format(//, ' ', A2, ' Benchmark Completed.')
-
- write (*, 3) Class
- 3 format(' Class = ', 12x, a12)
-
-c If this is not a grid-based problem (EP, FT, CG), then
-c we only print n1, which contains some measure of the
-c problem size. In that case, n2 and n3 are both zero.
-c Otherwise, we print the grid size n1xn2xn3
-
- if ((n2 .eq. 0) .and. (n3 .eq. 0)) then
- if (name(1:2) .eq. 'EP') then
- write(size, '(f15.0)' ) 2.d0**n1
- j = 15
- if (size(j:j) .eq. '.') j = j - 1
- write (*,42) size(1:j)
- 42 format(' Size = ',9x, a15)
- else
- write (*,44) n1
- 44 format(' Size = ',12x, i12)
- endif
- else
- write (*, 4) n1,n2,n3
- 4 format(' Size = ',9x, i4,'x',i4,'x',i4)
- endif
-
- write (*, 5) niter
- 5 format(' Iterations = ', 12x, i12)
-
- write (*, 6) t
- 6 format(' Time in seconds = ',12x, f12.2)
-
- write (*,7) nprocs_total
- 7 format(' Total processes = ', 12x, i12)
-
- write (*,8) nprocs_compiled
- 8 format(' Compiled procs = ', 12x, i12)
-
- write (*,9) mops
- 9 format(' Mop/s total = ',12x, f12.2)
-
- write (*,10) mops/float( nprocs_total )
- 10 format(' Mop/s/process = ', 12x, f12.2)
-
- write(*, 11) optype
- 11 format(' Operation type = ', a24)
-
- if (verified) then
- write(*,12) ' SUCCESSFUL'
- else
- write(*,12) 'UNSUCCESSFUL'
- endif
- 12 format(' Verification = ', 12x, a)
-
- write(*,13) npbversion
- 13 format(' Version = ', 12x, a12)
-
- write(*,14) compiletime
- 14 format(' Compile date = ', 12x, a12)
-
-
- write (*,121) cs1
- 121 format(/, ' Compile options:', /,
- > ' MPIF77 = ', A)
-
- write (*,122) cs2
- 122 format(' FLINK = ', A)
-
- write (*,123) cs3
- 123 format(' FMPI_LIB = ', A)
-
- write (*,124) cs4
- 124 format(' FMPI_INC = ', A)
-
- write (*,125) cs5
- 125 format(' FFLAGS = ', A)
-
- write (*,126) cs6
- 126 format(' FLINKFLAGS = ', A)
-
- write(*, 127) cs7
- 127 format(' RAND = ', A)
-
- write (*,130)
- 130 format(//' Please send the results of this run to:'//
- > ' NPB Development Team '/
- > ' Internet: npb@nas.nasa.gov'/
- > ' '/
- > ' If email is not available, send this to:'//
- > ' MS T27A-1'/
- > ' NASA Ames Research Center'/
- > ' Moffett Field, CA 94035-1000'//
- > ' Fax: 650-604-3957'//)
-
-
- return
- end
-
+++ /dev/null
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- double precision function randlc (x, a)
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
-c---------------------------------------------------------------------
-c
-c This routine returns a uniform pseudorandom double precision number in the
-c range (0, 1) by using the linear congruential generator
-c
-c x_{k+1} = a x_k (mod 2^46)
-c
-c where 0 < x_k < 2^46 and 0 < a < 2^46. This scheme generates 2^44 numbers
-c before repeating. The argument A is the same as 'a' in the above formula,
-c and X is the same as x_0. A and X must be odd double precision integers
-c in the range (1, 2^46). The returned value RANDLC is normalized to be
-c between 0 and 1, i.e. RANDLC = 2^(-46) * x_1. X is updated to contain
-c the new seed x_1, so that subsequent calls to RANDLC using the same
-c arguments will generate a continuous sequence.
-c
-c This routine should produce the same results on any computer with at least
-c 48 mantissa bits in double precision floating point data. On 64 bit
-c systems, double precision should be disabled.
-c
-c David H. Bailey October 26, 1990
-c
-c---------------------------------------------------------------------
-
- implicit none
-
- double precision r23,r46,t23,t46,a,x,t1,t2,t3,t4,a1,a2,x1,x2,z
- parameter (r23 = 0.5d0 ** 23, r46 = r23 ** 2, t23 = 2.d0 ** 23,
- > t46 = t23 ** 2)
-
-c---------------------------------------------------------------------
-c Break A into two parts such that A = 2^23 * A1 + A2.
-c---------------------------------------------------------------------
- t1 = r23 * a
- a1 = int (t1)
- a2 = a - t23 * a1
-
-c---------------------------------------------------------------------
-c Break X into two parts such that X = 2^23 * X1 + X2, compute
-c Z = A1 * X2 + A2 * X1 (mod 2^23), and then
-c X = 2^23 * Z + A2 * X2 (mod 2^46).
-c---------------------------------------------------------------------
- t1 = r23 * x
- x1 = int (t1)
- x2 = x - t23 * x1
- t1 = a1 * x2 + a2 * x1
- t2 = int (r23 * t1)
- z = t1 - t23 * t2
- t3 = t23 * z + a2 * x2
- t4 = int (r46 * t3)
- x = t3 - t46 * t4
- randlc = r46 * x
-
- return
- end
-
-
-
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- subroutine vranlc (n, x, a, y)
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
-c---------------------------------------------------------------------
-c
-c This routine generates N uniform pseudorandom double precision numbers in
-c the range (0, 1) by using the linear congruential generator
-c
-c x_{k+1} = a x_k (mod 2^46)
-c
-c where 0 < x_k < 2^46 and 0 < a < 2^46. This scheme generates 2^44 numbers
-c before repeating. The argument A is the same as 'a' in the above formula,
-c and X is the same as x_0. A and X must be odd double precision integers
-c in the range (1, 2^46). The N results are placed in Y and are normalized
-c to be between 0 and 1. X is updated to contain the new seed, so that
-c subsequent calls to VRANLC using the same arguments will generate a
-c continuous sequence. If N is zero, only initialization is performed, and
-c the variables X, A and Y are ignored.
-c
-c This routine is the standard version designed for scalar or RISC systems.
-c However, it should produce the same results on any single processor
-c computer with at least 48 mantissa bits in double precision floating point
-c data. On 64 bit systems, double precision should be disabled.
-c
-c---------------------------------------------------------------------
-
- implicit none
-
- integer i,n
- double precision y,r23,r46,t23,t46,a,x,t1,t2,t3,t4,a1,a2,x1,x2,z
- dimension y(*)
- parameter (r23 = 0.5d0 ** 23, r46 = r23 ** 2, t23 = 2.d0 ** 23,
- > t46 = t23 ** 2)
-
-
-c---------------------------------------------------------------------
-c Break A into two parts such that A = 2^23 * A1 + A2.
-c---------------------------------------------------------------------
- t1 = r23 * a
- a1 = int (t1)
- a2 = a - t23 * a1
-
-c---------------------------------------------------------------------
-c Generate N results. This loop is not vectorizable.
-c---------------------------------------------------------------------
- do i = 1, n
-
-c---------------------------------------------------------------------
-c Break X into two parts such that X = 2^23 * X1 + X2, compute
-c Z = A1 * X2 + A2 * X1 (mod 2^23), and then
-c X = 2^23 * Z + A2 * X2 (mod 2^46).
-c---------------------------------------------------------------------
- t1 = r23 * x
- x1 = int (t1)
- x2 = x - t23 * x1
- t1 = a1 * x2 + a2 * x1
- t2 = int (r23 * t1)
- z = t1 - t23 * t2
- t3 = t23 * z + a2 * x2
- t4 = int (r46 * t3)
- x = t3 - t46 * t4
- y(i) = r46 * x
- enddo
-
- return
- end
+++ /dev/null
-c---------------------------------------------------------------------
- double precision function randlc (x, a)
-c---------------------------------------------------------------------
-
-c---------------------------------------------------------------------
-c
-c This routine returns a uniform pseudorandom double precision number in the
-c range (0, 1) by using the linear congruential generator
-c
-c x_{k+1} = a x_k (mod 2^46)
-c
-c where 0 < x_k < 2^46 and 0 < a < 2^46. This scheme generates 2^44 numbers
-c before repeating. The argument A is the same as 'a' in the above formula,
-c and X is the same as x_0. A and X must be odd double precision integers
-c in the range (1, 2^46). The returned value RANDLC is normalized to be
-c between 0 and 1, i.e. RANDLC = 2^(-46) * x_1. X is updated to contain
-c the new seed x_1, so that subsequent calls to RANDLC using the same
-c arguments will generate a continuous sequence.
-c
-c This routine should produce the same results on any computer with at least
-c 48 mantissa bits in double precision floating point data. On 64 bit
-c systems, double precision should be disabled.
-c
-c David H. Bailey October 26, 1990
-c
-c---------------------------------------------------------------------
-
- implicit none
-
- double precision r23,r46,t23,t46,a,x,t1,t2,t3,t4,a1,a2,x1,x2,z
- parameter (r23 = 0.5d0 ** 23, r46 = r23 ** 2, t23 = 2.d0 ** 23,
- > t46 = t23 ** 2)
-
-c---------------------------------------------------------------------
-c Break A into two parts such that A = 2^23 * A1 + A2.
-c---------------------------------------------------------------------
- t1 = r23 * a
- a1 = int (t1)
- a2 = a - t23 * a1
-
-c---------------------------------------------------------------------
-c Break X into two parts such that X = 2^23 * X1 + X2, compute
-c Z = A1 * X2 + A2 * X1 (mod 2^23), and then
-c X = 2^23 * Z + A2 * X2 (mod 2^46).
-c---------------------------------------------------------------------
- t1 = r23 * x
- x1 = int (t1)
- x2 = x - t23 * x1
-
-
- t1 = a1 * x2 + a2 * x1
- t2 = int (r23 * t1)
- z = t1 - t23 * t2
- t3 = t23 * z + a2 * x2
- t4 = int (r46 * t3)
- x = t3 - t46 * t4
- randlc = r46 * x
- return
- end
-
-
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- subroutine vranlc (n, x, a, y)
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
-c---------------------------------------------------------------------
-c This routine generates N uniform pseudorandom double precision numbers in
-c the range (0, 1) by using the linear congruential generator
-c
-c x_{k+1} = a x_k (mod 2^46)
-c
-c where 0 < x_k < 2^46 and 0 < a < 2^46. This scheme generates 2^44 numbers
-c before repeating. The argument A is the same as 'a' in the above formula,
-c and X is the same as x_0. A and X must be odd double precision integers
-c in the range (1, 2^46). The N results are placed in Y and are normalized
-c to be between 0 and 1. X is updated to contain the new seed, so that
-c subsequent calls to RANDLC using the same arguments will generate a
-c continuous sequence.
-c
-c This routine generates the output sequence in batches of length NV, for
-c convenience on vector computers. This routine should produce the same
-c results on any computer with at least 48 mantissa bits in double precision
-c floating point data. On Cray systems, double precision should be disabled.
-c
-c David H. Bailey August 30, 1990
-c---------------------------------------------------------------------
-
- integer n
- double precision x, a, y(*)
-
- double precision r23, r46, t23, t46
- integer nv
- parameter (r23 = 2.d0 ** (-23), r46 = r23 * r23, t23 = 2.d0 ** 23,
- > t46 = t23 * t23, nv = 64)
- double precision xv(nv), t1, t2, t3, t4, an, a1, a2, x1, x2, yy
- integer n1, i, j
- external randlc
- double precision randlc
-
-c---------------------------------------------------------------------
-c Compute the first NV elements of the sequence using RANDLC.
-c---------------------------------------------------------------------
- t1 = x
- n1 = min (n, nv)
-
- do i = 1, n1
- xv(i) = t46 * randlc (t1, a)
- enddo
-
-c---------------------------------------------------------------------
-c It is not necessary to compute AN, A1 or A2 unless N is greater than NV.
-c---------------------------------------------------------------------
- if (n .gt. nv) then
-
-c---------------------------------------------------------------------
-c Compute AN = AA ^ NV (mod 2^46) using successive calls to RANDLC.
-c---------------------------------------------------------------------
- t1 = a
- t2 = r46 * a
-
- do i = 1, nv - 1
- t2 = randlc (t1, a)
- enddo
-
- an = t46 * t2
-
-c---------------------------------------------------------------------
-c Break AN into two parts such that AN = 2^23 * A1 + A2.
-c---------------------------------------------------------------------
- t1 = r23 * an
- a1 = aint (t1)
- a2 = an - t23 * a1
- endif
-
-c---------------------------------------------------------------------
-c Compute N pseudorandom results in batches of size NV.
-c---------------------------------------------------------------------
- do j = 0, n - 1, nv
- n1 = min (nv, n - j)
-
-c---------------------------------------------------------------------
-c Compute up to NV results based on the current seed vector XV.
-c---------------------------------------------------------------------
- do i = 1, n1
- y(i+j) = r46 * xv(i)
- enddo
-
-c---------------------------------------------------------------------
-c If this is the last pass through the 140 loop, it is not necessary to
-c update the XV vector.
-c---------------------------------------------------------------------
- if (j + n1 .eq. n) goto 150
-
-c---------------------------------------------------------------------
-c Update the XV vector by multiplying each element by AN (mod 2^46).
-c---------------------------------------------------------------------
- do i = 1, nv
- t1 = r23 * xv(i)
- x1 = aint (t1)
- x2 = xv(i) - t23 * x1
- t1 = a1 * x2 + a2 * x1
- t2 = aint (r23 * t1)
- yy = t1 - t23 * t2
- t3 = t23 * yy + a2 * x2
- t4 = aint (r46 * t3)
- xv(i) = t3 - t46 * t4
- enddo
-
- enddo
-
-c---------------------------------------------------------------------
-c Save the last seed in X so that subsequent calls to VRANLC will generate
-c a continuous sequence.
-c---------------------------------------------------------------------
- 150 x = xv(n1)
-
- return
- end
-
-c----- end of program ------------------------------------------------
-
+++ /dev/null
- double precision function randlc(x, a)
-
-c---------------------------------------------------------------------
-c
-c This routine returns a uniform pseudorandom double precision number in the
-c range (0, 1) by using the linear congruential generator
-c
-c x_{k+1} = a x_k (mod 2^46)
-c
-c where 0 < x_k < 2^46 and 0 < a < 2^46. This scheme generates 2^44 numbers
-c before repeating. The argument A is the same as 'a' in the above formula,
-c and X is the same as x_0. A and X must be odd double precision integers
-c in the range (1, 2^46). The returned value RANDLC is normalized to be
-c between 0 and 1, i.e. RANDLC = 2^(-46) * x_1. X is updated to contain
-c the new seed x_1, so that subsequent calls to RANDLC using the same
-c arguments will generate a continuous sequence.
-
- implicit none
- double precision x, a
- integer*8 i246m1, Lx, La
- double precision d2m46
-
- parameter(d2m46=0.5d0**46)
-
- save i246m1
- data i246m1/X'00003FFFFFFFFFFF'/
-
- Lx = X
- La = A
-
- Lx = iand(Lx*La,i246m1)
- randlc = d2m46*dble(Lx)
- x = dble(Lx)
- return
- end
-
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
-
- SUBROUTINE VRANLC (N, X, A, Y)
- implicit none
- integer n, i
- double precision x, a, y(*)
- integer*8 i246m1, Lx, La
- double precision d2m46
-
-c This doesn't work, because the compiler does the calculation in 32
-c bits and overflows. No standard way (without f90 stuff) to specify
-c that the rhs should be done in 64 bit arithmetic.
-c parameter(i246m1=2**46-1)
-
- parameter(d2m46=0.5d0**46)
-
- save i246m1
- data i246m1/X'00003FFFFFFFFFFF'/
-
-c Note that the v6 compiler on an R8000 does something stupid with
-c the above. Using the following instead (or various other things)
-c makes the calculation run almost 10 times as fast.
-c
-c save d2m46
-c data d2m46/0.0d0/
-c if (d2m46 .eq. 0.0d0) then
-c d2m46 = 0.5d0**46
-c endif
-
- Lx = X
- La = A
- do i = 1, N
- Lx = iand(Lx*La,i246m1)
- y(i) = d2m46*dble(Lx)
- end do
- x = dble(Lx)
-
- return
- end
-
+++ /dev/null
- double precision function randlc(x, a)
-
-c---------------------------------------------------------------------
-c
-c This routine returns a uniform pseudorandom double precision number in the
-c range (0, 1) by using the linear congruential generator
-c
-c x_{k+1} = a x_k (mod 2^46)
-c
-c where 0 < x_k < 2^46 and 0 < a < 2^46. This scheme generates 2^44 numbers
-c before repeating. The argument A is the same as 'a' in the above formula,
-c and X is the same as x_0. A and X must be odd double precision integers
-c in the range (1, 2^46). The returned value RANDLC is normalized to be
-c between 0 and 1, i.e. RANDLC = 2^(-46) * x_1. X is updated to contain
-c the new seed x_1, so that subsequent calls to RANDLC using the same
-c arguments will generate a continuous sequence.
-
- implicit none
- double precision x, a
- integer*8 Lx, La, a1, a2, x1, x2, xa
- double precision d2m46
- parameter(d2m46=0.5d0**46)
-
- Lx = x
- La = A
- a1 = ibits(La, 23, 23)
- a2 = ibits(La, 0, 23)
- x1 = ibits(Lx, 23, 23)
- x2 = ibits(Lx, 0, 23)
- xa = ishft(ibits(a1*x2+a2*x1, 0, 23), 23) + a2*x2
- Lx = ibits(xa,0, 46)
- x = dble(Lx)
- randlc = d2m46*x
- return
- end
-
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
-
- SUBROUTINE VRANLC (N, X, A, Y)
- implicit none
- integer n, i
- double precision x, a, y(*)
- integer*8 Lx, La, a1, a2, x1, x2, xa
- double precision d2m46
- parameter(d2m46=0.5d0**46)
-
- Lx = X
- La = A
- a1 = ibits(La, 23, 23)
- a2 = ibits(La, 0, 23)
- do i = 1, N
- x1 = ibits(Lx, 23, 23)
- x2 = ibits(Lx, 0, 23)
- xa = ishft(ibits(a1*x2+a2*x1, 0, 23), 23) + a2*x2
- Lx = ibits(xa,0, 46)
- y(i) = d2m46*dble(Lx)
- end do
- x = dble(Lx)
- return
- end
-
+++ /dev/null
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- subroutine timer_clear(n)
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- implicit none
- integer n
-
- double precision start(64), elapsed(64)
- common /tt/ start, elapsed
-
- elapsed(n) = 0.0
- return
- end
-
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- subroutine timer_start(n)
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- implicit none
- integer n
- include 'mpif.h'
- double precision start(64), elapsed(64)
- common /tt/ start, elapsed
-
- start(n) = MPI_Wtime()
-
- return
- end
-
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- subroutine timer_stop(n)
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- implicit none
- integer n
- include 'mpif.h'
- double precision start(64), elapsed(64)
- common /tt/ start, elapsed
- double precision t, now
- now = MPI_Wtime()
- t = now - start(n)
- elapsed(n) = elapsed(n) + t
-
- return
- end
-
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- double precision function timer_read(n)
-
-c---------------------------------------------------------------------
-c---------------------------------------------------------------------
-
- implicit none
- integer n
- double precision start(64), elapsed(64)
- common /tt/ start, elapsed
-
- timer_read = elapsed(n)
- return
- end
-
+++ /dev/null
-This directory contains examples of make.def files that were used
-by the NPB team in testing the benchmarks on different platforms.
-They can be used as starting points for make.def files for your
-own platform, but you may need to taylor them for best performance
-on your installation. A clean template can be found in directory
-`config'.
-Some examples of suite.def files are also provided.
\ No newline at end of file
+++ /dev/null
-#This is for a DEC Alpha 8400. The code will execute on a
-#single processor
-#Warning: parallel make does not work properly in general
-MPIF77 = f77
-FLINK = f77
-#Optimization -O5 breaks SP; works fine for all other codes
-FFLAGS = -O4
-
-MPICC = cc
-CLINK = cc
-CFLAGS = -O5
-
-include ../config/make.dummy
-
-CC = cc -g
-BINDIR = ../bin
-
-RAND = randi8
+++ /dev/null
-#This is for a generic single-processor SGI workstation
-MPIF77 = f77
-FLINK = f77
-FFLAGS = -O3
-
-MPICC = cc
-CLINK = cc
-CFLAGS = -O3
-
-include ../config/make.dummy
-
-CC = cc -g
-BINDIR = ../bin
-
-RAND = randi8
-
+++ /dev/null
-# This is for a an SGI Origin 2000 or 3000 with vendor MPI. The Fortran
-# record length is specified, so it can be used for the I/O benchmark.
-# as well
-MPIF77 = f77
-FMPI_LIB = -lmpi
-FLINK = f77 -64
-FFLAGS = -O3 -64
-
-MPICC = cc
-CMPI_LIB = -lmpi
-CLINK = cc
-CFLAGS = -O3
-
-CC = cc -g
-BINDIR = ../bin
-
-RAND = randi8
-
-CONVERTFLAG = -DFORTRAN_REC_SIZE=4
-
+++ /dev/null
-# This is for the SGI PowerChallenge Array at NASA Ames. mrf77 and
-# mrcc are local scripts that invoke the proper MPI library.
-MPIF77 = mrf77
-FLINK = mrf77
-FFLAGS = -O3 -OPT:fold_arith_limit=1204
-
-MPICC = mrcc
-CLINK = mrcc
-CFLAGS = -O3 -OPT:fold_arith_limit=1204
-
-CC = cc -g
-BINDIR = ../bin
-
-RAND = randi8
-
-
+++ /dev/null
-#This is for the IBM SP2 at Ames; mrf77 and mrcc are local scripts
-MPIF77 = mrf77
-FLINK = mrf77
-FFLAGS = -O3
-FLINKFLAGS = -bmaxdata:0x60000000
-
-MPICC = mrcc
-CLINK = mrcc
-CFLAGS = -O3
-CLINKFLAGS = -bmaxdata:0x60000000
-
-CC = cc -g
-
-BINDIR = ../bin
-
-RAND = randi8
-
+++ /dev/null
-# This is for a Sun SparcCenter or UltraEnterprise machine
-MPIF77 = f77
-FLINK = f77
-FMPI_LIB = -L<your mpich installation tree>/lib/solaris/ch_lfshmem -lmpi
-FMPI_INC = -I<your mpich installation tree>/include
-# sparc10,20 SparcCenter{1,2}000 (uname -m returns sun4m)
-# and f77 -V returns 4.0 or greater
-# FFLAGS = -fast -xtarget=super -xO4 -depend
-# Ultra1,2, UltraEnterprise servers (uname -m returns sun4u)
-FFLAGS = -fast -xtarget=ultra -xarch=v8plus -xO4 -depend
-FLINKFLAGS = -lmopt -lcopt -lsunmath
-
-MPICC = cc
-CLINK = cc
-CMPI_LIB = -L<your mpich installation tree>/lib/solaris/ch_lfshmem -lmpi
-CMPI_INC = -I<your mpich installation tree>/include
-# sparc10,20 SparcCenter{1,2}000 (uname -m returns sun4m)
-# and cc -V returns 4.0 or greater
-#CFLAGS = -fast -xtarget=super -xO4 -xdepend
-# Ultra1,2, UltraEnterprise servers (uname -m returns sun4u)
-CFLAGS = -fast -xtarget=ultra -xarch=v8plus -xO4 -xdepend
-CLINKFLAGS = -fast
-
-CC = cc -g
-
-BINDIR = ../bin
-
-# Cannot use randi8 or randi8-safe on a 32-but machine. Use double precision
-RAND = randdp
-
+++ /dev/null
-#This is for the Cray T3D at the Jet Propulsion Laboratory
-MPIF77 = cf77
-FLINK = cf77
-FMPI_LIB = -L/usr/local/mpp/lib -lmpi
-FMPI_INC = -I/usr/local/mpp/lib/include/mpp
-FFLAGS = -dp -Wf-onoieeedivide -C cray-t3d
-#The following flags provide more effective optimization, but may
-#cause the random number generator randi8(_safe) to break in EP
-#FFLAGS = -dp -Wf-oaggress -Wf-onoieeedivide -C cray-t3d
-FLINKFLAGS = -Wl-Drdahead=on -C cray-t3d
-
-MPICC = cc
-CLINK = cc
-CMPI_LIB = -L/usr/local/mpp/lib -lmpi
-CMPI_INC = -I/usr/local/mpp/lib/include/mpp
-CFLAGS = -O3 -Tcray-t3d
-CLINKFLAGS = -Tcray-t3d
-
-CC = cc -g -Tcray-ymp
-BINDIR = ../bin
-
-CONVERTFLAG= -DCONVERTDOUBLE
-
-RAND = randi8
-
+++ /dev/null
-#---------------------------------------------------------------------------
-#
-# SITE- AND/OR PLATFORM-SPECIFIC DEFINITIONS.
-#
-#---------------------------------------------------------------------------
-
-#---------------------------------------------------------------------------
-# Items in this file will need to be changed for each platform.
-# (Note these definitions are inconsistent with NPB2.1.)
-#---------------------------------------------------------------------------
-
-#---------------------------------------------------------------------------
-# Parallel Fortran:
-#
-# For CG, EP, FT, MG, LU, SP and BT, which are in Fortran, the following must
-# be defined:
-#
-# MPIF77 - Fortran compiler
-# FFLAGS - Fortran compilation arguments
-# FMPI_INC - any -I arguments required for compiling MPI/Fortran
-# FLINK - Fortran linker
-# FLINKFLAGS - Fortran linker arguments
-# FMPI_LIB - any -L and -l arguments required for linking MPI/Fortran
-#
-# compilations are done with $(MPIF77) $(FMPI_INC) $(FFLAGS) or
-# $(MPIF77) $(FFLAGS)
-# linking is done with $(FLINK) $(FMPI_LIB) $(FLINKFLAGS)
-#---------------------------------------------------------------------------
-
-#---------------------------------------------------------------------------
-# This is the fortran compiler used for MPI programs
-#---------------------------------------------------------------------------
-MPIF77 = mpif77
-# This links MPI fortran programs; usually the same as ${MPIF77}
-FLINK = $(MPIF77)
-
-#---------------------------------------------------------------------------
-# These macros are passed to the linker to help link with MPI correctly
-#---------------------------------------------------------------------------
-FMPI_LIB =
-
-#---------------------------------------------------------------------------
-# These macros are passed to the compiler to help find 'mpif.h'
-#---------------------------------------------------------------------------
-FMPI_INC =
-
-#---------------------------------------------------------------------------
-# Global *compile time* flags for Fortran programs
-#---------------------------------------------------------------------------
-FFLAGS = -fast
-# FFLAGS = -g
-
-#---------------------------------------------------------------------------
-# Global *link time* flags. Flags for increasing maximum executable
-# size usually go here.
-#---------------------------------------------------------------------------
-FLINKFLAGS = -fast
-
-
-#---------------------------------------------------------------------------
-# Parallel C:
-#
-# For IS, which is in C, the following must be defined:
-#
-# MPICC - C compiler
-# CFLAGS - C compilation arguments
-# CMPI_INC - any -I arguments required for compiling MPI/C
-# CLINK - C linker
-# CLINKFLAGS - C linker flags
-# CMPI_LIB - any -L and -l arguments required for linking MPI/C
-#
-# compilations are done with $(MPICC) $(CMPI_INC) $(CFLAGS) or
-# $(MPICC) $(CFLAGS)
-# linking is done with $(CLINK) $(CMPI_LIB) $(CLINKFLAGS)
-#---------------------------------------------------------------------------
-
-#---------------------------------------------------------------------------
-# This is the C compiler used for MPI programs
-#---------------------------------------------------------------------------
-MPICC = mpicc
-# This links MPI C programs; usually the same as ${MPICC}
-CLINK = $(MPICC)
-
-#---------------------------------------------------------------------------
-# These macros are passed to the linker to help link with MPI correctly
-#---------------------------------------------------------------------------
-CMPI_LIB =
-
-#---------------------------------------------------------------------------
-# These macros are passed to the compiler to help find 'mpi.h'
-#---------------------------------------------------------------------------
-CMPI_INC =
-
-#---------------------------------------------------------------------------
-# Global *compile time* flags for C programs
-#---------------------------------------------------------------------------
-CFLAGS = -fast
-# CFLAGS = -g
-
-#---------------------------------------------------------------------------
-# Global *link time* flags. Flags for increasing maximum executable
-# size usually go here.
-#---------------------------------------------------------------------------
-CLINKFLAGS = -fast
-
-
-#---------------------------------------------------------------------------
-# MPI dummy library:
-#
-# Uncomment if you want to use the MPI dummy library supplied by NAS instead
-# of the true message-passing library. The include file redefines several of
-# the above macros. It also invokes make in subdirectory MPI_dummy. Make
-# sure that no spaces or tabs precede include.
-#---------------------------------------------------------------------------
-# include ../config/make.dummy
-
-
-#---------------------------------------------------------------------------
-# Utilities C:
-#
-# This is the C compiler used to compile C utilities. Flags required by
-# this compiler go here also; typically there are few flags required; hence
-# there are no separate macros provided for such flags.
-#---------------------------------------------------------------------------
-CC = cc -g
-
-
-#---------------------------------------------------------------------------
-# Destination of executables, relative to subdirs of the main directory. .
-#---------------------------------------------------------------------------
-BINDIR = ../bin
-
-
-#---------------------------------------------------------------------------
-# Some machines (e.g. Crays) have 128-bit DOUBLE PRECISION numbers, which
-# is twice the precision required for the NPB suite. A compiler flag
-# (e.g. -dp) can usually be used to change DOUBLE PRECISION variables to
-# 64 bits, but the MPI library may continue to send 128 bits. Short of
-# recompiling MPI, the solution is to use MPI_REAL to send these 64-bit
-# numbers, and MPI_COMPLEX to send their complex counterparts. Uncomment
-# the following line to enable this substitution.
-#
-# NOTE: IF THE I/O BENCHMARK IS BEING BUILT, WE USE CONVERTFLAG TO
-# SPECIFIY THE FORTRAN RECORD LENGTH UNIT. IT IS A SYSTEM-SPECIFIC
-# VALUE (USUALLY 1 OR 4). UNCOMMENT THE SECOND LINE AND SUBSTITUTE
-# THE CORRECT VALUE FOR "length".
-# IF BOTH 128-BIT DOUBLE PRECISION NUMBERS AND I/O ARE TO BE ENABLED,
-# UNCOMMENT THE THIRD LINE AND SUBSTITUTE THE CORRECT VALUE FOR
-# "length"
-#---------------------------------------------------------------------------
-# CONVERTFLAG = -DCONVERTDOUBLE
-CONVERTFLAG = -DFORTRAN_REC_SIZE=1
-# CONVERTFLAG = -DCONVERTDOUBLE -DFORTRAN_REC_SIZE=length
-
-
-#---------------------------------------------------------------------------
-# The variable RAND controls which random number generator
-# is used. It is described in detail in Doc/README.install.
-# Use "randi8" unless there is a reason to use another one.
-# Other allowed values are "randi8_safe", "randdp" and "randdpvec"
-#---------------------------------------------------------------------------
-RAND = randi8
-# The following is highly reliable but may be slow:
-# RAND = randdp
-
+++ /dev/null
-bt S 1
-bt S 4
-bt S 9
-bt S 16
-bt A 1
-bt A 4
-bt A 9
-bt A 16
-bt A 25
-bt A 36
-bt A 49
-bt A 64
-bt A 81
-bt A 100
-bt A 121
-bt B 1
-bt B 4
-bt B 9
-bt B 16
-bt B 25
-bt B 36
-bt B 49
-bt B 64
-bt B 81
-bt B 100
-bt B 121
-bt C 1
-bt C 4
-bt C 9
-bt C 16
-bt C 25
-bt C 36
-bt C 49
-bt C 64
-bt C 81
-bt C 100
-bt C 121
+++ /dev/null
-cg S 1
-cg S 2
-cg S 4
-cg S 8
-cg S 16
-cg A 1
-cg A 2
-cg A 4
-cg A 8
-cg A 16
-cg A 32
-cg A 64
-cg A 128
-cg B 1
-cg B 2
-cg B 4
-cg B 8
-cg B 16
-cg B 32
-cg B 64
-cg B 128
-cg C 1
-cg C 2
-cg C 4
-cg C 8
-cg C 16
-cg C 32
-cg C 64
-cg C 128
+++ /dev/null
-ft S 1
-ft S 2
-ft S 4
-ft S 8
-ft S 16
-ft A 1
-ft A 2
-ft A 4
-ft A 8
-ft A 16
-ft A 32
-ft A 64
-ft A 128
-ft B 1
-ft B 2
-ft B 4
-ft B 8
-ft B 16
-ft B 32
-ft B 64
-ft B 128
-ft C 1
-ft C 2
-ft C 4
-ft C 8
-ft C 16
-ft C 32
-ft C 64
-ft C 128
+++ /dev/null
-lu S 1
-lu S 2
-lu S 4
-lu S 8
-lu S 16
-lu A 1
-lu A 2
-lu A 4
-lu A 8
-lu A 16
-lu A 32
-lu A 64
-lu A 128
-lu B 1
-lu B 2
-lu B 4
-lu B 8
-lu B 16
-lu B 32
-lu B 64
-lu B 128
-lu C 1
-lu C 2
-lu C 4
-lu C 8
-lu C 16
-lu C 32
-lu C 64
-lu C 128
+++ /dev/null
-mg S 1
-mg S 2
-mg S 4
-mg S 8
-mg S 16
-mg A 1
-mg A 2
-mg A 4
-mg A 8
-mg A 16
-mg A 32
-mg A 64
-mg A 128
-mg B 1
-mg B 2
-mg B 4
-mg B 8
-mg B 16
-mg B 32
-mg B 64
-mg B 128
-mg C 1
-mg C 2
-mg C 4
-mg C 8
-mg C 16
-mg C 32
-mg C 64
-mg C 128
+++ /dev/null
-bt S 1
-cg S 1
-ep S 1
-ft S 1
-is S 1
-lu S 1
-mg S 1
-sp S 1
+++ /dev/null
-sp S 1
-sp S 4
-sp S 9
-sp S 16
-sp A 1
-sp A 4
-sp A 9
-sp A 16
-sp A 25
-sp A 36
-sp A 49
-sp A 64
-sp A 81
-sp A 100
-sp A 121
-sp B 1
-sp B 4
-sp B 9
-sp B 16
-sp B 25
-sp B 36
-sp B 49
-sp B 64
-sp B 81
-sp B 100
-sp B 121
-sp C 1
-sp C 4
-sp C 9
-sp C 16
-sp C 25
-sp C 36
-sp C 49
-sp C 64
-sp C 81
-sp C 100
-sp C 121
+++ /dev/null
-FMPI_LIB = -L../MPI_dummy -lmpi
-FMPI_INC = -I../MPI_dummy
-CMPI_LIB = -L../MPI_dummy -lmpi
-CMPI_INC = -I../MPI_dummy
-default:: ${PROGRAM} libmpi.a
-libmpi.a:
- cd ../MPI_dummy; $(MAKE) F77=$(MPIF77) CC=$(MPICC)
# Typing "make suite" in the main directory will build all the benchmarks
# specified in this file.
# Each line of this file contains a benchmark name, class, and number
-# of nodes. The name is one of "cg", "is", "ep", mg", "ft", "sp", "bt",
-# "lu", and "dt".
+# of nodes. The name is one of "is", "ep", and "dt"
# The class is one of "S", "W", "A", "B", "C", "D", and "E"
# (except that no classes C, D and E for DT, and no class E for IS).
# The number of nodes must be a legal number for a particular
# Comments start with "#" as the first character on a line.
# No blank lines.
# The following example builds 1 processor sample sizes of all benchmarks.
-ft S 1
-mg S 1
-sp S 1
-lu S 1
-bt S 1
is S 1
ep S 1
-cg S 1
dt S 1
##-------------------------------- DEFAULT APPLICATION --------------------------------------
-APPLICATIONTMP=$(echo ${PROC_ARGS}|cut -d' ' -f2)
+APPLICATIONTMP=$(echo ${PROC_ARGS}|cut -d' ' -f2 -s)
cat > ${APPLICATIONTMP} <<APPLICATIONHEAD
<?xml version='1.0'?>
p Test the replay with multiple instances
p first generate the deployment file
-$ ${srcdir:=.}/generate_multiple_deployment.sh -platform ${srcdir:=.}/../../platforms/small_platform_with_routers.xml -hostfile ${srcdir:=.}/../hostfile ${srcdir:=.}/description_file deployment.xml
+$ ${srcdir:=.}/generate_multiple_deployment.sh -platform ${srcdir:=.}/../../platforms/small_platform_with_routers.xml -hostfile ${srcdir:=.}/../hostfile ${srcdir:=.}/description_file ${srcdir:=.}/deployment.xml
! timeout 120
-$ ./replay_multiple description_file ${srcdir:=.}/../../platforms/small_platform_with_routers.xml deployment.xml --log=smpi.:info
+$ ./replay_multiple description_file ${srcdir:=.}/../../platforms/small_platform_with_routers.xml ${srcdir:=.}/deployment.xml --log=smpi.:info
> [0.000000] [msg_test/INFO] Initializing instance 1 of size 32
> [0.000000] [msg_test/INFO] Initializing instance 2 of size 32
> [0.000000] [smpi_kernel/INFO] You did not set the power of the host running the simulation. The timings will certainly not be accurate. Use the option "--cfg=smpi/running_power:<flops>" to set its value.Check http://simgrid.org/simgrid/latest/doc/options.html#options_smpi_bench for more information.
class Host;
}
namespace surf {
+ class Resource;
class Cpu;
class NetCard;
class As;
}
typedef simgrid::s4u::Host simgrid_Host;
+typedef simgrid::surf::As surf_As;
typedef simgrid::surf::Cpu surf_Cpu;
typedef simgrid::surf::NetCard surf_NetCard;
-typedef simgrid::surf::As surf_As;
typedef simgrid::surf::Link Link;
+typedef simgrid::surf::Resource surf_Resource;
typedef simgrid::trace_mgr::future_evt_set sg_future_evt_set;
#else
typedef struct simgrid_Host simgrid_Host;
+typedef struct surf_As surf_As;
typedef struct surf_Cpu surf_Cpu;
typedef struct surf_NetCard surf_NetCard;
-typedef struct surf_As surf_As;
+typedef struct surf_Resource surf_Resource;
typedef struct Link Link;
typedef struct future_evt_set sg_future_evt_set;
#endif
typedef simgrid_Host* sg_host_t;
+typedef surf_As *AS_t;
typedef surf_Cpu *surf_cpu_t;
typedef surf_NetCard *sg_netcard_t;
-typedef surf_As *AS_t;
+typedef surf_Resource *sg_resource_t;
typedef sg_future_evt_set *sg_future_evt_set_t;
// Types which are in fact dictelmt:
/**
* @addtogroup XBT_cunit
- * @brief Unit test mechanism (to test a set of functions)
+ * @brief Unit testing implementation (see @ref inside_tests_add_units)
*
* This module is mainly intended to allow the tests of SimGrid
* itself and may lack the level of genericity that you would expect
* feature of SimGrid (and this code is sufficient to cover our
* needs, actually, so why should we bother switching?)
*
- * Note that if you want to test a full binary (such as an example),
- * you want to use our integration testing mechanism, not our unit
- * testing one. Please refer to Section \ref
- * inside_cmake_addtest_integration
+ * Unit testing is not intended to write integration tests.
+ * Please refer to \ref inside_tests_add_integration for that instead.
*
- * Some more information on our unit testing is available in Section @ref inside_cmake_addtest_unit.
- *
- * All code intended to be executed as a unit test will be extracted
- * by a script (tools/sg_unit_extract.pl), and must thus be protected
- * between preprocessor definitions, as follows. Note that
- * SIMGRID_TEST string must appear on the endif line too for the
- * script to work, and that this script does not allow to have more
- * than one suite per file. For now, but patches are naturally
- * welcome.
- *
-@verbatim
-#ifdef SIMGRID_TEST
-
-<your code>
-
-#endif // SIMGRID_TEST
-@endverbatim
*
*
* @{
/** @brief Provide informations about the suite declared in this file
* @hideinitializer
*
- * Actually, this macro is not used by C, but by the script
- * extracting the test units, but that should be transparent for you.
+ * Actually, this macro is only used by the script extracting the test
+ * units, but that should be transparent for you.
*
* @param suite_name the short name of this suite, to be used in the --tests argument of testall afterward. Avoid spaces and any other strange chars
* @param suite_title instructive title that testall should display when your suite is run
__VA_ARGS__)
#define _xbt_test_assert_CHECK(cond, ...) \
do { if (!(cond)) xbt_test_fail(__VA_ARGS__); } while (0)
+/** @brief Report some details to help debugging when the test fails (shown only on failure)
+ * @hideinitializer */
#define xbt_test_log(...) _xbt_test_log(__FILE__, __LINE__, __VA_ARGS__)
/** @brief Declare that the lastly started test failed because of the provided exception */
public static void nativeInit() {
if (isNativeInited)
return;
-
+
+ if (System.getProperty("os.name").toLowerCase().startsWith("win"))
+ NativeLib.nativeInit("winpthread-1");
+
NativeLib.nativeInit("simgrid");
NativeLib.nativeInit("simgrid-java");
isNativeInited = true;
XBT_ERROR("Attribute 'speed' must be specified for host and must either be a string (in the correct format; check documentation) or a number.");
}
host.speed_peak = xbt_dynar_new(sizeof(double), NULL);
- xbt_dynar_push_as(host.speed_peak, double, get_cpu_speed(lua_tostring(L, -1)));
+ xbt_dynar_push_as(host.speed_peak, double, parse_cpu_speed(lua_tostring(L, -1)));
lua_pop(L, 1);
// get core
/* For the trace and trace:connect tag (store their content till the end of the parsing) */
XBT_PUBLIC_DATA(xbt_dict_t) traces_set_list;
-XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_host_avail;
-XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_power;
+XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_host_speed;
XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_link_avail;
-XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_bandwidth;
-XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_latency;
+XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_link_bw;
+XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_link_lat;
-
-XBT_PUBLIC(double) get_cpu_speed(const char *power);
+XBT_PUBLIC(double) parse_cpu_speed(const char *str_speed);
XBT_PUBLIC(xbt_dict_t) get_as_router_properties(const char* name);
-int surf_get_nthreads(void);
-void surf_set_nthreads(int nthreads);
-
/*
* Returns the initial path. On Windows the initial path is
* the current directory for the current process in the other
{
return remote_ptr<T>(address_ - n * sizeof(T));
}
- remote_ptr<T>& operator+=(std::uint64_t n) const
+ remote_ptr<T>& operator+=(std::uint64_t n)
{
address_ += n * sizeof(T);
return *this;
}
- remote_ptr<T>& operator-=(std::uint64_t n) const
+ remote_ptr<T>& operator-=(std::uint64_t n)
{
address_ -= n * sizeof(T);
return *this;
current_var1 = &stack1->local_variables[cursor];
current_var2 = &stack1->local_variables[cursor];
if (current_var1->name != current_var2->name
- || current_var1->subprogram != current_var1->subprogram
+ || current_var1->subprogram != current_var2->subprogram
|| current_var1->ip != current_var2->ip) {
// TODO, fix current_varX->subprogram->name to include name if DW_TAG_inlined_subprogram
XBT_VERB
mc_snapshot_stack_t stack1, stack2;
while (cursor < s1->stacks.size()) {
stack1 = &s1->stacks[cursor];
- stack2 = &s1->stacks[cursor];
+ stack2 = &s2->stacks[cursor];
if (stack1->process_index != stack2->process_index) {
diff_local = 1;
/* This program is free software; you can redistribute it and/or modify it
* under the terms of the license (GNU LGPL) which comes with this package. */
-#include "src/surf/surf_interface.hpp"
-#include "src/simdag/simdag_private.h"
#include "instr/instr_interface.h"
-#include "xbt/sysdep.h"
-#include "xbt/dynar.h"
-#include "surf/surf.h"
#include "simgrid/sg_config.h"
#include "simgrid/host.h"
-#include "xbt/ex.h"
+#include "src/simdag/simdag_private.h"
+#include "src/surf/surf_interface.hpp"
+
+#include "xbt/dynar.h"
#include "xbt/log.h"
-#include "xbt/str.h"
-#include "xbt/config.h"
-#include "surf/surfxml_parse.h"
+#include "xbt/sysdep.h"
#ifdef HAVE_JEDULE
#include "simgrid/jedule/jedule_sd_binding.h"
/** \brief set a configuration variable
*
- * Do --help on any simgrid binary to see the list of currently existing
- * configuration variables, and see Section @ref options.
+ * Do --help on any simgrid binary to see the list of currently existing configuration variables, and
+ * see Section @ref options.
*
* Example:
* SD_config("host/model","default");
void SD_application_reinit(void)
{
xbt_die("This function is not working since the C++ links and others. Please report the problem if you really need that function.");
-
-#ifdef HAVE_JEDULE
- jedule_sd_cleanup();
- jedule_sd_init();
-#endif
}
/**
#ifdef HAVE_JEDULE
jedule_setup_platform();
#endif
+ XBT_VERB("Starting simulation...");
+ surf_presolve(); /* Takes traces into account */
}
/**
SD_dependency_t dependency;
surf_action_t action;
unsigned int iter, depcnt;
- static int first_time = 1;
-
- if (first_time) {
- XBT_VERB("Starting simulation...");
-
- surf_presolve(); /* Takes traces into account */
- first_time = 0;
- }
XBT_VERB("Run simulation for %f seconds", how_long);
sd_global->watch_point_reached = 0;
while (elapsed_time >= 0.0 && (how_long < 0.0 || 0.00001 < (how_long -total_time)) &&
!sd_global->watch_point_reached) {
surf_model_t model = NULL;
- /* dumb variables */
-
XBT_DEBUG("Total time: %f", total_time);
task->surf_action = NULL;
/* the state has changed. Add it only if it's the first change */
- if (xbt_dynar_search_or_negative(sd_global->return_set, &task) < 0) {
+ if (!xbt_dynar_member(sd_global->return_set, &task)) {
xbt_dynar_push(sd_global->return_set, &task);
}
/* remove the dependencies after this task */
xbt_dynar_foreach(task->tasks_after, depcnt, dependency) {
dst = dependency->dst;
- if (dst->unsatisfied_dependencies > 0)
- dst->unsatisfied_dependencies--;
+ dst->unsatisfied_dependencies--;
if (dst->is_not_ready > 0)
dst->is_not_ready--;
XBT_DEBUG("Released a dependency on %s: %d remain(s). Became schedulable if %d=0",
- SD_task_get_name(dst), dst->unsatisfied_dependencies,
- dst->is_not_ready);
+ SD_task_get_name(dst), dst->unsatisfied_dependencies, dst->is_not_ready);
if (!(dst->unsatisfied_dependencies)) {
if (SD_task_get_state(dst) == SD_SCHEDULED)
if (!sd_global->watch_point_reached && how_long<0){
if (!xbt_dynar_is_empty(sd_global->initial_task_set)) {
- XBT_WARN("Simulation is finished but %zu tasks are still not done",
+ XBT_WARN("Simulation is finished but %lu tasks are still not done",
xbt_dynar_length(sd_global->initial_task_set));
static const char* state_names[] =
{ "SD_NOT_SCHEDULED", "SD_SCHEDULABLE", "SD_SCHEDULED", "SD_RUNNABLE", "SD_RUNNING", "SD_DONE","SD_FAILED" };
return sd_global->return_set;
}
-/**
- * \brief Returns the current clock
- *
- * \return the current clock, in second
- */
+/** @brief Returns the current clock, in seconds */
double SD_get_clock(void) {
return surf_get_clock();
}
/**
* \brief Destroys all SD internal data
*
- * This function should be called when the simulation is over. Don't forget
- * to destroy too.
+ * This function should be called when the simulation is over. Don't forget to destroy too.
*
* \see SD_init(), SD_task_destroy()
*/
void SD_exit(void)
{
TRACE_surf_resource_utilization_release();
+ TRACE_end();
- xbt_mallocator_free(sd_global->task_mallocator);
+#ifdef HAVE_JEDULE
+ jedule_sd_cleanup();
+ jedule_sd_exit();
+#endif
- XBT_DEBUG("Destroying the dynars ...");
+ xbt_mallocator_free(sd_global->task_mallocator);
xbt_dynar_free_container(&(sd_global->initial_task_set));
xbt_dynar_free_container(&(sd_global->executable_task_set));
xbt_dynar_free_container(&(sd_global->completed_task_set));
xbt_dynar_free_container(&(sd_global->return_set));
-
- TRACE_end();
-
xbt_free(sd_global);
sd_global = NULL;
-#ifdef HAVE_JEDULE
- jedule_sd_cleanup();
- jedule_sd_exit();
-#endif
-
- XBT_DEBUG("Exiting Surf...");
surf_exit();
}
#ifndef SIMDAG_PRIVATE_H
#define SIMDAG_PRIVATE_H
-#include "xbt/base.h"
#include "xbt/dynar.h"
#include "simgrid/simdag.h"
#include "surf/surf.h"
xbt_cfg_setdefault_int(_sg_cfg_set, "model-check/checkpoint", 0);
/* do stateful model-checking */
- xbt_cfg_register(&_sg_cfg_set, "model-check/sparse-checkpoint",
+ xbt_cfg_register(&_sg_cfg_set, "model-check/sparse_checkpoint",
"Use sparse per-page snapshots.",
xbt_cfgelm_boolean, 1, 1, _mc_cfg_cb_sparse_checkpoint, NULL);
- xbt_cfg_setdefault_boolean(_sg_cfg_set, "model-check/sparse-checkpoint", "no");
+ xbt_cfg_setdefault_boolean(_sg_cfg_set, "model-check/sparse_checkpoint", "no");
/* do stateful model-checking */
xbt_cfg_register(&_sg_cfg_set, "model-check/soft-dirty",
return;
called = 1;
- /* connect all traces relative to hosts */
- xbt_dict_foreach(trace_connect_list_host_avail, cursor, trace_name, elm) {
+ /* connect host speed traces */
+ xbt_dict_foreach(trace_connect_list_host_speed, cursor, trace_name, elm) {
tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
- CpuCas01 *host = static_cast<CpuCas01*>(sg_host_by_name(elm)->pimpl_cpu);
+ Cpu *cpu = sg_host_by_name(elm)->pimpl_cpu;
- xbt_assert(host, "Host %s undefined", elm);
+ xbt_assert(cpu, "Host %s undefined", elm);
xbt_assert(trace, "Trace %s undefined", trace_name);
- host->p_stateEvent = future_evt_set->add_trace(trace, 0.0, host);
- }
-
- xbt_dict_foreach(trace_connect_list_power, cursor, trace_name, elm) {
- tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
- CpuCas01 *host = static_cast<CpuCas01*>(sg_host_by_name(elm)->pimpl_cpu);
-
- xbt_assert(host, "Host %s undefined", elm);
- xbt_assert(trace, "Trace %s undefined", trace_name);
-
- host->p_speedEvent = future_evt_set->add_trace(trace, 0.0, host);
+ cpu->set_speed_trace(trace);
}
}
************/
class CpuCas01 : public Cpu {
- friend CpuCas01Model;
public:
CpuCas01(CpuCas01Model *model, simgrid::s4u::Host *host, xbt_dynar_t speedPeak,
int pstate, double speedScale, tmgr_trace_t speedTrace, int core,
protected:
void onSpeedChange() override;
-
-private:
-
- tmgr_trace_iterator_t p_stateEvent = nullptr;
- tmgr_trace_iterator_t p_speedEvent = nullptr;
};
/**********
return m_core;
}
+void Cpu::set_state_trace(tmgr_trace_t trace)
+{
+ xbt_assert(p_stateEvent==NULL,"Cannot set a second state trace to Host %s", m_host->name().c_str());
+
+ p_stateEvent = future_evt_set->add_trace(trace, 0.0, this);
+}
+void Cpu::set_speed_trace(tmgr_trace_t trace)
+{
+ xbt_assert(p_speedEvent==NULL,"Cannot set a second speed trace to Host %s", m_host->name().c_str());
+
+ p_speedEvent = future_evt_set->add_trace(trace, 0.0, this);
+}
+
+
/**********
* Action *
**********/
virtual void setPState(int pstate_index);
virtual int getPState();
- void addTraces(void);
simgrid::s4u::Host* getHost() { return m_host; }
public:
lmm_constraint_t *p_constraintCore=NULL;
void **p_constraintCoreId=NULL;
+public:
+ void set_state_trace(tmgr_trace_t trace); /*< setup the trace file with states events (ON or OFF) */
+ void set_speed_trace(tmgr_trace_t trace); /*< setup the trace file with availability events (peak speed changes due to external load) */
+protected:
+ tmgr_trace_iterator_t p_stateEvent = nullptr;
+ tmgr_trace_iterator_t p_speedEvent = nullptr;
};
/**********
}
}
-/*************
- * CallBacks *
- *************/
-
-static void cpu_ti_define_callbacks()
-{
- simgrid::surf::on_postparse.connect([]() {
- surf_cpu_model_pm->addTraces();
- });
-}
-
/*********
* Model *
*********/
xbt_assert(!surf_cpu_model_vm,"CPU model already initialized. This should not happen.");
surf_cpu_model_pm = new simgrid::surf::CpuTiModel();
+ xbt_dynar_push(all_existing_models, &surf_cpu_model_pm);
+
surf_cpu_model_vm = new simgrid::surf::CpuTiModel();
+ xbt_dynar_push(all_existing_models, &surf_cpu_model_vm);
- cpu_ti_define_callbacks();
- simgrid::surf::Model *model_pm = static_cast<simgrid::surf::Model*>(surf_cpu_model_pm);
- simgrid::surf::Model *model_vm = static_cast<simgrid::surf::Model*>(surf_cpu_model_vm);
- xbt_dynar_push(all_existing_models, &model_pm);
- xbt_dynar_push(all_existing_models, &model_vm);
+ simgrid::surf::on_postparse.connect([]() {
+ surf_cpu_model_pm->addTraces();
+ });
}
namespace simgrid {
called = 1;
/* connect all traces relative to hosts */
- xbt_dict_foreach(trace_connect_list_host_avail, cursor, trace_name, elm) {
- tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
- CpuTi *cpu = static_cast<CpuTi*>(sg_host_by_name(elm)->pimpl_cpu);
-
- xbt_assert(cpu, "Host %s undefined", elm);
- xbt_assert(trace, "Trace %s undefined", trace_name);
-
- if (cpu->p_stateEvent) {
- XBT_DEBUG("Trace already configured for this CPU(%s), ignoring it",
- elm);
- continue;
- }
- XBT_DEBUG("Add state trace: %s to CPU(%s)", trace_name, elm);
- cpu->p_stateEvent = future_evt_set->add_trace(trace, 0.0, cpu);
- }
-
- xbt_dict_foreach(trace_connect_list_power, cursor, trace_name, elm) {
+ xbt_dict_foreach(trace_connect_list_host_speed, cursor, trace_name, elm) {
tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
CpuTi *cpu = static_cast<CpuTi*>(sg_host_by_name(elm)->pimpl_cpu);
xbt_dynar_get_cpy(trace->s_list.event_list,
xbt_dynar_length(trace->s_list.event_list) - 1, &val);
if (val.delta == 0) {
- cpu->p_speedEvent =
- future_evt_set->add_trace(tmgr_empty_trace_new(), cpu->p_availTrace->m_lastTime, cpu);
+ cpu->set_speed_trace(tmgr_empty_trace_new());
}
}
}
void modified(bool modified);
CpuTiTgmr *p_availTrace; /*< Structure with data needed to integrate trace file */
- tmgr_trace_iterator_t p_stateEvent = NULL; /*< trace file with states events (ON or OFF) */
- tmgr_trace_iterator_t p_speedEvent = NULL; /*< trace file with availability events */
ActionTiList *p_actionSet; /*< set with all actions running on cpu */
double m_sumPriority; /*< the sum of actions' priority that are running on cpu */
double m_lastUpdate = 0; /*< last update of actions' remaining amount done */
int nb_used_host = 0; /* Only the hosts with something to compute (>0 flops) are counted) */
double latency = 0.0;
- xbt_dict_t ptask_parallel_task_link_set = xbt_dict_new_homogeneous(NULL);
this->p_netcardList->reserve(host_nb);
for (int i = 0; i<host_nb; i++)
/* Compute the number of affected resources... */
if(bytes_amount != NULL) {
+ xbt_dict_t ptask_parallel_task_link_set = xbt_dict_new_homogeneous(NULL);
+
for (int i = 0; i < host_nb; i++) {
for (int j = 0; j < host_nb; j++) {
xbt_dynar_t route=NULL;
}
}
}
- }
- nb_link = xbt_dict_length(ptask_parallel_task_link_set);
- xbt_dict_free(&ptask_parallel_task_link_set);
+ nb_link = xbt_dict_length(ptask_parallel_task_link_set);
+ xbt_dict_free(&ptask_parallel_task_link_set);
+ }
for (int i = 0; i < host_nb; i++)
if (flops_amount[i] > 0)
nb_used_host++;
- XBT_DEBUG("Creating a parallel task (%p) with %d cpus and %d links.",
- this, host_nb, nb_link);
+ XBT_DEBUG("Creating a parallel task (%p) with %d hosts and %d unique links.", this, host_nb, nb_link);
this->p_computationAmount = flops_amount;
this->p_communicationAmount = bytes_amount;
this->m_latency = latency;
this->m_rate = rate;
this->p_variable = lmm_variable_new(model->getMaxminSystem(), this, 1.0,
- (rate > 0 ? rate : -1.0),
- host_nb + nb_link);
+ (rate > 0 ? rate : -1.0),
+ host_nb + nb_link);
if (this->m_latency > 0)
lmm_update_variable_weight(model->getMaxminSystem(), this->getVariable(), 0.0);
for (int i = 0; i < host_nb; i++)
- lmm_expand(model->getMaxminSystem(),
- host_list[i]->pimpl_cpu->getConstraint(),
- this->getVariable(), flops_amount[i]);
+ lmm_expand(model->getMaxminSystem(), host_list[i]->pimpl_cpu->getConstraint(),
+ this->getVariable(), flops_amount[i]);
if(bytes_amount != NULL) {
for (int i = 0; i < host_nb; i++) {
host_list[1] = sg_host_by_name(dst->getName());
bytes_amount[1] = size;
- res = p_hostModel->executeParallelTask(2, host_list,
- flops_amount,
- bytes_amount, rate);
+ res = p_hostModel->executeParallelTask(2, host_list, flops_amount, bytes_amount, rate);
return res;
}
xbt_dict_cursor_t cursor = NULL;
char *trace_name, *elm;
- if (!trace_connect_list_host_avail)
+ if (!trace_connect_list_host_speed)
return;
/* Connect traces relative to cpu */
- xbt_dict_foreach(trace_connect_list_host_avail, cursor, trace_name, elm) {
- tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
- CpuL07 *host = static_cast<CpuL07*>(sg_host_by_name(elm)->pimpl_cpu);
-
- xbt_assert(host, "Host %s undefined", elm);
- xbt_assert(trace, "Trace %s undefined", trace_name);
-
- host->p_stateEvent = future_evt_set->add_trace(trace, 0.0, host);
- }
-
- xbt_dict_foreach(trace_connect_list_power, cursor, trace_name, elm) {
+ xbt_dict_foreach(trace_connect_list_host_speed, cursor, trace_name, elm) {
tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
- CpuL07 *host = static_cast<CpuL07*>(sg_host_by_name(elm)->pimpl_cpu);
+ Cpu *cpu = sg_host_by_name(elm)->pimpl_cpu;
- xbt_assert(host, "Host %s undefined", elm);
+ xbt_assert(cpu, "Host %s undefined", elm);
xbt_assert(trace, "Trace %s undefined", trace_name);
- host->p_speedEvent = future_evt_set->add_trace(trace, 0.0, host);
+ cpu->set_speed_trace(trace);
}
/* Connect traces relative to network */
link->p_stateEvent = future_evt_set->add_trace(trace, 0.0, link);
}
- xbt_dict_foreach(trace_connect_list_bandwidth, cursor, trace_name, elm) {
+ xbt_dict_foreach(trace_connect_list_link_bw, cursor, trace_name, elm) {
tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
LinkL07 *link = static_cast<LinkL07*>(Link::byName(elm));
link->p_bwEvent = future_evt_set->add_trace(trace, 0.0, link);
}
- xbt_dict_foreach(trace_connect_list_latency, cursor, trace_name, elm) {
+ xbt_dict_foreach(trace_connect_list_link_lat, cursor, trace_name, elm) {
tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
LinkL07 *link = static_cast<LinkL07*>(Link::byName(elm));
{
sg_host_t*host_list = xbt_new0(sg_host_t, 1);
double *flops_amount = xbt_new0(double, 1);
- double *bytes_amount = xbt_new0(double, 1);
host_list[0] = getHost();
flops_amount[0] = size;
return static_cast<CpuL07Model*>(getModel())->p_hostModel
- ->executeParallelTask( 1, host_list, flops_amount, bytes_amount, -1);
+ ->executeParallelTask( 1, host_list, flops_amount, NULL, -1);
}
Action *CpuL07::sleep(double duration)
return 0;
}
-void L07Action::suspend()
-{
- XBT_IN("(%p))", this);
- if (m_suspended != 2) {
- m_suspended = 1;
- lmm_update_variable_weight(getModel()->getMaxminSystem(), getVariable(), 0.0);
- }
- XBT_OUT();
-}
-
-void L07Action::resume()
-{
- XBT_IN("(%p)", this);
- if (m_suspended != 2) {
- lmm_update_variable_weight(getModel()->getMaxminSystem(), getVariable(), 1.0);
- m_suspended = 0;
- }
- XBT_OUT();
-}
-
-double L07Action::getRemains()
-{
- XBT_IN("(%p)", this);
- XBT_OUT();
- return m_remains;
-}
-
}
}
************/
class CpuL07 : public Cpu {
- friend void HostL07Model::addTraces();
- tmgr_trace_iterator_t p_stateEvent = nullptr;
- tmgr_trace_iterator_t p_speedEvent = nullptr;
public:
CpuL07(CpuL07Model *model, simgrid::s4u::Host *host, xbt_dynar_t speedPeakList, int pstate,
double power_scale, tmgr_trace_t power_trace,
void updateBound();
int unref() override;
- void suspend() override;
- void resume() override;
- double getRemains() override;
std::vector<NetCard*> * p_netcardList = new std::vector<NetCard*>();
double *p_computationAmount;
link->p_stateEvent = future_evt_set->add_trace(trace, 0.0, link);
}
- xbt_dict_foreach(trace_connect_list_bandwidth, cursor, trace_name, elm) {
+ xbt_dict_foreach(trace_connect_list_link_bw, cursor, trace_name, elm) {
tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
NetworkCm02Link *link = static_cast<NetworkCm02Link*>( Link::byName(elm) );
link->p_speed.event = future_evt_set->add_trace(trace, 0.0, link);
}
- xbt_dict_foreach(trace_connect_list_latency, cursor, trace_name, elm) {
+ xbt_dict_foreach(trace_connect_list_link_lat, cursor, trace_name, elm) {
tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
NetworkCm02Link *link = static_cast<NetworkCm02Link*>(Link::byName(elm));;
simgrid::xbt::signal<void(simgrid::surf::Link*)> Link::onCreation;
simgrid::xbt::signal<void(simgrid::surf::Link*)> Link::onDestruction;
-simgrid::xbt::signal<void(simgrid::surf::Link*, int, int)> Link::onStateChange; // signature: wasOn, currentlyOn
+simgrid::xbt::signal<void(simgrid::surf::Link*)> Link::onStateChange;
simgrid::xbt::signal<void(simgrid::surf::NetworkAction*, e_surf_action_state_t, e_surf_action_state_t)> networkActionStateChangedCallbacks;
simgrid::xbt::signal<void(simgrid::surf::NetworkAction*, simgrid::surf::NetCard *src, simgrid::surf::NetCard *dst, double size, double rate)> networkCommunicateCallbacks;
void Link::turnOn(){
if (isOff()) {
Resource::turnOn();
- onStateChange(this, 0, 1);
+ onStateChange(this);
}
}
void Link::turnOff(){
if (isOn()) {
Resource::turnOff();
- onStateChange(this, 1, 0);
+ onStateChange(this);
}
}
* Signature: void(Link*) */
static simgrid::xbt::signal<void(simgrid::surf::Link*)> onDestruction;
- /** @brief Callback signal fired when the state of a Link changes
- * Signature: `void(LinkAction *action, int previouslyOn, int currentlyOn)` */
- static simgrid::xbt::signal<void(simgrid::surf::Link*, int, int)> onStateChange;
+ /** @brief Callback signal fired when the state of a Link changes (when it is turned on or off)
+ * Signature: `void(Link*)` */
+ static simgrid::xbt::signal<void(simgrid::surf::Link*)> onStateChange;
/** @brief Get the bandwidth in bytes per second of current Link */
while ((next_event_date = future_evt_set->next_date()) != -1.0) {
if (next_event_date > NOW)
break;
- while ((event = future_evt_set->pop_leq(next_event_date,
- &value,
- (void **) &resource))) {
+
+ while ((event = future_evt_set->pop_leq(next_event_date, &value, &resource))) {
if (value >= 0){
resource->updateState(event, value, NOW);
}
surf_min = next_event_virt;
}
- XBT_DEBUG("Min for resources (remember that NS3 don't update that value) : %f", surf_min);
+ XBT_DEBUG("Min for resources (remember that NS3 don't update that value): %f", surf_min);
XBT_DEBUG("Looking for next trace event");
- do {
- XBT_DEBUG("Next TRACE event : %f", next_event_date);
-
+ while (1) { // Handle next occurring events until none remains
next_event_date = future_evt_set->next_date();
+ XBT_DEBUG("Next TRACE event: %f", next_event_date);
if(! surf_network_model->shareResourcesIsIdempotent()){ // NS3, I see you
- if(next_event_date!=-1.0 && surf_min!=-1.0) {
+ if (next_event_date!=-1.0 && surf_min!=-1.0) {
surf_min = MIN(next_event_date - NOW, surf_min);
} else{
surf_min = MAX(next_event_date - NOW, surf_min);
}
- XBT_DEBUG("Run for network at most %f", surf_min);
+ XBT_DEBUG("Run the NS3 network at most %fs", surf_min);
// run until min or next flow
model_next_action_end = surf_network_model->shareResources(surf_min);
break;
}
- if ((surf_min == -1.0) || (next_event_date > NOW + surf_min)) break;
+ if ((surf_min == -1.0) || (next_event_date > NOW + surf_min))
+ break; // next event occurs after the next resource change, bail out
XBT_DEBUG("Updating models (min = %g, NOW = %g, next_event_date = %g)", surf_min, NOW, next_event_date);
- while ((event = future_evt_set->pop_leq(next_event_date,
- &value,
- (void **) &resource))) {
+
+ while ((event = future_evt_set->pop_leq(next_event_date, &value, &resource))) {
if (resource->isUsed() || xbt_dict_get_or_null(watched_hosts_lib, resource->getName())) {
surf_min = next_event_date - NOW;
- XBT_DEBUG
- ("This event will modify model state. Next event set to %f",
- surf_min);
+ XBT_DEBUG("This event will modify model state. Next event set to %f", surf_min);
}
- /* update state of model_obj according to new value. Does not touch lmm.
+ /* update state of the corresponding resource to the new value. Does not touch lmm.
It will be modified if needed when updating actions */
- XBT_DEBUG("Calling update_resource_state for resource %s with min %f",
- resource->getName(), surf_min);
+ XBT_DEBUG("Calling update_resource_state for resource %s with min %f", resource->getName(), surf_min);
resource->updateState(event, value, next_event_date);
}
- } while (1);
+ }
/* FIXME: Moved this test to here to avoid stopping simulation if there are actions running on cpus and all cpus are with availability = 0.
* This may cause an infinite loop if one cpu has a trace with periodicity = 0 and the other a trace with periodicity > 0.
* The options are: all traces with same periodicity(0 or >0) or we need to change the way how the events are managed */
if (surf_min == -1.0) {
- XBT_DEBUG("No next event at all. Bail out now.");
+ XBT_DEBUG("No next event at all. Bail out now.");
return -1.0;
}
XBT_DEBUG("Duration set to %f", surf_min);
+ // Bump the time: jump into the future
NOW = NOW + surf_min;
- /* FIXME: model_list or model_list_invoke? revisit here later */
- /* sequential version */
+
+ // Inform the models of the date change
xbt_dynar_foreach(all_existing_models, iter, model) {
model->updateActionsState(NOW, surf_min);
}
/* For the trace and trace:connect tag (store their content till the end of the parsing) */
XBT_PUBLIC_DATA(xbt_dict_t) traces_set_list;
XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_host_avail;
-XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_power;
+XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_host_speed;
XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_link_avail;
-XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_bandwidth;
-XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_latency;
+XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_link_bw;
+XBT_PUBLIC_DATA(xbt_dict_t) trace_connect_list_link_lat;
/**********
* Action *
xbt_strdup(trace_connect->element), NULL);
break;
case SURF_TRACE_CONNECT_KIND_POWER:
- xbt_dict_set(trace_connect_list_power, trace_connect->trace,
+ xbt_dict_set(trace_connect_list_host_speed, trace_connect->trace,
xbt_strdup(trace_connect->element), NULL);
break;
case SURF_TRACE_CONNECT_KIND_LINK_AVAIL:
xbt_strdup(trace_connect->element), NULL);
break;
case SURF_TRACE_CONNECT_KIND_BANDWIDTH:
- xbt_dict_set(trace_connect_list_bandwidth,
+ xbt_dict_set(trace_connect_list_link_bw,
trace_connect->trace,
xbt_strdup(trace_connect->element), NULL);
break;
case SURF_TRACE_CONNECT_KIND_LATENCY:
- xbt_dict_set(trace_connect_list_latency, trace_connect->trace,
+ xbt_dict_set(trace_connect_list_link_lat, trace_connect->trace,
xbt_strdup(trace_connect->element), NULL);
break;
default:
XBT_DEBUG("Buffer: %s", buf);
host.speed_peak = xbt_dynar_new(sizeof(double), NULL);
if (strchr(buf, ',') == NULL){
- double speed = get_cpu_speed(A_surfxml_host_power);
+ double speed = parse_cpu_speed(A_surfxml_host_power);
xbt_dynar_push_as(host.speed_peak,double, speed);
}
else {
xbt_dynar_get_cpy(pstate_list, i, &speed_str);
xbt_str_trim(speed_str, NULL);
- speed = get_cpu_speed(speed_str);
+ speed = parse_cpu_speed(speed_str);
xbt_dynar_push_as(host.speed_peak, double, speed);
XBT_DEBUG("Speed value: %f", speed);
}
* With XML parser
*/
-double get_cpu_speed(const char *str_speed)
+double parse_cpu_speed(const char *str_speed)
{
double speed = 0.0;
const char *p, *q;
#include "xbt/dict.h"
#include "simgrid/platf.h"
#include "surf/surfxml_parse.h"
+#include "src/surf/cpu_interface.hpp"
#include "src/surf/surf_private.h"
#ifdef HAVE_LUA
+extern "C" {
#include "src/bindings/lua/simgrid_lua.h"
#include <lua.h> /* Always include this when calling Lua */
#include <lauxlib.h> /* Always include this when calling Lua */
#include <lualib.h> /* Prototype for luaL_openlibs(), */
+}
#endif
XBT_LOG_EXTERNAL_DEFAULT_CATEGORY(surf_parse);
XBT_IMPORT_NO_EXPORT(unsigned int) surfxml_buffer_stack_stack_ptr;
XBT_IMPORT_NO_EXPORT(unsigned int) surfxml_buffer_stack_stack[1024];
-void surfxml_bufferstack_push(int new)
+void surfxml_bufferstack_push(int new_one)
{
- if (!new)
+ if (!new_one)
old_buff = surfxml_bufferstack;
else {
xbt_dynar_push(surfxml_bufferstack_stack, &surfxml_bufferstack);
}
}
-void surfxml_bufferstack_pop(int new)
+void surfxml_bufferstack_pop(int new_one)
{
- if (!new)
+ if (!new_one)
surfxml_bufferstack = old_buff;
else {
free(surfxml_bufferstack);
xbt_dict_t traces_set_list = NULL;
xbt_dict_t trace_connect_list_host_avail = NULL;
-xbt_dict_t trace_connect_list_power = NULL;
+xbt_dict_t trace_connect_list_host_speed = NULL;
xbt_dict_t trace_connect_list_link_avail = NULL;
-xbt_dict_t trace_connect_list_bandwidth = NULL;
-xbt_dict_t trace_connect_list_latency = NULL;
+xbt_dict_t trace_connect_list_link_bw = NULL;
+xbt_dict_t trace_connect_list_link_lat = NULL;
/* ***************************************** */
traces_set_list = xbt_dict_new_homogeneous(NULL);
trace_connect_list_host_avail = xbt_dict_new_homogeneous(free);
- trace_connect_list_power = xbt_dict_new_homogeneous(free);
+ trace_connect_list_host_speed = xbt_dict_new_homogeneous(free);
trace_connect_list_link_avail = xbt_dict_new_homogeneous(free);
- trace_connect_list_bandwidth = xbt_dict_new_homogeneous(free);
- trace_connect_list_latency = xbt_dict_new_homogeneous(free);
+ trace_connect_list_link_bw = xbt_dict_new_homogeneous(free);
+ trace_connect_list_link_lat = xbt_dict_new_homogeneous(free);
/* Init my data */
if (!surfxml_bufferstack_stack)
/* Do the actual parsing */
parse_status = surf_parse();
+ /* connect all traces relative to hosts */
+ xbt_dict_cursor_t cursor = NULL;
+ char *trace_name, *elm;
+
+ xbt_dict_foreach(trace_connect_list_host_avail, cursor, trace_name, elm) {
+ tmgr_trace_t trace = (tmgr_trace_t) xbt_dict_get_or_null(traces_set_list, trace_name);
+ xbt_assert(trace, "Trace %s undefined", trace_name);
+
+ simgrid::s4u::Host *host = sg_host_by_name(elm);
+ xbt_assert(host, "Host %s undefined", elm);
+ simgrid::surf::Cpu *cpu = host->pimpl_cpu;
+
+ cpu->set_state_trace(trace);
+ }
+
/* Free my data */
xbt_dict_free(&trace_connect_list_host_avail);
- xbt_dict_free(&trace_connect_list_power);
+ xbt_dict_free(&trace_connect_list_host_speed);
xbt_dict_free(&trace_connect_list_link_avail);
- xbt_dict_free(&trace_connect_list_bandwidth);
- xbt_dict_free(&trace_connect_list_latency);
+ xbt_dict_free(&trace_connect_list_link_bw);
+ xbt_dict_free(&trace_connect_list_link_lat);
xbt_dict_free(&traces_set_list);
xbt_dict_free(&random_data_list);
xbt_dynar_free(&surfxml_bufferstack_stack);
free(trace);
}
-/** Register a new trace into the future event set, and get an iterator over the integrated trace */
+/** @brief Registers a new trace into the future event set, and get an iterator over the integrated trace */
tmgr_trace_iterator_t simgrid::trace_mgr::future_evt_set::add_trace(
- tmgr_trace_t trace,
- double start_time,
- void *resource)
+ tmgr_trace_t trace, double start_time, surf::Resource *resource)
{
tmgr_trace_iterator_t trace_iterator = NULL;
return trace_iterator;
}
-double simgrid::trace_mgr::future_evt_set::next_date()
+/** @brief returns the date of the next occurring event (pure function) */
+double simgrid::trace_mgr::future_evt_set::next_date() const
{
if (xbt_heap_size(p_heap))
return (xbt_heap_maxkey(p_heap));
return -1.0;
}
+/** @brief Retrieves the next occurring event, or NULL if none happens before #date */
tmgr_trace_iterator_t simgrid::trace_mgr::future_evt_set::pop_leq(
- double date,
- double *value,
- void** resource)
+ double date, double *value, simgrid::surf::Resource **resource)
{
double event_date = next_date();
- tmgr_trace_iterator_t trace_iterator = NULL;
- tmgr_event_t event = NULL;
- tmgr_trace_t trace = NULL;
- double event_delta;
-
if (event_date > date)
return NULL;
- if (!(trace_iterator = (tmgr_trace_iterator_t)xbt_heap_pop(p_heap)))
+ tmgr_trace_iterator_t trace_iterator = (tmgr_trace_iterator_t)xbt_heap_pop(p_heap);
+ if (trace_iterator == NULL)
return NULL;
- trace = trace_iterator->trace;
+ tmgr_trace_t trace = trace_iterator->trace;
*resource = trace_iterator->resource;
- switch(trace->type) {
- case e_trace_list:
+ if (trace->type == e_trace_list) {
- event = (tmgr_event_t)xbt_dynar_get_ptr(trace->s_list.event_list, trace_iterator->idx);
+ tmgr_event_t event = (tmgr_event_t)xbt_dynar_get_ptr(trace->s_list.event_list, trace_iterator->idx);
*value = event->value;
} else { /* We don't need this trace_event anymore */
trace_iterator->free_me = 1;
}
- break;
-
- case e_trace_probabilist:
- //FIXME : not tested yet
+ } else if (trace->type == e_trace_probabilist) { //FIXME : not tested yet
+ double event_delta;
if(trace->s_probabilist.is_state_trace) {
*value = (double) trace->s_probabilist.next_event;
if(trace->s_probabilist.next_event == 0) {
xbt_heap_push(p_heap, trace_iterator, event_date + event_delta);
XBT_DEBUG("Generating a new event at date %f, with value %f", event_date + event_delta, *value);
- break;
- }
+ } else
+ THROW_IMPOSSIBLE;
return trace_iterator;
}
typedef struct tmgr_trace_iterator {
tmgr_trace_t trace;
unsigned int idx;
- void *resource;
+ sg_resource_t resource;
int free_me;
} s_tmgr_trace_event_t;
public:
future_evt_set();
virtual ~future_evt_set();
- double next_date();
- tmgr_trace_iterator_t pop_leq(double date, double *value, void** resource);
- tmgr_trace_iterator_t add_trace(
- tmgr_trace_t trace,
- double start_time,
- void *model);
+ double next_date() const;
+ tmgr_trace_iterator_t pop_leq(double date, double *value, simgrid::surf::Resource** resource);
+ tmgr_trace_iterator_t add_trace(tmgr_trace_t trace, double start_time, simgrid::surf::Resource *resource);
private:
// TODO: use a boost type for the heap (or a ladder queue)
#bsendalign 2
#bsendpending 2
isendself 1
-issendselfcancel 1
+#issendselfcancel 1
#needs MPI_Buffer_attach, MPI_Bsend, MPI_Buffer_detach
#bsendfrag 2
#needs MPI_Intercomm_create
for(j=0; j<2;j++ )
for(i=0; i<3;i++ )
- printf("%d ", tab[j][i]);
+ printf("%d ", tab[j][i]);
- printf("\n");
+ printf("\n");
/* Clean up the type */
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
+#include "src/surf/network_cm02.hpp"
#include "src/surf/trace_mgr.hpp"
XBT_LOG_NEW_DEFAULT_CATEGORY(surf_test,
"Messages specific for surf example");
+class DummyTestResource
+ : public simgrid::surf::Resource {
+public:
+ DummyTestResource(const char *name) : Resource(nullptr,name) {}
+ bool isUsed() override {return false;}
+ void updateState(tmgr_trace_iterator_t it, double date, double value) override {}
+};
+
static void test(void)
{
simgrid::trace_mgr::future_evt_set *fes = new simgrid::trace_mgr::future_evt_set();
tmgr_trace_t trace_B = tmgr_trace_new_from_file("trace_B.txt");
double next_event_date = -1.0;
double value = -1.0;
- char *resource = NULL;
- char *host_A = strdup("Host A");
- char *host_B = strdup("Host B");
+ simgrid::surf::Resource *resource = NULL;
+ simgrid::surf::Resource *hostA = new DummyTestResource("Host A");
+ simgrid::surf::Resource *hostB = new DummyTestResource("Host B");
- fes->add_trace(trace_A, 1.0, host_A);
- fes->add_trace(trace_B, 0.0, host_B);
+ fes->add_trace(trace_A, 1.0, hostA);
+ fes->add_trace(trace_B, 0.0, hostB);
while ((next_event_date = fes->next_date()) != -1.0) {
XBT_DEBUG("%g" " : \n", next_event_date);
- while (fes->pop_leq(next_event_date, &value, (void **) &resource)) {
- XBT_DEBUG("\t %s : " "%g" "\n", resource, value);
+ while (fes->pop_leq(next_event_date, &value, &resource)) {
+ XBT_DEBUG("\t %s : " "%g" "\n", resource->getName(), value);
}
if (next_event_date > 1000)
break;
}
delete fes;
- free(host_B);
- free(host_A);
+ delete hostA;
+ delete hostB;
}
int main(int argc, char **argv)
src/xbt/probes.h
src/xbt/win32_ucontext.c
tools/tesh/generate_tesh
+ tools/lualib.patch
examples/smpi/mc/only_send_deterministic.tesh
examples/smpi/mc/non_deterministic.tesh
)
src/surf/surf_routing_none.cpp
src/surf/surf_routing_vivaldi.cpp
src/surf/surfxml_parse.c
- src/surf/surfxml_parseplatf.c
+ src/surf/surfxml_parseplatf.cpp
src/surf/trace_mgr.hpp
src/surf/trace_mgr.cpp
src/surf/vm_hl13.cpp
doc/doxygen/header.html
doc/doxygen/help.doc
doc/doxygen/index.doc
- doc/doxygen/inside_ci.doc
+ doc/doxygen/inside.doc
+ doc/doxygen/inside_tests.doc
doc/doxygen/inside_cmake.doc
doc/doxygen/inside_doxygen.doc
doc/doxygen/inside_extending.doc
doc/doxygen/inside_release.doc
doc/doxygen/install.doc
- doc/doxygen/internals.doc
- doc/doxygen/introduction.doc
+ doc/doxygen/tutorial.doc
doc/doxygen/module-msg.doc
doc/doxygen/module-sd.doc
doc/doxygen/module-simix.doc
${CMAKE_HOME_DIRECTORY}/doc/webcruft/simgrid_logo_2011_small.png
${CMAKE_HOME_DIRECTORY}/doc/webcruft/simgrid_logo_win.bmp
${CMAKE_HOME_DIRECTORY}/doc/webcruft/simgrid_logo_win_2011.bmp
- ${CMAKE_HOME_DIRECTORY}/doc/webcruft/win_install_01.png
- ${CMAKE_HOME_DIRECTORY}/doc/webcruft/win_install_02.png
- ${CMAKE_HOME_DIRECTORY}/doc/webcruft/win_install_03.png
- ${CMAKE_HOME_DIRECTORY}/doc/webcruft/win_install_04.png
- ${CMAKE_HOME_DIRECTORY}/doc/webcruft/win_install_05.png
- ${CMAKE_HOME_DIRECTORY}/doc/webcruft/win_install_06.png
${CMAKE_HOME_DIRECTORY}/doc/webcruft/smpi_simgrid_alltoall_pair_16.png
${CMAKE_HOME_DIRECTORY}/doc/webcruft/smpi_simgrid_alltoall_ring_16.png
)
#COMMAND ${STRIP_COMMAND} ${JAVA_NATIVE_PATH}/${LIBSIMGRID_JAVA_SO} || true
COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_BINARY_DIR}/lib/${LIBSIMGRID_SO} ${JAVA_NATIVE_PATH}
- COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_BINARY_DIR}/lib/${LIBSIMGRID_JAVA_SO} ${JAVA_NATIVE_PATH}
+ COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_BINARY_DIR}/lib/${LIBSIMGRID_JAVA_SO} ${JAVA_NATIVE_PATH}
+ # There is no way to disable the dependency of mingw-64 on that lib, unfortunately
+ # nor to script cmake -E properly, so let's be brutal
+ COMMAND ${CMAKE_COMMAND} -E copy C:/mingw64/bin/libwinpthread-1.dll ${JAVA_NATIVE_PATH} || true
COMMAND ${JAVA_ARCHIVE} -uvf ${SIMGRID_JAR} ${JAVA_NATIVE_PATH}
COMMAND ${CMAKE_COMMAND} -E remove_directory ${JAVA_NATIVE_PATH}
COMMAND ${CMAKE_COMMAND} -E echo "-- Cmake put the native code in ${JAVA_NATIVE_PATH}"
COMMAND "${Java_JAVA_EXECUTABLE}" -classpath "${SIMGRID_JAR}" org.simgrid.NativeLib
- )
+ )
endif(enable_lib_in_jar)
### Check the node installation
-for pkg in xsltproc valgrind
+for pkg in xsltproc valgrind gcovr
do
if dpkg -l |grep -q $pkg
then
fi
done
-if [ -e /usr/local/gcovr-3.1/scripts/gcovr ]
-then
- echo "gcovr is installed, good."
-else
- die "Please install /usr/local/gcovr-3.1/scripts/gcovr"
-fi
-
### Cleanup previous runs
! [ -z "$WORKSPACE" ] || die "No WORKSPACE"
ctest -D ExperimentalCoverage || true
if [ -f Testing/TAG ] ; then
- /usr/local/gcovr-3.1/scripts/gcovr -r .. --xml-pretty -e teshsuite.* -u -o $WORKSPACE/xml_coverage.xml
+ gcovr -r .. --xml-pretty -e teshsuite.* -u -o $WORKSPACE/xml_coverage.xml
xsltproc $WORKSPACE/tools/jenkins/ctest2junit.xsl Testing/`head -n 1 < Testing/TAG`/Test.xml > CTestResults_memcheck.xml
mv CTestResults_memcheck.xml $WORKSPACE
fi
+This patch is to be applied to the Lua 5.3 source file to get a shared
+library. This is because the authors of Lua don't bother distributing
+a working build system for their software, so we have to make one...
+
+As unfortunate as it may be, there is nothing else we can do.
+
+ -- Da SimGrid team.
+
diff --git a/Makefile b/Makefile
index 5ee5601..93830a3 100644
--- a/Makefile
#! /usr/bin/env perl
-# Copyright (c) 2005-2012, 2014. The SimGrid Team.
-# All rights reserved.
+# Copyright (c) 2005-2012, 2014. The SimGrid Team. All rights reserved.
# This program is free software; you can redistribute it and/or modify it
# under the terms of the license (GNU LGPL) which comes with this package.
+
+
use strict;
use strict;
sub usage($) {
my $ret;
- print "USAGE: $progname [--root=part/to/cut] [--path=where/to/search NOT WORKING] [--outdir=where/to/generate/files] infile [infile+]\n";
+ print "USAGE: $progname [--root=part/to/cut] [--outdir=where/to/generate/files] infile [infile+]\n\n";
+ print "This program is in charge of extracting the unit tests out of the SimGrid source code.\n";
+ print "See http://simgrid.gforge.inria.fr/doc/latest/inside_tests.html for more details.\n";
exit $ret;
}
-my $path=undef;
my $outdir=undef;
my $root;
my $help;
GetOptions(
'help|h' => sub {usage(0)},
'root=s' =>\$root,
- 'path=s' =>\$path,
'outdir=s' =>\$outdir) or usage(1);
usage(1) if (scalar @ARGV == 0);