README
======

This is a small example of an MPI+OMP application that can run with
DLB (LeWI) and OMPT support.

The application has not been modified. DLB acts as an OpenMP Tool and
can receive callbacks from the OpenMP runtime on some defined events that
allow the DLB library to act transparently to the user.

Compilation
-----------
Check that the Makefile paths are correct and run `make`.

Also, check that the example is linked to a OMPT compatible OpenMP runtime
library. If you doubt, run first the 'OMPT' example.

Execution
---------
* Edit the script 'run.sh'
    * Check that the variables detected at configure time are correct
    * Review the variables to be modified by the user and change them as you like
    * Make sure that either the variable MPIEXEC_BIND_FLAG is set or that
      the mpiexec command binds processes by default. This example must run
      with all the MPI processes pinned to a non-overlapping subset of CPUs.
      See Annex for more info.

* Run the script 'run.sh'
* Review the results

You can change the values of DLB and TRACE variables to modify the script behaviour.
By default both options are set to 0, which means that the example program will run
without DLB nor tracing capabilities. Setting DLB to 1 should enable DLB to perform
some basic balancing techniques. And setting TRACE to 1 will enable the tracing
through the EXTRAE tracing library. The TRACE options assumes that you have EXTRAE
correctly installed, and your environment variable EXTRAE_HOME points to that path.
This option will generate a tracefile.prv which can be opened using the PARAVER
application.

By default, DLB will track all the process CPUs that are not being used to
execute useful code (user code, tasks, etc.) and will use them instead to
balance the workload of other processes. Blocking MPI calls are not considered
useful computation, so those CPUs waiting for MPI messages will also be used by
DLB. To keep those CPUs for the current process set the script variable
LEWI_KEEP_ONE_CPU to 1. This variable will enable the option
`--lewi-keep-one-cpu`.

References
----------
If you don't know about EXTRAE and PARAVER and want to test the tracing capabilities
of the example program you can download the sources and pre-built binaries at:
* https://tools.bsc.es/downloads


Annex: check MPI binding support
--------------------------------
DLB is able to manage the thread affinity on OpenMP threads when running with
OMPT support. For this reason, in this example all processes must start the execution
with a disjoint set of CPUs.

If your are not sure how to run MPI with binding enabled try the following commands:

$ mpirun -n 2 dlb --affinity                    # check default behaviour
$ mpirun -n 2 --bind-to core dlb --affinity     # OpenMPI specific flag
$ mpirun -n 2 -genv I_MPI_PIN 1 \
            -genv I_MPI_PIN_CELL core \
            -genv I_MPI_PIN_DOMAIN auto \
            dlb --affinity                      # Intel MPI pinning variables

And figure which flag gives you an output where both processes do not share any
CPU. Then, put that flag in the MPIEXEC_BIND_FLAG variable.

If none of the above works for you, check the documentation of the MPI implementation
that you are using.
