README
======

This is a small example of an MPI+OmpSs application that can run with
with DLB and the LeWI module.

The application has not been modified. Nanos++, the OmpSs runtime, has
native support with DLB and will invoke all DLB functions when needed.

Compilation
-----------
Check that the Makefile paths are correct and run `make`. The Makefile assumes you
have the Mercurium binaries directory in your PATH.

Execution
---------
* Edit the script 'run.sh'
    * Check that the variables detected at configure time are correct
    * Review the variables to be modified by the user and change them as you like
    * Make sure that either the variable MPIEXEC_BIND_FLAG is set or that
      the mpiexec command binds processes by default. This example must run
      with all the MPI processes pinned to a non-overlapping subset of CPUs.
      See Annex for more info.

* Run the script 'run.sh'
* Review the results

You can change the values of DLB and TRACE variables to modify the script behaviour.
By default both options are set to 0, which means that the example program will run
without DLB nor tracing capabilities. Setting DLB to 1 should enable DLB to perform
some basic balancing techniques. And setting TRACE to 1 will enable the tracing
through the EXTRAE tracing library. The TRACE options assumes that you have EXTRAE
correctly installed, and your environment variable EXTRAE_HOME points to that path.
This option will generate a tracefile.prv which can be opened using the PARAVER
application.

By default, DLB will track all the process CPUs that are not being used to
execute useful code (user code, tasks, etc.) and will use them instead to
balance the workload of other processes. Blocking MPI calls are not considered
useful computation, so those CPUs waiting for MPI messages will also be used by
DLB. To keep those CPUs for the current process set the script variable
LEWI_KEEP_ONE_CPU to 1. This variable will enable the option
`--lewi-keep-one-cpu`.


References
----------
If you don't know about EXTRAE and PARAVER and want to test the tracing capabilities
of the example program you can download the sources and pre-built binaries at:
* https://tools.bsc.es/downloads


Annex: check MPI binding support
--------------------------------
DLB will manage exactly which CPUs must be used for each process running with OmpSs.
For this reason, in this example all processes must start the executionwith a disjoint
set of CPUs.

If your are not sure how to run MPI with binding enabled try the following commands:

$ mpirun -n 2 dlb --affinity                    # check default behaviour
$ mpirun -n 2 --bind-to core dlb --affinity     # OpenMPI specific flag
$ mpirun -n 2 -genv I_MPI_PIN 1 \
            -genv I_MPI_PIN_CELL core \
            -genv I_MPI_PIN_DOMAIN auto \
            dlb --affinity                      # Intel MPI pinning variables

And figure which flag gives you an output where both processes do not share any
CPU. Then, put that flag in the MPIEXEC_BIND_FLAG variable.

If none of the above works for you, check the documentation of the MPI implementation
that you are using.
