Add simple autotuning selector for collectives
For now, it loops over existing ones and benches the time for each process and the maximum time.
It outputs the shortest collective found for each process, and also the global shortest.
A rollback feature should be added to allow correct continuation and simulation.
This is still experimental, tests are not generated, it can be called with --cfg=smpi/collname:automatic
For now we don't check for input values, so some algorithms will fail (because they need power of 2 or even number of processes, mainly). Checks should be added