Developer documentation
Version 3.0.3-105-gd3941f44
Thread-safe image looping

These functions allows arbitrary looping operations to be performed in parallel, using a versatile multi-threading framework. It works hand-in-hand with the single-threaded looping functions, and can be used to code up complex operations with relatively little effort.

The ThreadedLoop() function is generally used by first initialising an object that will determine the order of traversal, which axes will be looped over between synchronisation calls, and what message to display in the progress bar if one is needed. This is all performed in the various overloaded versions of ThreadedLoop(), which are template functions allowing the class to be initialised from any HeaderType class. These are described in more detail below.

The object returned by the ThreadedLoop() functions provide methods to loop over ImageType classes. To do this, the object returned by ThreadedLoop() keeps track of which axes will be managed by each thread independently (the inner axes), and which will be coordinated across threads (the outer axes). By default, the inner axes consist of a single axis, chosen to be the axis of smallest stride in the source HeaderType class provided at initialisation. The remaining axes are coordinated across threads: each invocation of the thread's functor is given a fresh position to operate from, in the form of an Iterator class.

To help illustrate the concept, consider an example whereby the inner axes have been set to the x & y axes (i.e. axes 0 & 1), and the outer axes have been set to the z and volume axes (i.e. axes 2 & 3). Each thread will do the following:

  1. lock the loop mutex, obtain a new set of z & volume coordinates, then release the mutex so other threads can obtain their own unique set of coordinates to process;
  2. set the position of all ImageType classes to be processed according to these coordinates;
  3. iterate over the x & y axes, invoking the user-supplied functor each time;
  4. repeat from step 1 until all the data have been processed.

Instantiating a ThreadedLoop() object

The ThreadedLoop() functions can be used to set up any reasonable multi-threaded looping structure. The various relevant aspects that can be influenced include:

The various overloaded functions allow these settings to be specified in different ways. A HeaderType class is always required; it is generally recommended to provide the (or one of the) input ImageType class that is to be processed, since its order of traversal will have the most influence on performance (by making the most efficient use of the hardware's RAM access and CPU cache). It is also possible to supply a vector<size_t> directly if required.

These axes can be restricted to a specific range to allow volume-wise processing, etc. The inner axes can be specified by supplying how many are needed; they will then be taken from the list of axes to be looped over, in order of increasing stride. It is also possible to provide the list of inners axes and outer axes as separate vector<size_t> arguments.

Running the ThreadedLoop() objects

The run() methods will run the ThreadedLoop, invoking the specified function or functor once per voxel in the loop. The different versions will expect different signatures for the function or functor, and manage looping over different numbers of ImageType classes. This should be clarified by examining the examples below.

This example can be used to compute the exponential of the voxel values in-place, while displaying a progress message:

void my_function (MyImageType& vox) {
vox.value() = std::exp (vox.value());
}
...
MyImageType vox;
ThreadedLoop ("computing exponential in-place", vox)
.run (my_function, vox);
ThreadedLoopRunOuter< decltype(Loop(vector< size_t >()))> ThreadedLoop(const HeaderType &source, const vector< size_t > &outer_axes, const vector< size_t > &inner_axes)
Multi-threaded loop object.

To make this operation more easily generalisable to any ImageType, use a functor with a template operator() method:

class MyFunction {
public:
template <class ImageType>
void operator() (ImageType& vox) {
vox.value() = std::exp (vox.value());
}
};
...
AnyImageType vox;
ThreadedLoop ("computing exponential in-place", vox)
.run (MyFunction(), vox);

Note that a simple operation such as the previous example can be written more compactly using C++11 lambda expressions:

MyImageType vox;
ThreadedLoop ("computing exponential in-place", vox)
.run ([](decltype(vox)& v) { v.value() = std::exp(v.value()); }, vox);

As a further example, the following snippet performs the addition of any ImageTypes vox1 and vox2, this time storing the results in vox_out, with no progress display, and looping according to the strides in vox1:

struct MyAdd {
template <class ImageType1, class ImageType2, class ImageType3>
void operator() (ImageType1& out, const ImageType2& in1, const ImageType3& in2) {
out.value() = in1.value() + in2.value();
}
};
...
ThreadedLoop (vox1).run (MyAdd(), vox_out, vox1, vox2);

Note that the ImageType arguments to run() must be provided in the same order as expected by the Functor's operator() method.

Again, such a simple operation can be written more compactly using C++11 lambda expressions:

auto f = [](decltype(vox_out)& out, decltype(vox1)& in1, decltype(vox2)& in2) {
out.value() = in1.value() + in2.value();
}
ThreadedLoop (vox1).run (f, vox_out, vox1, vox2);

This example uses a functor to computes the root-mean-square of vox:

class RMS {
public:
// We pass a reference to the same double to all threads.
// Each thread accumulates its own sum-of-squares, and
// updates the overal sum-of-squares in the destructor, which is
// guaranteed to be invoked after all threads have re-joined,
// thereby avoiding race conditions.
RMS (double& grand_SoS) : SoS (0.0), grand_SoS (grand_SoS) { }
~RMS () { grand_SoS += SoS; }
// accumulate the thread-local sum-of-squares:
template <class ImageType>
void operator() (const ImageType& in) {
SoS += Math::pow2 (in.value());
}
protected:
double SoS;
double& grand_SoS;
};
...
double SoS = 0.0;
ThreadedLoop ("computing RMS of \"" + vox.name() + "\"", vox)
.run (RMS(SoS), vox);
double rms = std::sqrt (SoS / voxel_count (vox));
constexpr T pow2(const T &v)
Definition: math.h:53

The run_outer() method

The run_outer() method can be used if needed to loop over the indices in the outer loop only. This method is used internally by the run() methods, and may prove useful in selected cases if each thread needs to handle its own looping over the inner axes. Usage is essentially identical to the run() method, and the function or functor provided will need to have a void operator() (const Iterator& pos) method defined. Note that in this case, any ImageType classes to be looped over will need to be stored as members to the functor, since it will only receive an Iterator for each invocation - the functor will need to then implement looping over the inner axes from the position provided in the Iterator.

See also
Loop
Thread::run()
Thread-safe queue