Thread-safe image looping

These functions allows arbitrary looping operations to be performed in parallel, using a versatile multi-threading framework. It works hand-in-hand with the single-threaded looping functions, and can be used to code up complex operations with relatively little effort.

The ThreadedLoop() function is generally used by first initialising an object that will determine the order of traversal, which axes will be looped over between synchronisation calls, and what message to display in the progress bar if one is needed. This is all performed in the various overloaded versions of ThreadedLoop(), which are template functions allowing the class to be initialised from any `HeaderType`

class. These are described in more detail below.

The object returned by the ThreadedLoop() functions provide methods to loop over ImageType classes. To do this, the object returned by ThreadedLoop() keeps track of which axes will be managed by each thread independently (the *inner axes*), and which will be coordinated across threads (the *outer axes*). By default, the inner axes consist of a single axis, chosen to be the axis of smallest stride in the *source* `HeaderType`

class provided at initialisation. The remaining axes are coordinated across threads: each invocation of the thread's functor is given a fresh position to operate from, in the form of an Iterator class.

To help illustrate the concept, consider an example whereby the inner axes have been set to the x & y axes (i.e. axes 0 & 1), and the outer axes have been set to the z and volume axes (i.e. axes 2 & 3). Each thread will do the following:

- lock the loop mutex, obtain a new set of z & volume coordinates, then release the mutex so other threads can obtain their own unique set of coordinates to process;
- set the position of all
`ImageType`

classes to be processed according to these coordinates; - iterate over the x & y axes, invoking the user-supplied functor each time;
- repeat from step 1 until all the data have been processed.

The ThreadedLoop() functions can be used to set up any reasonable multi-threaded looping structure. The various relevant aspects that can be influenced include:

- which axes will be iterated over; by default, all axes of the
*source*`HeaderType`

class will be included. - the order of traversal, in the form of a list of axes that will be iterated over in order; the first entry in the list will be iterated over in the inner-most loop, etc. By default, the axes are traversed in the order of increasing stride of the
*source*InfoType class. - the number of axes that will be managed independently by each thread. By default, this number is one.
- a string that will be displayed in the progress bar if one is desired.

The various overloaded functions allow these settings to be specified in different ways. A `HeaderType`

class is always required; it is generally recommended to provide the (or one of the) *input* `ImageType`

class that is to be processed, since its order of traversal will have the most influence on performance (by making the most efficient use of the hardware's RAM access and CPU cache). It is also possible to supply a `vector<size_t>`

directly if required.

These axes can be restricted to a specific range to allow volume-wise processing, etc. The inner axes can be specified by supplying how many are needed; they will then be taken from the list of axes to be looped over, in order of increasing stride. It is also possible to provide the list of inners axes and outer axes as separate `vector<size_t>`

arguments.

The run() methods will run the ThreadedLoop, invoking the specified function or functor once per voxel in the loop. The different versions will expect different signatures for the function or functor, and manage looping over different numbers of `ImageType`

classes. This should be clarified by examining the examples below.

This example can be used to compute the exponential of the voxel values in-place, while displaying a progress message:

void my_function (MyImageType& vox) {

vox.value() = std::exp (vox.value());

}

...

MyImageType vox;

ThreadedLoop ("computing exponential in-place", vox)

.run (my_function, vox);

ThreadedLoopRunOuter< decltype(Loop(vector< size_t >()))> ThreadedLoop(const HeaderType &source, const vector< size_t > &outer_axes, const vector< size_t > &inner_axes)

Multi-threaded loop object.

To make this operation more easily generalisable to any `ImageType`

, use a functor with a template operator() method:

class MyFunction {

public:

template <class ImageType>

void operator() (ImageType& vox) {

vox.value() = std::exp (vox.value());

}

};

...

AnyImageType vox;

ThreadedLoop ("computing exponential in-place", vox)

.run (MyFunction(), vox);

Note that a simple operation such as the previous example can be written more compactly using C++11 lambda expressions:

MyImageType vox;

ThreadedLoop ("computing exponential in-place", vox)

.run ([](decltype(vox)& v) { v.value() = std::exp(v.value()); }, vox);

As a further example, the following snippet performs the addition of any `ImageTypes`

*vox1* and *vox2*, this time storing the results in *vox_out*, with no progress display, and looping according to the strides in *vox1:*

struct MyAdd {

template <class ImageType1, class ImageType2, class ImageType3>

void operator() (ImageType1& out, const ImageType2& in1, const ImageType3& in2) {

out.value() = in1.value() + in2.value();

}

};

...

ThreadedLoop (vox1).run (MyAdd(), vox_out, vox1, vox2);

Note that the `ImageType`

arguments to run() must be provided in the same order as expected by the Functor's `operator()`

method.

Again, such a simple operation can be written more compactly using C++11 lambda expressions:

auto f = [](decltype(vox_out)& out, decltype(vox1)& in1, decltype(vox2)& in2) {

out.value() = in1.value() + in2.value();

}

ThreadedLoop (vox1).run (f, vox_out, vox1, vox2);

This example uses a functor to computes the root-mean-square of *vox:*

class RMS {

public:

// We pass a reference to the same double to all threads.

// Each thread accumulates its own sum-of-squares, and

// updates the overal sum-of-squares in the destructor, which is

// guaranteed to be invoked after all threads have re-joined,

// thereby avoiding race conditions.

RMS (double& grand_SoS) : SoS (0.0), grand_SoS (grand_SoS) { }

~RMS () { grand_SoS += SoS; }

// accumulate the thread-local sum-of-squares:

template <class ImageType>

void operator() (const ImageType& in) {

SoS += Math::pow2 (in.value());

}

protected:

double SoS;

double& grand_SoS;

};

...

double SoS = 0.0;

.run (RMS(SoS), vox);

double rms = std::sqrt (SoS / voxel_count (vox));

The run_outer() method can be used if needed to loop over the indices in the outer loop only. This method is used internally by the run() methods, and may prove useful in selected cases if each thread needs to handle its own looping over the inner axes. Usage is essentially identical to the run() method, and the function or functor provided will need to have a void operator() (const Iterator& pos) method defined. Note that in this case, any `ImageType`

classes to be looped over will need to be stored as members to the functor, since it will only receive an `Iterator`

for each invocation - the functor will need to then implement looping over the inner axes from the position provided in the `Iterator`

.

- See also
- Loop
- Thread::run()
- Thread-safe queue