Cris’ Image Analysis Blog

theory, methods, algorithms, applications

DIPimage 2.0 released

Last week the new release for DIPimage and DIPlib was made available at diplib.org. The change list is pretty substantial, though there should be no real compatibility concerns. One of the most important changes is that, for both Windows and Linux, some image processing functionality now can use multithreading to make best use of multi-processor and multi-core systems. For example, all separable filters will use all available cores by default. You can overrule this behaviour with the new 'NumberOfThreads' preference setting. Let’s first see how many threads DIPimage uses by default on my computer:

dipgetpref('NumberOfThreads')
ans =
     4

I’m using an Intel Xeon with four cores. Now let’s see how this affects computation. Again, I’m using the timeit utility you can find here:

a = noise(newim(500,500));
f = @() gaussf(a,15);
dipsetpref('NumberOfThreads',4)
t4 = timeit(f)
dipsetpref('NumberOfThreads',2)
t2 = timeit(f)
dipsetpref('NumberOfThreads',1)
t1 = timeit(f)
t4 =
    0.0193

t2 =
    0.0266

t1 =
    0.0443

Using four cores decreases execution time by a little over 50%. Part of the execution time is the actual calculation, but a significant part of the time is needed for memory access. All processor cores share the same memory, which makes it impossible to obtain the ideal decrease of 75% of the computation time. Note how the transition from one to two cores produces a larger advantage than the transition from two to four cores. MATLAB has the maxNumCompThreads command, which changes the number of threads used for linear algebra calculations:

b = double(a);
f = @() b*b;
n = maxNumCompThreads(4);
t4 = timeit(f)
maxNumCompThreads(2);
t2 = timeit(f)
maxNumCompThreads(1);
t1 = timeit(f)
maxNumCompThreads(n);
t4 =
    0.0082

t2 =
    0.0142

t1 =
    0.0267

Expect the 'NumberOfThreads' preference to be linked to the the maxNumCompThreads command in a future release of DIPimage.

Not all filters are affected by the number of threads. For example, the dilation with a circular structuring element is not separable:

f = @() dilation(a,15);
dipsetpref('NumberOfThreads',4)
t4 = timeit(f)
dipsetpref('NumberOfThreads',2)
t2 = timeit(f)
dipsetpref('NumberOfThreads',1)
t1 = timeit(f)
t4 =
    0.0310

t2 =
    0.0311

t1 =
    0.0311

However, the rectangular structuring element is:

f = @() dilation(a,15,'rectangular');
dipsetpref('NumberOfThreads',4)
t4 = timeit(f)
dipsetpref('NumberOfThreads',2)
t2 = timeit(f)
dipsetpref('NumberOfThreads',1)
t1 = timeit(f)
t4 =
    0.0065

t2 =
    0.0075

t1 =
    0.0102

Note how the rectangular dilation only reduced execution time by 36% when spread over four cores. The dilation uses relatively few computations, meaning it is limited more by the memory access than by the computation. The same is true for the dot-product on matrices, for example, which therefore is not implemented as a multi-threaded operation in MATLAB.