Separable convolutions

The convolution is an important tool in image processing. All linear filters are convolutions (like, for example, the Gaussian filter). One of the reasons linear filters are so prevalent is that they imitate physical systems. For example, an analog electric circuit (containing resistors, capacitors and inductors) can be described by a convolution. The projection of a cell on a slide, through the microscope’s optical systems, onto the CCD sensor, can be described by a convolution. Even the sampling and averaging that occur in the CCD can be described by a convolution.

Two properties of the convolution are quite interesting when looking for an efficient implementation:

  • The convolution is a multiplication in the Fourier domain: f(x)⊗h(x) ⇒ F(ω)⋅H(ω) . This means that you can compute the convolution by applying the Fourier transform to both the image and the convolution kernel, multiplying the two results, then inverse transforming the result.
  • The convolution is associative: (fh1)⊗h2 = f⊗(h1h2) . This post is about the repercussions of this property.

Given this associativity property, if we have a filtering kernel h that can be decomposed into two simpler kernels such that h = h1h2, then we can compute the convolution as a sequence of two cheaper convolutions: fh = f⊗(h1h2) = (fh1)⊗h2 . The most trivial example of this is a square, uniform filter:

a = double(deltaim(40,40));
h1 = ones(15,1)/15;
h2 = ones(1,15)/15;
b = conv2(a,h1,'same');
c = conv2(b,h2,'same');

a b c

Convolving with two one-dimensional uniform filters in sequence is equivalent to convolving with a square, 2D uniform filter. But instead of doing 15×15=225 multiplications and additions per pixel, we’re only doing 15+15=30. Whenever we can decompose an n-dimensional filter into a set of n one-dimensional filters, we say that the filter is separable. Some of the most useful linear filters are separable.

Decomposing a separable filter kernel

Given that h1 is a column vector (a filter for the y-axis), and h2 a row vector, their product is a matrix and identical to the convolution of the one with the other. That is, h = h1h2 = h1 h2 . We can thus see how each column of a separable kernel is a scaled version of any other column (and conversely, each row is a scaled version of any other row). In MATLAB terms, h(:,i)=h1*h2(i). To decompose the kernel into two vectors, one thus can take any column of h as h1, and determine the values of h2 by simple division:

h = [2,1,0,-1,-2]'*[3,2,1];
h1 = h(:,1);
h2 = h(:,1)/h1(1);
ans =

Note, however, that the h1 and h2 computed in the code above are not identical to the two vectors that we multiplied together to obtain h. We can multiply h1 by any constant, as long as we divide h2 by that same constant.

The problem with this method is that if he happened to pick a column for h1 that is all zeros, h2 will be all infinity, and we wouldn’t get anywhere. Also, if some columns of h1 have very small values, the decomposition result would be less accurate if we used that column compared to any other column. By adding a little bit of logic to the code above, we can easily get the most accurate decomposition:

[~,I] = max(sum(abs(h))); % Pick the column with largest values
h1 = h(:,I);
[~,I] = max(sum(abs(h),2)); % Pick the row with largest values
h2 = h(I,:)/h1(I);
ans =

Let’s look at a slightly more complex example for a separable filter: the Gaussian derivative. We generate the kernel simply by applying the filter to a delta image, an image with a single pixel set:

h = deltaim(31,31);
h = dx(h,5);
h = double(h);
[~,I] = max(sum(abs(h)));
h1 = h(:,I);
[~,I] = max(sum(abs(h),2));
h2 = h(I,:)/h1(I);
ans =

The filter, though it is separable, failed the isequal test. We shouldn’t expect an exact equality, though, because we are dealing with floating-point values. The function dx returns single-precision floating point values, this is the maximum accuracy we should expect in the end result:

q = h1*h2-h;
q = abs(q(:))/max(abs(h(:)));
max(q) < eps('single')
ans =

A test for separability

Consider the matrix h as a system of linear equations. Because each row is a scaled version of every other row, this is actually a set of identical equations. That is, the rank of the system is 1:

h = [2,1,0,-1,-2]'*[3,2,1];
ans =

But, again, in general we cannot expect infinite accuracy, we should check the rank taking a certain tolerance into account:

h = double(dx(deltaim(31,31),5));
s = svd(h);
sum(s > eps('single'))
ans =

If we decide to check the separability of the kernel before computing its components, we might as well use svd to its full power. svd computes not only the singular values, but also two matrices such that h=u*s*v'. These two matrices, u and v, contain the h1 and h2 that we are looking for:

[u,s,v] = svd(h);
s = diag(s);
if sum(s > eps('single'))==1
   h1 = u(:,1)*s(1);
   h2 = v(:,1)';
   error('Kernel not separable!')
q = h1*h2-h;
q = abs(q(:))/max(abs(h(:)));
max(q) < eps('single')
ans =

This last bit of code, though more elegant, will produce slightly less accurate results than the first method in those cases where the first method can compute an exact result. But if there are round-off errors, the SVD method will, in general, produce a more accurate result.

To round off this post, I wanted to point you to two entries on the MATLAB File Exchange. One post, that I submitted, computes the kernel decomposition using the SVD method explained here. Interesting in that file is that it extends the method to n-dimensional kernels by iterating over the last dimension. The other post, by Dirk-Jan Kroon, introduces a whole different approach to kernel decomposition. I think his method is more complex and not any better than the two methods discussed here, but it is interesting because it can produce, using a least squares approach, an approximation for non-separable kernels.

2 Responses to “Separable convolutions”

  1. On October 27th, 2010, at 16:10, John said:

    I’m interested in some property I have read about convolution and it’s transformation in frequency domain.

    If Y(w)= X(w)/H(w)

    Can we say that

    y(t)=x(t)*ifft(1/h(t)) ?

    How does it work if it has been already filtered?

  2. On October 27th, 2010, at 16:33, Cris Luengo said:

    That’s almost correct:

       y(t) = x(t) * h(t)
       =>   Y(w) = X(w) H(w)
       =>   X(w) = Y(w) / H(w)
       =>   x(t) = y(t) * IFT( 1 / FT( h(t) ) )

    This is called deconvolution.

    However, that leads to bad behaviour for frequencies where H(w) is small, and even worse behaviour where it’s zero! Noise tends to get out of hand with this type of deconvolution. This is where the Wiener filter comes in. See

Leave a Reply

You can use these HTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Note: I moderate all comments. Comments without a clear relation to the text above will not be published.