Gaussian filtering

In my recent lectures on filtering I was trying to convey only one thing to my students: do not use the uniform filter, use the Gaussian! The uniform (or “box”) filter is very easy to implement, and hence used often as a smoothing filter. But the uniform filter is a very poor choice for a smoothing filter, it simply does not suppress high frequencies strongly enough. And on top of that, it inverts some of the frequency bands that it is supposed to be suppressing (its Fourier transform has negative values). There really is no excuse ever to use a uniform filter, considering there is a very fine alternative that is very well behaved, perfectly isotropic, and separable: the Guassian. Sure, it’s not a perfect low-pass filter either, but it is as close as a spatial filter can get.

Because recently I found some (professionally written) code using Gaussian filtering in a rather awkward way, I realized even some seasoned image analysis professionals are not familiar and comfortable with Gaussian filtering. Hence this short tutorial.

First of all, for reference, this is the equation for a Gaussian:

g(x) = 1/(√(2π) σ) exp( -½ x2/σ2 )

An isotropic multi-variate Gaussian is defined as (with x an n-dimensional vector):

g(x) = 1/(√(2π) σ)n exp( -½ |x|2/σ 2 )

Scaling the various axes can be accomplished giving a different σ for each dimension. For example, in two dimensions this gives:

g(x,y) = 1/(2π σxσy) exp{ -½( x2/σx2 + y2/σy2 ) } = g(xg(y)

The Gaussian as a Smoothing Filter

The Gaussian can be used as a smoothing filter. Its Fourier transform is a Gaussian as well, with a sigma of (2πσ)-1, meaning it strongly suppresses frequencies above f ≈ (πσ)-1. What is more, it is monotonically decreasing for increasing f, like a good smoothing filter should be, and real and positive everywhere, meaning the filter does not change the phase of any frequency components. Even thought the Gaussian does not have a cutoff frequency, its Fourier transform is very close to zero after certain value of f, meaning it can be sampled with only a very small error if σ is larger than 0.9 or so.

This demo gives a good indication of what is wrong with the uniform filter:

a = cos(phiphi*100) * erfclip((rr-40)/2,0.5,1);
b = unif(a,9,'rectangular');
c = unif(a,9);
d = gaussf(a,3);
a =   a(0:127  ,0:127  );
b = 4*b(128:255,0:127  );
c = 4*c(128:255,128:255);
d = 4*d(0:127  ,128:255);
hold on

filter output

a is a test image with a line pattern that decreases in frequency as one moves from the center to the edge. b is filtered with a uniform filter with a 9×9 rectangular support. c is a uniform filter with a circular support. It shows the same characteristics, but is much more isotropic. d is filtered with a Gaussian. Note how the uniform filter inverts the line pattern at various points in the image.

Computing Derivatives

The derivative of a function is defined as its slope, which is equivalent to the difference between function values at two points an infinitesimal distance apart, divided by that distance. On a discrete grid, the smallest distance obtainable without interpolation is 1. This yields the finite difference approximation of the derivative: a linear filter with values [1,-1]. This approximation actually computes the derivative in between the two grid points, meaning that the output of the filter is shifted by half a pixel. This is often solved by using a distance of 2 instead of 1. We now have a filter [1,0,-1]/2, which is odd in size.

The problem with these filters is that they respond very strongly to noise. Hence the introduction of filters like Prewitt and Sobel, that add some smoothing in the direction orthogonal to the derivative direction. Of course, these filters had to use a uniform or a triangular filter, instead of a good smoothing filter. (The triangular filter has better properties than the uniform filter, but does not suppress high frequencies as well as the Gaussian.) But even if we ignore the poor choice of smoothing filter, this type of derivative operator is in general a bad idea. Why? Because each of the components of a gradient vector computed with these filters has a different smoothing. Let me explain this. A filter like Prewitt is separable, equivalent of applying the [1,0,-1]/2 along one axis, and the [1,1,1]/3 filter along the other axis. So basically we are computing the partial derivative along the x-axis of an image that was smoothed with a 1×3 uniform filter, and then computing the partial derivative along the y-axis of an image that was smoothed with a 3×1 uniform filter. The two partial derivatives are derivatives of different images! Needless to say, if we combine these into one gradient vector we obtain a nonsensical result (or a poor approximation at best). This means that these filters are not rotation invariant.

So what is so great about those “Gaussian derivatives” I keep hearing about? Don’t they do the same basic thing as Prewitt and Sobel, but with a proper smoothing? Well, yes, but there is more. We learned above that the derivative is a convolution. We also know that the convolution is associative and commutative. These properties show us that computing the gradient of an image blurred with a Gaussian is the same thing as convolving the image with the gradient of a Gaussian:

∂/∂x [ i(x,y) ⊗ g(x,y) ] = ∂/∂x ⊗ i(x,y) ⊗ g(x,y) = i(x,y) ⊗ ∂/∂x ⊗ g(x,y) = i(x,y) ⊗ [ ∂/∂x g(x,y) ]

And because we can compute the derivative of the Gaussian analytically, and because it has a bandlimit similar to that of the Gaussian itself, we can sample it and make a filter out of it. What is so exciting? We are now computing the exact gradient of a blurred version of our image. This is no longer a discrete approximation to the gradient, but an exact analytical gradient. Of course, we cannot compute the gradient of the image itself, only of its blurred version, which means we are still approximating the gradient. But in any application that requires accurate gradients, we needed to blur the image anyway because the gradient is so noise sensitive.

Convolutions with a Gaussian

One very important theorem in all the sciences is the central limit theorem. OK, officially it belongs to statistics, but this theorem is the reason we all have to deal with Normally distributed noise. It states that, if we add together a large enough set of random variables with whatever distribution, the resulting random variable is normally distributed. The Gaussian emerges again and again as a probability distribution in nature, Brownian motion is a classic example. The central limit theorem also tells us that, if we convolve any compact filtering kernel with itself often enough, we obtain a Gaussian kernel. Meaning that by convolving an image repeatedly with a uniform filter we can approximate a Gaussian filter. It also means that convolving an image twice with a Gaussian filter yields an image filtered with a larger Gaussian filter. The size parameter σ of this resulting larger Gaussian filter can be computed by quadratically adding the σs of the applied filters: σtotal = √(σ12+σ22). So when repeatedly convolving an image with the same Gaussian, we effectively increase the σ with a factor √2 with every convolution. Note that this is independent of the order in which the gradient convolution is applied: the convolution is commutative. This means that if we smooth the image with a Gaussian of σ=4, then compute a gradient with a Gaussian derivative of σ=3, we could just as well have computed the gradient straight away, using a Gaussian derivative of σ=5.

Furthermore, because the Gaussian is separable, you can compute the convolution with a 2D Gaussian by applying 1D convolutions, first along each of the rows of the image, then along each of the columns of the result:

i(x,y) ⊗ g(x,y) = i(x,y) ⊗ g(x) ⊗ g(y)

Or first along the columns and then along the rows, whatever floats your boat. And of course you can do this in your 5D image just as nicely as in a 2D image. But more importantly: this is also true for the gradient!

∂/∂x i(x,y) ⊗ g(x,y) = i(x,y) ⊗ ∂/∂x g(x) ⊗ g(y)

What I’m saying here is that you, yes, you who wrote that code that nicely smoothed the image with a 1D Gaussian first along rows and then along columns, and then computed the x and y derivatives by applying full 2D convolutions with 2D Gaussian derivatives, you were wasting a lot of computer clock cycles. Skip the first blurring, you don’t need it, and compute the gradient as the separable convolution that it is. Please.

22 Responses to “Gaussian filtering”

  1. On December 9th, 2008, at 15:06, Olechka-persik said:

    Огромное спасибо за потрясающие идеи!!! Буду следить за блогом, много всего интересного. А мой блог о науке, надеюсь, тоже понравится 😉

  2. On December 24th, 2008, at 0:43, Sylvain Costes said:

    Very nice explanation Cris…

  3. On January 1st, 2009, at 9:54, Sukhee said:

    Dear Prof. Cris Luengo

    Thank you very much for writing the interested topic.

    Please advise me that how can I make convolution very fast without Gaussian Filtering

  4. On January 5th, 2009, at 10:57, Cris Luengo said:


    What do you mean? Convolution is a very general operation. What the fastest implementation is depends very much on the kernel you want to use. If the kernel is separable, like the Gaussian, you can apply a series of one-dimensional convolutions. If the kernel is very large it often is best to compute the convolution through the Fourier Domain.

  5. On January 31st, 2009, at 6:03, Rodrigo said:

    I just run into this. Genius!!!!

  6. On March 31st, 2009, at 0:53, Will Dwinnell said:

    This is an interesting posting. Thanks!

  7. On May 6th, 2009, at 13:17, Maha Maabar said:

    Can we determine the degree of smoothing from the size of the Gaussian filter σ?

  8. On May 6th, 2009, at 13:51, Cris Luengo said:

    Hi Maha,

    The parameter σ determines the degree of smoothing. The size in pixels of a Gaussian filter is typically between 6 and 10 times the σ (cutting off the tail of the Gaussian curve at 3σ or 5σ). Smaller sizes can affect how well the filter dampens high frequencies, so it’s not recommended to go below the 3σ cutoff. But other than that, the size in pixels of the filter does not change the degree of smoothing: it is only determined by the parameter σ.

  9. On May 6th, 2009, at 19:59, Maha Maabar said:

    Thanks Cris for your quick reply. However, I think I didn’t state the problem clearly. My question is, can we, or is there a way to, predict the amount of noise reduction for a certain value of σ? For example, can we say how much reduction a kernel of σ = 5 would achieve on a signal of amplitude 100 pixels?

  10. On May 7th, 2009, at 9:22, Cris Luengo said:


    The answer is yes and no. You can calculate how much you’d reduce the energy of the noise in the image, but the energy of the signal is also reduced in some way that depends on image contents. Therefore it is not possible to estimate the increase in signal to noise ratio (SNR), which is the more relevant value quantifying noise.

    To calculate the reduction of energy of the noise portion of the image you’d use Fourier analysis. White noise has a uniform amplitude in the Fourier domain. The energy is the integral of the square of the amplitude. Convolution with a Gaussian is multiplication with a Gaussian in the Fourier domian. Therefore, white noise energy will be reduced by a factor equivalent to the integral of the square of the Gaussian in the Fourier domain.

    If the noise is not white, you can still calculate the energy reduction as long as you know the power spectrum. To calculate the SNR increase you also need to know the power spectrum of the noise-free signal, which typically is not obtainable unless you image is synthetic.

  11. On May 8th, 2009, at 12:42, Maha Maabar said:

    Hi Cris,

    Thanks very much for your idea.

    can you please point me to some good references which explain it in more details?

  12. On May 8th, 2009, at 13:58, Cris Luengo said:


    I don’t have a reference that studies this particular issue, but any book on Fourier analysis should give you all the equations and knowledge you need to compute these things. A good reference is, for example, “Signals and Systems” by Alan V. Oppenheim and Alan S. Willsky (Prentice Hall, 1997).

  13. On June 3rd, 2009, at 17:23, Ray said:

    Dear Prof. Cris Luengo:

    Thank you for your nice lecture. I am doing a project which requires accurate image gradient vector. It seems there are two ways. The first way is convolve the image with gaussian and then compute X and Y gradient from the blurred image. The second way is compute X and Y gradient respectively by convolving with derivative of gaussian. So which method is more widely used? Thank you!

  14. On June 3rd, 2009, at 17:29, Ray said:

    I hear that recursive gaussian filter runs very fast. But I could not understand it. Could you give some lecture on it? Thanks agian!

  15. On June 4th, 2009, at 10:10, Cris Luengo said:

    Hi Ray,

    Computing the Gaussian filtered image first and then applying the finite difference derivative defeats the purpose of the Gaussian gradients. The heavier the blurring, the more accurate this approximation is, but using the Gaussian derivatives directly is always better. And it’s not such an expensive operation.

    The recursive Gaussian filter is nice to use for large filters. If you’re looking to do a Gaussian derivative with a σ of 1 or 2, it’s faster to compute it with the finite impulse response filter (the “normal” way). If you want to learn more about it, you should read:

    – Young, I. T. & van Vliet, L. J. “Recursive implementation of the Gaussian filter,” Signal Processing 44:139-151, 1995.

    – van Vliet, L. J.; Young, I. T. & Verbeek, “Recursive Gaussian Derivative Filters,” Proceedings 14th Int. Conference on Pattern Recognition, vol. 1, 509-514, IEEE Computer Society Press, 1998.

  16. On October 14th, 2010, at 20:20, Krishna Prasad said:

    Dear Professor,

    Can you provide code for Gaussian Filter in ‘C’ Language for 512X512 lena image. thanks in advance.

  17. On October 15th, 2010, at 9:19, Cris Luengo said:

    Krishna Prasad,

    Sorry, I do not have any such code in C that I can distribute.

  18. On January 13th, 2012, at 0:08, Ricky Bangel said:

    Dear Professor,

    I am trying to use an artificially generated Gaussian white noise image to calculate the MTF of a camera. I will be using the Fourier Tranform to get my MTF value. However, I am having trouble generating such an image. Can you please provide me some guidance into how I should go about this?

  19. On January 13th, 2012, at 9:38, Cris Luengo said:

    Hi Ricky,

    Using DIPimage you can generate an image with Gaussian white noise very easily: noise(newim) does the trick. You can read some more about random numbers in this post.

  20. On August 17th, 2012, at 11:36, Anna said:

    Hello Cris,

    would you recomment to use 2D or 3D gaussian filter to smooth a set of 2D medical image slices (slice thickness 7mm, pixel size 2mm)?

  21. On August 19th, 2012, at 13:32, Cris Luengo said:


    That very much depends on the goal. The sigma along z, in pixels, will be 2/7 times that of the sigma along x and y. If you want to use a small sigma, then the sigma along z will be too small for correct computation. In this case it might be better to do the filtering on the 2D slices. But in general it is better to do the operation along all 3 dimensions, because the slices are correlated and you definitely want to use that correlation if you can.

  22. On May 8th, 2013, at 7:13, Zhao Zongyi said:

    It’s clear and helpful.Thank you.