Last month I asked the following question in an exam for the advanced image analysis course we teach here: “Given that interpolation is a convolution, describe how you would compute an interpolation using the Fourier Transform.” Unfortunately I can count on one finger the number of students that did not simply answer with something in the order of “convolution can be computed by multiplication in the Fourier domain.” And the one student that did not give this answer didn’t give an answer at all… Apparently this question is too difficult, though I thought it was interesting and only mildly challenging. In this post I’ll discuss interpolation and in passing give the correct answer to this question.
Sampling and interpolation
Let’s consider a sampled function such as this (top left):
The arrows represent pulses, their height represent the function value at each of the sample points. This is commonly referred to as a pulse train; it is a continuous function which only has non-zero values at the sample locations. We’re assuming in this case that the function is infinitely long, so we don’t have to worry about what happens when you have a limited set of samples.
In the right column are the Fourier transforms of the functions on the left. For our pulse train, the Fourier transform is an infinite repetition of the Fourier transform (frequency spectrum) of the function that was sampled. In the image you can see five repetitions, but this is also an infinitely long function with an infinite number of repetitions of the same shape. The shape, the frequency spectrum of the function before sampling, is zero above some frequency, called the cut-off frequency. The distance between repetitions is equal to the sampling frequency (), meaning that the closer together the samples are, the further away these repetitions in the frequency domain are. Half the sampling frequency is called the Nyquist frequency (); the cut-off frequency should be lower than the Nyquist frequency for correct sampling. If this is not the case, then the repetitions of the frequency spectrum will overlap. This is referred to as aliasing, and causes irreversible loss of information in the sampled signal. If these repetitions do not overlap, no information is lost by the sampling, and the original, continuous function can be recovered from its samples.
The way to recover the original, continuous function is by blanking out all these repetitions of the spectrum, keeping only the original copy centred at zero frequency. To do so, we multiply in the Fourier domain by the function shown on the middle row, right. This is called the ideal interpolator, and has a value of 1 for all frequencies up to the Nyquist frequency, and a value of 0 above this frequency. The result of this multiplication is shown in the bottom row.
The inverse Fourier transform of this ideal interpolator is a sinc function (middle row, left). A convolution of our pulse train with this sinc function is equivalent to the multiplication we did in the Fourier domain. This convolution, thus, gives the original, continuous function from the set of samples. Note how the sinc function has a value of 1 in the origin, and a value of 0 at the location of every other sample (pulse in the pulse train). This is a necessary feature of an interpolating function, such that the result of the convolution at the location of one of the samples is always equal to that sample.
Of course, the sinc function is infinitely long, which means that computing the interpolation requires an infinite number of input samples. It is much more practical to limit this to a few samples only. This is the reason we never use the ideal interpolator. The term ideal is often confused with perfect, but in this case it refers to the original meaning of theoretical/imaginary/impractical. More practical interpolators use only a few input samples. For example, the linear interpolator is a triangular function, also valued 1 at the origin, and valued 0 at the two nearest samples. You always need only two samples to compute the interpolated function at any one location. Another example is the Lanczos interpolator, which has a parameter a dictating its size; it always uses 2a samples to compute each output sample.
The discrete case
But all of this discussion assumes a pulse train, a continuous function, as the sampled signal. In reality, a sampled signal is just the set of values represented by the pulse train, the sequence of numbers that contain the relevant information of the sampled function. If we were to sample the interpolating function in the same manner, all we would have left is a single 1 surrounded by 0 values (see the image below). That is, the interesting parts of the interpolator is what happens in between the sample values. How do you represent that?
There are two main tasks for interpolation in image processing: shifting and zooming. To shift an image by a non-integer amount of pixels, we need to interpolate to find the new sample values. In this task, the output has the same number of pixels as the input. I ignore rotating the image here, as it is most efficiently computed by shifting rows and columns of the image in three steps, and thus is discussed under the shifting case. For zooming, we compute the value of the underlying continuous image at more locations than in the input image, resulting in an image with more pixels. Let’s consider these two tasks separately.
Shifting an image
If we wish to shift an image by, say, 0.3 pixels horizontally, we need to compute new pixel values at 0.3, 1.3, 2.3, etc., and we discard the original values. That is, each row in the image is replaced by new values computed by interpolation from a few existing values. As stated above, this is equivalent to convolving each row with a 1D interpolation kernel. The kernel then should be shifted by 0.3 pixels and sampled (as in the image below). Assuming the ideal interpolator, we thus have a convolution kernel , sampled at integer . But of course you can do the same with the linear interpolator or any other one. For a linear interpolator we would thus have a kernel with two values, [0.7,0.3]. This matches with what you expect for linear interpolation, but it also matches with sampling the triangular function I mentioned earlier. To compute this in the Fourier domain, you just need to know the shifting property of the Fourier transform: multiply each frequency component by .
Zooming an image
More interesting, and this is what I was hoping my students would think about when answering the question on the exam last month, is the case of zooming. When we zoom in on an image, we need to generate new pixels, their values determined by interpolation from surrounding pixels. This cannot be done by a simple, standard convolution, because the discrete convolution generates as many input values as output values. That is, the answer that my students gave of applying the Fourier transform of the input image and the interpolating kernel, multiply the two, and applying the inverse transform, is never going to produce a zoomed image. We need to increase the number of pixels somehow.
To zoom an image, the process with the kernel is identical to that described above for the shift, except that we need to sample the kernel differently for each output position. In the case of zooming by an integer amount, say a factor 4, we generate three new pixels in between each pair of pixels on each row of the image (and in a second step do the same thing along the columns). For example, the output pixels could be at location 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, etc. The integer locations are already given by the input samples. The values at locations 0.25, 1.25, 2.25, etc. can all be computed simply by translating the image by 0.25, in the manner described above. The same is true for the group of pixels with a shift of 0.5, and that with a shift of 0.75. Thus, we generate three convolution kernels, compute 3 convolutions, and combine the results in an appropriate way.
But we can compute the same thing with a very simple trick in the Fourier domain! Look at the graph below. We can reconstruct the continuous function by erasing all the repetitions of the shape except the one at the origin (middle row). We would do this by zeroing out all frequencies above the Nyquist frequency. We would then sample the function again, at a higher sampling rate (bottom row), which would again produce repeated shapes in the frequency domain, but this time there would be more space in between the repeats (as the distance between these repeats is equal to the sampling frequency). That is, there will be more zero values around the middle peak.
In the discrete domain, the Fourier transform only computes those values between and , as everything else is redundant. To go from the discrete Fourier transform on the top row to that of the bottom row, we simply need to pad it with zeros, extending it to the new . The inverse transform of this padded frequency spectrum is an image sampled at a higher sampling rate, a zoomed image.
This concept is so trivial to me, I was expecting my students would be able to figure it out during the exam. I still think they should have at least noticed that the recipe they gave does not produce any more pixels than were in the input image, or that sampling the interpolating kernel produces a not-so-useful function unless one wants to shift the image. In any case, it is clear that the question was not clear at all. I should have clarified better what I was thinking of. I guess I should compensate them in some way for this?