Evaluating noise filters
Most of the new papers that I come across that propose a new or improved way of filtering out noise from images use the Peak Signal-to-Noise Ratio (PSNR) as a means to evaluate their results. It has been shown again and again that this is not a good way of evaluating the performance of a filter. When people compute the PSNR for a filtered image, what they actually do is compare this filtered image to an undistorted one (i.e. known ground truth). This is very different from what the name PSNR implies: the ratio of peak signal power to noise power. Of course, that is something that cannot be measured: if we’d be able to separate the noise from the signal and measure the power of the two components, then we wouldn’t need to write so many papers about filters that remove noise! So instead, the typical PSNR measure in image analysis uses the difference between the filtered image and the original (supposedly noise-free) image, calls this difference the noise, and computes its power (in dB):
a = readim('cameraman') b = noise(a,'gaussian',20) c = bilateralf(b,2,30) c_psnr = 10*log10(255^2/mean((a-c)^2))
c_psnr = 28.6511
The problem is that this does not compute the amount of noise in the image, and therefore is misleading. If the filter affects the signal as well as removing noise (which is always the case), both these changes will be taken as noise. What is more, this method assumes that we know what the image looks like without noise. If we use, as I did above, a natural image as a base for testing our filter, then we do not know what that image looks like without noise (all natural images have noise!). Sure, we added noise to the image and watched the filter reduce this noise. But we cannot expect our filter to know the difference between the noise we added to the image, and the noise that was already there. If we had a perfect filter that removed all the noise without affecting the signal at all, its result would be different from the original image (which has some unknown small amount of noise), and therefore we’d be underestimating the filter’s performance!
In my opinion, if what we really want to do is compare the filtered image with the original “noise-free” image, we should simply use the Mean Square Error (MSE). The MSE is the interesting part of the computation for PSNR, doesn’t make assumptions about the maximum signal power, and doesn’t claim to compute something we’re not computing:
c_mse = mean((a-c)^2)
c_mse = 88.7099
If you compare this value to the MSE of the noisy image b (404.0), you get a very good idea of how much the filter undoes the noise we added to the image. But of course we do also need to make sure the input image really is noise-free for this to be correct, and the only way of doing so is by using a synthetic image.
To really characterise the performance of a noise filter, one should not simplify the comparison to a single number: part of the difference between the images is noise that wasn’t removed, part of the difference is distortion added to the image. And these two effects are not equal. For example, the two images below have the same MSE (and, consequently, the same PSNR):
b = gaussf(a,3) b_mse = mse(b,a) c = noise(a,'gaussian',sqrt(b_mse)) c_mse = mse(c,a)
b_mse = 502.8580 c_mse = 506.3268
I don’t believe these two images should get the same “score,” the noisy one looks better to me!
The main complaint against the MSE and similar measures for error is that they’re too “simplistic.” For example, adding an offset to the image (which doesn’t affect its signal-to-noise ratio at all), has a huge effect on the MSE:
d_mse = mse(a+sqrt(b_mse),a)
d_mse = 502.8581
So how can we more properly study the effect of a noise-reduction filter? One way is to look at what the filter removed. How much signal is in there? Ideally, there would be none. All the pixels should be independent of its neighbours. If so, all that the filter removed was noise, without affecting the signal at all. Of course, this is not possible, so typically you’d see some part of the signal in there. This signal has some autocorrelation (i.e. the pixel values are not independent of each other). We can look for autocorrelation in the difference between the input and output of the filter:
N1 = autocorrelation(a-b); dipshow(N1(96:159,96:159),'lin','notruesize') N2 = autocorrelation(a-c); dipshow(N2(96:159,96:159),'lin','notruesize')
I’m showing only the central part of the autocorrelation, as there typically is no correlation between points further away. Clearly, image b has a disturbed signal, therefore a-b has strong autocorrelation. For image c this is not the case. One simple way to quantify the autocorrelation is to take the root mean square value of the autocorrelation at short distances (excluding the distance 0, of course):
m = rr(a)<2 & rr(a)>0; b_ac = sqrt(sum(N1(m)^2)) c_ac = sqrt(sum(N2(m)^2))
b_ac = 1.9094e+05 c_ac = 1.1608e+03
As we can see, the one has a much larger value than the other, as expected. Ideally, we would normalize the input to the autocorrelation function, because stronger noise will have a larger value in this measure. The autocorrelation measure, in combination with the standard deviation of the difference (which is equivalent to the MSE), will quantify both the amount of noise and the amount of signal removed from the image. I’d rather give both figures separately, as it is application dependent how these two filter effects should be weighted.
A different approach is the Structure SIMilarity index (SSIM), which tries to provide a measure for how perceptually similar two images are. It also tries to summarize differences with a single number, but does so by measuring three different similarity cues, and combining them with empirically determined weights. There’s quite a few resources relating to SSIM on the inventor’s website. On our two images we get:
ans = 0.6437 ans = 0.4244
And we see that the noisy image has a lower value, which I don’t agree with (as I mentioned above). And that is exactly the problem with perceptual measures: not everybody agrees on what is better! In any case, there are many, many variants on the SSIM theme, the nice thing about SSIM is that it is so simple.
What is you take on this? Is there a better way of evaluating a noise-reduction filter?