Archive for the ‘analyses’ Category

Union-Find

Wednesday, March 21st, 2018

The Union-Find data structure is well known in the image processing community because of its use in efficient connected component labeling algorithms. It is also an important part of Kruskal’s algorithm for the minimum spanning tree. It is used to keep track of equivalences: are these two objects equivalent/connected/joined? You can think of it as a forest of trees. The nodes in the trees are the objects. If two nodes are in the same tree, they are equivalent. It is called Union-Find because it is optimized for those two operations, Union (joining two trees) and Find (determining if two objects are in the same tree). Both operations are performed in (essentially) constant time (actually it is O(α(n)), where α is the inverse Ackermann function, which grows extremely slowly and is always less than 5 for any number you can write down).

Here I’ll describe the data structure and show how its use can significantly speed up certain types of operations.

Efficient algorithms vs efficient code

Saturday, October 21st, 2017

Since I have spent quite a bit of time porting 25-year old code, I have been confronted with the significant changes to CPU architecture over that time. Code in DIPlib used to be very efficient back then, but some optimizations did not age well at all. I want to show some examples here. It is nice to see that a little bit of effort into optimization can go a long way.

I also want to give a quick example of a highly optimized implementation of an inefficient algorithm, which highlights the difference between algorithms and code.

Proper counting

Tuesday, September 23rd, 2014

I just came across an editorial in the Journal of the American Society of Nephrology (Kirsten M. Madsen, Am Soc Nephrol 10(5):1124-1125, 1999), which states:

A considerable number of manuscripts submitted to the Journal include quantitative morphologic data based on counts and measurements of profiles observed in tissue sections or projected images. Quite often these so-called morphometric analyses are based on assumptions and approximations that cannot be verified and therefore may be incorrect. Moreover, many manuscripts have insufficient descriptions of the sampling procedures and statistical analyses in the Methods section, or it is apparent that inappropriate (biased) sampling techniques were used. Because of the availability today of many new and some old stereologic methods and tools that are not based on undeterminable assumptions about size, shape, or orientation of structures, the Editors of the Journal believe that it is time to dispense with the old, often biased, model-based stereology and change the way we count and measure.

It then goes on to say that the journal would require appropriate stereological methods be employed for quantitative morphologic studies. I have never read a paper in this journal, but certainly hope that they managed to hold on to this standard during the 15 years since this editorial was written. Plenty of journals have not come this far yet.

Automated image-based diagnosis

Saturday, August 24th, 2013

Nowhere is it as difficult to get a fully automatic image analysis system accepted and used in practice as in the clinic. Not only are physicians sceptical of technology that makes them irrelevant, but an automated system has to produce a perfect result, a correct diagnosis for 100% of the cases, to be trusted without supervision. And of course this is impossible to achieve. In fact, even if the system has a better record than an average (or a good) physician, it is unlikely that it is the same cases where the system and the physician are wrong. Therefore, the combination of machine + physician is better than the machine, and thus the machine should not be used without the physician.

What often happens then is that the system is tuned to yield a near 100% sensitivity (to miss only very few positives), and thus has a very low specificity (that is, it marks a lot of negative tests as positive). The system is heavily biased to the positives. The samples marked by the system as negative are almost surely negative, whereas the samples marked as positive (or, rather, suspect) are reviewed by the physician. This is supposed to lighten the workload of the physician. This seems nice and useful, no? What is the problem?

Cryptography or steganography?

Friday, May 8th, 2009

RSA and DES keys keep growing in length, to keep up with increasing computational power. Keys we were using 15 years ago are laughable now. And according to a nice graph in April’s IEEE Spectrum, ridiculous amounts of computing power are cheaper than ever. I wonder how long before people give up on encryption altogether.