Graph cut segmentation
For a while I’ve been interested in adding the popular graph cut segmentation algorithm to DIPlib. But I was not able to find an implementation I could adopt (the most commonly used implementation has an incompatible open-source license). So I finally sat down and re-implemented the algorithm myself from scratch. I learned a lot about the algorithm, hopefully with this blog post you’ll learn something too.
The graph cut segmentation algorithm sees the image as a weighted graph, where each pixel is a node (or vertex), and each pixel is connected to its neighbors with an edge. The weight of the edge is given by the difference in intensity of the two pixels it connects. If we now find a way to split the graph into two, by cutting edges with a high weight (i.e. connecting pixels with a large difference in intensity), then we will have identified a region in the image with a large contrast to its surrounding.
This algorithm requires two markers: one for the background and one for the foreground. The algorithm will find a cut of the graph that separates these two markers: each marker will be in a different region. These markers therefore direct the segmentation. Typically, this is used in a situation where the user can draw a set of markers on the image to give a rough indication of what the object is and what the background is.
How is this optimal cut found? And what really is optimized? First we need to learn about the max-flow problem, the min-cut problem, and their relation.
The min-cut problem
Let’s forget about images for a moment, and think of a network of tubes. Each tube has a specific cross-section, which will determine how much water can flow through it (its capacity). The network has a source, where water flows into the network, and a sink, where water flows out. We can represent this network as a graph: each tube is an edge in the graph, its weight is the tube cross-section. Where the tubes are joined together we have a node in the graph. The source and sink are two nodes like any other. One thing to note: the flow of water going into a node must be equal to the flow going out, except for the source and sink nodes. The flow originating in the source must be equal to the flow exiting at the sink.
There is a maximum amount of water that can flow through this network from source to sink. Finding out how much this is, is the max-flow problem.
When this amount of water is flowing, there must be a set of tubes flowing at maximum capacity (they’re saturated). Any path that you can find from source to sink must go through one of these tubes flowing at maximum capacity, otherwise we’d be able to increase the amount of water flowing (it can flow along the non-saturated path we just found). If we were to cut the set of saturated tubes, we would separate the source from the sink: the graph would be cut into two parts, one containing the source and one containing the sink.
The set of edges (tubes) we cut constitute the minimal cut. That is, the sum of their weights (or the cross-sections of the tubes) is smaller than for any other cut. Any other cut will necessarily cut an edge with available capacity, and therefore yield a larger sum. Finding this cut is the min-cut problem. It is clear, I think, from the explanation above that the min-cut problem is equivalent to the max-flow problem: the maximum amount of water that can flow from source to sink is equal to the minimal possible sum of capacities of tubes we need to cut to separate the source from the sink.
Finding a solution to the min-cut problem
To find the maximum amount of water that can flow through our network of tubes, and which tubes will be saturated, we need to find all possible paths through the network. For each path in turn, we find the minimum capacity, and add that amount of flow to each tube along the path. That is, we find a path and have as much water as possible flow along that path. Do that for all possible paths are we’re done.
Here’s a simple example graph:
One path from source (S) to sink (T) is S–A–B–T. The minimal capacity along this path is 0.1, so we subtract that amount from each of the edge weights along that path:
The numbers attached to each edge now is not the total capacity of the edge, but the residual capacity, how much more water can flow through it.
Another path is S–C–B–T. The residual along this path is 0.6, so we again subtract that amount from each of the edge weights along that path:
Another path is S–C–D–T. Residual to subtract is 0.5:
Now we have a set of edges that are saturated (their residual is 0): A–B, C–B, C–D. This is our minimal cut (1.2 was our total flow). By removing these edges, nodes A and C belong to S, and B and D to T.
But there is another path we haven’t yet explored: S–A–B–C–D–T. We didn’t need to consider this path because at least one of its edges is already saturated. But I needed to point this out because:
First, the B–C edge is being used in reverse in this path, compared to the earlier S–C–B–T path. If we were to add water flow along this path, the flow from C to B would be reduced, or even reversed. Thus, when computing flow, we must also consider direction of flow. We either need to remember in which direction the water flows, or (as is commonly done) we must consider a separate residual in each direction. When one residual is increased, the other is decreased.
Second, the number of paths through a graph is deceptively large. For any practical problem, the number is so large that the trivial algorithm described above is impossible to compute. We need to use a more efficient algorithm. Many such algorithms have been described in the literature. The algorithm commonly used in image processing is due to Boykov and Kolmogorov [1].
An efficient algorithm
Boykov and Kolmogorov [1] described an algorithm with a higher time complexity than that of earlier algorithms. But they show that, in practice, it is faster for the case of graphs derived from images (where each node has few edges).
The algorithm is based on building two trees, one rooted in the source and one in the sink. Each node in the tree points to its parent (so that one can navigate from any node in the tree to the root, but not the other way around). As these trees grow, we’ll eventually find one edge that joins the two trees. From this edge, we can travel backwards to the source and forward to the sink, identifying a complete path from source to sink. Crucially, the trees can later continue growing to identify all the other paths.
When a path is identified, we update residuals along it. One edge will become saturated, and this is not necessarily the edge that joins the two trees. In fact, it is possible for multiple edges to become saturated. The branch that is rooted in this saturated edge is cut off from its tree (we’ll mark its root node as an orphan). Next, we attempt to re-attach the orphan nodes by looking at their neighbors. Any non-saturated edge that connects an orphan node back to the same tree it originally belonged to can be used. If no such edge exists, the node becomes a free node (one that the trees can eventually grow into), and its children are marked orphans. When there are no more orphans, the tree growing phase continues.
The algorithm finishes when all the free nodes have become part of one of the two trees, and no more paths can be found. Note that it is possible for there to be more saturated edges than strictly required to cut the graph in two. And the set of saturated nodes can even yield different graph cuts! We cannot simply look at all the saturated nodes to find our solution. We need to traverse the graph from the source (or from the sink) and find all nodes that we can reach without using a saturated edge. This set of nodes will be one of the graph components, all the other nodes will form the other graph component. The graph cut is the set of edges that join a node in each component.
Application to image processing
The above can be directly applied to an image, from which we construct a graph by turning each pixel into a node, and add edges between each pair of neighboring pixels. In a 2D image, using a 4-connected neighborhood, we’d have 4 edges connecting each node. Edge weights have to be computed in some way from the difference in pixel intensity such that for a large intensity difference we obtain a small weight, and for a small intensity difference we obtain a large weight. The cut will then go between pixels with a larger difference in intensity. We’d mark one pixel as the source and one as the sink, or, in the case of many marker pixels, we’d add a new source node connected to all the source marker pixels with a very high weight, and similarly for the sink.
But Boykov and Jolly [1] suggested a more elaborate scheme, where the source and sink nodes connect to all pixels in the image. The weight from source and sink nodes to each pixel are given by the difference between the pixel’s intensity and the expected foreground or background intensities. They actually use the histogram of the pixel values for the foreground and background markers as the expected intensities. What this accomplishes is striking a balance between just propagating a marker to the nearest edges and just selecting pixels based on colors (as you’d do with a threshold).
In any case, this segmentation approach provides a global optimum (as opposed to a local one in algorithms such as level sets) given the constraints (markers) and the cost function (edge weights). This allows for more easyly tuning markers and edge weights to obtain the desired result, without being dependent on algorithm initialization.
The next release of DIPlib will have a graph cut function.
Literature
- Y. Boykov and V. Kolmogorov, “An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision”, IEEE Transactions on Pattern Analysis and Machine Intelligence 26(9):1124–1137, 2004.
- Y. Boykov M.-P. Jolly, “Interactive Graph Cuts for Optimal Boundary & Region Segmentation of Objects in N-D Images”, Proceedings Eighth IEEE International Conference on Computer Vision (ICCV 2001) 1:105–112, 2001.
Questions or comments on this topic?
Join the discussion on
LinkedIn
or Mastodon