Image Anal Stereol 2001;20:163-168 Original Research Paper BAYESIAN IMAGE SEGMENTATION THROUGH LEVEL LINES SELECTION Charles Kervrann INRA - Biome´trie, Domaine de Vilvert, 78352 Jouy-en-Josas, France e-mail: ck@jouy.inra.fr (Accepted October 16, 2001) ABSTRACT Bayesian statistical theory is a convenient way of taking a priori information into consideration when inference is made from images. In Bayesian image segmentation, the a priori distribution should capture the knowledge about objects. Taking inspiration from (Alvarez et al., 1999), we design a prior density that penalizes the area of homogeneous parts in images. The segmentation problem is further formulated as the estimation of the set of curves that maximizes the posterior distribution. In this paper, we explore a posterior distribution model for which its maximal mode is given by a subset of level curves, that is the boundaries of image level sets. For the completeness of the paper, we present a stepwise greedy algorithm for computing partitions with connected components. Keywords: area distribution, connected components, energy minimization, image segmentation, level curves. INTRODUCTION Image segmentation and object boundaries estimation are among the most challenging and fundamental addressed problems in image analysis (Mumford and Shah, 1989; Vincent and Soille, 1991; Morel and Solimini, 1994). Segmentation can be achieved by minimizing an energy model designed in conjunction with Bayes’s theorem as shown by Mumford and Shah (1989) and Zhu and Yuille (1996). Indeed, it is straightforward to transfer a Bayesian criterion into an energy minimization criterion (Morel and Solimini, 1994; Zhu and Yuille, 1996). Thereby, the discrete (Geman and Geman, 1984; Blake and Zisserman, 1987) or continuous (Mumford and Shah, 1989; Morel and Solimini, 1994) energy functional is traditionally comprised of two terms: the first term is a fidelity term describing the interaction between the observed data and the model (data model) and the second is a regularity term (prior model). In Bayesian image segmentation, the prior model should capture the knowledge about objects. In particular, the incorporation of prior information about the outline of objects can be applied. There has been a growing interest in this field, particularly along the guidelines of Grenander’s general pattern theory using deformable templates (Grenander and Miller, 1994). In a separate way, Zhu and Yuille (1996) attempted to unify snakes (Kass et al., 1987) and region growing methods within a general energy/Bayes framework. Both approaches estimate the curves that maximally separate unknown statistics inside and outside the curves. Finally, the designed energy functionals are complex, for which it is difficult to specify global minimizers corresponding to “best” segmentations. The maximum a posteriori (MAP) estimate is generally determined by prohibitive stochastic search procedures (Grenander and Miller, 1994) or other variants of steepest ascent algorithms (Zhu and Yuille, 1996). With these tools, additional a priori knowledge may be specified to ease the segmentation task: statistics inside region boundaries are assumed to be known (Grenander and Miller, 1994; Chan and Vese, 1999) or estimated using ad-hoc methods (Paragios and Deriche, 2000). In practical imaging, these methods still suffer from the problem of initialization of curves (Zhu and Yuille, 1996), offline estimation of the mixture model of Gaussians approximating the probability density function of the image (Paragios and Deriche, 2000), or selection of hyperparameters weighting the contribution of energy terms (Zhu and Yuille, 1996; Chan and Vese, 1999; Paragios and Deriche, 2000). In this paper, we address these problems and follow the Bayesian approach for recovering simply connected objects in the plane. The prior model focuses on how the area and number of objects can be varied in images. Unlike other approaches (Grenander and Miller, 1994; Zhu and Yuille, 1996), we shall see that maximizing the posterior distribution is here equivalent to selecting a subset of connected components of image bilevel sets. For the completeness of the paper, we present a stepwise greedy algorithm for computing partitions with connected components. Finally, we illustrate this approach with some experiments on satellite and medical images. 163 Kervrann C: Bayesian image segmentation THE BAYESIAN FRAMEWORK Let S be an open subset (rectangle) of M2 and f a grey-scale image treated as a function defined on S. Below we will work in the continuous setup, where S is a subset of a Euclidian space and f: S —> ffi+ represents the observed data function. We use the terminology “site” or “pixel” to denote a point of the image, even in the continuous case. Each point x G S is assigned a grey value f{x). According to Matheron (1975), we interpret the image fas a family of sets defined by LT(f) = {x e S : f(x) > t}, t G M+. Each level set LT(f) is assumed to be of finite perimeter. Therefore, f will belong to the bounded variation (BV) space as shown by Alvarez et al. (1999). Let m C S) be a set of disjoint and non-empty image domains or objects, and {dD.i} their boundaries. A partition of the image domain S consists in finding a set {mPi1 and a background D. defined as the complementary subset of the union of objects D. = S \ UPi1 £li , £1i r\ij Qj = 0/ and £1i ni D. = 0/. We assume that the observed image f has been produced by the model f = f true + £, where £ is a zero-mean Gaussian white noise: The true image ftrue{x) = £Pi=1fn 1^+ f^1^ is supposed piecewise constant, where fa and f¦=¦ denote respectively the unknown average values off over D.i and D., and 1xeE is the set indicator function of the set E. The variance o2 is assumed to be known and constant over the entire image. So, the likelihood for the data fgiven {Q.1, ¦ ¦ ¦, D.P} is specified by p(f\a1---,aP) - exp-12jx£(f(x)-f i)2dx + J_ 0 admissible, closed and connected objects: ^P = {{£l1...,£lP} C S ; S \U = uPi=1 D.i ; nin1 ij 0 the regularization parameter and A = log (/3). The penalty functional tends to regulate the emergence of objects D.i in the image. The regularization parameter A can be then interpreted as a scale parameter that only tunes the number of regions (Morel and Solimini, 1994; Kervrann et al., 2000). If A = 0, each point is potentially a region and D. = 0/ ; the global minimum coincides with zero and this segmentation is called the “trivial segmentation” (Morel and Solimini, 1994). By using classical arguments on lower semi-continuous functionals on the BV space, we assume here the existence of minimizers of E% (f, Q1, ¦ ¦ ¦, D.P) among functions of sets finite perimeter (or of bounded 164 Image Anal Stereol 2001;20:163-168 variation) (Morel and Solimini, 1994; Zhu and Yuille, 1996). Our MAP estimator is defined by (when it exists) («1,..., QP) = argmin0PT argmin{n ^}6

• 0, we obtain expressions for T2, T3 and T5 (higher order terms are neglected) Ay r IOI k 1 Q„\Q 3 |Q| Jas\a Ja |Q| yJas\a J |0|2 Jas\a \Ja J T5 = ^r-^\\2f ff f-(l f \S\ - \il\ [ Jas\a JS\a \Jas\a J ~Srw\L\n1{S\af) /¦ (6) We define the image moments m0 = Ja 1, m1 = /qf , K0 = S1, K1 = Sf. Using the mean value theorem for a double integral, which states that if f is continuous and a connected subset E is bounded by a simple curve, then for some point x0 in E we have JEf(x)dE = f(x0) ¦ \E\ where \E\ denotes the area of E, it follows that M„ AEa(f,0) i21 {K1-m12 ly M, '0J + K0-m0 m0 fx 0) 0rf(x 0)2 I 1 / 1-K-mn) Jas\a Jas\a m0(K0-m0) (7) Let xb be a fixed point of the border d£l. Choose D.g such that dD.s = dD. except on a small neighborhood of xb. The energy having a minimum for D., f{xb) needs to be solution of the following equation AEA(f,Q) A|fl| [M0+M1 f(xb)] + O(A\£l\) = 0. (8) 165 Kervrann C: Bayesian image segmentation By passing to the limit A|Q| —> 0, we obtain M0 + M1f(xb) = 0. This equation has a unique solution. The coefficients M0 and M1 depend on neither xb nor f{xb), and M0 / 0. The function f is continuous and dQ. is a connected curve. Therefore f(xb) is constant when xb covers d£l. This completes the proof. A STEPWISE GREEDY ALGORITHM FOR IMAGE SEGMENTATION To implement our level set image segmentation based on energy minimization, a four step method is used. The proposed algorithm is not a region growing algorithm (Morel and Solimini, 1994) since all objects are built once and for all. It differs from the watershed approach since regions that emerge from the watershed segmentation are not necessarily connected components within the image level sets (Vincent and Soille, 1991). Let K, A, |ßmn| be the input parameters set by the user. Bilevel set construction. The first step completes a quantization of the function f G [fminfmax] in K = {4,¦¦¦,8} non-equal-sized and non-overlapping intervals [tl_1,tl), l = {1,---,K}. Given this set of intervals estimated using the maximum entropy sum method (Kapur et al., 1985), let bl be the bilevel set image with bl{x) = 1 if f(x) G tl_15tl) and blx) = 0 otherwise. The connected components of bilevel sets can be characterized by their surrounding curves, that is the level lines. If we map these level lines for a given set of K levels, we get a segmentation of the image also called topographic map (Caselles et al., 1999). More generally, one can consider a segmentation achieved using only some connected components of bilevel sets, which is the philosophy of our approach. The most perceptible level lines could be determined by the detection of T-junctions of level lines (Caselles et al, 1999). Instead, we use herein a simpler criterion where perceptually significant level lines are the bilevel sets boundaries of an quantized image by using K quantizers and an entropy method. The entropy method due to Kapur et al. (1985) chooses the thresholds {tl} to be the values at which the information is maximum. Object extraction. A crude way to build pixels sets corresponding to objects is to proceed to a connected components labeling of bilevel set images {bl} and to associate each label with an object £li. Though this process may work in the noise-free case, in general we would also need some smoothing effect of the connected components labeling. So, we consider a size-oriented morphological operator acting on sets that consists in keeping all connected components of the output of area larger than a limit |ßmn|. This connected operator in mathematical morphology will never introduce new features or edges and boundaries of remain connected components are preserved (Salembier and Serra, 1995). The list of connected components then forms the bank T of admissible objects {Q1,...QT} with |m > |nmn|. Configuration determination. The connected components are then combined during the third step to form object configurations. For instance, these configurations can be built by enumeration of all possible object combinations, i.e. 2T configurations. Each configuration is made of a subset of connected components taken in the bank ^T. The background D. corresponds to the complementary set of objects selected for each configuration. Energy computation and object configuration selection. Energy calculations take the image intensities of the original (not quantized) image to establish piecewise-constant approximation errors. Energies of the form {Ja (f(x) — fai2 dx} are computed once and stored _on a RAM memory. The energy term /^(f(x) — f-^)2 dx is efficiently updated for each configuration. For a fixed bank fT = {D.1:---,D.T}, one way to choose the optimal set of of objects {f^, ¦ ¦ ¦, QP}, P < T, is to search for all possible combinations of P objects and compute the corresponding energy E% (f, £l1, ¦ • ¦, D.P). Enumerating all possible sets of objects in the object bank and comparing their energies is computationally too expensive if T is large. Instead of a such brute force search, we propose a stepwise greedy algorithm for minimizing E^ (f, Q1, ¦ • ¦, D.P). We start from P = 0 and introduce one object D. at a time. At the first step, we compute the T energies with one single object D.j at once against the complementary subset D. = S\ uj,i=1£2i. Let Q1 be the estimated object that best lowers E^(f,D.1l---,D.P). This object is stored on a RAM memory as an object of the optimal configuration. It is removed from the initial bank ^T. At any step of the algorithm, a new object is chosen to maximally decrease the energy E^(f1D.1,---,D.P). Suppose that at the .P-th step, P and D. are not known but we have estimated P objects {Q1,---,QP} and a current background D. = S \ {Q15---,QP}. Let Ex(f,D.1,--- 7D.P) be the current computed energy. Then at the (P + 1)-th step, we choose the object Qj G ^T \ {Q1-'-jQP} which has the maximal difference, i.e. QP1 = arg max ^ [Ek{f,a1,---,SiP) Q.jeT\{Q.1,-,Q.P} \ -EA(f,fl1,-,flP,flj)V (9) 166 Image Anal Stereol 2001;20:163-168 The algorithm stops at the P-th step when the addition of any object does not decrease El(f1W 1,---,WP). This means that the optimal number of objects is P = P and the remaining objects of the bank are a part of the estimated background W = S\ {W 1,•¦•,W P}. This algorithm selects a suboptimal configuration of objects corresponding to a local minimum. Using this algorithm, at most (T x (T + 1))/2 object configurations are examined. EXPERIMENTAL RESULTS Experiments were conducted on satellite and medical images. The number of bilevel sets K was set fairly low (K = 4 or K = 8) to obtain large regions and to improve robustness to noise and artifacts in the image. Regions with areas |W i| < [0.0001,0.001] x \S\ are discarded. To estimate A and g, we consider the sets of observations {log(|W i),log(g(|W i)), 1 < i < T}. We perform a linear regression on this set so as to find the straight line (in the log-log coordinates) log(g(|W i)) = A- glog(|W i) closest to the data in the least squares sense (Alvarez et al, 1999). The choice of the hyperparameter l determines mostly Finally, the performance of the segmentation procedure is demonstrated for a (181 x 217) MR image shown in Fig. 2 and a microscopic medical breast (256 x 256) image shown in Fig. 3. In Fig. 2, the dark background has been previously eliminated before processing. Segmentation of both grey and white matter is achieved using the set of parameters K= 8, l = 0.5 x 2552 and |Wmn| = 0.0001 x \S\. Fig. 3 exhibits an inflammatory carcinoma with metastases. Fig. 3b shows the segmentation results when K = 8, |Wmn| = 0.001 x \S\ and l = 0.5 x 2552. the properties of the segmentation result. If f is a function from S to [0,255], a default choice for the hyperparameter is l G [0.1,1.] x 2552. Fig. 1a shows an aerial 256 x 256 image (in the visual spectrum) depicting the region of Saint-Louis during the rising of the Mississippi and Missouri rivers in July 1993. We are interested in extracting the rivers and a background corresponding to textured urban areas. Fig. 1 shows the segmentation results when K = 8, |Wmn| = 0.00025 x 1S1 and l = 0.25 x 2552. In this experiment, the maximum number of significant components is T = 291. The corresponding topographic map is shown in Fig. 1d. The image histogram has been quantized with K = 8 quantizers and an entropic method (Fig. 1b). We estimated the values of parameters A = 3.727 and g= 1.486 by linear regression (Fig. 1c). In that case, the residual sum of squares is 2.007 and 17 < W ii < 2.886 ¦ 104 pixels. It takes about 15 seconds (25 095 < (T x (T + 1))/2 = 42486 iterations) of computing time for building ^T and selecting the best configuration shown in Fig. 1e (P = 105 objects) using the stepwise greedy algorithm (l = 0.25 x 2552). Enumerating all the configurations is infeasible since 2T = 3.98 ¦ 1087 iterations! The non-connected background is labeled in “white” in Fig. 1e and the objects are filled with their mean gray values {fW}. a) MR b) Image c) Boun- d) Image image. segmen- daries. histogram. tation. Fig. 2. Segmentation of a MR image. . JiiLliLlijjk^j__L_ a) Original image. b) Image histogram. c) Area distribution. d) Topographic e) Optimal map. segmentation. Fig. 1. Segmentation of a (256 x 256) satellite image (K=8, |Wmn| = 0.00025 x 1S, l = 0.25 x 2552). 167 Kervrann C: Bayesian image segmentation f J ff^ / J A a) Original b) Image c) Image image. segmentation. histogram. Fig. 3. Segmentation of a microscopic image. CONCLUSION AND PERSPECTIVES In this paper, we have presented a Bayesian approach for extracting structures in images. Although our work is related to morphological approaches based on connected operators (Salembier and Serra, 1995), it is an independent approach since we seek minimizers of a global objective functional. In addition, we proved that our MAP estimator can be determined by selecting a subset of image level lines. A total CPU time of a few seconds on a 296 MHz workstation makes the method attractive for many time-critical applications. In terms of future directions for research, we propose to create a non-linear scale-space by successive applications of an area morphology operator to select most meaningful regions in the image. REFERENCES Alvarez L, Gousseau Y, Morel JM (1999). Scales in natural images and a consequence on their bounded variation. In: Nielsen M, Johansen P, Olsen OF, Weickert J, eds. Scale-Space Theories in Computer Vision. Berlin, Heidelberg: Springer, 247-58. Blake A, Zisserman A (1987). Visual Reconstruction. Cambridge, Mass.: MIT Press. Caselles V, Coll B, Morel JM (1999). Topographic maps and local contrast changes in natural images. Int J Comput Vision 33(1):5-27. Chan T, Vese L (1999). Active contour model without edges. In: Nielsen M, Johansen P, Olsen OF, Weickert J, eds. Scale-Space Theories in Computer Vision. Berlin, Heidelberg: Springer, 141-51. Geman S, Geman D (1984). Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE T Pattern Anal 6(6):721-41. Grenander U, Miller MI (1994). Representations of knowledge in complex systems. J Roy Stat Soc B Met 56(4):549-603. Kapur JN, Sahoo PK, Wong AKC (1985). A new method for gray-level picture thresholding using the entropy of the histogram. Comput Vision Graph 29:273-85. Kass M, Witkin A, Terzopoulos D (1987). Snakes: active contour models. Int J Comput Vision 12(1):321-31. Kervrann C, Hoebeke M, Trubuil A (2000). Level lines as global minimizers of energy functionals in image segmentation. In: Vernon D, ed. Computer Vision -ECCV 2000, 6th European Conference on Computer Vision, Dublin, Ireland, July 26 - July 1, 2000, Proceedings, Part II. Berlin, Heidelberg: Springer, 241-56. Matheron G (1975). Random Sets and Integral Geometry. New York: John Wiley. Møller J, Waagepertersen RP (1998). Markov connected component fields. Adv Appl Probab 30: 1-35. Morel JM, Solimini S (1994). Variational methods in image segmentation. Boston: Birkhauser. Mumford D, Shah J (1989). Optimal approximations by piecewise smooth functions and variational problems. Commun Pur Appl Math 42(5):577-685. Paragios N, Deriche R (2000). Coupled geodesic active regions for image segmentation: a level set approach. In: Vernon D, ed. Computer Vision - ECCV 2000, 6th European Conference on Computer Vision, Dublin, Ireland, July 26 - July 1, 2000, Proceedings, Part II. Berlin, Heidelberg: Springer, 224-40. Salembier P, Serra J (1995). Flat zones filtering, connected operators, and filters by reconstruction. IEEE T Image Process 4(8):1153-60. Vincent L, Soille P (1991). Watershed in digital spaces: an efficient algorithm based on immersion simulations. IEEE T Pattern Anal 13(6):583-98. Zhu SC, Yuille A (1996). Region competition: unifying snakes, region growing, and bayes/mdl for multiband image segmentation. IEEE T Pattern Anal 18(9):884-900. 168