Image Anal Stereol 2001;20:163-168 Original Research Paper
BAYESIAN IMAGE SEGMENTATION THROUGH LEVEL LINES SELECTION
Charles Kervrann
INRA - Biome´trie, Domaine de Vilvert, 78352 Jouy-en-Josas, France e-mail: ck@jouy.inra.fr (Accepted October 16, 2001)
ABSTRACT
Bayesian statistical theory is a convenient way of taking a priori information into consideration when inference is made from images. In Bayesian image segmentation, the a priori distribution should capture the knowledge about objects. Taking inspiration from (Alvarez et al., 1999), we design a prior density that penalizes the area of homogeneous parts in images. The segmentation problem is further formulated as the estimation of the set of curves that maximizes the posterior distribution. In this paper, we explore a posterior distribution model for which its maximal mode is given by a subset of level curves, that is the boundaries of image level sets. For the completeness of the paper, we present a stepwise greedy algorithm for computing partitions with connected components.
Keywords: area distribution, connected components, energy minimization, image segmentation, level curves.
INTRODUCTION
Image segmentation and object boundaries estimation are among the most challenging and fundamental addressed problems in image analysis (Mumford and Shah, 1989; Vincent and Soille, 1991; Morel and Solimini, 1994). Segmentation can be achieved by minimizing an energy model designed in conjunction with Bayes’s theorem as shown by Mumford and Shah (1989) and Zhu and Yuille (1996). Indeed, it is straightforward to transfer a Bayesian criterion into an energy minimization criterion (Morel and Solimini, 1994; Zhu and Yuille, 1996). Thereby, the discrete (Geman and Geman, 1984; Blake and Zisserman, 1987) or continuous (Mumford and Shah, 1989; Morel and Solimini, 1994) energy functional is traditionally comprised of two terms: the first term is a fidelity term describing the interaction between the observed data and the model (data model) and the second is a regularity term (prior model).
In Bayesian image segmentation, the prior model should capture the knowledge about objects. In particular, the incorporation of prior information about the outline of objects can be applied. There has been a growing interest in this field, particularly along the guidelines of Grenander’s general pattern theory using deformable templates (Grenander and Miller, 1994). In a separate way, Zhu and Yuille (1996) attempted to unify snakes (Kass et al., 1987) and region growing methods within a general energy/Bayes framework. Both approaches estimate the curves that maximally separate unknown statistics inside and outside the curves. Finally, the designed energy  functionals  are  complex,  for  which  it  is
difficult to specify global minimizers corresponding to “best” segmentations. The maximum a posteriori (MAP) estimate is generally determined by prohibitive stochastic search procedures (Grenander and Miller, 1994) or other variants of steepest ascent algorithms (Zhu and Yuille, 1996). With these tools, additional a priori knowledge may be specified to ease the segmentation task: statistics inside region boundaries are assumed to be known (Grenander and Miller, 1994; Chan and Vese, 1999) or estimated using ad-hoc methods (Paragios and Deriche, 2000). In practical imaging, these methods still suffer from the problem of initialization of curves (Zhu and Yuille, 1996), offline estimation of the mixture model of Gaussians approximating the probability density function of the image (Paragios and Deriche, 2000), or selection of hyperparameters weighting the contribution of energy terms (Zhu and Yuille, 1996; Chan and Vese, 1999; Paragios and Deriche, 2000).
In this paper, we address these problems and follow the Bayesian approach for recovering simply connected objects in the plane. The prior model focuses on how the area and number of objects can be varied in images. Unlike other approaches (Grenander and Miller, 1994; Zhu and Yuille, 1996), we shall see that maximizing the posterior distribution is here equivalent to selecting a subset of connected components of image bilevel sets. For the completeness of the paper, we present a stepwise greedy algorithm for computing partitions with connected components. Finally, we illustrate this approach with some experiments on satellite and medical images.
163
Kervrann C: Bayesian image segmentation
THE BAYESIAN FRAMEWORK
Let S be an open subset (rectangle) of M2 and f a grey-scale image treated as a function defined on S. Below we will work in the continuous setup, where S is a subset of a Euclidian space and f: S —> ffi+ represents the observed data function. We use the terminology “site” or “pixel” to denote a point of the image, even in the continuous case. Each point x G S is assigned a grey value f{x). According to Matheron (1975), we interpret the image fas a family of sets defined by LT(f) = {x e S : f(x) > t}, t G M+. Each level set LT(f) is assumed to be of finite perimeter. Therefore, f will belong to the bounded variation (BV) space as shown by Alvarez et al. (1999).
Let m C S) be a set of disjoint and non-empty image domains or objects, and {dD.i} their boundaries. A partition of the image domain S consists in finding a set {mPi1 and a background D. defined as the complementary subset of the union of objects D. = S \ UPi1 £li , £1i r\ij Qj = 0/ and £1i ni D. = 0/. We assume that the observed image f has been produced by the model f = f true + £, where £ is a zero-mean Gaussian white noise: The true image ftrue{x) = £Pi=1fn 1^+ f^1^ is
supposed piecewise constant, where fa and f¦=¦ denote respectively the unknown average values off over D.i and D., and 1xeE is the set indicator function of the set E. The variance o2 is assumed to be known and constant over the entire image. So, the likelihood for the data fgiven {Q.1, ¦ ¦ ¦, D.P} is specified by
p(f\a1---,aP) - exp-12jx£(f(x)-f i)2dx
+ J_<if(x)-fu)2dxY   (1)
We seek a partition of the rectangle S into a finite set of objects D.i, each of which corresponds to a part of the image where fis constant. Given the objects {Qi}, the unknown background is explicitly determined as the complementary subset of the union of estimated objects. Therefore, we define the following collection ^p of P > 0 admissible, closed and connected objects: ^P = {{£l1...,£lP} C S ; S \U = uPi=1 D.i ; nin1 ij<Pnj = 0/ }. When P = 0, there is no object in the image. Following the Bayesian approach, we use some functional of the posterior distribution
p(n1-,nP\f) ocpfm,...,^)^^,...,^).
The likelihood p(f \ Q1; ¦ • ¦ ,QP) is given by (1) and n(D.1,---,D.P) is the prior distribution of objects. The posterior distribution is used in a further inferential issue concerning the objects within the Bayesian paradigm. The a priori distribution should
capture the knowledge about {Q15---,QP}. We define a density that penalizes the area |Qi| of objects. Additionally, the variables {|^i|} may be considered as independent random variables with density g(|m). Hence, the prior distribution is of the
form w(fl1,...,flP) = Z-1nPi1g(im)a where Zp is a normalization constant and a a real positive value. The density g(|m) is chosen to be a non-negative monotically decreasing function of the object area |Qi|. For instance, Alvarez et al. (1999) have empirically observed that the area distribution of homogeneous parts in images follows a power law j8|Qi|~r. In what follows, we shall consider this model for the density g(|m). There are other possible choices of g(im): the case of g(\£li\) ~ exp-{ß\Q^} related to the Markov connected components fields, has been already discussed by Kervrann et al. (2000), Alvarez et al. (1999) and Møller and Waagepetersen (1998).
BAYESIAN INFERENCE
All kinds of inference are made from p(Q1- ¦ ¦ ,QP | f). Finding the maximum a posteriori (MAP) estimate is herein our choice of inference. As a consequence, the MAP estimation of objects is equivalent to the minimization of a global energy function E% (f £l1,..., D.P) defined as
E(fA.-.ßP    =  Ep(Q1...,QP) +
+ AEd(f,«1,...,flP), (2)
where
Ep(Q1...:QP) = P^(rlog(Qi\)-A)
i=1
is the penalty functional,
Ed(f,n1...,nP)  = X/ (f(x)-fni2dx +
+ f_(f(x)-fu)2dx
the data model, A = 2a o2 > 0 the regularization parameter and A = log (/3). The penalty functional tends to regulate the emergence of objects D.i in the image. The regularization parameter A can be then interpreted as a scale parameter that only tunes the number of regions (Morel and Solimini, 1994; Kervrann et al., 2000). If A = 0, each point is potentially a region and D. = 0/ ; the global minimum coincides with zero and this segmentation is called the “trivial segmentation” (Morel and Solimini, 1994).
By using classical arguments on lower semi-continuous functionals on the BV space, we assume here the existence of minimizers of E% (f, Q1, ¦ ¦ ¦, D.P) among functions of sets finite perimeter (or of bounded
164
Image Anal Stereol 2001;20:163-168
variation) (Morel and Solimini, 1994; Zhu and Yuille, 1996). Our MAP estimator is defined by (when it exists)
(«1,..., QP)    =   argmin0PT argmin{n     ^}6<P
E(fA.-.ßP),  (3)
where ^P C ^T ,\/P < T, and T is the maximum number of admissible objects registered in a bank   T.
We recall that Q = S \ UPi 1Öi is the complementary subset of estimated objects {Q15...,QP}. A direct minimization with respect to all unknown domains D.i and parameters fa is a very intricate problem, even if T is low since objects are not designated. In what follows (Lemma 1), we prove that the object boundaries that minimize E% (f, £l1,..., D.P) are level lines of the function f, which makes the problem tractable.
LEMMA 1 If there exist minimizers and no pathological minimum exists, then the energy minimizing set of curves is a subset of level lines of f i.e. the border dD.i of each D.i is a boundary of a connected component of a level set off.
Proof of Lemma 1. Let D.s be a variation of a set D., i.e. the Hausdorff distance d00(D.s,D.) < 5 . To prove Lemma 1, we assume that, for any connected perturbation of D. such that d00(D.s,D) < <5, two neighboring sets D. and D.' do not merge into one single set flUfl' and, for any connected perturbation of D. such d00(D.s,D.) < 5, Q. does not split into two new sets. This corresponds to prohibited topological changes. Without loss of generality, we prove Lemma 1 for one object D. and a background D., that is the closure of the complementary set of £X For two sets A and B such that B C A, we denote JA\Bf= Jaf~ Ibf . Then, we have
AIQI
Jc
Q„\Q
1= |Q5|-|Q|
and
Wßj   I      \Ja J        Ja   Jas\a      \Jas\a   I
WedefineAEx(fQ)=Ex(fQs)-Ex(fQ) = T1 +
T2 + T3 + T4 + T5, where
T1    =     I   f- f f,
Jas        Ja
T4    =     f      f-f    f,
JS\as        JS\a
T5    =    ~]sT\Ö;\ [Saf) +S^S\qf]
(5)
By definition we have T1 + T4 = 0. Passing to the limit A|Q| —>• 0, we obtain expressions for T2, T3 and T5 (higher order terms are neglected)
Ay   r IOI k
1
Q„\Q
3              |Q| Jas\a   Ja       |Q| yJas\a J
|0|2 Jas\a   \Ja J
T5  =  ^r-^\\2f    ff f-(l    f
\S\ - \il\ [  Jas\a   JS\a       \Jas\a J
~Srw\L\n1{S\af) /¦
(6)
We define the image moments m0 = Ja 1, m1 =
/qf , K0 =   S1, K1 =   Sf. Using the mean value theorem for a double integral, which states that if f
is continuous and a connected subset E is bounded by
a simple curve, then for some point x0 in E we have
JEf(x)dE = f(x0) ¦ \E\ where \E\ denotes the area of E, it follows that
M„
AEa(f,0)
i21     {K1-m12     ly
M,
'0J
+
K0-m0        m0
fx 0)
0rf(x 0)2 I       1      /       1-K-mn)          Jas\a      Jas\a
m0(K0-m0)
(7)
Let xb be a fixed point of the border d£l. Choose D.g such that dD.s = dD. except on a small neighborhood of xb. The energy having a minimum for D., f{xb) needs to be solution of the following equation
AEA(f,Q)
A|fl|
[M0+M1 f(xb)] + O(A\£l\) = 0.    (8)
165
Kervrann C: Bayesian image segmentation
By passing to the limit A|Q| —> 0, we obtain M0 + M1f(xb) = 0. This equation has a unique solution. The coefficients M0 and M1 depend on neither xb nor f{xb), and M0 / 0. The function f is continuous and dQ. is a connected curve. Therefore f(xb) is constant when xb covers d£l. This completes the proof.
A STEPWISE GREEDY ALGORITHM FOR IMAGE SEGMENTATION
To implement our level set image segmentation based on energy minimization, a four step method is used. The proposed algorithm is not a region growing algorithm (Morel and Solimini, 1994) since all objects are built once and for all. It differs from the watershed approach since regions that emerge from the watershed segmentation are not necessarily connected components within the image level sets (Vincent and Soille, 1991). Let K, A, |ßmn| be the input parameters set by the user.
Bilevel set construction. The first step completes a quantization of the function f G [fminfmax] in K = {4,¦¦¦,8} non-equal-sized and non-overlapping intervals [tl_1,tl), l = {1,---,K}. Given this set of intervals estimated using the maximum entropy sum method (Kapur et al., 1985), let bl be the bilevel set image with bl{x) = 1 if f(x) G tl_15tl) and blx) = 0 otherwise. The connected components of bilevel sets can be characterized by their surrounding curves, that is the level lines. If we map these level lines for a given set of K levels, we get a segmentation of the image also called topographic map (Caselles et al., 1999). More generally, one can consider a segmentation achieved using only some connected components of bilevel sets, which is the philosophy of our approach. The most perceptible level lines could be determined by the detection of T-junctions of level lines (Caselles et al, 1999). Instead, we use herein a simpler criterion where perceptually significant level lines are the bilevel sets boundaries of an quantized image by using K quantizers and an entropy method. The entropy method due to Kapur et al. (1985) chooses the thresholds {tl} to be the values at which the information is maximum.
Object extraction. A crude way to build pixels sets corresponding to objects is to proceed to a connected components labeling of bilevel set images {bl} and to associate each label with an object £li. Though this process may work in the noise-free case, in general we would also need some smoothing effect of the connected components labeling. So, we consider a size-oriented morphological operator acting on sets that consists in keeping all connected components of the output of area larger than a limit |ßmn|. This connected operator in mathematical
morphology will never introduce new features or edges and boundaries of remain connected components are preserved (Salembier and Serra, 1995). The list of connected components then forms the bank T of admissible objects {Q1,...QT} with |m > |nmn|.
Configuration determination. The connected components are then combined during the third step to form object configurations. For instance, these configurations can be built by enumeration of all possible object combinations, i.e. 2T configurations. Each configuration is made of a subset of connected components taken in the bank ^T. The background D. corresponds to the complementary set of objects selected for each configuration.
Energy computation and object configuration selection. Energy calculations take the image intensities of the original (not quantized) image to establish piecewise-constant approximation errors. Energies of the form {Ja (f(x) — fai2 dx} are computed once and stored _on a RAM memory. The energy term /^(f(x) — f-^)2 dx is efficiently updated for each configuration. For a fixed bank fT = {D.1:---,D.T}, one way to choose the optimal set of of objects {f^, ¦ ¦ ¦, QP}, P < T, is to search for all possible combinations of P objects and compute the corresponding energy E% (f, £l1, ¦ • ¦, D.P). Enumerating all possible sets of objects in the object bank and comparing their energies is computationally too expensive if T is large. Instead of a such brute force search, we propose a stepwise greedy algorithm for minimizing E^ (f, Q1, ¦ • ¦, D.P). We start from P = 0 and introduce one object D. at a time. At the first step, we compute the T energies with one single object D.j at once against the complementary subset
D. = S\ uj,i=1£2i. Let Q1 be the estimated object that best lowers E^(f,D.1l---,D.P). This object is stored on a RAM memory as an object of the optimal configuration. It is removed from the initial bank ^T. At any step of the algorithm, a new object is chosen to maximally decrease the energy E^(f1D.1,---,D.P).
Suppose that at the .P-th step, P and D. are not known but we have estimated P objects {Q1,---,QP} and a current background D. = S \ {Q15---,QP}. Let Ex(f,D.1,--- 7D.P) be the current computed energy. Then at the (P + 1)-th step, we choose the object Qj G ^T \ {Q1-'-jQP} which has the maximal difference, i.e.
QP1   =   arg        max   ^    [Ek{f,a1,---,SiP)
Q.jeT\{Q.1,-,Q.P} \ -EA(f,fl1,-,flP,flj)V       (9)
166
Image Anal Stereol 2001;20:163-168
The algorithm stops at the P-th step when the addition of any object does not decrease El(f1W 1,---,WP). This means that the optimal number of objects is P = P and the remaining objects of the bank are a part of the estimated background W = S\ {W 1,•¦•,W P}. This algorithm selects a suboptimal configuration of objects corresponding to a local minimum. Using this algorithm, at most (T x (T + 1))/2 object configurations are examined.
EXPERIMENTAL RESULTS
Experiments were conducted on satellite and medical images. The number of bilevel sets K was set fairly low (K = 4 or K = 8) to obtain large regions and to improve robustness to noise and artifacts in the image. Regions with areas |W i| < [0.0001,0.001] x \S\ are discarded. To estimate A and g, we consider the sets of observations {log(|W i),log(g(|W i)), 1 < i < T}. We perform a linear regression on this set so as to find the straight line (in the log-log coordinates) log(g(|W i)) = A- glog(|W i) closest to the data in the least squares sense (Alvarez et al, 1999). The choice of the hyperparameter l determines mostly
Finally, the performance of the segmentation procedure is demonstrated for a (181 x 217) MR image shown in Fig. 2 and a microscopic medical breast (256 x 256) image shown in Fig. 3. In Fig. 2, the dark background has been previously eliminated before processing. Segmentation of both grey and white matter is achieved using the set of parameters K= 8, l = 0.5 x 2552 and |Wmn| = 0.0001 x \S\. Fig. 3 exhibits an inflammatory carcinoma with metastases. Fig. 3b shows the segmentation results when K = 8, |Wmn| = 0.001 x \S\ and l = 0.5 x 2552.
the properties of the segmentation result. If f is a function from S to [0,255], a default choice for the hyperparameter is l G [0.1,1.] x 2552. Fig. 1a shows an aerial 256 x 256 image (in the visual spectrum) depicting the region of Saint-Louis during the rising of the Mississippi and Missouri rivers in July 1993. We are interested in extracting the rivers and a background corresponding to textured urban areas. Fig. 1 shows the segmentation results when K = 8, |Wmn| = 0.00025 x 1S1 and l = 0.25 x 2552. In this experiment, the maximum number of significant components is T = 291. The corresponding topographic map is shown in Fig. 1d. The image histogram has been quantized with K = 8 quantizers and an entropic method (Fig. 1b). We estimated the values of parameters A = 3.727 and g= 1.486 by linear regression (Fig. 1c). In that case, the residual sum of squares is 2.007 and 17 < W ii < 2.886 ¦ 104 pixels. It takes about 15 seconds (25 095 < (T x (T + 1))/2 = 42486 iterations) of computing time for building ^T and selecting the best configuration shown in Fig. 1e (P = 105 objects) using the stepwise greedy algorithm (l = 0.25 x 2552). Enumerating all the configurations is infeasible since 2T = 3.98 ¦ 1087 iterations! The non-connected background is labeled in “white” in Fig. 1e and the objects are filled with their mean gray values {fW}.
a) MR             b) Image         c) Boun-         d) Image
image.             segmen-          daries.             histogram.
tation.
Fig. 2. Segmentation of a MR image.
. JiiLliLlijjk^j__L_
a) Original image.            b) Image histogram.            c) Area distribution.           d) Topographic               e) Optimal
map.                              segmentation.
Fig. 1. Segmentation of a (256 x 256) satellite image (K=8, |Wmn| = 0.00025 x 1S, l = 0.25 x 2552).
167
Kervrann C: Bayesian image segmentation
		f	J	ff^	/ J	A
a) Original          b) Image            c) Image
image.                segmentation.     histogram.
Fig. 3. Segmentation of a microscopic image.
CONCLUSION AND PERSPECTIVES
In this paper, we have presented a Bayesian approach for extracting structures in images. Although our work is related to morphological approaches based on connected operators (Salembier and Serra, 1995), it is an independent approach since we seek minimizers of a global objective functional. In addition, we proved that our MAP estimator can be determined by selecting a subset of image level lines. A total CPU time of a few seconds on a 296 MHz workstation makes the method attractive for many time-critical applications. In terms of future directions for research, we propose to create a non-linear scale-space by successive applications of an area morphology operator to select most meaningful regions in the image.
REFERENCES
Alvarez L, Gousseau Y, Morel JM (1999). Scales in natural images and a consequence on their bounded variation. In: Nielsen M, Johansen P, Olsen OF, Weickert J, eds. Scale-Space Theories in Computer Vision. Berlin, Heidelberg: Springer, 247-58.
Blake A, Zisserman A (1987). Visual Reconstruction. Cambridge, Mass.: MIT Press.
Caselles V, Coll B, Morel JM (1999). Topographic maps and local contrast changes in natural images. Int J Comput Vision 33(1):5-27.
Chan T, Vese L (1999). Active contour model without edges. In: Nielsen M, Johansen P, Olsen OF, Weickert J, eds. Scale-Space Theories in Computer Vision. Berlin, Heidelberg: Springer, 141-51.
Geman S, Geman D (1984). Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE T Pattern Anal 6(6):721-41.
Grenander U, Miller MI (1994). Representations of knowledge in complex systems. J Roy Stat Soc B Met 56(4):549-603.
Kapur JN, Sahoo PK, Wong AKC (1985). A new method for gray-level picture thresholding using the entropy of the histogram. Comput Vision Graph 29:273-85.
Kass M, Witkin A, Terzopoulos D (1987). Snakes: active contour models. Int J Comput Vision 12(1):321-31.
Kervrann C, Hoebeke M, Trubuil A (2000). Level lines as global minimizers of energy functionals in image segmentation. In: Vernon D, ed. Computer Vision -ECCV 2000, 6th European Conference on Computer Vision, Dublin, Ireland, July 26 - July 1, 2000, Proceedings, Part II. Berlin, Heidelberg: Springer, 241-56.
Matheron G (1975). Random Sets and Integral Geometry. New York: John Wiley.
Møller J, Waagepertersen RP (1998). Markov connected component fields. Adv Appl Probab 30: 1-35.
Morel JM, Solimini S (1994). Variational methods in image segmentation. Boston: Birkhauser.
Mumford D, Shah J (1989). Optimal approximations by piecewise smooth functions and variational problems. Commun Pur Appl Math 42(5):577-685.
Paragios N, Deriche R (2000). Coupled geodesic active regions for image segmentation: a level set approach. In: Vernon D, ed. Computer Vision - ECCV 2000, 6th European Conference on Computer Vision, Dublin, Ireland, July 26 - July 1, 2000, Proceedings, Part II. Berlin, Heidelberg: Springer, 224-40.
Salembier P, Serra J (1995). Flat zones filtering, connected operators, and filters by reconstruction. IEEE T Image Process 4(8):1153-60.
Vincent L, Soille P (1991). Watershed in digital spaces: an efficient algorithm based on immersion simulations. IEEE T Pattern Anal 13(6):583-98.
Zhu SC, Yuille A (1996). Region competition: unifying snakes, region growing, and bayes/mdl for multiband image segmentation. IEEE T Pattern Anal 18(9):884-900.


168