Image Anal Stereol 2014;33:83-94 doi:10.5566/ias.v33.p83-94 
Review Article 
NUMERICAL EXPERIMENTS FOR THE ESTIMATION OF MEAN DENSITIES OF RANDOM SETS 
FEDERICO CAMERLENGHI 1, VINCENZO CAPASSO 2,3 AND ELENA VILLAC,2 
1Dept. of Mathematics, Universit`

a degli Studi di Pavia, via Ferrata 1, 27100 Pavia, Italy; 2Dept. of Mathematics, Universit`
a degli Studi di Milano, via Saldini 50, 20133 Milano, Italy ; 3Escuela Politecnica Superior, 
Universidad Carlos III de Madrid, Av. de la Universidad 30 28911 Leganes, Spain 
e-mail: federico.camerlenghi01@ateneopv.it, vincenzo.capasso@unimi.it, elena.villa@unimi.it 
(Received November 13, 2013; revised February 25, 2014; accepted April 24, 2014) 
ABSTRACT 

Many real phenomena may be modelled as random closed sets in Rd , of different Hausdorff dimensions. The problem of the estimation of pointwise mean densities of absolutely continuous, and spatially inhomogeneous, random sets with Hausdorff dimension n < d, has been the subject of extended mathematical analysis by the authors. In particular, two different kinds of estimators have been recently proposed, the .rst one is based on the notion of Minkowski content, the second one is a kernel-type estimator generalizing the well-known kernel density estimator for random variables. The speci.c aim of the present paper is to validate the theoretical results on statistical properties of those estimators by numerical experiments. We provide a set of simulations which illustrates their valuable properties via typical examples of lower dimensional random sets. 
Keywords: density estimator, Hausdorff dimension, Hausdorff measure, kernel estimate, Minkowski content, random closed set, stochastic geometry. 
INTRODUCTION 
Given an Euclidean space Rd , the problem of the evaluation and estimation of the mean density of lower dimensional random closed sets (i.e., with Hausdorff dimension less than d), such as .ber processes, boundaries of germ-grain models, n-facets of random tessellations, and surfaces of full dimensional random sets, has been of great interest in many different scienti.c and technological .elds over the last decades (see Camerlenghi et al., 2014 and references therein). Recently, in Villa (2014), and Camerlenghi et al. (2014), two different kinds of estimators have been proposed by the authors, the .rst one is based on the notion of Minkowski content, the second one is a kernel-type estimator generalizing the well-known kernel density estimator for random variables. 
The speci.c aim of the present paper is to validate the theoretical results on statistical properties of those estimators by numerical experiments. We provide a set of simulations which illustrates their valuable properties via typical examples of lower dimensional random sets. To complete the picture, we have included an additional estimator that naturally derives from the Besicovitch derivation theorem (Ambrosio et al., 2000). 
The required background regarding the global and local approximation of mean densities of random closed sets has been presented in a series of papers by Capasso and Villa (Capasso and Villa, 2006; 2007; 2008; Ambrosio et al., 2009; Villa, 2014); we will report here the basic de.nitions, while for a detailed mathematical analysis of the proposed estimators we refer to the paper by Camerlenghi et al. (2014). 
In Section Basics and notation we introduce basic notations and de.nitions; in Section Estimation of ..n (x) and Section Optimal bandwidth we recall our proposed estimators as mentioned above, together with their statistical properties; in Section Particular cases we discuss the problem of the optimal bandwidth for the three cases and provide a set of simulations in several cases of interest; .nally, in the concluding remarks we offer some comments based on the compared results of the numerical simulations. 


BASICS AND NOTATION 
We remind that, given a probability space (.,F,P),a random closed set . in Rd is a measurable map . : (.,F) › (F,.F), where F denotes the class of the closed subsets in Rd , and .F is the .-algebra generated by the so called Fell topology, or hit-or­miss topology (Matheron, 1975). We say that a random closed set . : (.,F) › (F,.F) satis.es a certain property (e.g., . has Hausdorff dimension n) if . satis.es that property P-a.s. 
Here H n is the n-dimensional Hausdorff measure, dx stands for H d(dx), and BX is the Borel .-algebra of any topological space X . Br(x) and bn will denote the closed ball with centre x and radius r > 0 and the volume of the unit ball in Rn, respectively. For any function f , disc f will denote the set of its discontinuity points. We remind that a compact set A . Rd is called countably H n-recti.able if there exist countably many n-dimensional Lipschitz graphs .i . Rd such that A \.i.i is H n-negligible. Throughout the paper we shall deal with countably H n-recti.able random closed sets .n. For de.nitions and basic properties of Hausdorff measure and recti.able sets (Federer, 1969; Falconer, 1985; Ambrosio et al., 2000). 

We brie.y recall here that, by means of marked point processes in Rd with marks in the class of compact subset of Rd , every random closed set in Rd can be represented as a germ-grain model (Hug et al., 2002; Baddeley et al., 2007, p. 192 and references therein). Therefore, throughout the paper we shall consider random sets . described by marked point processes . = {(.i,Si)}i.N in Rd with marks in a suitable mark space K so that Zi = Z(Si), i . N is a random set containing the origin (i.e., Z : K › F): 
.(.)= 
 xi + Z(si) , . . . . (1) (xi,si)..(.) 

We assume that . has intensity measure .(d(x,s)) = f (x,s)dxQ(ds) and second factorial moment measure .[2](d(x,s,y,t)) = g(x,s,y,t) dxdyQ[2](d(s,t)) (Karr, 1986; Daley et al., 1988; Stoyan et al., 1995, for general theory of point processes). 
For the reader’s convenience we have put in the Appendix basic notation and assumptions on ., which will appear throughout the paper (see also Villa, 2010; 2014; Camerlenghi et al., 2014 for further details). 
Given a RACS .n of integer Hausdorff dimension n . d, whenever E[H n(.n .·)] is absolutely continuous with respect to the measure H d on Rd , its density (i.e., its Radon-Nikodym derivative) with respect to H d has been called mean density of .n, and it is denoted by ..n (for an exhaustive discussion about the existence of ..n , see Capasso and Villa, 2007; 2008). In the representation via point processes, as in Section Basics and notation, we may write (see Villa, 2014) 
11
..n (x)= f (y,s)H n(dy)Q(ds), (2)
K x-Z(s) 

where -Z(s) is the re.ection of Z(s) at the origin. It is easy to see that if n = 0 and .0 = X is a random vector with pdf fX , then .X (x)= fX (x). 

ESTIMATION OF ..n (x) 

In the sequel we will assume that an i.i.d. random sample .1 n,...,.N is available for the random closed 
n set .n, with mean density ..n . 
We list here three different kinds of estimators for ..n (x). (See also Camerlenghi et al., 2014). 
A natural estimator 

By the Besicovitch derivation theorem (Ambrosio et al., 2000, Theorem 2.22), we know that 
E[H n(.n . Br(x))] 

..n (x)= lim H d-a.e. x . Rd ; 
r¡0 bdrd 

such approximation suggests the following natural estimator for the mean density ..n (x) of .n, at a point x . Rd ,
N 
..,N 1 
(x) := H n(.i . BrN (x)) . (3)
 .nd . n
Nbdr
Ni=1 

Here and later the scaling parameter rN will be called the bandwidth associated with the sample size N, as usual in literature. 
Kernel estimator 

Kernel density estimation was .rstly introduced by Parzen (1962) and Rosenblatt (1956) (see also Wertz, 1978, Deheuvels and Mason, 2004 for a survey of additional foundational papers). 
We recall here that a measurable function k : Rd › R is said to be a multivariate kernel if it satis.es the following conditions: 
1. 
0 . k(z) . M for all z . Rd , for some M > 0; 

2. 
k is radially symmetric; 

3. 
IRdk(z)dz = 1. 


As a natural extension of the kernel density estimation theory for random vectors, the following kernel estimator for the mean density of .n has been introduced in Camerlenghi et al. (2014)
N
1
 ...n ,N (x) := .krN * HLn .i (x)
Nn
i=1 N
1 
= .1 k(x - y)H n(dy) , (4)
Nrd .i rN
Ni=1 n 

where * stands for the usual convolution product. 
Image Anal Stereol 2014;33:83-94 
Remark 1 By choosing the kernel function 
1
k(z)= 1B1(0)(z) ,bd 
it is easy to see that we may reobtain the natural density estimator as a particular case of the kernel estimator, i.e., ..,N (x)= ..,N (x).
.n .n 
“Minkowski content”-based estimator 
Within the mathematical framework provided in Ambrosio et al. (2009) and in Villa (2014, Theorem 7), based on a stochastic version of the Minkowski content notion, it is proved that if .n satis.es (A1), (A2) and (A3), given in the Appendix, then 
P(x . .n.r )
..n (x)= lim , H d-a.e. x . Rd . (5) 
r¡0 bd-nrd-n 
As a natural byproduct, the following “Minkowski content”-based estimator of ..n (x) has been proposed in Villa (2014): 
.Ni=1 1.i = 0/
.µ,Nn.BrN (x)
(x) := . (6)
.nd-n
Nbd-nrN 
The statistical properties of the “Minkowski content”-based estimator ..µn ,N (x), and of the kernel 
estimator ..,N (x) (and so of ..,N (x) as well, by 
.n .n
Remark 1), can be summarized as follows (we refer to Camerlenghi et al., 2014 for a) and b), and to Villa, 2014, Corollary 13 for c)) : 
Theorem 2 Assume that .n satis.es assumptions (A1) and (A2), that rN › 0, as N › ., and that k is a kernel with compact support. Then, for almost every x . Rd, 
..,N
a) (x) is asymptotically unbiased, 
.n 
..,N
b) (x) is weakly consistent if (A1) and (A3), and 
.n 
lim Nrd-n = . (7)
NN›. 
hold, 
.µ,N
c) (x) is asymptotically unbiased and weakly 
.n
consistent if (A3) and Eq. 7 hold. 


OPTIMAL BANDWIDTH 
A problem of statistical interest is to .nd an optimal bandwidth rN . By proceeding along the same lines as what is commonly done for the kernel density estimator fXN (x) of the pdf fX (x) of a random variable X (where rN is de.ned as the quantity which minimizes the asymptotic mean square error (AMSE) of fXN (x)), in Camerlenghi et al. (2014) optimal bandwidths for the proposed estimators have been provided. In order to do this, asymptotic approximations of bias and variance are needed. 
OPTIMAL BANDWIDTH FOR ...n ,N 
The next theorem provides asymptotic approximations for the bias and the variance of the kernel estimator. 
Theorem 3 (Camerlenghi et al., 2014) In addition to the hypotheses in b ) of Theorem 2, we assume that the kernel k is continuous and (A2bis) holds for |.| = 2. Then, for H d-a.e. x . Rd, 
22
Bias(...n ,N (x)) = CBias(x)rN + o(rN ) 
CVar(x) 1
Var(...n ,N (x)) = + o(),
NrNd-n NrNd-n 
with 
1 1
CBias(x) := . k(z)z. dz
.! Rd 
|.|=2 
11 
· D. f (y,s)H n(dy)Q(ds) ,
yK x-Z(s) 
and 
111 1
CVar(x) := 
.x,s
K Rdx-Z(s) y 
k(z)k(z + w) f (y,s)H n(dw)H n(dy)dzQ(ds) , 
where .yx,s . Gn is the approximate tangent space to x - Z(s) at y . x - Z(s). 
For the notion of approximate tangent space to a H n-recti.able compact set A of Rd at a point x . A we refer to Ambrosio et al. (2000, De.nition 2.79). 
By remembering that 
MSE(..,N (x)) = [Bias(..,N (x))]2 +Var(..,N (x)) ,
.n .n .n 
it follows that the asymptotic approximation of the mean square error of ...n ,N (x) is given by 
4
AMSE(..,N (x)) = C21 
.n Bias(x)rN + CVar(x) ,
NrNd-n 

so that, for any suf.ciently large sample size N, 
o,AMSE 
rN (x) := argmin AMSE(rN ) 
rN 
 (d - n)CVar(x)
4+d-n
= , H d-a.e. x . Rd . (8)
4NC2 
Bias(x) 

We also remind that a criterion for obtaining a uniform choice of the optimal bandwidth is based on the integrated mean square error (MISE) (Hardle, 1991), so de.ned: 
1
MISE[..,N (W )] := MSE[..,N (x)]dx ,
.n .n
W 

for any compact W . Rd . Under the same assumptions, by the asymptotic approximation of the 
integrated mean square error (AMISE) of ...n ,N (W ), as N › +., for any compact W . Rd and any given suf.ciently large sample size N, we obtain 
r o,AMISE := argmin AMISE(..,N (W ))
N,W .n rN 
 (d - n)IW CVar(x)dx
4+d-n
= . (9)
4N IWC2 
Bias(x) dx 

We observe that the cases in which CBias(x)= 0, might complicate the identi.cation of an optimal bandwidth (we refer to Camerlenghi et al. (2014), and to Schucany (1989) for a more detailed discussion). A case of particular interest where CBias(x)= 0, is given by assuming that .n is stationary (see Section The case of stationary .n). 
OPTIMAL BANDWIDTH FOR ...n ,N 
..,N
By Remark 1 we know that ..,N (x)= (x)
.n .n 
1

with the kernel k(z)= 1B1(0)(z). The hypothesis 
bd 

of continuity of k in Theorem 3 can be weakened (Camerlenghi et al., 2014), provided that 
HLn .x,s (disc(k(z + ·)) = 0 , (10)
y 

for any s . K, z . supp(k), and H n-a.e. y . x - Z(s). Such a condition is trivially ful.lled in several cases of interest in applications. Therefore the general formulas for the pointwise and global optimal bandwidth rN given in Eq. 8 and Eq. 9, respectively, apply for 
..,N 
(x) too, provided that HLn .x,s (disc( 1 1B1(0)(z +

.n ybd 
·)) = 0, for any s . K, z . B1(0), and H n-a.e. y . x - Z(s). 
As an example we consider an inhomogeneous Boolean model of segments of the type [0,l] ×{0}in R2, with random length l ~ U(0,L) (we have chosen L = 0.2 for numerical studies) in the compact window W =[0,1]2, where the underlying Poisson point process has intensity f (x1,x2)= cx12. From Eq. 2 it follows that 
11 1
..1 (x1,x2)= L3c - L2cx1 + Lcx21 ,
1232

and by Eq. 8 
 
 L3 - 11  
.5256 1 L2x1 + Lx2 
123 21o,AMSE  
r (x1,x2)= .
N 3NcL2.2 

Fig. 1 shows, for c = 700, the natural estimator for this type of random closed set as a function of the bandwidth, expressed in pixel (1 pixel = 0.0029). To carry out the numerical experiments, we have studied the estimator at a .xed point (0.5, 0.5) of the compact window W =[0,1]2 . 
For N = 10, Fig. 1a shows that the choice r = o,AMSE 
r provides a good estimation of the theoretical 
N 
mean density; in fact, for this value of the bandwidth, |..,N (0.5,0.5) - ..1 (0.5,0.5)| = 0.2973. For N = 
.1 

100, Fig. 1b shows that the theoretical optimal value of r is still one of the best choices for the estimation of the mean density. One may notice that the estimation improves with respect to the case N = 10; in fact for 
o,AMSE 

r = r we have |..,N (0.5,0.5) -..1 (0.5,0.5)| = 
100 .1 

0.0614. We conclude that the optimal bandwidth is one of the best choices for the estimation of the mean density, and as N › . the estimation improves. Finally we may observe that the natural estimator has good stability properties with respect to the choice of the bandwidth. 
OPTIMAL BANDWIDTH FOR ..µn ,N 
It is not dif.cult to see that 
P(x . .n.rN )Bias(..µn ,N (x)) = - ..n (x) , (11)
bd-nrd-n 
..n (x) 1
()
Var(.µ,N (x))= + o. (12).n Nrd-nbd-n Nrd-n 
NN 

Therefore, it would be necessary to provide a Taylor series expansion of Bias(..µn ,N (x)), or equivalently of P(x . .n.rN ). 
Image Anal Stereol 2014;33:83-94 


..,N 
(0.5,0.5)= 13.5973. In (b) N = 100; for
.1 
o,AMSE 

r100 . 49pixel(0.1425) ...1 ,N (0.5,0.5)= 13.2386. 
A particular class of germ-grain models .n for which an explicit expression of P(x . .n.rN ) is available, is the class of Boolean models; in that case we get: 
P(x . .n.rN )= 
11
{- }
1 - expf (y, s)dyQ(ds). (13)K x-Z(s).rN 
.µ,N
For numerical experiments of consider the
.n
Boolean model of segments analyzed in Section 
Optimal bandwidth for ...n ,N . 
Since a general formula for the optimal bandwidth is not yet available in the literature, we will minimize the asymptotic approximation of the mean square error directly in this particular example. To this aim, a standard calculation of the integral in Eq. 13 leads to: 
P(x . .2.rN )= 1 - exp{- cLrN (1L2 - 2Lx1
6311
2 (42 ))} ;
+ x1 - L.rN + rN + rN .x1(14)
623
then, by a Taylor series expansion of the exponential term in Eq. 14, we get that Bias(..µ2 ,N (x)) = CBrN + o(rN ), where 
CB(x1,x2) := - 1 L2.c + 1.x1cL
12412L2(12 2)2 
- cL2 - Lx1 + x,
1
463
2
and so AMSE(..µ2 ,N (x)) = CB2rN + ..1 (x1,x2)/2NrN ; thus it follows 
o,AMSE 3 ..1 (x1,x2)
r (x1,x2)= .
N 
4NCB2 (x1,x2) 
Fig. 2 shows the estimator ..µn ,N at point (0.5,0.5), as a function of the bandwidth r (in pixel) for two different sample sizes (N = 10, and N = 100). In Fig. 2a, where the sample size N is equal to 10, we can observe that, near the optimal bandwidth, ..µ1 ,N provides a very good estimation of the mean density; in fact for the optimal value of r we have |.µ,N (0.5,0.5) - ..1 (0.5,0.5)| = 
.1 

1.8556. In Fig. 2b, where N = 100, the estimation improves; indeed |.µ,N (0.5,0.5) - ..1 (0.5,0.5)| = 
.1 
0.70, for r equal to the asymptotic optimal bandwidth. We can conclude that the optimal bandwidth leads to a good estimation of the mean density, which improves as N increases. 
Finally observe that as r › +. the estimator decreases as the function 1 (Fig. 3), in accordance 
2r with the de.nition of the estimator. 
We wish to mention that general results for the optimal bandwidth for inhomogeneous (and not-necessarily Boolean) models of random closed sets are 

not yet available in literature. 



(b) 


Fig. 2. Comparison of the “Minkowski content”-based estimator and the theoretical value (..1 (0.5,0.5)= 13.3) at point (0.5,0.5) for an inhomogeneous Boolean model of segments with intensity f (x1,x2)= 
2

700x1. In (a) N = 10; for ro,AMSE . 9pixel(0.0271) 
r . 4pixel(0.0126) .µ,N (0.5,0.5)= 14. 
10  
. µ,N .1 (0.5,0.5)  =  15.1556.  In  (b)  N  =  100;  for  
o,AMSE  

100 .1 


PARTICULAR CASES 

As a con.rmation of the validity of our results, in this section we wish to present particular cases (for n = 0, and for stationary .n) which have already been treated in literature. 

Fig. 3. The “Minkowski content”-based estimator (for N = 10) at point (0.5,0.5) for an inhomogeneous Boolean model of segments with intensity f (x1,x2)= 
2

700x1, compared with the function 1 (in green). 
2r 
RANDOM VARIABLES AND POINT 
PROCESSES (n=0) 

Let .0 . X be a continuous random variable with pdf fX (equivalently, with mean density .X = fX ). In order to apply the above results, X may be considered as the trivial germ-grain process driven by the marked point process . = {(X,s)} in Rwith mark space K = Rd , consisting of one point (X) only, with grain Z(s) := s, and intensity measure .(d(y, s)) = f (y)dy.0(s)ds, being .0 the usual Dirac delta function 
in 0. In this case the kernel estimator ..,N (x) de.ned
X 

in Eq. 4 reduces, as expected, to usual kernel density estimator for random variable well known in classical literature. Known results on the optimal bandwidth (Parzen, 1962; Silverman, 1986; Hardle, 1991, p. 59) follows now as particular case by Eq. 8 and Eq. 9. For a more detaild discussion, see Camerlenghi et al. (2014, Section 3.3.1). With regard to the natural estimator 
..,N 
(x) and the “Minkowski content”-based estimator 

X .µ,N 
(x), we notice that both estimators reduce, in this 

X 
case, to the usual histogram density estimator, also known in literature as naive estimator, 
N
1
..,N 
X (x)= .X µ,N (x)= .1[x-rN ,x+rN ](Xi) ,N2rN i=1 

where X1,...,XN is an i.i.d. random sample for X. 
As a more signi.cant example of a random set .0 with dimension n = 0, let us consider a point process 
Image Anal Stereol 2014;33:83-94 
. in Rd with intensity function f.. In Camerlenghi et al. (2014) the following statement has been proven: 
Proposition 4 Let {.i}i.N be a sequence of point processes in Rd , i.i.d. as ., with intensity function .. . C2, and locally bounded second moment density g, and let k be a kernel with compact support, continuous at 0. Then the kernel density estimator 
N 
..,Nk(x - xj )
(x)= 1 .. (15)
. Nrd rN
Ni=1 xj..i 
of ..(x) is asymptotically unbiased and weakly consistent for H d-a.e. x . Rd if rN is such that 
lim rN = 0 and lim Nrd = . .
NN›. N›. 
Moreover, the pointwise optimal bandwidth ro,AMSE(x)
N is given, for H d-a.e. x . Rd, by 
o,AMSE 
r (x)= 
N 
1
d..(x)k2(z)dz 
Rd 
D.. dz
4+d 4N( . 1 ..(x) 1 k(z)z)2 , 
x
.!Rd 
|.|=2 
while the global optimal bandwidth ro,AMISE is given, 
N,W for any compact window W . Rd, by 
o,AMISE 
r = 
N,W 
1
dE[.(W )]k2(z)dz 
Rd 
4+d . 
4N 1( . 1 D...(x)1 k(z)z. dz)2dx W .!y Rd 
|.|=2 
1
Note that by choosing k(z) := 1B1(0)(z) in Eq. 15, 
bd 
with N = 1, we reobtain the well known classic and widely used Berman-Diggle estimator (Diggle, 1985; Berman and Diggle, 1989; van Lieshout, 2012) 
.(Br(x)) 
..,N 
(x)= .
. bdrd 
In order to show numerical results, let . be a 
Poisson point process in R2 with intensity function 22
..(x1,x2)= x1 + x2. We want to estimate the intensity function of . in the compact window W :=[-2,2]2, by means of ..,N (x) and ..,N (x), for a sample size 
.. 

N = 1000. 
Fig. 4a shows the .rst estimator, where we have adopted the kernel of Epanechnikov:  2 22 
. (1 - x1 - x2) , if (x1,x2) . B1(0)k(t)=
0 , otherwise 
and the optimal bandwidth at each point of estimation, that is 
6(x21 + x2 
o,AMSE 62)
r (x1,x2)= ;
N 
N. 
on the other hand, by Proposition 4, it is easy to see that the uniform optimal bandwidth at all points in W is 
 64
o,AMISE 6
r = 
N,WN. . 
We would like to compare the kernel estimation 
at (1.8,1.8), obtained by employing the optimal bandwidth at this point (...,,No (1.8,1.8)), and the corresponding estimation generated by using the uniform optimal bandwidth in W (...,,Nu (1.8,1.8)). At point (1.8,1.8) the theoretical value of the intensity function is ..(1.8,1.8)= 6.48, the optimal 
o,AMSE
bandwidth is r1000 = 0.4809 and the corresponding ..,N
kernel estimation is (1.8,1.8)= 6.4636
.,o (|..,N (1.8,1.8) - .(1.8,1.8)| = 0.0164); instead the 
.,o 
o,AMISE

uniform optimal bandwidth is r = 0.5226,
1000,W and the corresponding estimation ...,,Nu (1.8,1.8)= 
6.4350 (|..,N (1.8,1.8) - .(1.8,1.8)| = 0.045). Both 
.,u 
estimations are accurate, but the .rst one is better, since it employs the optimal bandwidth at the .xed point (1.8,1.8). 
Fig. 4b shows the natural estimator in the compact window W , generated by employing the theoretical optimal bandwidth at each point in the compact window: 
o,AMSE 62(x12 + x22)
r (x1,x2)= ;
N 
N. 
as in the previous case, it is easy to obtain the uniform optimal bandwidth: 
 16
o,AMISE 6
r = .
N,W 
3N. As before, we will analyze the behavior of 
..,N 
(1.8,1.8), obtained by employing the optimal 
. 
bandwidth in the point (1.8,1.8), and the same estimation generated by using the uniform optimal bandwidth in W . At the chosen point (1.8,1.8), 
o,AMSE
the optimal bandwidth is r1000 = 0.4005 and the ..,N
corresponding natural estimation (1.8,1.8)= 
.,o 

6.4821 (|..,N (1.8,1.8) - .(1.8,1.8)| = 0.0021); the 
.,o o,AMISE

uniform optimal bandwidth is r = 0.3454, and 
1000,W the corresponding estimation ...,,Nu (1.8,1.8)= 6.5133 (|..,N (1.8,1.8) - .(1.8,1.8)| = 0.0333). As before, 
.,u 

the .rst estimation is more accurate, since it employs the optimal bandwidth at the relevant point. 


(b) 


Fig. 4. Estimators for the intensity function of a Poisson point process, for N = 1000, for two different kernels. (a) shows ..,N (x1,x2), where k is the kernel 
. 

of Epanechnikov and we have used the optimal bandwidth ro,AMSE(x1,x2) at each point. (b) shows the 
N 
THE CASE OF STATIONARY .n 

Let .n be stationary; we assume here that, in the point process representation, . = {(xi,si)}i.N is an independent marking of the marginal process {xi}i.N, which is itself stationary, so that .(d(x,s)) = cdxQ(ds), i.e., f (x,s) . c, for any (x,s) . Rd × K. Thus, ..n (x) . cE[H n(Z)] for H d -a.e. x . Rd, and the optimal bandwidth rN associated with the proposed estimators will be independent of x as well. 
Optimal bandwidth for ..,N and ..,N 
.n .n 

We point out that in the stationary case a kernel type estimation would be irrelevant, since the intensity of the point process is constant; though we treat this case too in order to show the full compatibility of our approach with the standard one, in which we may just take global “means” in the observation window (see, e.g., Bene¡s and Rataj (2004) and the next paragraphs for further details). 
In (Camerlenghi et al., 2014) the following implications have been shown: 
– 	
(A1) . Bias(...n ,N (x)) = 0 for any bandwidth r > 0, and any sample size N; 

– 	
(A1) and (A3) . ...n ,N is strongly consistent for H d -a.e. x . Rd , as N › .. 


It is worth noting that whenever .n is a Boolean model such that E[(H n(Z))2] < ., and the kernel k is assumed to be continuous in the interior of its support, then ro,MSE =+. (Camerlenghi et al., 2014). 
..,N
The same conclusions hold for too, by 
.n 
1

choosing k(z) := 1B1(0)(z). This is in accordance 
bd 

with both intuition and known results in literature for the optimal bandwidth of the kernel estimators of the intensity of homogeneous Poisson point processes. In particular, if W is the observation window of any realization of a homogeneous Poisson point process . in Rd (and so N = 1), and |W | its volume, being 
..,1 (3) 1
(x)= bdrd .(Br(0)) for any x . Rd , we reobtain 

. 
that the best unbiased estimator of the intensity .. of . is given by (taking ro,MSE =+.) .. = .(W )/|W |, with |W |› .. 
In order to carry out numerical experiments for 

..,N 
in the stationary case, we have considered a

.n 
Boolean model of segments of the type [0,l] ×{0}with random length l ~ U(0,L), where L = 0.2, in the compact window W =[0,1]2. Furthermore, 
assume that the underlying Poisson point process has 
natural estimator ..,N (x1,x2); here we have used the 
. 	constant intensity f (x1,x2)= c > 0. It is obvious that 
optimal bandwidth ro,AMSE(x) at each point. 
N 	..1 = cL/2. 
Image Anal Stereol 2014;33:83-94 
Fig. 5 shows the natural estimator at the .xed point (0.5,0.5), for different values of the bandwidth r (in pixel). Since the optimal bandwidth is +., we expect that, as r grows to in.nity, the estimation improves, which is con.rmed: the estimator seems to stabilize after a certain value of r. By comparing Fig. 5a, where N = 10, and b, where N = 100, the estimation improves as the sample size grows to in.nity; in fact for N = 100 the value after which the estimator stabilizes is less than the corresponding value for N = 10. 





5.2.2 Optimal bandwidth for ..µn ,N 
We denote by .i(A) the i-th total curvature measure of any compact set A . Rd with positive reach, as introduced in Federer (1959), for i = 0,...,d - 1. 
Proposition 5 (Camerlenghi et al., 2014) Let .n be a Boolean model with intensity measure .(d(y,s)) = cdyQ(ds), satisfying Assumption (A1), and such that, for any s . K, reachZ(s) > R, for some R > 0. Let us assume also that E[.i(Z)] < . for all i = 0,...,n - 1. 
Then, the optimal bandwidth associated with ..µn ,N is given by 
. cE[H n(Z)]
3 N(.cE[.n-1(Z)] - 2(cE[H n(Z)])2)2 
if d - n = 1, 
o,AMSE . 

.

r := 
N 
(d - n)bd-ncE[H n(Z)]

d-n+2 
2N(cbd-n+1E[.n-1(Z)])2 
.

. if d - n > 1, 
(16) 
independent of x . Rd . 
To understand further the behavior of ..µn ,N in the stationary case, consider the Boolean model of segments of the previous section. It is easy to calculate the optimal bandwidth, that is: 
cEL
o,AMSE 3
r = . (17)
N 
N(c. - 2c2(EL)2))2 
Fig. 6 shows ..µn ,N (0.5,0.5) as a function of the 
bandwidth r (in pixel). In Fig. 6a (N = 10) observe o,AMSE
that for r = r the estimation approaches very
N 
well the theoretical value of the mean density, even if N is low (|.µ,10(0.5,0.5) - ..1 (0.5,0.5)| = 
.1 
2.72). In Fig. 6b (N = 100) the best value of the estimation is achieved when r is equal to the theoretical optimal bandwidth, furthermore as N grows 
o,AMSE
the estimation improves; in fact, for r = r100 , |.µ,100
(0.5,0.5) - ..1 (0.5,0.5)| = 1.5833. We can
.1 
conclude that the estimator is optimal for the choice o,AMSE 
r = r and the estimation improves as the
N 
dimension of the sample size diverges to in.nity. 
Finally, observe that as r › +., the estimator 1
decreases as 2r , which is the same conclusion we have reached for the inhomogeneous case. 



(b) 

Fig. 6. Comparison of the “Minkowski content”-based estimator and the theoretical value (..1 (0.5,0.5)= 30) at the point (0.5,0.5) for a homogeneous Boolean model of segments with intensity f (x1,x2)= 300. In (a) we have chosen N = 10; for ro,AMSE . 5pixel(0.016),
10 we have ..µ1 ,N (0.5,0.5)= 27.28. In (b) we have chosen o,AMSE
N = 100, for r100 . 3pixel(0.0074), we have .µ,N 
(0.5,0.5)= 28.4167.

.1 


CONCLUDING REMARKS 

Based on the numerical simulations we may now offer here some comparison about the computational advantages / disadvantages of the estimators proposed by the authors in Villa (2014), and Camerlenghi et al. (2014). 
From a purely computationally point of view, it emerges the “Minkowski content”-estimator as the most treatable, as one may easily realize by considering that for this estimator we just need to count relevant pixels of the random object (Eq. 6), while for the kernel estimator a, generally nontrivial, computation of integrals is required (Eq. 4). This is the main reason why we have reduced our numerical simulation to the sole point process case. 
The natural estimator, which is a particular case of the kernel estimator, seems to be computationally easier to handle; further for point processes the choice of the kernel does not seem to be of much in.uence. The stationary case has been extensively studied in the literature. It is worth noticing that the optimal bandwidth for a generic kernel estimator is in.nity, whenever .n is a stationary Boolean model, in accordance to well known results in the literature. In applied problems, an in.nite optimal bandwidth is equivalent to the choice of an observation window as large as possible. 
As far as the behaviour of the proposed estimators with respect to the choice of the bandwidth is concerned, we have in particular realized that the natural estimator results to be more stable; i.e., the “Minkowski content”-based estimators are quite sensitive to the choice of the bandwidth, while for kernel estimators it is only important that the bandwidth has the correct order of magnitude (Fig. 1­
2). 
For the time being we have not yet taken into account possible edge effects, which require further analysis. 
ACKNOWLEDGMENTS 

FC and EV are members of the Gruppo Nazionale per l’Analisi Matematica, la Probabilita`e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM); VC wishes to acknowledge the continued collaboration with ADAMSS, the Milan Interdisciplinary Centre for Advanced Applied Mathematical and Statistical Sciences. 
It is our duty and a pleasure to acknowledge the anonymous referees for the accurate reading of the paper, and their valuable comments and suggestions which lead to an effective improvement of the presentation of our results. 
REFERENCES 

Ambrosio L , Capasso V, Villa E (2009). On the 
approximation of mean densities of random closed sets. 
Bernoulli 15:1222–42. 
Image Anal Stereol 2014;33:83-94 
Ambrosio L, Fusco N, Pallara D (2000). Functions of bounded variation and free discontinuity problems. Oxford: Clarendon Press. 
Baddeley A, Barany I, Schneider R, Weil W (2007). Stochastic geometry. Lect Not Math 1982. Berlin: Springer. 
Bene¡s V, Rataj J (2004). Stochastic geometry: selected topics. Dordrecht: Kluwer. 
Berman M, Diggle P (1989). Estimating weighted integrals of the second-order intensity of a spatial point process. J Roy Statist Soc B 51:81–92. 
Camerlenghi F, Capasso V, Villa E (2014). On the estimation of the mean density of random closed sets. J Multivariate Anal 125:65–88. 
Capasso V, Villa E (2006). On the continuity and absolute continuity of random closed sets. Stoch An Appl 24:381–97. 
Capasso V, Villa E (2007). On mean densities of inhomogeneous geometric processes arising in material science and medicine. Image Anal Stereol 26:23–36. 
Capasso V, Villa E (2008). On the geometric densities of random closed sets. Stoch Anal Appl 26:784–808. 
Daley DJ, Vere-Jones D (1988). An introduction to the theory of point processes. Probab Appl 2003. New York: Springer. 
David G, Semmes S (1997). Fractured fractals and broken dreams-self-similar geometry through metric and measure. London: Oxford Univ Press. 
Deheuvels P, Mason DM (2004). General asymptotic con.dence bands based on kernel-type function estimators. Stat Inference Stoch Process 7:225–77. 
Diggle PJ (1985). A kernel method for smoothing point process data. Appl Statist 34:138–47. 
Falconer KJ (1985). , The geometry of fractal sets. Cambridge: Cambridge University Press. 
Falconer KJ (2004). One-sided multifractal analysis and points of non-differentiability of devil’s staircases. Math Proc Cambridge Philos Soc 136:167–74. 
Federer H (1959). Curvature measures. Trans Amer Math Soc 93:418–91. 
Federer H (1969). Geometric measure theory. Berlin: Spriger. 
H¨W Smoothing techniques with
ardle (1991). 
implementation in S. New York: Springer-Verlag. 


Hug D, Last G, Weil W (2002). A survey on contact distributions. In: Mecke KR, Stoyan D, Eds. Morphology of condensed matter. Physics and geometry of spatially complex systems. Lect Not Phys 600:317–57. Berlin: Springer. 
Hug D, Last G, Weil W (2004). A local Steiner-type formula for general closed sets and applications. Math Z 
246:237–72. 
Karr AF (1986). Point processes and their statistical inference. New York: Marcel Dekker. 
Matheron G (1975). Random sets and integral geometry. New York: John Wiley & Sons. 
Parzen E (1962). On the estimation of a probability density function and the mode. Ann Math Statist 33:1065–76. 
Rosenblatt M (1956). Remarks on some nonparametric estimates of a density function. Ann Math Statist 27:832–7. 
Schucany WR (1989). Locally optimal window widths for kernel density estimation with large samples. Statist Probab Lett 7:401–5. 
Silverman BW (1986). Density estimation for statistics and data analysis. London: Chapman & Hall. 
Stoyan D, Kendall WS, Mecke J (1995). Stochastic geometry and its application. New York: John Wiley & Sons. 
van Lieshout MNM (2012). On estimation of the intensity function of a point process. Meth Comput Appl Probab 14:567-78. 
Villa E (2010). Mean densities and spherical contact distribution function of inhomogeneous Boolean models. Stoch An Appl 28:480–504. 
Villa E (2014) On the local approximation of mean densities of random closed sets. Bernoulli 20:1–27 
Wertz W (1978). Statistical density estimation: a survey. ¨ 
Angewandte Statistik und Okonometrie [Applied statistics and econometrics] 13 G¨ottingen: Vandenhoeck & Ruprecht. 


APPENDIX 
To .x the notation, . :=(.1,...,.d ) will be a multi-index of Nd 0; we will denote 
|.| := .1 + ··· + .d .!:= .1!···.d! .1 .d
y. := y ···y
1 d 
.|.| f (y,s)D. yf (y,s) := .1 .d ;.y1 ···.yd 
furthermore, for all s . K, we will denote 
D(.)(s) := disc(D. f (y,s)) , D(s) := disc( f (·,s)) .
y 
Furthermore, we list here the assumptions on . which have been adopted in the text. 
(A1) for any (y,s) . Rd × K, y + Z(s) is a countably H n-recti.able and compact subset of Rd, such that there exists a closed set .(s) . Z(s) such that 

IK H n(.(s))Q(ds) < . and (A3) for any (s,y,t) . K×Rd ×K, H n(disc(g(·,s,y,t))) = 0 and g(·,s,y,t) is locally bounded such that for 
n
H n(.(s) . Br(x)) . .r.x . Z(s), .r . (0,1) 
any compact K . Rd and a . Rd ,
(18) 
for some . > 0 independent of s; 


1(a-Z(t)).1 (y) sup g(x,s,y,t) . .a,K (s,y,t) (A1) as (A1), replacing (18) with x.K.diam(Z(s)) 
n
.rn . H n(.(s).Br(x)) . .yr.x . Z(s), r . (0,1) 

for some .a,K (s,y,t) with 
for some .,.y> 0 independent of s; 

1

(A2) for any s . K, H n(disc( f (·,s))) = 0 and f (·,s) is H n(.(s)).a,K (s,y,t)dyQ[2](ds,dt) < . . 
Rd ×K2
locally bounded such that for any compact K . Rd (19) 
sup f (x,s) . .yK(s) 

(A3) for any s,t . K , g(·,s,·,t) is locally bounded such 
x.K.diam(Z(s)) 
that, for any C,C . Rd compact sets: 

for some .yK (s) with 
sup sup g(x,s,y,t) . .C,C(s,t)

1 

H n(.(s)).yK (s)Q(ds) < . ; y.C.diamZ(t) x.C.diamZ(s) 
K 
for some .C,C(s,t) with

(A2bis) for any s . K, H n(D(.)(s)) = 0 and D. f (y,s) is locally bounded such that, for any 
y 1compact C . Rd , H n(.(s))H n(.(t)).C,C(s,t)Q[2](ds,dt) < . . 
K2 
(20)
sup |D. f (y,s)|. .y(.)(s)
yC 
y.C.diamZ(s) 


For a discussion on the above assumptions and on for some .y(.)(s) with how they simplify in certain particular cases, for 
C 

instance whenever .n is a Boolean model (i.e., . is1 H n(.(s)).y(.)(s)Q(ds) < . ; an independently marked Poisson point process), see K C Villa (2014, Sec. 3.1) and Camerlenghi et al. (2014).