Image Anal Stereol 2010;29:133-141 Original Research Paper INHOMOGENEITY IN SPATIAL COX POINT PROCESSES - LOCATION DEPENDENT THINNING IS NOT THE ONLY OPTION Michaela ProkesovaA Department of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, Charles University, Sokolovska 83, 18675 Praha, Czech Republic e-mail: prokesov@karlin.mff.cuni.cz (Accepted June 30, 2010) ABSTRACT In the literature on point processes the by far most popular option for introducing inhomogeneity into a point process model is the location dependent thinning (resulting in a second-order intensity-reweighted stationary point process). This produces a very tractable model and there are several fast estimation procedures available. Nevertheless, this model dilutes the interaction (or the geometrical structure) of the original homogeneous model in a special way. When concerning the Markov point processes several alternative inhomogeneous models were suggested and investigated in the literature. But it is not so for the Cox point processes, the canonical models for clustered point patterns. In the contribution we discuss several other options how to define inhomogeneous Cox point process models that result in point patterns with different types of geometric structure. We further investigate the possible parameter estimation procedures for such models. Keywords: Cox processes, inhomogeneity, moment estimation, parameter estimation, spatial point processes. INTRODUCTION In this paper we want to discuss the problem of introducing inhomogeneity into spatial Cox point processes. In contrast with the Markov point processes, where several models with different types of inhomogeneities (or different mechanisms how to introduce inhomogeneity into the homogeneous model) were introduced (see, e.g., Jensen and Nielsen, 2001 for the overview), looking through the papers about Cox point processes we see that the only used type of inhomogeneous model is the location dependent thinning (resulting in the so-called second order intensity reweighted point process, Baddeley et a^l., 2000). There are two reasons for this situation. The first is the simplicity and tractability of moment estimation procedures for the parameters of the location dependent thinning model whereas for other kind of inhomogeneities it is not clear what to do since the models are highly "nonlinear" in nature. We will show in the sequel that certain approximations can deal with this problem and it is possible to derive reasonable parameter estimators also for the more complicated inhomogeneities. The second reason is that by Markov point processes with strong negative interactions (e.g., hardcore interaction) it is clearly visible that different types of inhomogeneities produce point patterns with very different geometrical structure. Which means that for fitting the data with such strong interactions (like, e.g^., cell tissues or advanced materials like sinter filters with gradient structure - see, e.g., Hahn et al., 2003 for examples) it is necessary to have a correct inhomogeneous model. On the other side, the type of data usually modelled by Cox point processes (like different plant communities in ecology) do not usually exhibit interactions strong enough to make the misfit we get by fitting the location dependent thinning type inhomogeneity obvious at first sight. Thus the misfit is pragmatically ignored in the applications. Let us start by explaining what we mean by different geometrical structure. In Fig. 1, we see examples of realizations of four different inhomogeneous hardcore point processes with the same intensity function. . • • • • • • • • ••• t ••••• : : Fig. 1. Examples of different inhomogeneous Markov point processes w^ith the same intensity function. For details see the text. The inhomogeneity was introduced into the model by (from left to right) an inhomogeneous firstorder potential (Ogata and Tanemura, 1986), location dependent thinning (Baddeley et al, 2000), a 1-1 spatial transformation of the homogeneous point pattern (Nielsen and Jensen, 2004) and by location dependent scaling (Hahn et al, 2003). When examining the point pattern produced by location dependent thinning we see that this model dilutes the interaction (or the geometrical structure) of the original homogeneous model in a special way - the repulsive interactions so obvious in the area with high intensity are hardly recognizable in the area with low intensity - the point pattern there looks almost like a Poisson point process. All the other models preserve more interactions in the area with low intensity, i.e., more from the geometrical structure of the homogeneous version of the hard-core model. The other extreme is the point pattern produced by location dependent scaling which keeps the same geometrical structure in any part of the observation window independently of the intensity fiinction. The fact, that the situation is similar also for the Cox point processes is illustrated in Fig. 2 where we can see realizations of several shot-noise Cox processes with the same first order intensity function. All three processes were derived from the same homogeneous Thomas process but different types of inhomogeneity were used. Here we can see the location dependent thinning type inhomogeneity in the middle panel and again we observe the dilution of the clusters in the area with low intensity. The point pattern in the low intensity area looks not like a cluster process but almost like a Poisson process. The other two point patterns are obviously different - the left one having the same clusters but distributed inhomogeneously in the space, the right one showing a location dependent scaling type of behaviour where the point patterns in areas with different intensity look like scaled copies of the same homogeneous template process. The clustered nature of the point pattern is preserved also in the areas of low intensity. We will discuss the exact models from which the point patterns were generated in detail later in the paper. Nevertheless the differences between the different inhomogeneities become less pronounced when the interactions are weaker {i.e., the clustering is weaker). Fig. 3 shows what happens if we double the scale of the clusters (of the homogeneous template) and keep the rest of the model parameters the same as in Fig. 2. Here we still can see the differences in the geometrical structure in the areas of low intensity but the difference is not as obvious as in Fig. 2. r- ■^■•8.-WW' " * t t -4- sv. ft- a- 4 y %: ^ ■Jf a' t' Fig. 2. Examples of different inhomogeneous shot-noise Cox point processes with strong interactions and the same intensity function. For details see the text. r. •<.■:. i.' ■■■>:■. nisi-i '»"■A-., 'ä'- Fig. 3. Examples of different inhomogeneous shot-noise Cox point processes with moderate interactions and the same intensity function. For details see the text. Concerning the estimation of the models we already said that the reason, why the location dependent thinning models are so popular, is the simplicity of moment estimation in such models. The higher order intensity functions ... k >2 are namely equal to the product k ... ,Xk) = PH{XI,. . .,Xk)Ylp{xi) , (1) i=l of the l-th order intensity function of some homogeneous point process and the product of the first order intensity function of the thinned process p{xj) = for the points Xj from the point configuration X = {xi,...xk). And this fact makes it possible to estimate first the inhomogeneity - i.e., the (first order) intensity function p, and then conditionally on this estimate to proceed with the estimation of the interaction parameters from the higher order intensity functions in the same way as in the homogeneous case. However this strategy is not applicable to the other types of inhomogeneities. There the inhomogeneity enters the model in a more "nonlinear" way, thus we do not have the product structure for the higher order intensity functions. But we can use other properties of the models for a 2-step estimation of the inhomogeneity parameters and subsequently the interaction parameters. For example in the transformation model we can estimate the transformation from the intensity function, transform the point pattern back by the estimated transformation and then estimate the interaction parameters like in the homogeneous case (Nielsen and Jensen, 2004). Or for the locally scaled model we can estimate the local scaling function from the intensity function and then conditionally on this estimate of the inhomogeneity parameters (parametrizing the local scaling function) use the pseudolikelihood methods to estimate the interaction parameters of the model (Prokesova et al, 2006). Inspired by some of these ideas we derive two step moment estimation procedures for several kinds of inhomogeneous Cox processes. PRELIMINARIES We first recall the basic notions and introduce our notation. For more detailed information see the standard references Daley and Vere-Jones (1988) and Stoyan etal. (1995). Let C R'^ and Xhe ä point process on . For a Borel set A in R'^ will denote the volume of A and \Xri A\ the number of points X has in A. For an i? > 0 we will denote B{0,R) the ball centered in the origin with radius R. For any given point u G R'^ let du be the infinitesimal region that contains the point u. Following Diggle (2003) we can define the (first-order) intensity function p of A'by p{u) = hm ' I \ (2) i.e., the mean number of points from X occurring in du; and the second-order intensity function v) by v)= lim fE\{Xndu){Xndv)\^^ \du\\dii (3) When X is simple (it does not have multiple points) then the intuitive meaning of is that p''^Xu,v)\du\\dv\ is the approximate probability that du and d v each contain a point from X, where u ^ V. Higher order intensity functions are defined analogically. If X is stationary {i.e., its distribution is invariant with respect to the simultaneous shifts of all the points in X), then p{u) = p = const, and (4) Thus all the k-ih order intensity functions can be reduced to equivalent functions of only - 1) arguments. In the stationary case two important summary statistics, which are often used in the applications, are defined by means of the second order intensity function p(^). The so-called pair correlation function (sometimes called simply the ^-function) is defined by p(')iu,v) (5) and because of the reducibility (Eq. 4) of p(^) it is equivalent to a function of one argument [u,v) =g{u-v), U,ve (6) The fact that the correlation function of a stationary point process depends only on the difference of the locations uand I'and not on the locations themselves is an important property of the stationary point processes. In the sequel we will denote the ^-function of a stationary process as a function of just one argument, as is usual in the literature. For the Poisson point process {i.e., a process with no interactions) the g-flinction is identically equal to 1. ^-function values larger than 1 indicate positive correlation between the occurrence of the points of the point process in the locations u, v, smaller values than 1 indicate negative correlations. The second important statistic is the i^-function defined by K(r)= f pP)(0,ü)dü/p2 J||u||0, ,u\\ 0. Thus it corresponds to a superposition of clusters [jwe(i>'^w where O' is a stationary Poisson process with intensity ß (mother intensity) and each mother w generates a Poisson cluster X„ of daughter points with mean number of daughter points in the cluster equal to V and probability distribution Sanction of the location of the daughter points relative to its parent is a bivariate radially symmetric normal distribution. We can easily derive that p{u) = ßv and /XV 1 + V exp V \u— I'll 4C72 // (14) LOCATION DEPENDENT THINNING Now we review the most popular type of inhomogeneity in spatial point processes and the moment estimation methods available for this type of (parametric) models. Let C R'^ and X he a homogeneous point process on (by homogeneous we mean stationary - for = R'^, or a restriction of a stationary point process to Jf C R"') and let p : Jf ^ [0,1] be a Sanction on S". Let F be the point process Y={xeX-.R{x)0 J\\u\\ 2. Let us now suppose that the point process model is parametrized by a vector parameter y/ consisting of two subvectors y/ = {ß, 6), where ß parametrizes the thinning function Pß{x) (inhomogeneity parameter) and 6 parametrizes the original homogeneous point process Xe with fixed constant intensity. Example. Inhomogeneous Thomas process with log-linear intensity function (Waagepetersen, 2007): Let C Xq be a Thomas process with mother intensity ß, mean number of daughter points in a cluster equal to v and k the bivariate normal kernel with scale parameter o. And let z{u) G R^ be a vector of covariates that are recorded in the location u G and let us define the retention probability where M= max^g^r (exp(z(u)j8^). Then p^^\u,v)=p(u)p(v)g{u-v) , (19) (20) (21) and gi( u— v) = ge( u — v) is equal to the ^-fianction of the homogeneous Thomas process 1 1 exp u — 4C72 (22) After the reparametrization ßo = log(ßv/Af) we have p(u)=pß(u)=exp{ß,>+z(u)ß^} . (23) Our inhomogeneity parameter is then ß = (jSq , • • • jSjt) and interaction parameter 6 = {ß,o). The estimation in such models then proceeds as follows: we estimate the inhomogeneity parameter ß from the first order moment properties i.e., by maximization of the Bernoulli composite likelihood Lb derived from the first order intensity function LB{ß)=Qw(-[ Pß{u)du) n P/3«- (24) \ J^ J Arem^r If there is a unique ß which maximizes Lb we can get it as the solution of the estimating equation I xeYnSC d{logpß{x)) f d{logpß{x)) dß dß pß{u)du=0. (25) This is an unbiased estimating equation for processes of location dependent thinning type. Note that Lb is actually equal to the likelihood of an inhomogeneous Poisson process with intensity function Pß{u). Thus the estimate of the first-order parameter ß is obtained by ignoring the interactions in the point process Y. In the sequel we will denote p = pß. Asymptotic properties of the estimator ß were investigated in Waagepetersen (2007) and Waagepetersen and Guan (2009). In the second step we estimate the interaction parameter 6 conditionally on ß = ß. There are several options, how to do this. The oldest one is the minimum contrast estimation for the gi or Kj function (Diggle, 2003; Waagepetersen and Guan, 2009). Or we can use the composite likelihood approach (Guan, 2006) and find 0 as the argument ofmaxima for the composite (log) likelihood fianction logCL(e) = S x^yeYnSr,\\x-y\\ 0 is a user specified tuning constant. Another version of the composite log likelihood fianction may be defined by logCL'{9)= X log{p{x)p{y)ge{y-x))- x^yeYnSr ,\\x-y\\ 0 is a tuning constant. The maximum Palm likelihood estimate of 6 is the value which maximizes logZ,p = log Z,p(6). Further details and a simulation study comparing the methods introduced above can be found in (Prokesova and Jensen, in preparation). Example. For the inhomogeneous Thomas process with log-linear intensity function the (vector) first order estimation equation (Eq. 25) becomes simply X il,ziu))exp{ßo + ziu)ß^)du. xeYnär (26) The implementation of the composite or Palm likelihood estimation of 6 = {ß,o) is then straightforward when using the closed form of the ^-function (Eq. 22). The last parameter v is obtained from ßo = log(/xv/M) and the value M = INTRODUCING INHOMOGENEITY INTO COX PROCESSES Let us start again with the log Gaussian Cox processes. We said already that their distribution is completely determined by the mean value field and the covariance function of the corresponding Gaussian field or equivalently it is completely determined by the first and second order intensity function (since we have the 1-1 correspondence given by the Eqs. 8 and 9). This means that as far as the covariance function c( u, v) is translation invariant, the (inhomogeneous) log Gaussian Cox process X is of the location dependent thinning type. For the case of the log-linear intensity function dependent on the covariates z{u) we can reparametrize A(ü)=exp(z(ü)j8^ + Y(ü)) , (27) with = -c(0,0)/2 which means £'exp(*F(u)) = 1 is now a stationary Gaussian field) and having the covariance function parametrized by 0 we can apply the estimation procedures from the preceding chapter Since in the literature there are not any cases with covariance function which is not translation invariant we can conclude that all practically usable log Gaussian Cox processes are of the location dependent thinning type. We would like to remark here, that the main advantage of the log Gaussian Cox processes, i.e., the simplicity of the specification of the model, changes to a disadvantage from the viewpoint of the modelling of point patterns with diverse geometrical structure. Namely simple specification by only ß and c corresponds to the full determination of the model by only p(^) and It follows that all the higher (> 3) order product intensity functions are determined by the first and second order intensity functions and consequently the possible geometrical structure of the resulting point patterns is highly restricted. Thus, e.g., for modelling the first and third point pattern from Fig. 2 log Gaussian Cox processes are not a good choice. Here one should use the shot noise Cox processes which are more flexible. We can introduce inhomogeneity in the shot noise Cox process in several ways. Let us suppose that /ß : Š" ^ R+ is a function bounded away from zero and infinity on S", parametrized by the inhomogeneity parameter ß and k and ^ are a translation invariant kernel and an intensity measure on (0,°°) x invariant with respect to shifts in the spatial coordinate A-G jr. We can define a model with a new kernel k'ß{w,u) = fß{u)k{w,u) , (28) (thus the kernel k! no longer integrates to 1, nevertheless it is possible to reparametrize this model in such a way, that we still have k! a probability kernel, but the formulas are more complicated (Hellmund et al, 2008). Then we obtain a point process of the location dependent thinning type with p{u) = fß{ii) and a well defined inhomogeneous ^-function equal to the ^-function of the homogeneous model and we can use the estimation procedures from the preceding chapter. An example of such a point process can be seen in the middle panel of Fig. 2. This is a realization of the inhomogeneous Thomas process model from the preceding chapter with ß = {ßo,ßi), z{u) = U2 and fß = exp(j8i U2). We see clearly how the strength of the clustering disappears in the areas with low intensity since the clusters are gradually thinned out even to individual points in the bottom part of the panel. Such a point pattern would correspond, e.g., to a situation when we want to model a biological population of seedlings, produced by a homogeneous population of older plants, where the seedlings were thinned by different ecological conditions at their respective location. A completely different point pattern can be seen in the first panel of Fig. 2 - here we have the same clusters (same size of clusters and number of points in the clusters) but their number (=intensity) in different parts of the observation window is different. Keeping to our plant community example this would correspond to a situation, where we have homogeneous outer ecological conditions but the sources of seedlings (older plants) are unevenly distributed in space. This point process was derived from the homogeneous one by setting C'(dr,diy) = /^(iy)C(dr,diy), (29) and keeping the same kernel k. Concerning the question of parameter estimation of the model (Eq. 29) by moment methods we can plug in into Eqs. 11 and 12 p{u) =11 rk{w, u) fß{w) C(dr,diy) , (30) p^^\u,v)=p{u)p{v) (31) w,u)k{w,v) fß{w) ^(dr,div) , and see that the separation of the inhomogeneity parameters is no longer possible. Nevertheless we can still accept the pragmatic attitude and under the assumption that the scale of change of the inhomogeneity function is larger than the size of the clusters {i.e., the scale/practical range of the kernel function k) we can use the approximation Pß,e{u) - fß{u) JIrke{w, u) l;,e{dr,dw) , (32) to obtain from the estimation (Eq. 25) a (biased, but under the above mentioned assumption reasonable) estimate ß of the inhomogeneity parameter ß and subsequently with the approximation + JJi^keiw,u)keiw,v)Udr,dw), we can use the composite or Palm likelihood to obtain the estimate of the interaction parameter 6. Example. Let us consider the inhomogeneous Thomas process with ^(dr,dw) = fß{w)ß5y{dr)dw, i.e., the mothers are distributed inhomogeneously according to the intensity function fßß. Then by the approximation (Eq. 32) we get Pß,e{u) -- - const, (33) and by choosing, e.g., fß{u) = exp{z{u}ß^)/Mwe see that we obtain the same estimate of ß like for the log-linear inhomogeneous Thomas process of location dependent thinning type from the previous chapter. Nevertheless for the second order intensity function we get the approximation fßCi 4(T2 \ fß{u)fß{v) , (34) / thus the obtained estimate of 6 by either the composite or Palm likelihood method will be influenced by the weightmg factor of fß{u)fß{v)). When constructing the inhomogeneous models it is also possible to include the inhomogeneity function fß into several ingredients of the shot noise Cox process specification, thus obtaining point patterns witii more complicated geometrical structure. Our last example is the point pattern in the right hand panel of Fig. 2. We can notice a certain resemblance with the right hand panel in Fig. 1 - the locally scaled point pattern. Here also the different parts of the figure look like a scaled version of the same clustered point pattern, where the intensity is high, the clusters are more compact and where the intensity is low, they are more loose. This type of inhomogeneous shot noise Cox process was derived from the homogeneous model by defining the kernel k k!{w, u) = kg w V fßi^)' fßi^)J fi^) 1 (35) (note that k' is again a probability density in u) and the intensity measure of O' by C'(dr,diy) = C9(dr,dw)/fß(w)'. (36) Here d is the dimension of the space S", i.e., c/ = 2 for the planar case and fß has the meaning of the scaling function - i.e., the point pattern looks at different places like a scaled version of some homogeneous template point process scaled with the factor fß{u). Concerning the parameter estimation we would suggest to proceed analogically to the Markov point process case (Prokešova et al, 2006), i.e., with a two step estimation procedure, first estimating the function fß from the Poisson likelihood (Eq. 24) using the approximation Pßiu)^jr,JJrke{w, u) C0(dr,diy) , (37) and then instead of the pseudolikelihood (used in the Markov point process case) use the composite likelihood or Palm likelihood with the approximation for the second order intensity function 1 fß{uYfß{vY / JI r^e(w;u)Ce(dr,dw) \ 2 / + h'ke W ■ I\H Vi V ^ß^ 2 )J V ^ß^ 2 )J / pßi")pßiy)gx ke Ce(dr,dw) \ ^ftm' ftmj where gx denotes the homogeneous version of the g-flinction for the homogeneous shot noise Cox process with kernel Rq and intensity measure i^e. CONCLUSIONS In this paper we discussed the problem of defining an inhomogeneous model for clustered point process data, which possess a specific geometrical structure which makes the modelling by the locally thinned point process models inappropriate. The disadvantage of such models is a complex structure of the intensity functions which makes the parameter estimation by moment methods (be it minimum contrast of K-flinction, Poisson likelihood, composite likelihood or Palm likelihood estimation) intractable without accepting certain approximations which lead to biased estimates. But parameter estimation in these models is possible, which was illustrated by the methods in this paper. Nevertheless the suggested methods and their properties deserve further investigation. ACKNOWLEDGEMENTS This work was supported by projects GACR 201/08/P100 from the Czech Science Foundation and MSM0021620839 financed by the Ministry of Education of the Czech Republic. REFERENCES Baddeley AJ, Meiler J, Waagepetersen R (2000). Non-and semiparametric estimation of interaction in inhomogeneous point patterns. Stat Neerl 54:329-50. Daley DJ, Vere-Jones D (1988). An introduction to the theoiy of point processes. New York: Springer Verlag. Diggle P (2003). Statistical analysis of spatial point patterns. New York: Oxford University Press. Guan Y (2006). A composite likelihood approach in fitting spatial point process models. J Am Stat Assoc 101:1502-12. Halm U, Jensen EBV, van Lieshout MNM, Nielsen LS (2003). Inhomogeneous spatial point processes by location-dependent scaling. Adv Appl Probab 35:31936. Hellmund G, Prokešova M, Jensen EBV (2008). Levy-based Cox point processes. Adv Appl Probab 40:603-29. Jensen EBV, Nielsen LSN (2001). A review on inhomogeneous point processes. In: Basawa 1, Heyde CC, Taylor RL, eds. Select Proc Symp Infer Stoch Proces. Beachwood, Ohio: Institute of Mathematical Statistics, 297-318. Moller J (2003). Shot noise Cox processes. Adv Appl Probab 35:614-40. Moller J, Syversveen AR, Waagepetersen RP (1998). Log Gaussian Cox processes. Scand J Statist 25:451-82. Nielsen LS, Jensen EBV (2004). Statistical inference for tiansformation inhomogeneous point processes. Scand J Statist 31:13 l-f2. Ogata Y, Tanemura M (1986). Likelihood estimation of interaction potentials and external fields of inhomogeneous spatial point pattems. In: Francis IS, Manly BFJ, Lam FC, eds. Proc Pacific Statist Congr - 1985. Amsterdam: Elsevier, 150^. Prokešova M, Hahn U, Jensen EBV (2006). Statistics for locally scaled point processes. In: Baddeley A, Gregori P, Mateu J, Stoica R, Stoyan D, eds. Case Studies in Spatial Point Process Modelling. New York: Springer Verlag, 99-123. Stoyan D, Kendall WS, Mecke J (1995). Stochastic geometiy and its applications. 2nd ed. Chichester: Wiley. TanakaU, Ogata Y, Stoyan D. (2007). Parameter estimation and model selection for Neyman-Scott point processes. Biometrical J49:l-f5. Thomas M (1949). A generalization of Poisson's binomial processes. Biometrics 63:252-258. limit for use in ecology. Biometrika 36:18-25. Waagepetersen RP, Guan Y (2009). Two-step estimation for Waagepetersen RP (2007). An estimating function approach inhomogeneous spatial point processes. J Roy Stat Soc to inference for inhomogeneous Neyman-Scott B 71:685-702.