Image Anal Stereol 2000;19:199-204 Original Research Paper ACCURACY OF ESTIMATES OF VOLUME FRACTION Joanne Chia and Adrian Baddeley Department of Mathematics and Statistics University of Western Australia Nedlands, W.A. 6907 Perth, Australia e-mail: chia@maths.uwa.edu.au (Accepted August 3, 2000) ABSTRACT When estimating a volume fraction VV from point count fractions PP using Delesse's principle VV = PP, very little information on the accuracy of the estimator can be obtained from the basic theory of stereology. Existing methods for quantifying the variability of PP are mainly large-sample approximations such as Cochran's formula for the variance of a ratio. Cruz-Orive proposed an alternative method, but this requires statistical assumptions to be made on the point counts P, that do not hold in general. We introduce two alternative methods for quantifying the variability of PP, namely the bootstrap method and explicit statistical modelling of the bivariate distribution. The bootstrap method requires few statistical assumptions about the point counts but requires large sample size. The explicit statistical modelling method does make assumptions, but they can be checked directly from the data. To explore this approach, we propose a statistical model, the Type I Bivariate Binomial (BVB) distribution to model the pairs of count data (P, P). We show how to fit the BVB model to the data and how to assess the goodness-of-fit of this model. A formula for the variance of PP under the BVB model is also derived. The three approaches are compared in their application to nine example data sets taken from macroscopic sections of cerebral hemispheres of selected domesticated animals. The BVB model appears to be a good fit to these data sets. Implications for stereological estimation are discussed. Keywords: bootstrap method, delta method, Monte Carlo, stereology, type I bivariate binomial distribution, volume fraction. INTRODUCTION Many parameters of interest in stereology are ratios of geometrical quantities such as the volume fraction V(obj) V V (ref) of a phase of interest, obj within the three-dimensional reference space, ref. Stereological theory (Weibel, 1980; Baddeley, 1991), shows how to statistically estimate V(obj) and V(ref) by unbiased estimators, such as the Cavalieri's principle estimators estV (obj) = t . a?P(obj) and estV (ref) = t . a? P(ref) where a is the area per test point and t is the separation distance between the sections, based on test point counts P(obj) and P(ref). Usually, the ratio VV is estimated by taking the ratios of these estimators, est V estV(obj) = ? P(obj) estV (ref) ? P(ref) The important question for us is the accuracy of this estimator. Stereological theory (Weibel, 1980) does not provide much information about the variance of the individual estimators estV(obj), estV(ref) or their ratio. In a more general context, suppose we are interested in the parameter e = E(Y) E(X) (1) where X and Y are geometrical measurements such as length, area or the number of counts of test points within a small planar section through an object. The parameter ? is usually estimated by the ratio estimator e xy y xx (2) 199 Chia J et al: Accuracy of estimates Our aim is to estimate the variance, Varlö j of this estimator. The usual estimator of 0 used in stereology is the ratio estimator (2) which does not require any specific model assumptions. Cruz-Orive (1980) proposed a semi-parametric approach to estimate 0, and in this paper, we use a parametric approach. We consider bivariate count data (X, Y) and the situation where n integer bivariate observations (x;, j;),...,(x„, y„), are obtained from n different images or sections of the material under study. These observations are assumed to be independent and to have the same bivariate distribution. EXAMPLE DATA SETS The data were extracted from a comparative study of neuroanatomy by Mayhew et al, (1990) and were kindly provided by Professor Mayhew. The samples were taken from the cerebral hemispheres of nine selected domesticated animals, namely three horses, one dog, one pig and four rabbits. A test system consisting of a grid of test points and a lattice of cycloid arcs was used to record various quantities of interest for the study. Volume fraction of cortex in the whole brain was estimated from PV(cort/ref) = .P(cort)/P(ref) where “cort” represents cortex. In this paper, we denote P(ref) the number of test points which fell on the whole slice by X, and .P(cort) the number of test points which fell on the cortex alone by Y. VARIANCE ESTIMATION We consider three approaches to estimate the variance of 0 = y / x . They are the non-parametric, semi-parametric and parametric approaches. Bootstrap method A non-parametric bootstrap method (Efiron, 1982) was applied to each of the nine data sets. For each data set, 1000 random samples (with replacement) from the data, of the same size as the original sample, were generated and the sample value of 0 = y / x computed. The variance of 0 was estimated by the sample variance of the bootstrap replications. Delta/Cochran method One may apply the delta method (Lehmann, 1999, p. 85) which assumes that (X,Y) are asymptotically jointly normally distributed and approximates the variance of 0 by linearising the function f(x,y) = y/x around (E(x),E(7h. This yields an estimate of the variance of Var(ö)< /-2\ ' y * L-V 2 i Sr + (1 "I y 1\ ¦xy -,2 „2 where sx , s and s denote the sample variance of x and of y and the sample covariance of x and y respectively. Note that the above estimate of the variance of 0 is equivalent to the one given by Cochran (1977). Cruz-Orive method The semi-parametric approach adopted to estimate the variance of 0 was that proposed by Cruz-Orive (1980). This method is based on a conditional regression model which assumes linear regression of j on x and the variance of y given x is proportional to x. Under this model, Var(ö)- — v > n-1 (f X^/X*,-e2 z=1 X, i=1 (3) Parametric method For the parametric approach, we assume that the statistical model, Type I Bivariate Binomial distribution (BVB) is applicable to the data. Details of the BVB model are given in a further section below. We fitted this model to the data and used the variance of 0 predicted by the model. The following expression for the variance of 0 was derived from the BVB model where Pa and Px are parameters of the distribution (see below), n is the number of sections/fields and m is the number of test points on the test system. Var(ey^(1-PA/Px){(1-2/nm)[nm.Px-(1-Px)]} (4) RESULTS We applied the techniques of estimating the variance of 0 mentioned above to the nine example data sets. The estimates of the standard error of 0 obtained are shown in Table 1 and Fig. 1. The bootstrap method, delta/Cochran method and Cruz-Orive's formula yielded results that are roughly in agreement with each other for all the data sets, while they disagree with the estimates obtained by the BVB 200 Image Anal Stereol 2000;19:199-204 model, particularly for the rabbit data (R2, R3, R4) which have small sample size. Further, we can see from Fig. 1 that for data sets R2, R3, R4, the bootstrap, delta/Cochran and Cruz-Orive methods produced estimates that fluctuate more than those yielded by the BVB model. A simulation study was then carried out to investigate the variance of the estimates of Var 16 I obtained by the four techniques assuming the BVB model is true. For a particular data set, we simulated 1000 matching data sets from the BVB model using the maximum likelihood estimates of the parameters. The four methods of estimating Var 16 I were then applied to the simulated data, and the corresponding sample variance/covariance matrix was computed. The nine variance/covariance matrices obtained all showed that the variance of the estimates of Var 16 I obtained by the BVB model is much lower than those yielded by the bootstrap, delta and Cruz-Orive methods. As an example, the variances for data set H1 were 3.6x10-7 for the BVB model, 7.4x10-6 for the bootstrap method, 4.4x10-6 for the delta/Cochran method and 3.8x10-6 for Cruz-Orive method. In addition, it was found that the latter three estimates were highly correlated. The results obtained from the four variance estimation methods and the simulation study indicate that if we believe the BVB model is true for our data, then it provides the most reliable way to estimate Var 16 I amongst the four methods considered. Fig. 1. Estimates of standard error of ?ˆ using the bootstrap method, delta method, Cruz-Orive's formula and BVB model. 201 Chia J et al: Accuracy of estimates Table 1. Estimates of standard error of d = y / x using the bootstrap, delta/Cochran, Cruz-Orive methods and the BVB model(4). Species Sample 0 Estimated Standard Error Data Bootstrap Delta Cruz-Orive's BVB Set size values Method Method Formula Model H1 Horse 72 0.532 0.019 0.020 0.022 0.025 H2 Horse 76 0.485 0.022 0.022 0.023 0.023 H3 Horse 68 0.400 0.024 0.024 0.024 0.022 Dg Dog 64 0.602 0.035 0.035 0.034 0.032 Pg Pig 68 0.470 0.028 0.027 0.028 0.028 R1 Rabbit 40 0.640 0.037 0.038 0.039 0.041 R2 Rabbit 32 0.590 0.032 0.031 0.032 0.039 R3 Rabbit 40 0.592 0.049 0.047 0.046 0.039 R4 Rabbit 36 0.496 0.036 0.036 0.035 0.042 TYPE I BIVARIATE BINOMIAL DISTRIBUTION The Type I bivariate binomial distribution (BVB) (Kocherlakota and Kocherlakota, 1992) is the joint, distribution of two random variables X and Y with the following joint probability mass function m! a=min(x,y) P(X = x Y = y) = "V _________ a=max(0, x+y-m)a! (x - a)! (y - a)! (m - x - y+a)! P a P x-a P y-a P m-x-y+a (5) where 0 1) = 0.37 which is not significant, where “Bin” represents the binomial distribution. Hence we conclude that the BVB model is applicable to (P, P) data in this context. A preliminary report of some of the data was presented at the Xth International Congress for Stereology, Melbourne, Australia, 1-4 November 1999. ACKNOWLEDGEMENTS We thank Professors T.M. Mayhew, G.L.M. Wamengele, and V. Dantzer for providing us with the stereological data and Dr U. Hahn for valuable feedback. REFERENCES Baddeley A (1991). Stereology. In: Spatial statistics and digital image analysis. Ch. 10. Washington D.C: National Academy Press, 181-216. Besag J, Diggle, P. (1977). Simple Monte Carlo tests for spatial pattern. Appl Statistics 26:327-33. Brooks S, Morgan B, Ridout M, Pack S (1997). Finite mixture models for proportions. Biometrics 53:1097- 115. Cochran W (1977). Sampling Techniques. New York: Wiley and Sons, 3rd edition. Cruz-Orive L (1980). Best linear unbiased estimators for stereology. Biometrics 36:595-605. Efron B (1982). The Jackknife, the Bootstrap and Other Resampling Plans. Society for Industrial and Applied Maths, Philadelphia, Pennsylvania. Hope A (1968). A simplified Monte Carlo significance test procedure. J Roy Stat Soc 30(B):582-98. Kocherlakota S, Kocherlakota K (1992). Bivariate Discrete Distributions. New York: Marcel Dekker. Lehmann E (1999). Elements of large sample theory. New York, USA: Springer-Verlag. Mayhew T, Wamengele G, Dantzer V (1990). Comparative morphometry of the brain: estimates of cerebral volumes and cortical surface areas obtained from macroscopic slices. J Anatomy 172:191-200. Weibel E (1980). Stereological Methods. 2. Theoretical Foundations. London: Academic Press. 204