Unevenly spaced time series analysis: Case study using calcimetry data from BV-1 and BV-2 boreholes in Ljubljansko barje (central Slovenia) Analiza neenakomernih časovnih vrst Študija kalcimetrijskih podatkov iz vrtin BV-1 in BV-2 na Ljubljanskem barju Mihael BRENČIČ12 1University of Ljubljana, Faculty of Natural Sciences and Engineering, Department of Geology, Chair of Karst Geology and Hydrogeology, Privoz 11, SI-1000 Ljubljana, Slovenia; e-mail: mihael.brencic@ntf.uni-lj.si 2Geological Survey of Slovenia, Department of Hydrogeology, Dimičeva ulica 14, SI-1000 Ljubljana, Slovenia Prejeto / Received 9. 6. 2009; Sprejeto / Accepted 3. 8. 2009 Key words: unevenly spaced time series, interpolation, weighted influence function, autocorrelation, power spectrum, maximum entropy, calcimetry, sedimentation, Quaternary, Ljubljansko barje, Slovenia Ključne besede: neenakomerne časovne vrste, interpolacija, utežna funkcija vpliva, avtokorelacija, spekter, maksimalna entropija, kalcimetrija, sedimentacija, kvartar, Ljubljansko barje, Slovenija Abstract Statistical analyses of calcimetric data from boreholes BV-1 (north of Podpeč) and BV-2 (south of Črna vas) on Ljubljansko barje in central Slovenia are given. The original data are represented as unevenly spaced time series that are translated into evenly spaced time series. To calculate the interpolation weighted influence function, a model based on the power correlated influence is defined. Parameter selection is performed based on the maximum entropy principle. In the reconstructed time series, autocorrelation and Fourier power spectrum analyses are performed. In both time series, a transition from white noise to red noise was detected. Such behaviour can be described by a Lorentz process. Red noise is the result of a stochastic process with long-term memory. This effect can be seen predominantly in the autocorrelation function of borehole BV-1. In the calcimetric time series of borehole BV-2, periodicity with a period between 10.0 m and 12.5 m was also detected. We suppose that this period reflects climatic fluctuations during the Quaternary Period. Izvleček V članku je obravnavana statistična analiza kalcimetrijskih podatkov iz vrtin BV-1 (severno od Podpeči) in BV-2 (južno od Črne vasi) na Ljubljanskem barju. Objavljeni podatki predstavljajo neenakomerno časovno vrsto, ki jo je bilo potrebno prevesti v enakomerno časovno vrsto. V ta namen je bil postavljen intepolacijski model, ki temelji na parametrični utežni funkciji s potenčnim vplivom. Parameter utežne funkcije je bil izbran na podlagi principa maksimalne entropije. Na rekonstruiranih časovnih vrstah je bil izveden izračun autokorelacijske funkcije in Fou-rierjeve spektralne analize. V obeh časovnih vrstah je bil ugotovljen prehod od belega do rdečega šuma. Takšen prehod je opisan z Lorentzovim procesom. Rdeči šum je rezultat stohastičnega procesa z dolgim spominom. Ti efekti se odražajo predvsem na avtokorelogramu vrtine BV-1. Na kalcimetrijski časovni vrsti vrtine BV-2 je ugotovljena tudi periodičnost s periodo med 10 m do 12,5 m. Predpostavljamo, da je ta perioda povezana s klimatskimi nihanji v kvartarju. Introduction The marshy plain of Ljubljansko barje that extends south of Ljubljana has been studied extensively since the dawn of modern science. It represents an important boundary limiting the development of Slovenia's capital city Ljubljana as well as for other activities in the area, related especially to agriculture. The sediments of Ljubljansko barje also constitute an important geological and climatic archive. Several authors have studied various aspects of Ljubljansko barje. The most recent overview of the state of the art can be found in Pavsic (2008) and references therein. In our attempt to study past and recent geological processes in Ljubljansko barje, a simple consolidation model has been investigated (Brencic, 2007). During this study it was envisaged that data from boreholes BV-1 and BV-2 (Grimsicar & Ocepek, 1967; Sovinc, 1965; Sercelj, 1965, 1966; Pohar, 1978) could be studied more thoroughly. Unfortunately, until now these two boreholes are the only published deep boreholes from Ljubljansko barje with more or less complete data sets. There are many more (Mencej, 1990) but data are sparse and no details are available. With many remaining open problems, the publication of other available data and even the drilling of new boreholes is urgently needed. In 1959, the BV-1 borehole was drilled down to the dolomite basement to a depth of 103.80 m in the area between Notranje gorice and Podpec, and, in 1962, the BV-2 borehole was drilled south of Crna vas (Fig. 1) to the dolomite basement at 116.80m (Grimsicar & Ocepek, 1967; Pohar, 1978). In the present paper, data of carbonate concentrations (represented as mass ratio of CO2 or CaCO3 in the sample) from boreholes BV-1 (Grimsicar & Ocepek, 1967) and BV-2 (Pohar, 1978) are analysed based on statistical time series techniques. The article consists of two parts that are equally important. In the first part, an unevenly spaced time series was studied by an interpolation technique based on a weighted influence function. Data of CO2 or CaCO3 concentrations along the borehole core (borehole depth) in BV-1 and BV-2 can be described as an unevenly spaced time series where time (length) distances between particular data are not equal. These differences have some empirical distributions. For this type of data, several statistical techniques have been developed (e.g. Diggle, 1990), however our calculations were defined by independent consideration and we have developed our own statistical model. Therefore, the method also represents a new contribution to the data reduction of unevenly spaced time series. Based on this model, we have reconstructed an equally spaced data time series which was the basis for the second part of the paper. In the second part, the structure of the time series was explored with an autocorrelation function in the time domain and with a power spectrum in the frequency domain. The periodicity of the data was extracted and the types of noise contained in the signal were analysed. An attempt was made to interpret the results from a sedimentation point of view. Methods Data reconstruction Data for the analyses were obtained by digitization of originally published borehole profiles given in the works of Grimsicar & Ocepek (1967) and Pohar (1978). The diagram showing the CO2 profile in borehole BV-1 was digitized by scanning all the data points on the borehole diagram (Grimsicar & Ocepek, 1967). In the diagram of the CaCO3 profile in borehole BV-2 (Pohar, 1978), data points were not given. In this case digitization was performed along a line and data points chosen where a discontinuity was observed in the line. These values are probably not exactly the same values as the data obtained in the laboratory by the authors of the original paper. Unfortunately, data especially for borehole BV-2 are not published and are not available to the author. However, because we are not particularly interested in the particular values of the carbonate concentration, but more in the general shape of the curve along the borehole, we believe that data obtained with digitization of published diagrams are suitable for our analyses. From the published data, it follows that values of the CO2 ratio in borehole BV-1 (Grimsicar & Ocepek, 1967) and of CaCO3 ratio in BV-2 (Pohar, 1978) are not directly comparable. Unfortunately information about analytical methods for CO2 and CaCO3 determination are very sparse. Grimsicar & Ocepek (1967) report a calcimetric analysis with HCl acid in the ratio of 1:2.5 where they probably measured the mass ratio between sample and diluting agent. Readings were controlled for pure calcite and the precision of the reading was 0.1%. No information about the grain size of the sampled sediment is available. Pohar (1978) report that samples were weighed and sieved and grains smaller than 1 mm were used for the CaCO3 determination. How the determination of CaCO3 was performed is not reported. Because both boreholes BV-1 and BV-2 were drilled within a short period of time and, in both cases, experts from the same institutions were involved and were cooperating on the project, we can suppose that determination of CO2 and CaCO3 were performed by calcimetry on the same or similar fraction of the sediment sample. Therefore we have to recalculate concentrations of CaCO3 in BV-2 based on molar masses derived from the ratio of CO2 concentrations. By such a transformation, we obtain data on comparable scales. Time series analyses Suppose that X = {xb x2, ..., xh ..., xn} is a random vector where values xi, X2, ..., Xj, ..., xn are defined at time coordinates t = 1,2, ..., n. If At = ti-1 - ti = ti - ti+1, the time series is evenly spaced. If At = ti-1 - ti ^ t—t^ then the time series is uneven and data are not equally spaced. Irregular time series data are common when equally spaced data cannot be obtained owing to limitations of data access, or more often in cases when natural conditions do not allow equally spaced sampling. Such cases are very frequent in stratigraphy. The reader interested in regular time series analyses can find more information elsewhere (e.g Diggle, 1990; We-edon, 2003). For the regular time series, the r-th autocovari-ance coefficient is defined as: where the mean value of random vector X is defined as: ri .V = Index r represents a time lag. The definition of the empirical r-th autocorrelation coefficient rr follows from this result. It represents an estimate of the function p(t) of the stochastic process X that gave rise to sample defined as X. The r-th autocorrelation coefficient rT is defined as K = St ^ So where g0 is the autocovariance at time lag r= 0. A diagram of rr versus r is defined as an autocorrelation diagram that defines the structure of the time series in the time domain. Time series analysis can also be performed in the frequency domain, where components of periodic trends inside the time series can be detected by harmonic analysis. We assume that the time series yt can be modelled as: n/2 y,=X k cos(0, the influence of particular data at ti reduces with distance. The level of reduction is defined by parameter a. At large a, the influence of xi diminishes quickly and at small a, it reduces slowly. Parameter a can be defined based on the expert's judgment. However, it is better to find an independent procedure for selection of the parameter a. There are many possible procedures. For the selection process in our analyses, we have used the maximum entropy principle based on a definition of the entropy E (e.g. Ross, 1989): E = JZtPilogpi, where p., is the probability of a particular event from stochastic processes X estimated from the frequencies in random vector Xe. It was hypothesized that a maximum amount of information is available in the reconstructed time series Xe obtained with parameter at when the entropy Emax I (Eai,Ea2, ... , Ean) is maximum. Kernel densities In a classical statistical analysis, the empirical distribution of the data is usually represented by an histogram. Although many approaches for the determination of histogram class width can be found in the literature, they are still biased representations of the real data. Alternative ways for graphically representing the data is the kernel density approach (e.g. Williams, 1997; Reiss & Thomas, 1997). According to this method, the probability density gb(x,k(x)) for particular data is estimated as: j V t X-X; where k(x) is the kernel such that \k(y)dy = and where b is the bandwidth and b>0. We have used the Epanechnikov kernel (Reiss & Thomas, 1997): =—(1 -X2)I{-\