https://doi.or g/10.31449/inf.v48i14.5176 Informatica 48 (2024) 1–26 1 Methods for Repr esenting Earthquake T ime Series with Networks Romi Koželj 1 , Lovro Šubelj 1, 2 and Jurij Bajc 3 1 University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia 2 University of Ljubljana, Faculty of Social Sciences, Ljubljana, Slovenia 3 University of Ljubljana, Faculty of Education, Ljubljana, Slovenia E-mail: romi.kozelj@gmail.com, lovro.subelj@fri.uni-lj.si, jurij.bajc@pef.uni-lj.si Keywords: network analysis, seismic activity , earthquakes, time series Received: September 10, 2023 An earthquake is a natural phenomenon that occurs as a r esult of the internal dynamics of the Earth. It originates deep below the surface of the planet and cannot be pr edicted with our curr ent knowledge. In this paper , we use a network analysis appr oach to analyze the characteristics and evolution of seismic activity over time. W e implement several network models based on temporal and spatial interactions between earthquakes and on the assumption of self-similarity of seismic activity in selected geographic ar eas. W e cr eate sequences of networks generated in consecutive time windows and compar e the networks between differ ent models and time intervals. Additionally , we calculate a set of network structural characteristics and study their changes over time. The analysis shows that most models pr oduce networks with such a structur e that changes consistently with the intensity of seismic activity . Thus, based on the structural changes of networks, we can r eliably identify the time windows with incr eased seismic activity . Povzetek: V članku je pr edstavljena analiza značilnosti in razvoja seizmične aktivnosti skozi čas z uporabo analize omr ežij. 1 Intr oduction Earthquakes are typically caused by the release of stress that builds up on faults at the junctions of lithospheric plates. Block movement during an earthquake changes the stress field, af fecting seismic activity development in the wider area [1]. The occurrence of earthquakes is influenced by various factors, such as the size of the fault and the amount of stress accumulated next to it. T o directly measure the likelihood of earthquake occurrences in a specific area, one would need to measure the amount of stress accumulated several kilometers under the surface of the Earth, which is technologically challenging. Additionally , the values of other parameters that af fect the amount of stress required to cause plate slippage vary from fault to fault [2]. Some pro- cedures have been developed that can help determine the areas where an earthquake may occur and estimate the max- imum expected magnitude with a certain probability [3, 4]. Nevertheless, modern science does not yet have the tools to determine the location, the magnitude, and the time of an earthquake with an accuracy that would have practical significance [2]. This study focuses on analyzing seismic activity over time from the network science perspective. W e analyze and compare seismic activity patterns in consecutive time win- dows of data, where we use various methods of capturing and representing the data. The observations give interest- ing insights into the development of seismicity , considering the ef fect of earthquakes on each other . The proposed ap- proach may of fer a new way of processing seismic data that could potentially be used to reason about seismic behaviour in the near future. Because it is dif ficult to consider all the factors that in- fluence the occurrence of earthquakes [5], it is helpful to use new approaches that do not require an understanding of the entire dynamics of earthquake occurrence. The seis- mic data analyzed in this paper consist of the locations of the focuses, the times when the earthquakes occurred, and their magnitudes. Such data is usually represented in tab- ular form, where the hidden information about their inter - actions are ignored. The advantage of representing seismic data with a network is that it can capture the information about interactions and reveal hidden patterns in correlations between earthquakes. In this study we use public seismic catalogs for Italy , Japan, Southern California and Northern California. The span of earthquakes in each catalog is divided evenly into several consecutive time intervals. For each time interval, we construct multiple networks, which are built using the existing methods [6, 7, 8, 9, 5, 10, 1 1] and also a new method created as a combination of the published ones. An- alyzing one model at the time, networks constructed in time intervals with higher seismic activity are compared to those constructed in time intervals with lower seismic activity . Networks constructed in the same time intervals are also compared across dif ferent network models. Finally , using generated sequences of networks, we cal- culate a set of network characteristics and assemble them 2 Informatica 48 (2024) 1–26 R. Koželj et al. into time series. W e study whether changes in time se- ries can indicate in which time intervals seismic activity in- creases and if time series show any specific change in seis- micity before a stronger earthquake occurs. The study is contrived to also enable the application of predictive mod- els for time series analysis to test whether values in time series can be predicted. The rest of the paper is or ganized as follows. Section 2 surveys related works. In section 3, we introduce the data and preprocessing techniques to obtain complete datasets. In section 4, we present the approach for generating se- quences of networks. Firstly , section 4.1 proposes a method for dividing time ranges of catalogs into smaller intervals. In section 4.2, we explain the technique for constructing nodes of networks, and lastly , in section 4.3, we present the adopted network models. Section 5 compares the generated networks and section 6 analyzes time series from selected network characteristics. In section 7, we end the paper with conclusions and a discussion of future work. 2 Related works In this section, we summarize the results of other stud- ies that used network constructions analyzed in this paper , omitting the explanation of these networks as they are ex- plained in section 4.3. For clarity , studies are further sum- marised in T able 1. Abe Sumiyoshi et al. [6] generated networks according to the NTS(C) construction (section 4.3.1), where each net- work was built using a lar ge portion of the catalog. They analyzed the dependence of the clustering coef ficientC and the exponentγ of the power -law distribution on the size of the cell. Results show that both parameters stop changing when the cell size exceeds a certain threshold value. They also observed that generated NTSC(C) networks are scale- free and small-word. Abe Sumiyoshi et al. [14] constructed NTS(C) networks in successive time intervals around times of lar ger earth- quakes and observed the evolution of the clustering coef- ficient of these networks. They show that around the time of a lar ge event, the clustering coef ficient displays a char - acteristic behavior; it suddenly jumps at the time of a lar ge earthquake and then slowly decays until it becomes station- ary again following the power law distribution. T elesca Luciano et al. [9] generated VG(E) networks (section 4.3.4) and analyzed the derived degree distribu- tions. Networks were constructed using only earthquakes above a chosen threshold magnitude value, increasing this value from 1.9 to 3.5 with the step of 0.1. Acquired de- gree distributions do not change significantly for dif ferent threshold magnitude values and are power law shaped, in- dicating a scale-free nature of these networks. They also investigated the dependency of the VG(E) structure with respect to time and considered series of earth- quakes occurring regularly and those with randomly shuf- fled inter -event times. Study shows that there is no signifi- cant dif ference between the degree distributions of the orig- inal earthquake sequence and the shuf fled sequences. They ar gue that VG(E) structure depends only on the values of the magnitude. Soghra Rezaei et al. [5] generated NTS(C) and VG(C/E) (section 4.3.4) networks. For each NTS(C) network they also considered only the events above some threshold mag- nitude value, increasing this value for each generated net- work. They observed that the number of links in these net- works has a Gutenber g-Richter law relationship with the threshold magnitude value. Gutenber g-Richter law was also obtained for the VG(C) network, where the power law relationship with the threshold magnitude value was ob- served for the number of nodes in the network. For VG(E) and VG(C) networks, they analyzed the fre- quency with which new links connected to some lar ger earthquake are added to the network when the network grows over time with new occurring earthquakes. Study indicates that this frequency , with respect to time, mimics the Omori law . Joel N. T enenbaum et al. [10] constructed networks ac- cording to the NCV(C) construction (section 4.3.6), where they analyzed the strength of the obtained connections. The time interval in which networks were built was divided into 90-day long non-overlapping subintervals representing the cell vectors. V ector values were constructed by calculating the sum of earthquake magnitudes (SM), the average mag- nitude (AM), the number of events (NE), and the ener gy released by earthquakes (ER) in each of the subintervals. The analysis shows lar ge correlations between cells more than 1000 km apart in these networks. They also split the catalog into two parts along the time axis of the catalog and constructed partly overlapping networks. Study shows that the obtained networks consist of significant structural sim- ilarities. Nastaran Lofti et al. [1 1] divided the time span of earth- quakes in a catalog in multiple overlapping intervals in which they constructed the MLN-NTS(C) networks (sec- tion 4.3.7). They calculated the eigenvector centrality (a network property) and the total magnitude of earthquakes (an earthquake property) from each of the networks and analyzed the correlation between these properties. Anal- ysis was done on networks with one, two, three, and four layers. Results show a higher correlation in networks with three or four layers versus one-layer networks, which in- dicates that multilayer structure better captures earthquake dynamics over time. They also verified that adjacent cells present similar activity patterns by projecting cells into two-dimensional space. This was done using the Principal Component Anal- ysis (PCA) which removed the correlation between the node centrality in dif ferent layers. Xuan He et al. [7] constructed NTSW(C) networks (sec- tion 4.3.2). They show that the degree distribution of these networks obeys power law , consequently making net- works scale-free. Since the frequency of earthquakes with lar ge magnitudes also decays as a power law , the scale- Methods for Representing Earthquake T ime Series with Networks Informatica 48 (2024) 1–26 3 T able 1: Summary of related works. The notationC d denotes the cell size (cell edge length) in a network,N t denotes the number of time intervals,|t| denotes the length of the time intervals, and|t lag | denotes the length of the time lag between the time intervals for a series of networks. The abbreviation G-R stands for the Gutenber g-Richter law . Only the sources of the catalogs that are used in this paper are cited. ref. region network sequence findings model C d [ km] N t |t| |t lag | [6] S. California [12] Japan [13] Iran Chile NTS(C) various sizes (3D) 1 23 years 5 years 3 years 7 years 0 • scale-free and small-world • C andγ take the universal invariant values 0.85 and 1, respectively , when cell size becomes lar ger than a certain threshold [14] S. California [12] NTS(C) 5, 10 (3D) ≈ 40 to 70 240 days, 24 days 0 • C remains stationary before the lar ge earthquake, suddenly jumps at the event, then slowly decays to stationar - ity again [9] Italy [15] VG(E) / 1 5.7 years 0 • scale-free • similar degree distributions for dif- ferent threshold magnitudes • VG(E) structure depends only on the magnitude values [5] Iran, California, Italy-Greece NTS(C), VG(E), VG(C) ≈ 4 to 220 (2D) ≈ 10 to 20 for the Omori law , otherwise 1 7 years for the G-R law , other - wise less but grows network grows over time for the Omori law with the lag≈ 100 to 1000 hrs • G-R law obtained for the NTS(C) and VG(C) networks • Omori law obtained for the VG(C/E) networks [10] Japan [16] NCV(C) SM T=54 , NCV(C) ER T=54 , NCV(C) NE T=54 , NCV(C) AM T=54 NCV(C) T=28 100 (2D) 1 2 entire catalog (13.5 years) 7 years 0 not stated • strong links found representing lar ge correlations between patterns located more than 1000 km apart • networks of dif ferent periods are structurally similar [1 1] N. California [17] Iran MLN-NTS(C)T=1 , MLN-NTS(C)T=2 , MLN-NTS(C)T=3 , MLN-NTS(C)T=4 75 (2D) 66× 55 (2D) 8 3 years 1 year • earthquake dynamic is better cap- tured with a multilayer structure • nearby regions have similar seismic activity patters [7] S. California [12] NTSW(C) 5, 10, 15, 20 (2D) 10 for the small-world concept, otherwise 1 1 year (small-world), 10 years, 20 years 0 • scale-free and small-world • G-R law obtained • Omori law obtained [8] S. California [12] NHTSW(E) / 1 17 years 0 • aftershocks identification • Omori law obtained • power law distribution of link weights and distances found 4 Informatica 48 (2024) 1–26 R. Koželj et al. free nature of the degree distribution is consistent with the Gutenber g-Richter law . They calculated the clustering coef ficient and the average shortest path for networks constructed in consecutive one year time intervals. The clustering coef ficient showed to be much lar ger than in random networks, while the average shortest path was very small. Both results indicate a small- world structure of NTSW(C) networks. Furthermore, the Omori law emer ged as a result of counting the out-degree links of nodes containing lar ge earthquakes over time in one year networks. Marco Baiesi et al. [8] generated NHTSW(C) networks (section 4.3.3), where they dealt with the subject of after - shocks identification. Each earthquake in this network is connected to those earthquakes back in time that have trig- gered the event with a high probability . The links in the network were determined by selecting the lar gest (thresh- old) value (n ij ) max , before which the fully connected net- work breaks into clusters. The threshold value was obtained from the graph showing the size distribution of the con- nected components depending on the selected values ofn ij . They determined the point at which the smaller components mer ge into one lar ger one and set the value of n ij at this point as (n ij ) max . The Omori law was obtained for this network, as the number of outgoing links was found to be scale-free. The distribution of link weights and the distribution of distances between earthquakes and their aftershocks exhibits a power law behaviour . They state that the fat tail of the distance distribution suggests that aftershocks collection with fixed space windows is not appropriate. It can be observed that studies relevant to this paper fo- cus primarily on investigating one network or , at most, a few networks from the entire catalog. However , the main feature of this study is examining the time evolution of net- works and their characteristics where we consider a signif- icant number of time points. 3 Data W e analyze the data of detected seismic activity in Japan, Italy , and California. The data was obtained from four on- line sources (T able 2). T able 2: Seismic catalogs and online sources. region source Japan High Sensitivity Seismograph Network Laboratory [13] Italy National Institute of Geophysics and V olcanology [15] N. California Northern California Earthquake Data Center [17] S. California Southern California Earthquake Data Center [12] Since the time span of California catalogs goes far back in time (from 1932 for Southern California and from 1966 for Northern California), we examine the number of earth- quakes in individual years. Due to updates in seismolog- ical equipment and increase in the density of the network of seismic stations, the number of detected earthquakes in- creases from year to year up to 1981, when the seismic sta- tions throughout California improved. W e consider the data recorded after 1981 as more reliable and suitable for further analysis, and therefore limit these two catalogs only to the data from 1981 onwards. Since both California catalogs cover the same geograph- ical area, despite one containing earthquakes detected at stations in Northern California and the other at stations in Southern California, we also examine how much the num- ber of earthquakes changes if the areas are reduced more to the north or to the south. W e conclude that the earth- quakes that occurred in Northern California were not well detected by the stations in Southern California, and vice versa. In the following, we therefore limit the area of North- ern California to 35 ◦ ≤ latitude ≤ 42 ◦ and − 125 ◦ ≤ longitude≤− 116 ◦ , and the area of Southern California to 32 ◦ ≤ latitude ≤ 37 ◦ and− 122 ◦ ≤ longitude ≤ − 114 ◦ . If the areas are reduced further , the number of earthquakes decreases considerably , omitting relevant data. One of the fundamental empirical laws in seismology is the Gutenber g-Richter law [18], which states that the num- ber of earthquakes decreases exponentially with increasing magnitude. This is described by the relationship log 10 N =a+bM, (1) whereN denotes the number of earthquakes of magnitude M , anda andb < 0 are the corresponding constants. Fig- ure 1 shows the number of detected earthquakes of a partic- ular magnitude for Japanese and Italian catalogs. Figure 1: The number of detected earthquakes N of dif- ferent magnitudesM for Japanese (left) and Italian (right) catalogs. The two vertical lines indicate the minimum and maximum magnitude thresholds that are also listed in square brackets at the top of the Figure. One can observe that the linear relationship (1) does not apply to very weak earthquakes. Even though weaker earth- quakes occur more frequently than stronger ones, they are often not detected by seismic observatories, since ground Methods for Representing Earthquake T ime Series with Networks Informatica 48 (2024) 1–26 5 shaking during weak earthquakes is only measurable close to the hypocenters. Thus, catalogs containing such earth- quakes are incomplete. In order to obtain more complete catalogs, we determine a minimum magnitude threshold for each of the catalogs and keep only earthquakes with mag- nitudes above the threshold. Although this significantly re- duces the amount of data, we know with a considerable de- gree of reliability that all earthquakes above the threshold are included in the data. The minimum magnitude thresh- olds are determined by maximizing the goodness of fitR 2 of the regression lines (1), while still trying to keep as much data as possible. Since strong earthquakes are very rare, deviations from the linear dependence (1) occurs also at high magnitudes. For this reason, we also define the maximum magnitude threshold for each catalog and omit all earthquakes above this threshold when determining the regression lines. W e employ the fact that earthquakes occur according to the Poisson distribution λ k e λ k! [19, 20], which indicates the probability of the occurrence ofk events in a certain time in- terval if the expected number of events in an interval equals toλ . The confidence interval for the Poisson distribution is λ ± σ with the standard deviation ofσ = √ λ . W e determine a maximum magnitude threshold for each of the catalogs such thatσ /λ = 1/ √ λ ≤ 0. 5 . T able 3 shows the number of earthquakes in catalogs before and after preprocessing, the minimum magnitude thresholds and the catalogs’ time ranges. T able 3: The size of the catalogs N before and after pre- processing of the data, the minimum magnitude thresholds M min and the time ranges of complete catalogs. region N (before) M min N (after) time [ years] Japan 1 109 140 1.4 316 558 12 Italy 370 564 2.3 61 708 35 N. California 967 897 1.6 213 061 39 S. California 751 491 1.6 233 250 39 4 Methodology In this section, we describe the generation of sequences of networks. This includes explanation of division of cata- logs’ time ranges into smaller consecutive time intervals, description of the technique used to cut the Earth’ s interior into smaller pieces to obtain nodes of generated networks, and presentation of network models. 4.1 T ime intervals The earthquake catalogs contain a dif ferent number of earthquakes that occurred in dif ferent time ranges (T able 2). When determining the length of time intervals in which we generate individual networks, we try to ensure the best pos- sible comparability between networks across all earthquake catalogs. In addition, we want to create enough time in- tervals so that the time series containing the feature values calculated from the generated networks will contain enough points for a successful learning of the time series predictive model. W e decide that the time series should contain at least 200 points. One way to ensure the comparability of the gen- erated networks is to predetermine the average number of earthquakes within individual time intervals⟨ e t ⟩ and keep this number the same for all earthquake catalogs. In this way , all generated networks on average consist of the same number of earthquakes. Because a strong earthquake triggers many aftershocks, time intervals in which a stronger earthquake occurs contain a significantly higher number of events (Figure 2). Gener - ating networks from a lar ge number of events can result in slow execution of network model algorithms and slow cal- culation of network characteristics. Conversely , if the num- ber of events is too small, the generated networks could be meaningless or incomparable since they would contain too little data. Figure 2: The number of earthquakes in each of the time intervals in the Japanese (left) and the S. California catalog (right). The time intervals are generated according to the data in T able 4. Therefore, the search for the appropriate value ⟨ e t ⟩ is started from lar ger values to smaller ones. W e stop at values 600 ≲ ⟨ e t ⟩ ≲ 800 , which we accept as reasonable given the execution time of the algorithms within time intervals containing a lar ge number of events. If the number of inter - vals calculated using such average number of earthquakes per interval is insuf ficient, the intervals are of fset so that they partially overlap. This method increases the number of time intervals while maintaining the same average num- ber of earthquakes per interval. The obtained time intervals are shown in T able 4. 6 Informatica 48 (2024) 1–26 R. Koželj et al. T able 4: The number of time intervals N t , the length of time intervals |t| , the length of the lag between the time intervals|t lag | , and the average number of earthquakes⟨ e t ⟩ within time intervals. The average number of earthquakes is rounded to a whole number . region N t |t| [ days] |t lag | [ days] ⟨ e t ⟩ Japan 447 10 0 707 Italy 256 150 100 718 N. California 359 40 0 589 S. California 359 40 0 649 Figure 3: An example of generated cells from the areal view (right) and the vertical cross-section view (left). 4.2 Division of Earth into cells The nodes of most of the generated networks represent smaller , non-overlapping parts of the Earth’ s interior . W e obtain them by cutting the area defined by the earthquake hypocenters of each catalog into 3D or 2D cells of approx- imately the same size. 3D cells are roughly cubes with ap- proximately equally long edges in all three directions. They are generated by cutting the Earth along parallels, meridi- ans, and in depth. 2D cells have a shape of upright prisms with a base face of a roughly square shape. They are gener - ated by cutting the area only along the Earth’ s surface, i.e., along parallels and meridians, where each obtained part is stretched inward to the maximum depth of the area. Because connections in networks are formed according to the interactions between the earthquakes located in the cells, we want the volumes of the cells to be as compa- rable as possible. Otherwise, lar ger cells could contain more earthquakes simply because of their size, thus form- ing more active nodes and a greater number of connections than smaller cells. The area is therefore cut so that the cells closer to the poles cover a lar ger angle of longitude than those closer to the equator , and the cells deeper in the Earth cover a lar ger angle of longitude and latitude than those closer to the surface (Figure 3). T able 5 shows the number of cells obtained when cutting geographic areas into 3D and 2D cells with edge lengths of about 5 km and 20 km. When generating networks, we use 3D cells of sizes 5, 10, and 20 km and 2D cells of sizes 5 and 10 km. The cho- sen cell sizes are comparable to those used in the literature. It is important to note that if the cells used are too small, only the hypocenters, which are point size, are considered, while the earthquakes are much lar ger . On the other hand, using too lar ge cells can cause a deterioration of spatial res- olution. T able 5: The number of cells containing at least one earth- quake when cutting the areas into 3D and 2D cells with edge lengths of roughly 5 km and 20 km. region 5 km (3D) 5 km (2D) 20 km (3D) 20 km (2D) Japan 31 647 14 827 2 920 1 821 Italy 22 261 12 131 3 812 2 182 N. California 22 083 9 616 1 924 1 159 S. California 13 731 6 21 1 1 012 691 4.3 Network models In this chapter , we describe network models used f or con- structing seismological networks. The nodes of some net- works are represented by earthquakes that connect to each other based on their interactions. In other networks, nodes are represented by cells of a geographic area that are con- nected based on interactions between earthquakes located in the cells. For clarity , we label networks consisting of earthquakes with a mark (E) at the end of the network name and networks consisting of cells with a mark (C). All net- works are analyzed as simple undirected unweighted net- works. 4.3.1 T ime sequence of earthquakes The construction of the network based on time sequence of earthquakes NTS(C) [6] is built on the assumption that the spatial connectivity of seismic events is long-range since earthquakes can also appear as the cause of preceding very distant earthquakes [21, 22]. The nodes of this network are presented by cells of the geographic area. The connections between cells are generated by earthquakes that occur in time directly one after the other , regardless of their distance to each other . When implementing this network, we first ar - range the earthquakes according to the time of their occur - rence. Then, for each successive pair of earthquakes, we generate a connection between the corresponding cells 1 . 4.3.2 T ime-space windows The construction of the network based on the time-space windows NTSW(C/E) [7] is created on the assumption that the elastic ener gy released during an earthquake causes the resulting seismic activity in the surrounding areas for a time interval t from the time of the earthquake. The area sub- jected to the influence of seismic waves is defined as a sphere with the center at the earthquake’ s focus and the radius R . Earthquakes with higher magnitudes af fect the 1 The nodes of NTS network cannot be presented by earthquakes since forming such connections between earthquakes would result in a “chain” network. Methods for Representing Earthquake T ime Series with Networks Informatica 48 (2024) 1–26 7 seismic activity of a wider area and for a longer time than earthquakes with lower magnitudes. The radius R of the surrounding area af fected by an earthquake of magnitude M and the length of the time interval t are calculated ac- cording to equations log 10 R =a R M +b R , (2) log 10 t =a t M +b t , (3) where a R , b R , a t and b t denote the corresponding con- stants [7]. Using equations (2) and (3), we generate a time-space window around the focal point of each earthquake in the catalog. The size of the window depends on the magnitude of the earthquake. An earthquake that generates a window triggers a link to each earthquake within that window . T o calculate the distance between two earthquake focuses, we use the Euclidean distance. The values of parameters a R , b R , a t and b t for the area of S. California are determined in[23] by analyzing the seismic catalog and developing a new method for identifying the timet(M) and spaceR(M) intervals that include most of the aftershocks related to an earthquake of magnitudeM . Figure 4 shows earthquakes’ spatial and temporal influ- ence depending on the magnitude in the area of S. California with the associated parameters resulting from works [23, 7]. Figure 4: Spatial (left) and temporal (right) influence of earthquakes as a function of magnitude in the area of S. Cal- ifornia. Adapted from [23, 7]. The dependencies show , for example, that an earthquake with a magnitude of M = 6. 5 af fects an area up to R = 61 km from the focus of the earthquake for a period of t = 790 days from the time of the earthquake. Since the curve presenting the time influence as a function of the magnitude breaks at the magnitude value M = 6. 5 , the time dependence is described by two pairs of parameters. In [7] it is stated that the dif ferent dependences of t(M) at M > 6. 5 and M ≤ 6. 5 are determined empirically . Ar guments for the dual nature of this dependence are not provided. The obtained parameter values depend on the observed geographic area [7]. The relevance of these parameters for Japan, Italy , or even N. California is, therefore, question- able. Since we do not have the data on aftershocks or the sizes of time-space windows for the rest of the Earth, and due to dif ficulty of obtaining these parameters, this network construction and the obtained parameters are only used on the S. California catalog. The nodes of this network can be presented by earthquakes (NTSW(E)) or by cells of the geographic area (NTSW(C)). 4.3.3 Hyperbolic time-space windows The construction of the network based on hyperbolic time- space windows NHTSW(C/E) [8] is designed on a similar metric as the NTSW(C/E) construction, but instead of rect- angular windows, the hyperbolic windows are introduced. The metrics used to define the hyperbolic windows takes into account that the impact of earthquakes decreases with time and distance (Figure 5). Figure 5: Schematic representation of the rectangular space-time window used in NTSW(C/E) (dashed rectangle) and the hyperbolic window used in NHTSW(C/E) (solid area). Units are arbitrary . Adapted from [8]. T o determine the connections between earthquakes in a network, we need to calculate the expected number of earth- quakes n ij =HtD fd 10 − bMi (4) within the time-space window defined by each pair of earth- quakesi andj . In equation (4),t represents the time dif fer - ence between the occurrence of the two earthquakes (t i < t j ), D is the distance between earthquakes’ focuses, fd is the fractal dimension [24] of the geographic area where the two earthquakes occurred, andH is the corresponding con- stant. V ariableM i is the magnitude of the earthquakei and is represented via the Gutenber g Richter ’ s lawN ∝ 10 bMi (equation (1)),b< 0 ). Equation (4) shows that a shorter time t and a shorter distanceD between two earthquakes both reduce the value ofn ij . The correlation value between earthquakesi andj is, consequently , inversely proportional to the value ofn ij . Using equation (4), we calculate the correlation value for each pair of earthquakes. Since the parameterst andD in equation (4) appear in the product, earthquakei can also be strongly correlated with earthquakej , which occurred at a greater distance in time and a smaller distance in space, or vice versa. Therefore, the resulting window is hyperbolic (Figure 5). 8 Informatica 48 (2024) 1–26 R. Koželj et al. Fractal dimension fd is calculated using the box- counting method [8], where grids of dif ferent sizes are placed on the geographic area. For each grid size, we count the number of grid cells that contain at least one earthquake. In our case, the grid is represented by an arrangement of non-empty 3D cells. Thefd value is calculated as the slope of the line representinglog(N) values as a function of the log(Cd − 1 ) values, where N is the number of non-empty grid cells, andCd is the length of cell edges (Figure 6). W e calculate thefd value for each of the catalogs. From Figure 6 one can see that the number of non-empty cells in denser grids (the right part of graphs) changes dif- ferently as in the case of sparser grids (the left part of graphs). Because the relationship between the variablesN and Cd − 1 on a log-log scale must be linear , we use only the points corresponding to the linear relationship between the variables when determining the slope of the regression lines (Figure 6). Figure 6: The number of non-empty cells N along the inverse value of the length of cell edges Cd − 1 for the Japanese (left) and the Italian (right) catalog when the area is cut into 3D cells. The vertical dashed lines indicate the part of the data used to calculate the regression line. Since we want to avoid generating networks that are to densely connected, we keep only a share of the most corre- lated connections. The number of connections is calculated by determining the average degree⟨ k⟩ in each network and keeping⟨ k⟩ the same for all constructed networks. This is a simple but valid way to ensure the comparability of the ob- tained networks. W e use the relationship⟨ k⟩ = 2m n and cal- culatem at a value of⟨ k⟩ = 10 [25], thus keepingm = 5n of the most correlated links in the networks. The nodes of this network can be earthquakes (NHTSW(E)) or cells of the geographic area (NHTSW(C)). 4.3.4 V isibility graph In the construction of the visibility graph VG(C/E) [9, 5], the earthquakes represent the points of the graph that shows the time of the earthquakes on the x -axis and the magni- tude of the earthquakes on they -axis. T wo earthquakes are connected if no other earthquake happened in the time be- tween two earthquakes with a magnitude above the straight line connecting the points representing the initial two earth- quakes. If such an earthquake would exist, it would “over - shadow” the influence of the first earthquake on the second one; thus, the earthquakes would not “see” each other . Solid lines in Figure 7 show the links corresponding to the visibility graph since no point lies above any of the links. T o make distinguishing between correct and incorrect con- nections in the network easier , we also show dotted vertical lines connecting the points with thex -axis. Solid lines show all possible connections between these points since no link can be made between any other pair of points. Dashed lines in Figure 7 show two additional links added to the network that are inadequate since the link between earthquakes 1 and 4 is overshadowed by earthquakes 2 and 3, and the link between earthquakes 3 and 7 is overshad- owed by earthquake 5. Therefore, the network containing these two links does not correspond to the construction of the visibility graph. The nodes of the visibility graph can be earthquakes (VG(E)) or cells of the geographic area where instead of earthquakes we connect their corresponding cells (VG(C)). 4.3.5 Spatially upgraded visibility graph Connections in the visibility graph are created without tak- ing into account distances between earthquakes. T wo earth- quakes are connected if they “see” each other , even if they are very far apart. W e upgrade the visibility g raph by generating a spatial window defined by equation (2) around the focal point of each earthquake. The size of the window depends on the magnitude of the earthquake. The parameters for generat- ing the window are listed in the left part of Figure 4. Each earthquake is connected with all those earthquakes that are “visible” with the given earthquake and are also located within the corresponding spatial window . Consequently , the “visibility” between earthquakes is limited by their dis- tance from each other . Earthquakes located outside of the Figure 7: An example of a network corresponding to the visibility graph (links of a network are shown with solid lines) and an example of a network that does not correspond to the visibility graph (links of a network are shown with solid and dashed lines). Dots represent nodes of the net- work. Units are arbitrary . Methods for Representing Earthquake T ime Series with Networks Informatica 48 (2024) 1–26 9 window cannot prevent connections in the window . W e name this construction a spatially upgraded visibility graph SVG(C/E). Nodes can be earthquakes (SVG(E)) or cells of the ge- ographic area (SVG(C)). Due to the dif ficulty of obtaining the appropriate parameters in equation (2), the described network construction is only used on the S. California cat- alog (as the construction described in section 4.3.3). 4.3.6 Corr elation between cell characteristics In the construction of the network based on a corre- lation between vectors that represent cell characteristics NCV(C) [10], we assign a vector to each of the cells of the geographic area that contains at least one earthquake. The vector elements represent a chosen cell characteristics in shorter consecutive time intervals. Cell vectors are con- structed as follows. Firstly , we divide the time interval in which the network is generated into T equally long non- overlapping subintervals. Then, for each of the cells, we calculate a value fors using the earthquakes located in the cell within each subinterval. Those values represent vector elements. Cells are connected based on the Pearson correlation co- ef ficient r [26] between the corresponding vectors, where the correlation value represents the strength of the connec- tion between two vectors. The value forr between vectors x andy is calculated using the equation r = ∑ j (x j −⟨ x⟩ )(y j −⟨ y⟩ ) √ ∑ j (x j −⟨ x⟩ ) 2 √ ∑ j (y j −⟨ y⟩ ) 2 (5) with values expanding in the range [-1,1]. Because a strong correlation is represented in negative and positive correlation values, the most correlated links are those with a high absolute value ofr . T o avoid constructing too dense networks, we keep only those connections with |r| greater than some predetermined threshold value r min . The value for r min is determined by analyzing the number of pairs of cells within interval0. 5<|r|≤ 1 . In each of the catalogs, the number of cell pairs withr = 1 is much higher than the number of pairs with 0. 5 <|r| < 1 . Additionally , the number of pairs with r values in 0. 5 < |r| < 1 are very low . Thus, choosing any r value in interval 0. 5 < |r| < 1 represents an equally good choice forr min since the generated networks would contain approximately the same number of connections. W e setr min to 0.95. W e use two characteristics of cells – the sum of earth- quake magnitudes ( ∑ i M i ) and the amount of ener gy re- leased by earthquakes ( ∑ i 10 1. 5Mi+16. 1 ). W e set the length of vectors toT = 10 . In one case, we insert the sum of mag- nitudes as a characteristic into the cell vectors, and in the other , we insert the ener gy released by earthquakes. Then we increase T to 20 and repeat the process, getting four types of NCV(C) networks in total. 4.3.7 Multilayer network The multilayer network MLN-NTS(C) [1 1] consists of sev- eral NTS(C) networks constructed in shorter consecutive time intervals. The time window in which the multilayer network is generated is again divided into T equally long non-overlapping subintervals. These subintervals represent layers of a mutlilayer network. W ithin each subinterval, we build a NTS(C) network. By connecting identical network cells in successive time intervals, we generate the MLN- NTS(C) network (Figure 8). Figure 8: Representation of the multilayer network. The nodes of individual NTS(C) i networks are shown with the same node shape. Empty cells are shown in white color . The nodes of the multilayer network are cells of the geo- graphic area. All resulting NTS(C) networks consist of the same set of nodes. Since the NTS(C) networks have the same number of nodes, the MLN-NTS(C) network could also contain empty cells, i.e., cells that do not contain earth- quakes and do not form connections with the rest of the cells within individual layers. However , each node represents a non-empty cell in at least one of the layers of the entire net- work. By expanding the NTS(C) network into a multilayer net- work, the nodes representing the same cell are split into sev- eral sequentially connected nodes. In this manner , earth- quakes located in the same cell are distributed over several cells with the same location depending on the time of their occurrence. Thus, earthquakes that occurred further apart in time no longer belong to the same node, even if they lie very close to each other in space. W e construct multilayer networks with T = 1 , T = 2 andT = 3 layers 2 . 5 V isualisation and comparison of networks In this section, we describe the structure of the resulting net- works, focusing on the number of components, the number of connections, and the connectivity of the networks. 2 A multilayer network with one layer represents a NTS(C) network. 10 Informatica 48 (2024) 1–26 R. Koželj et al. When generating networks according to the NTS(C) con- struction, we arrange earthquakes by the time of their oc- currence and connect the cells corresponding to consecu- tive pairs of earthquakes. By doing so, none of the cells is disconnected from the rest of the network, and the network always consists of only one component (Figure 9: (a)-(c)). In the MLN-NTS(C) construction, the time interval in which the network is generated is divided into two or three equally long subintervals (layers). W ithin each layer , we generate individual NTS(C) networks that are then con- nected to each other via identical cells between succes- sive layers. The MLN-NTS(C) network also consists of only one component and does not contain isolated nodes (Figure 9: (d)-(i)). In time intervals in which a stronger earthquake occurs, the number of earthquakes and, thus, the number of connections in individual NTS(C) networks within the MLN-NTS(C) network increases. As a result, the cells within the layers are more densely connected than the cells between the layers (Figure 9: (e)-(i)). In the VG(E) construction, where earthquakes are con- nected according to the “visibility” condition, each earth- quake is always connected to its nearest neighbor in time. In the VG(C) network, instead of connections between earth- quakes, we generate connections between the correspond- ing cells. The network, according to the VG(E) construc- tion, contains all the connections contained in the NTS(C) network plus additional ones that arise due to the “visibil- ity” condition between earthquakes. Thus, VG(C/E) net- works also consist of only one component and do not con- tain isolated nodes. Figure 10 ((a)-(f)) shows some of the networks generated by the VG(C/E) construction. It can be seen from Figure 10 ((a), (b), (d), (e)) that (stronger) earthquakes in the VG(E) network “shadow” the connections between some (weaker) earthquakes, which results in clearly visible groups in a net- work. The “shadowing” ef fect is even more pronounced in the time intervals where a stronger earthquake occurs since such an earthquake triggers a lar ge number of aftershocks of various magnitudes. Because the number of earth- quakes with lower magnitudes is exponentially greater than the number of earthquakes with higher magnitudes (Gutenber g-Richter law , equation (1)), a lot of aftershocks have relatively low magnitudes. Since the occurrence of a strong earthquake results in a lot of aftershocks, there is a high probability that among the multitude of weaker earthquakes that lie close together in time, a stronger earth- quake will also occur , thus breaking the connections be- tween those weaker earthquakes. It is suf ficient that the magnitude of the “stronger” earthquake is only slightly higher than the magnitude of “weaker” earthquakes. The phenomenon of “shadowing” of connections is re- peated at dif ferent magnitude values, where stronger earth- quakes shadow the connections between weaker ones. Thus, weaker earthquakes that lie close together in time form a lar ge number of groups that are connected to each other through earthquakes of higher magnitudes (Figure 10: (b), (e)). When earthquakes are grouped into cells (VG(C) net- work), a spatial component is introduced into the network that groups nearby earthquakes into cells, regardless of their “visibility” to each other . Consequently , the branching of the network, which is the result of “visibility” between earthquakes, is somewhat destroyed. This is also the reason for the dif ferences in behavior of the time series consisting of characteristics calculated from the networks according to the VG(E) and VG(C) constructions (section 6). In the SVG(E) network, connections are generated simi- larly as in the VG(E) network, whereby the time-magnitude space in which the connections are generated is limited by a spatial window , the size of which is determined by the mag- nitude of the individual earthquakes. Due to the introduc- tion of a spatial component, earthquakes that are adjacent in time are no longer necessarily connected to each other . It can be seen from Figure 10 ((g)-(i)) that networks generated according to the SVG(E) construction break up into several components, among which quite a few consist of individ- ual nodes. In the component containing a lar ger number of nodes, the shape of the network is still very similar to the one obtained with the VG(E) network (Figure 10: (b), (e), (h)). When earthquakes are grouped into cells, the branching of the SVG(E) network which is a result of the “visibility” between earthquakes is rather destroyed (similar as in the VG(C) network). When generating networks with increas- ingly lar ger cells, the components consist of an increasingly lower number of nodes, where the number of components remains approximately the same or decreases very slowly . In the NTSW(E) construction, we generate two windows around each of the earthquakes - a spatial and a temporal window , the sizes of which are determined by the magni- tude of the individual earthquakes. W e connect each earth- quake with all the earthquakes within the intersection of the two windows. In the NTSW(E) network, as well as in the SVG(E) network, earthquakes are connected only to those earthquakes that occured spatially close together . However , the connection between earthquakes in the NTSW(E) network is also influenced by the temporal dis- tance between earthquakes, while in the SVG(E) network, the connection between earthquakes is influenced by their “visibility” to each other . Thus, certain pairs of earthquakes in the NTSW(E) network are connected because they oc- curred close together in time, while in the SVG(E) network, these same pairs of earthquakes might not be connected be- cause they might not “see” each other , and vice versa. By mer ging nearby earthquakes into cells, we gener - ate the NTSW(C) network. Figure 1 1 shows some of the networks generated by the NTSW(C/E) construction. The temporal and spatial windows belonging to earthquakes of higher magnitudes have a rather lar ge temporal and spatial range (Figure 4). For this reason, NTSW(E) networks gen- erated in time intervals in which a major earthquake oc- curs contain a higher number of connections than networks Methods for Representing Earthquake T ime Series with Networks Informatica 48 (2024) 1–26 1 1 Figure 9: Some networks generated by the NTS(C) ((a)-(c)) and MLN-NTS(C) ((d)-(i)) constructions composed of cells of dif ferent sizes. The MLN-NTS(C) networks consist of T = 2 and T = 3 layers. Networks are generated from the Japanese catalog. The times given in the titles of the images represent the sequential number of a 10-day interval in which a stronger earthquake occurred (t = 382 ,t = 385 ) or the interval in which a stronger earthquake did not occur (t = 300 ). The networks in this Figure and in the rest of the Figures in this section were drawn with the program Gephi [27]. Figure 10: Some networks generated by the VG(C/E) ((a)-(f)) and SVG(C/E) ((g)-(i)) constructions, where the nodes in the VG(C) and SVG(C) networks are composed of cells of dif ferent sizes. Networks in (a)-(c) are generated from the Japanese catalog and the rest from the S. California catalog. The times given in the title of the images represent the sequential number of a 10-day (Japan) or 40-day (S. California) interval in which a stronger earthquake occurred (t = 382 , t = 105 ) or the time interval in which a stronger earthquake did not occur (t = 300 ,t = 001 ). generated by other constructions. The number of connec- tions slowly decreases with the conversion of the NTSW(E) into the NTSW(C) network and with increasingly lar ger cell sizes in the NTSW(C) network (Figure 1 1). When generating networks according to the NHTSW(E) construction, we connect pairs of earthquakes that have a suf ficiently low value ofn ij calculated according to equa- tion (4) so that the resulting networks have an average de- gree of⟨ k⟩ = 10 . In equation (4),n ij depends on the time dif ference between the occurrence of two earthquakes, the distance between the hypocenters of these earthquakes, and on the magnitude of an earthquake that occurred first in the considered pair of earthquakes. Since the magnitude of the earthquake in equation (4) appears as a power of 10, it has a greater influence on the calculated value ofn ij compared to the product of time and distance. The results (Figure 12) show that in the time intervals in which a stronger earthquake occurs, the distances (and times) between connected earthquakes are shorter on aver - age than in the time intervals in which a stronger earthquake 12 Informatica 48 (2024) 1–26 R. Koželj et al. Figure 1 1: Some networks generated by the NTSW(E/C) construction, where the nodes consist of earthquakes or cells of dif ferent sizes. Networks are generated from the S. California catalog. The times given in the title of the im- ages represent the sequential number of a 40-day interval in which a stronger earthquake occurred (t = 105 ) or the time interval in which a stronger earthquake did not occur (t = 001 ). did not occur . This is because the highest magnitude in the time interval determines the lowest value ofn ij (the high- est correlation) among earthquakes connected in a network. Since we keep onlym = 5n of the most correlated connec- tions, we ignore connections with n ij above the specific threshold value that dif fers from network to network. This threshold value depends on the lowest value ofn ij in the network and, consequently , on the highest magni- tude in the time interval. In order for the other earthquakes (with magnitudes lower than the one of the strongest earth- quake in the time interval) to be able to form connections in a network, their distance in time and space from the rest of the earthquakes needs to be comparably smaller than distances in time and space corresponding to the strongest earthquake. Otherwise, these weaker earthquakes cannot produce the low enough n ij to form connections in a net- work. This means that most of the connected earthquakes in the NHTSW(E) network lie very close in time and space. For this reason, a very rough transition between the NHTSW(E) and NHTSW(C) constructions can be observed in Figure 12 ((d)→ (e)), where groups of connected earthquakes are sud- Figure 12: Some networks generated by the NHTSW(E/C) construction, where the nodes consist of earthquakes or cells of dif ferent sizes. Networks are generated from the Japanese ((d)-(f)) and the S. California catalog ((a)-(c)). The times given in the title of the images represent the se- quential number of a 10-day (Japan) or 40-day (S. Califor - nia) time interval in which a stronger earthquake occurred (t = 105 ,t = 385 ) or the time interval in which a stronger earthquake did not occur (t = 102 ). denly contained in individual cells. In the NTSW(C/E) construction (Figure 1 1: (c)→ (d)), in which the associated earthquakes are located within the spatial window , a tran- sition from a network with earthquakes to a network with cells is not so pronounced since the spatial windows are rel- atively wide (Figure 4). In the NCV(C) network, the time interval in which the network is generated is divided into T equally long time subintervals. For each of the cells, we generate a vector of length T . The elements of the vector represent some char - acteristic of the cell, calculated from the earthquakes that occurred within the cell in each of the time subintervals. Cells are connected based on the Pearson correlation coef- ficient between pairs of vectors. When generating the NCV(C) network, we also analyze the vectors belonging to each cell. From the analysis of vectors we o bserve that the obtained vectors consist of al- most only zero values. In the majority of the vectors, only one of the elements is non-zero. This means that the earth- quakes in most of the cells occurred within the same time subinterval. Methods for Representing Earthquake T ime Series with Networks Informatica 48 (2024) 1–26 13 Figure 13: Some networks generated by the NCV(C) construction consisting of cells of dif ferent sizes. Networks are generated from the J apanese ((a)-(e)) and S. California catalogs ((f)-(i)). The times given in the title of the images repre- sent the sequential number of a 10-day (Japanese) or 40-day (S. California) time interval in which a stronger earthquake occurred (t = 105 , t = 382 ) or the time interval in which the stronger earthquake did not occur (t = 300 ). In the im- ages, the abbreviation SM indicates networks where the characteristic used to calculate elements of the vectors is the sum of earthquake magnitudes, and the abbreviation ER indicates networks where the characteristic is the amount of ener gy released by earthquakes. As a result, when calculating the Pearson correlation co- ef ficient, we mostly consider two types of pairs of vectors – those that contain a non-zero element at the same posi- tion of the vector and those that contain a non-zero element at dif ferent position of the vector , where all other elements in the vector are equal to 0. In the first case, the value of the Pearson correlation coef ficient corresponding to a given pair of vectors is r = 1 , while in the second case, it is r = − 1/(T − 1) . This is true regardless of the value of the non-zero element (only positive values are considered), which can be deduced from the equation (5). Thus, most pairs of cells are connected by the Pearson correlation value of r = − 1/(T − 1) . However , these connections are not present in the network since|r| = |− 1/(T − 1)| ≤ 0. 95 forT > 2 . Figure 13 shows that NCV(C) networks consist of well- defined groups. These groups are the result of connections between cells with vectors composed of only one non-zero element. The non-zero element in all vectors in the same group appears at the same position. The number of well- defined groups in a network is equal to the number of ele- ments in vectors belonging to the cells. Cell vectors in one group dif fer from cell vectors in another group in the po- sition of the non-zero element. Pairs of cells in which the non-zero element appears at non-overlapping positions are not connected to each other . W ell-defined groups in the network are, in some cases, connected to each other (Figure 13: (b), (d), (h), (i)). The value of|r| > 0. 95 can also be obtained, for example, from a pair of vectors where one of the vectors contains only one non-zero element and the other contains several non- zero elements. In this case, the non-zero component in the first vector must match one of the non-zero components in the second vector 3 . Thus, cells corresponding to vectors with several non-zero elements can also be part of individ- ual well-defined groups. V ectors that contain more non-zero elements can be con- nected to other vectors that also contain several non-zero elements, but are not connected in well-defined groups. In this way , shorter or longer paths can emer ge in the network, starting in individual groups and connecting groups to each other . Whether the network will contain such paths depends on the activity of the area, the distribution of earthquakes in the area, and the size of the cells. In the time intervals in which a stronger earthquake oc- curs, some of the groups are more densely connected than others (Figure 13: (b)-(i)). This is due to the fact that the stronger earthquake did not occur at the beginning of the time interval in which the network is generated but later . If a stronger earthquake occurs in thei -th subinterval, groups consisting of vectors containing a non-zero element atj ≥ i -th index will be more densely connected than those containing a non-zero element atj