A dvances in Metho dology and Statistics , 2022, 19 (1), 31–44. https://doi.org/10.51936/xhhj7682 Complementar y r esults on se x-, age-, and cause-sp e cific mortality in EU countries obtaine d by SYR symb olic data analysis softwar e Aleša Lotrič Dolinar a, ∗ , Filip e Afonso b , Simona K or enjak-Černe a , Edwin Diday c a Univ ersity of Ljubljana, Scho ol of Economics and Business, Ljubljana, Slo v enia b Symb olic Data Lab , Roissy-Pôle , France c Paris Dauphine Univ ersity , Resear ch Centr e in Mathematics of De cision, Paris, France Abstract Differ ent mortality patterns acr oss countries r e quir e differ ent health and demographic p olicies. Positioning of the countries accor ding to their characteristic mortality pattern can help allo cate scar ce r esour ces appr opriately . W e use symb olic data analysis within SYR softwar e to analyse 28 Eur op ean Union countries’ se x-, age-, and cause-sp e cific mortality in 2015. Ther e ar e tw o main advantages for using symb olic analysis: (i) it p ermits mor e transpar ent and informativ e data descriptions along with conte xtual r elations, and (ii) ad- vance d metho ds adapte d for comple x data r epr esentations can b e emplo y e d to analyse such data, taking conte xtual r elations into account. Clustering r esults base d on symb olic data analysis sho w that gr oups of countries ar e str ongly r elate d to the ge ographical p osition of countries, with a clear east–w est cut on the first-le v el partition and with an e v en mor e ge ographically consistent lo w er-le v el partition compar e d to the classical clustering r esult. Relations b etw e en the obtaine d clusters of countries and their e xternal so cial and health indicators ar e w ell pr onounce d. W e also identify the mortality rates as symb olic variables that discriminate the most b etw e en individual countries as w ell as b etw e en the r esulting clusters. Kno wle dge of a countr y’s mortality pattern and its p osition among comparable countries is valuable information for health and demographic p olicymakers and can b e e xploite d to e xchange go o d practices. K e y w or ds: cause of death, clustering, health p olicy , symb olic data, Eur op ean Union ∗ Corr esp onding author Email addr ess: alesa.lotric.dolinar@ef.uni-lj.si ( Aleša Lotrič Dolinar ) ORCID iD: 0000-0002-1574-5473 ( Aleša Lotrič Dolinar ) 32 Lotrič Dolinar et al. 1. Intr o duction Countries with differ ent se x- and age-sp e cific mortality rate and structur e suffer fr om differ ent health pr oblems and ther efor e face differ ent health costs. It is imp ortant to kno w the countr y’s p osition among other countries to b e able to implement pr op er health p olicies and to distribute limite d r esour ces accor dingly and p otentially take advantage of go o d practices of the countries fr om a mor e fav ourably p ositione d gr oup . Be cause some illnesses or risk b ehaviours ar e mor e easily pr e v ente d or contr olle d, and conse quential deaths ther efor e p ostp one d ( e .g., Crimmins et al., 2011 ; Heijink et al., 2013 ; K err et al., 2017 ; Lear et al., 2017 ; Ste wart, 2012 ), it is essential to analyse what factors cause deaths in certain se x and age gr oups of the p opulation. Analysis base d on data supplemente d by causes of death in our opinion pr o duces mor e pr e cise information ab out a countr y’s health situation compar e d to the mor e common analysis of se x- and age-sp e cific mortality . W e study mortality in Eur op ean Union countries using ne w analytical to ols for mor e comple x data r epr esentations of se x-, age-, and cause-sp e cific mortality rates. A s causes of death ar e str ongly r elate d to se x and age , the input data ar e pr o vide d separately for each se x and age gr oup . One of the aims of our study is to upgrade the r esults pr esente d in Lotrič Dolinar et al. ( 2019 ), wher e only classical analyses w er e made . W e argue that mor e r ele vant r esults can b e achie v e d with mor e informativ e data descriptions, wher e each se x–age combination is describ e d with tw o so-calle d symb olic variables: structur e o v er causes of death, and mortality rate classifie d into four le v els ( lo w , mid-lo w , mid-high, and high). A s such a r epr esentation r e quir es appr opriate analytical to ols, w e p erforme d symb olic data analysis ( e .g., Afonso , Diday , & T o que , 2018 ; Billar d & Diday , 2006 ; Diday , 2016 ; Noirhomme-Fraitur e & Brito , 2011 ), mor e pr e cisely the clustering algorithms implemente d in SYR softwar e (Afonso , Haddad, et al., 2018 ). Some other clustering analyses for mortality rates hav e b e en conducte d pr e viously . For e xample , Meslé and V allin ( 2002 ) found that at the end of the 20th centur y the major asp e ct of Eur op ean demography was the div ergence b etw e en East and W est, with the line of separation b etw e en the tw o ge ographical gr oups corr esp onding to the former Ir on Curtain. Although the y analyse d life e xp e ctancy at birth, not taking the causes of death into account, the y e xplaine d their findings by the inability of Eastern countries to follo w W estern countries in the so-calle d “ car dio vascular r e v olution” (V allin et al., 2002 ). W e , ho w e v er , add the cause of death dimension and emplo y the symb olic data analysis appr oach in or der to account for the variability of the data in a mor e appr opriate way . W e thus determine which gr oups of countries ar e forme d and what the differ ences ar e b etw e en the gr oups. With this, w e pr esent our data as a symb olic obje ct and identify gr oups of countries with similar characteristics with clustering metho dology for symb olic data. Since clustering is a v er y imp ortant topic in symb olic data analysis, se v eral metho ds hav e b e en de v elop e d ( e .g., Billar d & Diday , 2020 ; Brito & Dias, 2022 ). W e conducte d our analysis with pr ogram SYR that enables us to also identify the most discriminating mortality rates or se x, age , and cause combinations b etw e en individual countries as w ell as b etw e en r esulting clusters. Using these r esults, w e can pr o vide p olicy makers with mor e detaile d information for pr op erly adjusting and implementing a countr y’s health p olicy . Our pap er highlights tw o main p oints. First, w e want to pr esent the symb olic data analysis, which could b e applie d to the analysis of other inter-r elate d multi-le v el data. And se cond, w e b elie v e that the conte xtual r esults for our sp e cific e xample should b e inter esting and imp ortant for p olicy makers, which w e elab orate in the concluding se ction. The r est of the pap er is organize d as follo ws: in the se cond part w e describ e the original data and ho w the y w er e transforme d into the symb olic form, as w ell as the adapte d metho ds Complementar y r esults on se x-, age-, and cause-sp e cific mortality in EU countries 33 implemente d in SYR softwar e that w e use d for our study . In the thir d part w e discuss our r esults for the mortality data of EU countries for 2015 in mor e detail and compar e them with the r esults obtaine d with classical clustering metho ds. Finally , w e conclude the pap er with a discussion and conclusion. 2. Metho dology 2.1. Metho ds Intensity of mortality by certain cause of death heavily dep ends on se x and age . Ther e- for e , our initial data consist of the numb er of deaths by se x, age , and cause of death (Eur ostat, 2018a ) and the corr esp onding size of the p opulation (Eur ostat, 2018c ). In our study , w e use 3-y ear-av erage standar dize d numb er of deaths by four causes of death within 36 se x–age gr oups (18 5-y ear age gr oups for each se x) for 28 EU countries in the y ear 2015. The analyze d causes ar e the thr e e most imp ortant causes of death accor ding to ICD-10 (W orld Health Organization, 2010 ), r esp onsible for o v er 70 % of all deaths in the EU (Eur ostat, 2018b ): ne oplasms, diseases of the cir culator y system, and diseases of the r espirator y system, plus the r esidual gr oup “ other causes” . The same data w er e use d in Lotrič Dolinar et al. ( 2019 ), wher e the study is base d on the classical clustering metho ds. Since the data that r elate to each se x–age combination r epr esent structur e of deaths o v er causes of death, w e transform them into a r epr esentation that takes this r elation into account and pr esent them in the mor e informativ e symb olic data table . Instead of units with single obser v e d values, symb olic data analysis deals with classes of individuals that ar e consider e d as higher-le v el units and thus constitute a ne w p opulation of higher-le v el units with their o wn structur e (Diday , 2016 ). T o pr eser v e their internal variability , inter vals, histograms, pr obability distributions, bar charts, etc. ar e use d for descriptions of such units. These typ es of data ar e calle d “symb olic” as the y cannot b e r e duce d to single numb ers without a loss of much information (Diday , 2016 ). 2.2. Data structur e Original data ar e pr esente d with a classical data matrix that has 28 r o ws or units (r epr e- senting 28 EU countries) and 144 columns or numerical variables (r epr esenting standar dize d numb er of deaths for b oth se xes, 18 age gr oups, and four categories for cause of death). Since the values for se x-, age-, and cause-sp e cific mortality (i.e ., mortality r elate d to sp e cific se x, sp e cific age gr oup , and sp e cific cause of death, calculate d for all p ossible combinations) ar e statistically and also conte xtually r elate d ( by each se x–age combination the y r epr esent structur e of deaths o v er cause of death), w e argue that such information cannot b e clearly se en fr om the data r epr esentation in a classical data table form wher e each cell contains only a numb er or categor y . Ther efor e , w e transform the classical data matrix into a symb olic data table wher e each cell contains a bar chart. This form enables the inclusion of conte xtual dep endence thr ough the so-calle d symb olic variables. The symb olic data still consist of 28 units or r o ws (r epr esenting EU countries), but the numb er of columns is r e duce d to 72 symb olic variables, wher e each of 36 se x–age combinations (18 age gr oups for each se x) is r epr esente d with: 1. a bar chart sho wing r elativ e structur e o v er causes of death, describ e d by the r elativ e fr e quency of the four cause-of-death categories, i.e ., 1 = Ne oplasms , 2 = Cir culator y , 3 = Respirator y , and 4 = Other ; and 2. a numerical value r epr esenting mortality rate (i.e ., the numb er of deaths p er certain numb er of p e ople ( 100 000 in our case) p er y ear ) discr etize d into bar charts with four categories: lo w , mid-lo w , mid-high, and high le v el. 34 Lotrič Dolinar et al. Mortality rate variable for each se x–age combination was separately discr etize d into these four categories (ther e ar e four categories in or der to discr etize in the same manner as for the four causes) using adapte d Fisher algorithm (Diday et al., 2013 ) dep ending on the standar dize d numb er of deaths p er 100 000 p e ople; for e v er y se x–age combination w e got differ ent b oundaries b etw e en the four r esulting categories in or der to p olarize the values as much as p ossible . T o ensur e cr oss-countr y comparability by contr olling for differ ent p opulation structur es by se x and age , and to compar e our r esults with the classic r esults fr om Lotrič Dolinar et al. ( 2019 ), w e use d the standar dize d mortality rates, which w e calculate d by applying countr y-sp e cific mortality rates to a standar d p opulation, using the combine d actual p opulation of all analyse d countries as the standar d p opulation. The transformation fr om classical to symb olic data is illustrate d in Figur e 1 . The segment of the classical data table is pr esente d in T able 1 , while the pr o duce d symb olic data table for the symb olic variables of the typ e Se x[ age-gr oup]Cause is pr esente d in T able 2 . Figur e 1: T ransformation of the original se x-, age-, and cause-sp e cific mortality rates data table to a symb olic data table T able 1 : A segment of the classical data table for se x-, age-, and cause-sp e cific mortality data for 28 EU countries in 2015 Cause Ne o Cir c Resp Othr Ne o Cir c Resp Othr Ne o ⋯ A ge 0–4 0–4 0–4 0–4 0–4 0–4 0–4 0–4 5–9 ⋯ Se x M M M M W W W W M ⋯ Countr y M[0–4] W[0–4] ⋯ A ustria ( A T) 0.05 0.01 0.04 2.02 0.05 0.01 0.02 1.58 0.04 ⋯ Belgium (BE) 0.06 0.04 0.08 2.13 0.06 0.04 0.04 1.62 0.07 ⋯ Bulgaria (BG) 0.10 0.36 0.53 3.80 0.11 0.28 0.50 2.89 0.12 ⋯ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notes : Ne o = Ne oplasms , Cir c = Cir culator y , Resp = Respirator y , Othr = Other . 3. Application 3.1. Repr esentation with symb olic data table In this se ction w e pr esent some of the adapte d metho ds of the SYR softwar e that w er e use d in our analysis. Complementar y r esults on se x-, age-, and cause-sp e cific mortality in EU countries 35 One of the imp ortant advantages of symb olic data analysis (SD A ) is the pr esentation of the input data and the r esulting clusters. A symb olic data table includes conte xtual r elations of the columns of the classical data matrix (i.e ., for each se x–age combination w e pr esent structur es of deaths by cause), while mortality rates ar e much easier to r ead if w e use differ ent colours for each of the four le v els. Thus, the first to ol that w e use d for this study is the data r epr esentation in the form of a symb olic data table . W e pr esent a segment of the symb olic data table in T able 2 wher e the or dering of the categories for causes of death is fixe d and colour e d as follo ws: Ne oplasms ( light gray ), Cir culator y ( black), Respirator y ( dark gray ), Other ( white). Fr om the symb olic data table w e can v er y easily notice , for e xample , that: 1. Bulgaria (BG) and Romania (RO ) hav e rather similar bar charts for most of the (pr e- sente d) se x–age combinations, esp e cially in older age gr oups; 2. ther e is only one cause of death for Malta (MT) for 5–9-y ear-old b o ys; 3. the categor y Ne oplasms (1) is mor e pr onounce d in the older age gr oups in A ustria ( A T), Belgium (BE), and Malta (MT) than in Bulgaria (BG) and Romania (RO ), etc. Clearly , such data r epr esentation is much mor e informativ e and intuitiv e than a classical data table containing just a numb er in each cell. 3.2. Data analysis with SYR softwar e Since w e hav e many countries and still to o many symb olic variables to sear ch for similarities and differ ences just by “manually” obser ving the complete symb olic data table , w e ther efor e ne e d additional to ols to r e duce the size of the data set into gr oups of similar countries. Be cause many variables ar e str ongly corr el ate d, w e first r e duce the numb er of variables with principal comp onent analysis (PCA ) and pr esent the data on a factor plane . T o obtain the e xplanator y p o w er of the factor axes, w e pr o duce corr elation cir cle and PCA values adapte d for bar charts (Diday , 2013 ), se e Figur es 2 and 3 . Fr om her e , the most discriminating input variable of the first and se cond principal comp onent (PC) axis, PC1 and PC2, can also b e dete cte d as the one with the longest distance fr om the origin. Fr om such obser vations, it is also p ossible to make infer ences ab out some corr elations b etw e en input variables, indir e ctly via the principal comp onents. For e xample , in the corr elation cir cle for the input variables r epr esenting mortality rate (Figur e 2 ) the se x–age combinations with high mortality e xp e cte dly app ear on the opp osite leg of the first axis rather than the se x–age combinations with lo w mortality . Obser ving categories of causes of death (Figur e 3 ), w e can se e , for e xample , that the cause Cir culator y app ears most fr e quently on the left side of the factor plane of the first tw o axes, and the cause Ne oplasms app ears v er y fr e quently on the right side of the plane . The v ertical axis (the upp er part of the plane) is r elate d to the cause Other . W e also obser v e that along the first axis the longest distances fr om the origin app ear for mortalities for older ages, while along the se cond axis the longest distances fr om the origin app ear for mortalities for y ounger ages. Symb olic cause and mortality variables can b e pr esente d with linear combinations of pr oje ctions of r ele vant original variables on principal comp onents with the angles of these pr oje ctions r epr esenting w eights (Diday , 2013 ). In such a way , w e can dete ct corr elations (indir e ctly thr ough corr elations with principal comp onent axes) b etw e en our symb olic variables and also e valuate their imp ortance in the sense of discriminating p o w er b etw e en individual countries as w ell as b etw e en the r esulting clusters. Mor e o v er , SYR softwar e enables us to pr esent each countr y on the factor plane , as w ell as to obser v e each symb olic variable in mor e detail. 36 Lotrič Dolinar et al. T able 2 : Se v eral segments of the symb olic data table for se x-, age-, and cause-sp e cific mortality data for 28 EU countries for symb olic variables of the typ e Se x[ age-gr oup]Cause Countr y Se x A ge gr oup 0–4 5–9 ⋯ 75–79 80–84 85+ A ustria ( A T) M ⋯ F ⋯ Belgium (BE) M ⋯ F ⋯ Bulgaria (BG) M ⋯ F ⋯ . . . . . . . . . . . . . . . . . . . . . . . . Malta (MT) M ⋯ F ⋯ . . . . . . . . . . . . . . . . . . . . . . . . Romania (RO ) M ⋯ F ⋯ . . . . . . . . . . . . . . . . . . . . . . . . Notes : Ne oplasms ( light gray ), Cir culator y ( black), Respirator y ( dark gray ), Other ( white). Complementar y r esults on se x-, age-, and cause-sp e cific mortality in EU countries 37 Figur e 2: Corr elation cir cle with some of the input variables r epr esenting mortality le v el Figur e 3: Corr elation cir cle with some of the categories r epr esenting causes of death 38 Lotrič Dolinar et al. Fr om Figur es 4 and 5 w e can se e that on the left side of the plane ther e ar e eastern EU countries and on the right side w estern EU countries. Roughly sp eaking, r elating to the corr elation cir cles w e can say that in 2015 eastern EU countries had higher mortality rates and the most pr onounce d cause of death w er e cir culator y diseases, while w estern EU countries had lo w er mortality rates, and the most pr onounce d cause of death w er e ne oplasms. Be cause this claim is rather sup erficial, w e w er e inter este d in obser ving the symb olic variables in mor e detail. W e do this in tw o ways: by obser ving values for each symb olic variable separately , or by forming gr oups of similar countries and then finding characteristics of these gr oups to obtain common ( symb olic) descriptions of the countries in each such gr oup . T o demonstrate the first p ossibility , w e pr esent the mor e detaile d r esults, for e xample , for w omen of age gr oup 65–69 (this se x–age combination is chosen as one of the most discriminativ e symb olic variables of the “ cause ” typ e) in Figur e 4 . Her e , w e pr esent structur es o v er cause of death with pie charts, and w e can notice that countries on the left side hav e the largest segments in light blue ( Cir culator y ), but when w e go fr om left to right the r e d segment ( Ne oplasms ) b e comes larger and larger . Figur e 4: Positioning of individual countries accor ding to the first tw o PC axes by structur e o v er cause of death for symb olic variable W[65–69]Cause ( structur e of deaths by four cause categories for w omen age d 65–69 y ears) T o dete ct gr oups of similar EU countries, w e p erforme d an adapte d 𝑘 -means clustering metho d base d on the first tw o factor axes. W e identify four main gr oups. Obtaine d clusters ar e pr esente d in Figur e 5 . Fr om Figur e 5 it can b e clearly se en that ther e ar e tw o gr oups of eastern EU countries on the left side , and tw o gr oups of w estern EU countries on the right. This implies that the gr oups of countries base d on their mortality rates and main causes of death ar e v er y much r elate d to the ge ographical p osition of the countries. T o obser v e characteristics of the countr y clusters w e can pr esent them in a symb olic data table , in a similar way as it was done in T able 2 for individual countries. Figur e 6 sho ws a segment of this pr esentation with bar-charts for the fiv e most discriminating se x–age combinations acr oss the countries in a certain cluster obtaine d with SYR pr ogram. In bar-charts, only the non-zer o categories ar e Complementar y r esults on se x-, age-, and cause-sp e cific mortality in EU countries 39 Figur e 5: Four clusters of EU countries obtaine d with SYR clustering pr ogram base d on the symb olic data description of se x-, age-, and cause-sp e cific mortality in 2015 pr esente d. The numb ers b elo w each column ar e r elate d with the categor y value , and each categor y is of a differ ent colour for easier obser vation. Figur e 6: Symb olic data table for the four clusters of EU countries base d on the se x-, age-, and cause-sp e cific mortality in 2015, sho wing the first fiv e most discriminating symb olic variables ( columns), with the follo wing mortality variable categories: 1 = lo w , 2 = mid-lo w , 3 = mid-high , and 4 = high 4. Results Besides r epr esenting the input variables in a much mor e transpar ent manner , w e fo- cuse d also on the identification of gr oups of EU countries with similar mortality patterns, considering b oth dimensions: mortality rate and the mortality structur e by main causes of death. At a glance , w e can se e clear division into eastern and w estern countries fr om a factor 40 Lotrič Dolinar et al. plane in Figur e 5 . W e compar e our r esults with the classical analysis r esults (Lotrič Dolinar et al., 2019 ), wher e the authors applie d classical W ar d and 𝑘 -means metho ds on the same original data and obtaine d these four clusters of countries: 1. W est 1 (8 countries): Belgium (BE), Denmark (DK), France (FR), Luxemb ourg (LU), the Netherlands (NL), Portugal (PT), Spain (ES), and the Unite d Kingdom ( UK); 2. W est 2 (10 countries): A ustria ( A T), Cyprus ( CY), Finland (FI), Germany (DE), Gr e e ce (EL), Ir eland (IE), Italy (I T), Malta (MT), Slo v enia (SI), and Sw e den (SE); 3. East 1 (6 countries): Cr oatia (HR), Cze chia ( CZ), Estonia (EE), Hungar y (H U), Poland (PL), and Slo vakia (SK); and 4. East 2 (4 countries): Bulgaria (BG), Latvia (LV), Lithuania (LT), and Romania (RO ). With the pr esente d SD A metho d w e obtain e xactly the same division into eastern and w estern countries. Ho w e v er , the tw o gr oups within the eastern cluster and tw o within the w estern cluster ar e identifie d differ ently: 1. W est 1 (12 countries): A ustria ( A T), Belgium (BE), Denmark (DK), Finland (FI), France (FR), Germany (DE), Ir eland (IE), Luxemb ourg (LU), Portugal (PT), Slo v enia (SI), Sw e den (SE), and the Unite d Kingdom ( UK); 2. W est 2 (6 countries): Cyprus ( CY), Gr e e ce (EL), Italy (I T), Malta (MT), the Netherlands (NL), and Spain (ES); 3. East 1 (8 countries): Cr oatia (HR), Cze chia ( CZ), Estonia (EE), Hungar y (H U), Latvia (LV), Lithuania (LT), Poland (PL), and Slo vakia (SK); and 4. East 2 (2 countries): Bulgaria (BG) and Romania (RO ). Mor e detaile d insp e ction of the r esulting clusters of EU countries r egar ding the causes of death sho ws that the cause Cir culator y is much mor e pr onounce d in b oth eastern gr oups, while Ne oplasms ar e mor e pr onounce d in b oth w estern gr oups, esp e cially compar e d to the cluster consisting of Bulgaria and Romania, which hav e the fe w est deaths fr om ne oplasms in all ages. These tw o countries also hav e by far the largest numb er of deaths due to cir culator y diseases in all ages. In these tw o countries w e can also dete ct a much higher p er centage of the Respirator y cause in the fiv e y oungest age gr oups (up to the age of 24), while this cause r epr esents considerably mor e deaths in the thr e e oldest age gr oups ( ab o v e the age of 75) in the tw o w estern clusters. On the w estern side , ther e ar e mor e deaths due to ne oplasms in the countries fr om cluster W est 2 for men age d 55–69 and for w omen age d 30–59 compar e d to cluster W est 1. A dditionally , in cluster W est 2 ther e ar e also mor e deaths fr om cir culator y diseases for men up to age 69 and for w omen age d 30–39 compar e d to cluster W est 1. The r esidual gr oup Other causes dominates up to the age of 50 y ears for men and up to the age of 40 y ears for w omen in all four clusters and b e comes v er y pr onounce d again after the age of 75 for b oth se xes of b oth w estern clusters, while in the Bulgaria-Romania cluster the shar e of deaths fr om other causes is considerably smaller compar e d to the other thr e e clusters. The most obvious differ ence in cluster characteristics when comparing the r esult ob- taine d thr ough the pr esente d SYR metho d w ith the classical r esult is the numb er of deaths cause d by r espirator y diseases for childr en in the tw o eastern clusters. With only Bulgaria and Romania forming cluster East 2, its shar e of r espirator y deaths for childr en is much higher compar e d to cluster East 1. Latvia and Lithuania, the tw o eastern countries that ar e place d differ ently than in the classical r esult, hav e o v erall mortality much closer to cluster East 1 obtaine d by SYR than to Bulgaria and Romania. Mor e o v er , the Ne oplasms pattern of Latvia and Lithuania for w omen of all ages and men age d 15–49 is closer to cluster East 1, and that also holds for deaths due to r espirator y diseases and the cause Other for b oth se xes and almost all ages, while deaths fr om cir culator y diseases ar e some wher e in b etw e en the Complementar y r esults on se x-, age-, and cause-sp e cific mortality in EU countries 41 tw o eastern clusters. On the w estern side , the gr ouping is e v en mor e differ ent compar e d to the classical r esult. Cluster W est 2 se ems pr e dominantly Me diterranean, apart fr om the Netherlands. The ne oplasms mortality rates in the Netherlands, ho w e v er , r esemble much mor e the Me diterranean cluster than the r est of the w estern countries, while for cir culator y diseases the opp osite is true . In short, the division b etw e en eastern and w estern countries is the same as with the classical appr oach, but the division into tw o lo w er-le v el gr oups for the w estern countries is quite differ ent. Although the ne w partition of the 28 EU countries into four clusters is not dir e ctly r elate d to some kno wn gr ouping of these countries, it ho w e v er still r epr esents ge ographically consistent clusters, e v en mor e so than with the classical r esult. Contrar y to the classical r esult wher e the same tw o larger clusters ( eastern and w estern) w er e each further divide d in an east–w est sense on a lo w er le v el, the partition within the same tw o larger clusters is no w basically in the north–south dir e ction. Mor e o v er , w e also p erforme d some matching b etw e en the r esulting clusters and clusters of countries base d on differ ent so cial and health indicators. W e separately gr oup e d individual countries for se v eral so cial indicators: so cial system (Sapir , 2006 ), health e xp enditur e , alcohol consumption, tobacco smoking (Eur ostat, 2016 ; W orld Health Organization, 2015a , 2015b ), and the Eur oHealth Consumer Inde x (Björnb erg, 2016 ). Base d on these r esults, w e e valuate d the corr esp on- dence b etw e en each so cial indicator clustering r esult and our mortality-rate–cause-of-death clustering r esult using the adjuste d Rand inde x (Hub ert & Arabie , 1985 ). Comparing the findings of the same pr o ce dur e also for the classical clustering r esult, the corr esp ondence b etw e en so cial indicator clusters and the symb olic data clustering r esult is b etter than with the classical r esult. Besides forming and comparing clusters of countries, SYR pr ogram enables us to identify those symb olic variables that discriminate the most b etw e en the four r esulting clusters, as w ell as b etw e en the individual countries. Base d on the cluster r epr esentation (Figur e 6 ), it turns out that the fiv e most discriminating symb olic variables ar e all r elate d to the mortality rate , not to a certain cause of death. The largest differ ences among the four clusters ar e in the mortality rate for men in the 25–29-y ear age gr oup , the highest one b eing in the first cluster (Bulgaria and Romania). The same is true also for the mortality rate for w omen in the fiv e-y ear age gr oups fr om 55 to 69, and for y oung w omen in the 20–24-y ear age gr oup . 5. Discussion The main aim of the pap er is to sho w the usefulness of a ne w to ol for dealing with comple x data using symb olic analysis. This appr oach is pr esente d on a case study of se x-, age-, and cause-sp e cific mortality data for EU countries in 2015. The main advantages of the ne w metho ds ar e these: 1. mor e intuitiv e and informativ e pr esentation of the countries base d on the input data with the symb olic data table that enables us to also pr esent conte xtual r elations; 2. use of adapte d classical statistical and machine learning metho ds ( e .g., PCA, clustering) for this typ e of data r epr esentation (Diday , 2020 ); 3. mor e informativ e pr esentation of the r esulting clusters of countries in a symb olic data table for clusters; and 4. column or dering (in our case mortality rates and structur es of deaths by cause at differ ent se x–age combinations) fr om the most to the least discriminating. Since the values for se x-, age-, and cause-sp e cific mortality ar e statistically and also conte xtually r elate d (for each se x–age combination the y r epr esent structur e of deaths by 42 Lotrič Dolinar et al. cause of death), w e argue that such information cannot b e clearly obser v e d fr om data r epr esentation in a classical data matrix form. Ther efor e , w e suggest displaying the data in a symb olic table , as this enables the pr esentation of conte xtual dep endence thr ough symb olic variables. Such a pr esentation offers a much mor e intuitiv e and ther efor e r ele vant vie w of the data. T o b e able to also include these r elations in a further analysis of the data, w e use d adapte d to ols implemente d in the SYR softwar e . W e argue that base d on the mor e informativ e data description, the r esults obtaine d using a symb olic data analysis appr oach ar e also mor e r ele vant compar e d to the classical r esult. W e obser v e countries in the plane of the first tw o obtaine d principal comp onent axes wher e the contrast b etw e en mortality due to cir culator y diseases and mortality due to ne o- plasms is particularly pr onounce d. This division is clearly in line with the major east–w est partition of the analyze d countries, with higher mortality fr om the diseases of the cir cula- tor y system in the eastern countries and higher mortality fr om ne oplasms in the w estern countries. W e identify the variables that b est discriminate b etw e en the single countries, as w ell as b etw e en the clusters of countries; these variables ar e the mortality rates (not structur es by cause) in b oth cases. The clustering r esult sho ws that the tw o-cluster partition r esulting fr om the pr esente d appr oach is e xactly the same as with classical clustering, but the lo w er-le v el partitions differ substantially . The ne w four r esulting gr oups ar e e v en mor e ge ographically consistent, with a mor e north–south division within each of the tw o larger first-le v el clusters (East, W est). A dditionally , the ne w partition also corr esp onds b etter to differ ent so cial and health indicators of the analyse d countries and is still in line with find- ings ab out the East lagging in the “ car dio vascular r e v olution” compar e d to the W est (Meslé & V allin, 2006 ); sp e cifically , w e found that lo w er mortality (fr om car dio vascular diseases) is r elate d to a lo w er p opulation shar e of individuals who smoke , lo w er alcohol consumption, and higher health e xp enditur e p er capita (Eur ostat, 2016 ; W orld Health Organization, 2015a , 2015b ). Ho w e v er , b e cause the symb olic r epr esentation of the data is comple x, it is v er y difficult to obser v e corr elations b etw e en the symb olic variables, and to the b est of our kno wle dge ther e is not y et a definite metho d of adapting this concept to a symb olic appr oach. Ther efor e , w e w er e only able to o v er come this limitation by making infer ences ab out the corr elations indir e ctly via the principal comp onents. The symb olic data pr o ce dur e use d in the pr esente d analysis could b e also applie d to analysis of any other inter-r elate d multi-le v el data, esp e cially if also using clustering metho ds. Concerning conte xtual findings for our sp e cific e xample , w e b elie v e that the y should b e inter esting and imp ortant for p olicy makers, which w e elab orate in the follo wing concluding se ction. 6. Conclusion In light of e v er-aging p opulations and r elate d incr easing health costs, it is crucial to distribute limite d health r esour ces as optimally as p ossible . Lo oking to war d b etter- p erforming countries can help health and demographic de cision makers, esp e cially in the ar ea of pr e v entable deaths, wher e p olicymakers can take advantage of the information w e pr o vide base d on the pr esente d analysis. When analysing data in a mor e comple x sense using symb olic analysis, w e still dete ct a clear East– W est division along the former Ir on Curtain as with the classical r esult, with the W est p erforming b etter in the ar ea of cir culator y diseases and o v erall mortality rate , and with the East p erforming b etter in the ar ea of ne oplasms. Further partition, ho w e v er , is substantially differ ent fr om the classical r esult. Complementar y r esults on se x-, age-, and cause-sp e cific mortality in EU countries 43 Within each of the tw o br oad gr oups of countries, eastern and w estern, w e no w obser v e a mor e ge ographically consistent division in the north–south dir e ction. Symb olic data analysis metho ds also r e v eal b est discriminating se x–age combinations among countries and also among the four obtaine d clusters of countries for mortality rates and for distributions of deaths o v er cause of death. In addition, matching b etw e en the r esulting clusters and clusters of countries base d on differ ent so cial and health indicators confirme d that the SYR clusters ar e matche d b etter than the classical clusters fr om the asp e cts of so cial system, health e xp enditur e , alcohol consumption, tobacco smoking, and the Eur oHealth Consumer Inde x. A s these ar e all factors that can b e dir e ctly influence d by appr opriate p olicies, this finding can r epr esent a str ong incentiv e for the countries to lo ok to war d, and thus aspir e to r eplicating the health indicators of, the countries fr om the b etter-p erforming clusters. Refer ences Afonso , F ., Diday , E., & T o que , C. (2018). Data science par analyse des doné es symb oliques . Editions T e chnip . Afonso , F ., Haddad, R., T o que , C., Eliezer , E. - S., & Diday , E. (2018). User manual of the SYR softwar e . https://www.symbad.co/le-logiciel-syr/ Billar d, L., & Diday , E. (2006). Symb olic data analysis: Conceptual statistics and data mining . Wile y . Billar d, L., & Diday , E. (2020). Clustering metho dology for symb olic data . Wile y . https://doi.or g/10.1002/9781119010401 Björnb erg, A. (2016). Eur oHealth Consumer Inde x 2015: Rep ort . Health Consumer Po w erhouse . https://healthpowerhouse.com/media/EHCI-2015/EHCI-2015-report.pdf Brito , P ., & Dias, S. (Eds.). (2022). A nalysis of distributional data . Chapman & Hall. https://do i.org/10.1201/9781315370545 Crimmins, E. M., Pr eston, S. H., & Cohen, B. (Eds.). (2011). Explaining div ergent le v els of longe vity in high-income countries . National A cademies Pr ess. https://doi.org/10.1722 6/13089 Diday , E. (2013). Principal comp onent analysis for bar charts and metabins tables. Statistical A nalysis and Data Mining , 6 (5), 403–430. https://doi.org/10.1002/sam.11188 Diday , E. (2016). Thinking by classes in data science: The symb olic data analysis paradigm. Wile y Inter disciplinar y Re vie ws: Computational Statistics , 8 (5), 172–205. https: // doi. or g/10.1002/wics.1384 Diday , E. (2020). Explanator y to ols for machine learning in the symb olic data analysis frame w ork. In E. Diday , R. Guan, G. Sap orta, & H. W ang (Eds.), A dvances in data science: Symb olic, comple x and netw ork data (pp . 1–30). Wile y . https://doi.org/10.100 2/9781119695110.ch1 Diday , E., Afonso , F ., & Haddad, R. (2013). The symb olic data analysis paradigm, discriminant discr etization and financial application. In R. Guan, Y . Le che vallier , G. Sap orta, & H. W ang (Eds.), Re vue des nouv elles te chnologies de l’information: V ol. RN TI-E-25 . A dvances in the or y and applications of high dimensional and symb olic data analysis (pp . 1–14). Editions RN TI. Eur ostat. (2016). Smoking of tobacco pr o ducts by se x, age and e ducational attainment le v el [Data Set]. Eur op ean Commission. http: / / data . europa . eu /88u / dataset / varzwbq6vy3c fkw2ubitza Eur ostat. (2018a). Causes of death: Deaths by N U TS 2 r egion of r esidence and o ccurr ence , 3 y ear av erage [Data Set]. Eur op ean Commission. http://data.europa.eu/88u/dataset/m 0ppsecjhfgxfmvng9gdg 44 Lotrič Dolinar et al. Eur ostat. (2018b). Causes of death: Deaths by countr y of r esidence and o ccurr ence [Data Set]. Eur op ean Commission. http://data.europa.eu/88u/dataset/uiak4pd0lanocottq4ebq Eur ostat. (2018c). Population on 1 Januar y by age and se x [Data Set]. Eur op ean Commission. http://data.europa.eu/88u/dataset/wjwcoscim2vainua6qufq Heijink, R., K o olman, X., & W estert, G. P . (2013). Sp ending mor e mone y , saving mor e liv es? The r elationship b etw e en av oidable mortality and healthcar e sp ending in 14 countries. The Eur op ean Journal of Health Economics , 14 , 527–538. https://doi.org/10 .1007/s10198-012-0398-3 Hub ert, L., & Arabie , P . (1985). Comparing partitions. Journal of Classification , 2 , 193–218. https://doi.org/10.1007/BF01908075 K err , J., Anderson, C., & Lippman, S. M. (2017). P hysical activity , se dentar y b ehaviour , diet, and cancer: An up date and emerging ne w e vidence . The Lancet Oncology , 18 (8), E457–E471. https://doi.org/10.1016/S1470-2045(17)30411-4 Lear , S. A., Hu, W ., Rangarajan, S., Gase vic, D ., Le ong, D ., Iqbal, R., Casano va, A., Swami- nathan, S., & Y usuf, S. (2017). The effe ct of physical activity on mortality and car- dio vascular disease in 130 000 p e ople fr om 17 high-income , middle-income , and lo w-income countries: The pur e study . Lancet , 390 (10113), 2643–2654. https://doi.or g/10.1016/S0140-6736(17)31634-3 Lotrič Dolinar , A., Sambt, J., & K or enjak-Černe , S. (2019). Clustering EU countries by causes of death. 38 , 157–172. https://doi.org/10.1007/s11113-019-09518-1 Meslé , F ., & V allin, J. (2002). Mortality in Eur op e: The div ergence b etw e en East and W est. Population , 57 (1), 157–197. https://doi.org/10.3917/popu.201.0171 Meslé , F ., & V allin, J. (2006). The health transition: T r ends and pr osp e cts. In G. Caselli, J. V allin, & G. W unsch (Eds.), Demography: A nalysis and synthesis (pp . 247–602). A cademic Pr ess. Noirhomme-Fraitur e , M., & Brito , P . (2011). Far b e y ond the classical data mo dels: Symb olic data analysis. Statistical A nalysis and Data Mining , 4 (2), 157–170. https://doi.org/10.1 002/sam.10112 Sapir , A. (2006). Globalization and the r eform of Eur op ean so cial mo dels. Journal of Common Market Studies , 44 (2), 369–390. https://doi.org/10.1111/j.1468-5965.2006.00627.x Ste wart, B. W . (2012). Priorities for cancer pr e v ention: Lifestyle choices v ersus unav oidable e xp osur es. The Lancet Oncology , 13 (3), e126–e133. https://doi.org/10.1016/S1470-204 5(11)70221-2 V allin, J., Meslé , F ., & V alkonen, T . (2002). T r ends in mortality and differ ential mortality . Council of Eur op e . W orld Health Organization. (2010). International statistical classification of diseases and r elate d health pr oblems (10th e d.). https://icd.who.int/ W orld Health Organization. (2015a). WHO global r ep ort on tr ends in pr e valence of tobacco smoking 2015 . https://apps.who.int/iris/handle/10665/156262 W orld Health Organization. (2015b). W orld health statistics 2015 . https://apps.who.int/iris/ha ndle/10665/170250