AdvancesinMethodologyandStatistics/Metodološkizvezki,Vol.18,No.2,2021,73–88
https://doi.org/10.51936/gktc3784
Timeseriesclusteringbasedontime-varyingHurstexponent
AlexBabiš
∗
,BeátaStehlíková
ComeniusUniversity,FacultyofMathematics,PhysicsandInformatics,Bratislava,Slovakia
Abstract
We consider the problem of clustering time series which are assumed to possess the long
termmemory. Weproposeanapproachbasedoncombiningtheresultsobtainedbyapplying
differentmethodsforestimatingtime-varyingHurstexponentandapplyittoEuroexchange
rates. Firstly, we ﬁt AR-GARCH models to every time series to reduce bias of rescaled
rangeanalysismethod. Weonlyconsidermodelwithresiduals,inwhichnoautocorrelation
and ARCH effect is present; among them we choose the model with the lowest value of
the Bayesian information criterion. Afterwards, we estimate the Hurst exponent from the
residualsbymeansoftherollingwindowapproachusingfourdifferentestimationmethods.
Vectors of Hurst exponents are clustered for each of the four cases and the clusters are
comparedinordertoobtaintheﬁnalclustering.
Keywords: Hurstexponent,Clustering,Stockmarket,Timeseries,GARCH
1. Introduction
Clustering,alsoknownasclusteranalysis,isanimportanttoolforanalysingdata. Clus-
teringmethodspartitiondataintoseveralhomogeneousgroupscalledclusters. Clustersare
createdsothatsimilaritybetweenobjectwithinspeciﬁcclusterismaximized,whileatthe
same time similarity between objects that do not belong to the same cluster is minimized.
Clusteringcanbeusedasapartofexploratorydataanalysis,asitallowsustogaininformation
fromunderlyingdatawithoutexplicitknowledgeaboutrelationshipbetweenobjectswithin.
It can also provide useful insights on the structure of the data and identiﬁcation of groups
containingsimilarobservationsmightbeofinterestonitsown. Usually,informationabout
objectsisgivenbyavectoroffeatures. Inmanyapplicationsavectoroffeaturesarisesby
observingspeciﬁccharacteristicsofanobjectatdifferenttimeintervals. Resultingvectors
havethereforetheformoftimeseries.
Clusteringoftimeseriesdatafounditswayintoawiderangeofareas,suchasastronomy,
medicine,environmentalanalyses,etc. WereferthereadertoreviewpaperbyAghabozorgi
etal.(2015)formoreapplicationsandconcretereferences. Inﬁnance,wherealsoourdataset
belongs,applicationsincludeforexampleclusteringstockswithuseinportfoliooptimization
∗
Correspondingauthor
Emailaddresses: alexbabis96@gmail.com(AlexBabiš),stehlikova@fmph.uniba.sk(Beáta
Stehlíková)
74 BabišandStehlíková
(Han&Ge,2020;Iorioetal.,2018;Massahietal.,2020)orclusteringaimingtodiscoverthe
structureofcryptocurrenciesmarket(Songetal.,2019).
Similarly to general cluster analysis, there is a great number of different methods and
approaches. Timeseriesclusteringcanbebasedonclusteringthedatadirectly,onextracting
their features or on model which were ﬁt to the data. There are new distance metrics,
proposedspeciﬁcallyfortimeseries. Moredetailscanbefoundforexampleinsurveypapers
(Aghabozorgietal.,2015;Fu,2011)orinarecentbookbyMaharajetal.(2019).
Our approach is based on estimating the Hurst exponent of the time series, which is a
measureoflong-rangedependenceinthetimeseries. OriginoftheHurstexponentdatesback
to1951,whenBritishhydrologistHaroldE.Hurstproposedamethodtooptimizethestorage
capacityofreservoirsinanefforttoregulatenaturalcontributionoftheNileriver,keeping
inmindcyclicaltrendsuchasperiodsofdroughtandﬂoods. Hisstatisticalanalysisofthe
hydrologicaldatawasnotinaccordwithstandardmodelsofthattimeandsubsequentlylead
tomodelsdescribingthebehaviourthatcouldbecharacterizedasalong-rangedependenceor
longmemory(O’Connelletal.,2016). ApplicationsoftheHurstexponentinﬁnanceinclude
analysesofinterestrates(Cajueiro&Tabak,2009),hedgefundsperformance(Auer,2016),
energyfuturesmarket(Sensoy&Hacihasanoglu,2014),cryptocurrencies(Jiangetal.,2018),
efﬁciencyofstockmarket(Cajueiro&Tabak,2004)andothers.
InthesamewayasCajueiroandTabak(2004),weapplytheHurstexponentestimatorsto
standardizedresidualsfromAR-GARCHmodel. Itwasshownthatpresenceofshortmemory
couldcausebiasofestimatedvalueofHurstexponentandusingthisprocedureweﬁlterout
theshortmemoryinformation. Inthiswayouranalysisoflong-rangememoryisnotaffected
byshortmemoryeffectspresentinthedata. PaperbyLahmiri(2016)useddifferentHurst
exponentestimatorstoclusterindustrialsectorsatCasablancaStockExchange. Wefollow
a similar idea in our approach. However, instead of using a single estimate of the Hurst
exponentforthewholetimeseries,weusetherollingwindowapproach. Inothersettingsit
hasbeensuccessfullyused,amongothers,inCajueiroandTabak(2004),Jiangetal.(2018),
andSensoyandHacihasanoglu(2014). Weusethisapproachforsubsequentlyclusteringthe
timeseriesofHurstexponentestimates.
Our contribution therefore lies in combining several approaches used in the literature
dealing with ﬁnancial time series and their Hurst exponents individually, but not in this
combination—using residuals from AR-GARCH models for the estimation of the Hurst
exponent, using rolling window estimates and clustering the time series. Furthermore, we
proposeanetworkbasedmethodforclustering,basedontheresultsfromanarbitrarynumber
ofclusteringalgorithmsappliedtothedata. Itcanbeusedinamoregeneralsetting,notonly
ourchoicesofmethodsforestimatingtheHurstexponentsandtheclusteringprocedures.
The rest of the paper is organized as follows. In Section 2 we review the notion of the
Hurst exponent and its estimators, which we will use in our analysis. Section 3 presents
our data set and Section 4 summarizes the results of GARCH modelling applied the data.
Section5showstheHurstexponentestimatesandtheirclusterings. InSection6wecompare
theseclustersandsuggesttheﬁnalclusteringofthetimeseries. Weconcludethepaperwith
remarks on the methodology, its advantages and shortcomings, and with ideas for future
researchinSection7.
2. Long-rangedependenceintime-seriesandestimationoftheHurstexponent
Ifthedependencebetweenobservationsofastationarytimeseriesthatarefarapartfrom
each other decreases very slowly, as the time distance between them increases, then the
Timeseriesclusteringbasedontime-varying... 75
time-seriesissaidtoexhibitlong-rangedependenceorlongmemory. Morespeciﬁcally,the
autocorrelationsρ(s)=Cor(X
t
,X
t−s
) decay to zero so slowly, that they are not absolutely
summable, i.e. ∑
∞
s=0
|ρ(s)|=∞. This holds in contrast to ARMA models, for which the
autocorrelationfunctiondecaysexponentiallyandthereforethesumofitsabsolutevalues
isﬁnite. Typicallongmemoryprocesshaveρ(s)∼|s|
−α
withα∈(0,1),as s→∞. Other
modelsoflongmemoryprocessesincludeARFIMAmodelswithfractionaldifferences(in
contrasttointegerdifferencesinARIMAmodels),theycanbecharacterizedviaspectrum,or
socalledHurstexponent. Adetailedtreatmentofthelongmemoryprocessescanbefoundin
Beran(2017).
Hurstexponent,whichweuseinouranalysis,attainsthevaluesfromtheinterval(0,1)
and, if different from 1/2, it is linked to the asymptotic behaviour of the autocorrelation
function by the relation ρ(s)∼H(2H−1)|s|
2H−2
. The case H =1/2 corresponds to pro-
cesseswithexponentiallydecayingautocorrelations,i.e. withoutthelongmemory. Values
H ∈(1/2,1) correspond to persistent processes, while values H ∈(0,1/2) correspond to
anti-persistentprocesses.
The oldest and probably the best-known method for estimation of the Hurst exponent
is Rescaled range (R/S) analysis, originally proposed by Hurst (1951) himself and further
developed by Mandelbrot and Wallis (1969). We outline this method according to Weron
(2002)andafterwardsweexplainitsmodiﬁcationswhichwehaveusedinouranalysis,using
theirimplementationintheRpackagepracma(Borchers,2019),inparticularthefunction
hurstexp().
Let{X
t
}
L
t=1
bestationarytimeseriesoflengthL. TheHurstexponentcanbeestimatedas
follows:
1. TimeseriesoflengthLisdividedintod sub-seriesoflengthn.
2. Foreachsub-series,indexedbym,meanE
m
andstandarddeviationS
m
arecalculated.
3. DataX
i,m
arethannormalizedbysubtractingmeanE
m
:
ˆ
X
i,m
=X
i,m
−E
m
(i=1,...,n).
4. Next step is to calculate new time series of deviations from mean value for each
sub-period:
Y
i,m
=
i
∑
j=1
ˆ
X
j,m
(i=1,...,n).
5. TherangeR
m
iscalculatedas
R
m
=

max{Y
1,m
,...,Y
n,m
}−min{Y
1,m
,...,Y
n,m
}
	
.
6. Each range R
m
is then rescaled/normalized by standard deviation for corresponding
sub-periodas
R
m
S
m
.
7. Finallymeanvalueoftherescaledrangeforallsub-seriesoflengthniscomputed
R
S
(n)=
1
d
d
∑
m=1
R
m
S
m
8. Thestepsabovearerepeatedfortheincreasinglengthn. Onlythevaluesofnwhich
includeﬁrstandlastpointsoftime-seriesareused,so
R
S
(n)iscalculatedfromthesame
numberofobservationsforeachn.
76 BabišandStehlíková
It was shown, (cf. Di Matteo, 2007; Mandelbrot, 1975; Mandelbrot & Wallis, 1969;
Taqquetal.,1995),that
R
S
statisticsasymptoticallyfollowsrelation
R
S
(n)∼cn
H
.
Takinglogarithmleadsto
log

R
S
(n)

∼Hlog(n)+log(c). (2.1)
It means that in order to estimate value of Hurst exponent H it is sufﬁcient to run simple
linearregressionoversampleofincreasingtimeintervaln.
The algorithm above has been modiﬁed in several ways in the literature. The simplest
form of the rescaled range analysis would be not to separate original time-series into m
sub-series but rather considered whole time series as suggested originally in Hurst (1951).
Thiswouldleadtoonlyone
R
S
(n)statisticswhichmeanstaking
log

R
S
(n)

log(n)
would be sufﬁcient enough to estimate Hurst exponent H. This method is referred to as
simpliﬁedrescaledrangeanalysis.
Resultsoftherescaledrangeanalysiscandependonthechoiceofthelengthsofsub-series
nthatareusedasinputintoregression. Ifthestartingvalueof nischosenasthelengthof
theoriginaltimeseriesandthenprogressivelyhalved,thenitwouldpossiblymean,ifn̸=2
i
forsomei,thatlastsub-serieswouldbeofdifferentlengthastheallprevious. Theresulting
statisticswillbereferredtoascorrectedrescaledrangeanalysis.
AbetterwaytoestimateHurstexponentH viaclassicalrescaledrangeanalysiswouldbe
toonlyconsiderthoselengthsofsub-seriesnthatthelengthoftheoriginalseriesismultiple
ofn. Thisstatisticswillbereferredtoasempiricalrescaledrangeanalysis.
AsstatedinAnnisandLloyd(1976)andPeters(1994),forsmallvalueofn,thedeviance
oftheslopeintheregression(Equation(2.1))fromitstruevalueissigniﬁcanteveninasimple
case, when the underlying process is a Gaussian noise. They approximate the theoretical
valuesfor
R
S
(n)as
E

R
S
(n)

=





















n−
1
2
n
Γ

(n−1)
2

√
πΓ(
n
2
)
∑
n−1
i=1
r
n−i
i
forn≤340,
n−
1
2
n
1
r
nπ
2
∑
n−1
i=1
r
n−i
i
forn>340,
whereΓistheEulerfunction. AspointedinWeron(2002),theHurstexponentcanbeesti-
matedmorepreciselyas0.5plustheslopefromtheregressionof
R
S
(n)−E(
R
S
(n))regressed
onlog(n). Thisisreferredtoascorrectedempiricalrescaledrangeanalysis.
Timeseriesclusteringbasedontime-varying... 77
3. Data
ThedatausedinouranalysisaredailyEuroforeignexchangeratesin2018–2020. They
are based on a regular daily concertation procedure between central banks across Europe
andavailablebyEuropeanCentralBank. Inparticular,westudytheexchangeratesforthe
followingcurrencies: USD(UnitedStatesdollar),JPY(Japaneseyen),CZK(Czechkoruna),
DKK(Danishkrone),GBP(Poundsterling),HUF(Hungarianforint),PLN(Polishzłoty),
RON (Romanian leu), SEK (Swedish krona), CHF (Swiss franc), ISK (Icelandic króna),
NOK(Norwegiankrone),HRK(Croatiankuna),RUB(Russianruble),TRY(Turkishlira),
AUD(Australiandollar),BRL(Brazilianreal),CAD(Canadiandollar),CNY(Chineseyuan
renminbi), HKD (Hong Kong dollar), IDR (Indonesian rupiah), ILS (Isreali shekel), INR
(Indianrupee),KRW(SouthKoreanwon),MXN(Mexicanpeso),MYR(Malaysianringgit),
NZD (New Zealand dollar), PHP (Philippine peso), SGD (Singapore dollar), THB (Thai
baht),andZAR(SouthAfricanrand).
Inordertomakethetimeseriesstationary,wefollowastandardprocedureofworking
withdifferencesoflogarithmsoftherates. Figure1showsaselectionofthedata. Wenote
thatthevolatilityofthetimeseriesseemstobevaryingintime,whichmotivatesustouse
GARCHmodelsfortheirmodelling. Findingparticularreasonsfornonconstantvolatilityin
theexchangeratesdatawouldneedastandaloneanalysis. Here,weonlynotethatthisisnot
anewphenomenon;ithasbeenstudiedinmanypapers(e.g., Fengetal.,2021;Kido,2016;
Manassehetal.,2019;You&Liu,2020;Zhouetal.,2020).
Figure1: Sampleofthedata,differencesoflogarithmsoftheselectedexchangerates
4. GARCHmodels
LetusrecallthatastandardautoregressiveAR(p)modelforstationarytimeseriex
t
takes
theform
x
t
=δ+a
1
x
t−1
+···+a
p
x
t−p
+u
t
,
where the error term u is a white noise. The parameters are required to satisfy certain
condition to ensure stationarity of the process (Kirchgässner et al., 2013). However, in
ﬁnancial applications it is often the case that the assumption of a constant variance of the
78 BabišandStehlíková
whitenoiseisnotconsistentwithobserveddata. Thetimevaryingvarianceofthedatacan
be captured by GARCH processes, which model the variance σ
2
t
of the process u
t
by the
equation
σ
2
t
=ω+α
1
u
2
t−1
+···+α
p
u
2
t−p
+β
1
σ
2
t−1
+···+β
q
σ
2
t−q
,
where again the parameters are required to satisfy stationarity conditions. This process is
knownasGARCH(p,q)process;wereferthereadertoKirchgässneretal.(2013)fordetails.
WeusegarchFit()functionfromtheRpackagefGarch(Wuertzetal.,2020)toestimate
GARCHmodelsandtoobtainresultsofstatisticaltestsnecessaryforevaluatingthemodels.
TheresidualsofthemodelsaretestedinordertoassessthesuitabilityoftheproposedGARCH
models. Following the standard procedures, implemented in the fGarch package, we use
theLjung-Boxtestfortheresidualsandthesquaredresidualsandtheheteroscedasticitytest.
InthemodelselectionweconsiderautoregressiveAR(p)processeswithorders p≤3with
GARCH(p,q)errortermwithorderssatisfying p+q≤3. Fromthemodelswithresiduals
passingthetestsgivenaboveon5%signiﬁcancelevel,weselectthemodelwiththelowest
Bayesian information criterion. Exchange rates IDR (Indonesian rupiah) and ILS (Isreali
shekel)wereexcludedfromthedataduetofactthatnoneofthemodelconsideredwassuitable
forthem. TheresultingmodelsfortheremainingexchangeratesaregiveninTable1.
Table1: AutoregressivemodelswithGARCHerrors
Model Exchangerates
AR(0)+GARCH(1,1) USD, DKK, HUF, JPY, GBP, SEK, RUB,
AUD, CNY, CHF, BRL, HKD, INR, MXN,
NZD,CAD,SGD,THB,ZAR
AR(0)+GARCH(2,1) CZK
AR(0)+GARCH(1,2) NOK,KRW,MYR,PHP
AR(1)+GARCH(1,1) HRK,TRY
AR(2)+GARCH(1,1) PLN
AR(3)+GARCH(1,2) RON
5. TimevaryingHurstexponentsandtheirclustering
As outlined in the introduction, we use the approach from Cajueiro and Tabak (2004),
Jiang et al. (2018), and Sensoy and Hacihasanoglu (2014), and we do not represent time
seriesbyasingleestimatedHurstexponent. Instead,werepresentitbysequenceofHurst
exponentsestimatedfromshortertimewindowstocaptureregimechangeswithindata. The
mainreasonisthatinmanyﬁnancialtimeserieswecanobservecyclesofirregularlength
in which the dynamics varies. It is reasonable to assume that this would be also true even
for Hurst exponent. Another reason for choosing a sequence of Hurst exponents over one
particular Hurst exponent estimate would be that we might be also interested in studying
reactionofexchangeratesdynamicsduringspeciﬁctimewindowoninformationthatwere
dominatingthroughthespeciﬁctime.
WechoosearollingwindowapproachtoestimatesequenceofHurstexponentforeach
exchangeratewithwindowsizeselectedtobe252days,whichisapproximatelyoneyearof
data(sincethedataareavailableonlyonbusinessdays). Thismeansthatforeachsequence
Timeseriesclusteringbasedontime-varying... 79
{X
j
}
i+w−1
j=i
with wbeingsizeofwindowand i=1,...,n−w+1,weestimated H
i
asHurst
exponentforparticulartimeperiod. ThisresultsinsequenceofHurstexponents{H
i
}
n−w+1
i=1
.
For determining clusters, hierarchical clustering was employed with Ward’s minimum
variance method using function hclust() from the statsR package. The distance was
chosenassquaredEuclideandistancebetweenvectorsoftime-varyingHurstexponentswhich
isrequiredduetousageoftheWard’salgorithm. Wenotethatapopularsimilaritymeasure
based on correlations is not applicable here. Two evolutions of Hurst exponents, which
differ by a constant, have a perfect correlation. However, they might be on opposite sides
of H =1/2 and thus exhibiting different characteristics, which we would like to take into
account. Todeterminethenumberofclusters,weusedsilhouettecriterion(Rousseeuw,1987)
usingfunctionsilhouette() fromclusterRpackage(Maechleretal.,2019).
We employed four different calculations of Hurst exponent, as described in Section 2,
resulting in four different vectors of time-varying Hurst exponent for each exchange rate.
Examplesofthetime-varyingHurstexponentsareshowninFigure2. Ascanbeseen,time-
varyingHurstexponentsforparticularexchangeratesigniﬁcantlydiffersbyusedestimation
technique so it is meaningful to carry out cluster analysis for every one of them. Thus,
resultingin4clusteringsofexchangeratemarket. Dendrogramsandtheresultingclustersare
presentedinFigures3–5.
Figure2: TimedependentestimatesoftheHurstexponentsforselectedcurrencies(bottom
ﬁgures),togetherwiththeoriginaldata(top)andstandardizedresiduals(middle)
6. Comparisonofclusteringsandﬁnalclusters
Clusterings presented in the previous section are not identical; however, in the case of
certain pairs of exchange rates, they were in the same cluster in all four clusterings. We
80 BabišandStehlíková
Figure3: HierarchicalclusteringofthecurrenciesbasedonsimpliﬁedHurstexponentclus-
teredbyWard’salgorithmusingsquaredEuclideandistanceassimilaritymeasure. Optimal
clustersselectedviasilhouettecriterionarevisualisedbythedashedframes.
Figure4: HierarchicalclusteringofthecurrenciesbasedoncorrectedHurstexponentclus-
teredbyWard’salgorithmusingsquaredEuclideandistanceassimilaritymeasure. Optimal
clustersselectedviasilhouettecriterionarevisualisedbythedashedframes.
Timeseriesclusteringbasedontime-varying... 81
Figure5: HierarchicalclusteringofthecurrenciesbasedonempiricalHurstexponentclus-
teredbyWard’salgorithmusingsquaredEuclideandistanceassimilaritymeasure. Optimal
clustersselectedviasilhouettecriterionarevisualisedbythedashedframes.
Figure 6: Hierarchical clustering of the currencies based on corrected empirical Hurst
exponent clustered by Ward’s algorithm using squared Euclidean distance as similarity
measure. Optimal clusters selected via silhouette criterion are visualised by the dashed
frames.
82 BabišandStehlíková
considerthistobeastrongindicatorthattheHurstexponenthasasimilarevolutionforthese
tworates. Asthenumberofsuchcasesdecreases,alsothesimilaritycanbeseenasweaker.
Naturally,inmanycases,thegivenpairoftherateswasneverinthesamecluster.
Weassociatetheclusteringresultswithanetwork,whosenodesaretheexchangerates.
Two nodes are connected by an edge, if they were in the same cluster at least once. The
weightoftheedgeisgivenbythenumberofsuchclustering. Theresultingnetworkisshown
inFigure7.
Ifweconsidertheedgeswiththeweightaboveacertainthreshold,thenetworksplitsinto
several connected components. The nodes in these components are therefore representing
setsofexchangerates,forwhichtheevolutionoftheHurstexponentissimilar. Therefore,
we take the connected components as the ﬁnal clusters of our analysis. The choice of the
threshold is subjective, we base it on visualizing the networks corresponding to different
thresholds. Dependingofthedata,wemightneedtoﬁndatrade-offbetweenalargenumber
ofsmallcomponentswithstrongconnectionsbetweenthenodes,andasmallnumberoflarge
componentswithweakerties. Inourparticularcasewecomparethecomponentsemerging
fromthethresholds4(themaximumpossibleweightofanedge)and3(whichmeansthat
the exchange rates have to be in the same clusters at least 3 times out of 4, in order to be
connected by an edge in the network). We do not consider lower values for a threshold;
requiringtheedgeoftheweighttobeatleast3meansthatthenodesconnectedbyanedge
mustbeinthesameclusterinmorethanhalfofthecases. Theclustersconsistingofmore
thanonenodearepresentedinTables2and3.
Both clusterings seem reasonable. We can identify nodes in the clusters which can be
expectedtobeinthesameclusterbasedonthedependenceoftheeconomiesandﬁnancial
marketsinthegivencountries. Inthenetworkfromthethreshold4,weseeasmallcluster
containingUnitedStatesdollarandHongKongdollar. Alargercluster,containingsixnodes,
includesexchangeratesofcurrenciesincountrieslocatedinthesouth,southeastandeastAsia
-Indianrupee,SouthKoreanwon,Malaysianringgit,Philippinepeso,Singaporedollar,Thai
baht. Wenote,however,thatbeinginthesameclusterdoesnotmeanasimilarevolutionofthe
exchangerateitself. Instead,itmeansasimilarevolutionoftheHurstexponent. Therefore,
amoredetailedinterpretationoftheclusterswouldneedamorecarefulonthefactorsthat
mightinﬂuencethisfeatureoftheexchangerates.
Table 2: Final clustering of the exchange rates for the threshold 4—
nontrivialclusters(containingmorethanoneexchangerate)
Cluster Exchangerates
1 USD,HKD
2 JPY,DKK,RON,CHF
3 CZK,NOK,HRK,RUB,AUD
4 GBP,BRL
5 HUF,CNY
6 PLN,SEK
7 INR,KRW,MYR,PHP,SGD,TBH
Timeseriesclusteringbasedontime-varying... 83
Figure7: NetworkconstructedfromtheclusteringsinFigures3–6withtheverticescorre-
spondingtoexchangeratesandtheweightoftheedgescorrespondingtonumberoftimes
eachpairofexchangeratesendeduptogetherinacluster. Theweightoftheedgeisvisualized
bythewidthoftheline,thetypeofthelineandbyitscolour(1=thin,dashed,light-grey,2=
thin,solid,grey,3=thick,dashed,green,4=thick,solid,red).
Table 3: Final clustering of the exchange rates for the threshold 3—
nontrivialclusters(containingmorethanoneexchangerate)
Cluster Exchangerates
1 USD,HUF,CAD,CNY,HKD,INR,KRW,MXN,MYR,PHP,
SGD,THB,ZAR
2 JPY,DKK,RON,CHF
3 CZK,NOK,HRK,RUB,AUD,NZD
4 GBP,BRL
5 PLN,SEK
7. Conclusions
Many ﬁnancial time series exhibit long-range dependence. We used this property for
clustering the time series based on Hurst exponent which measures this dependence. We
84 BabišandStehlíková
Figure 8: Network obtained from Figure 7 by only considering edges with weights equal
to4. Theverticescorrespondtoexchangeratesandtheweightsoftheedgescorrespondto
numberoftimeseachpairofexchangeratesendeduptogetherinacluster.
proposedaclusteringprocedurewhichusesseveraldifferentestimatesoftheHurstexponent
andclusterstheirvaluesobtainedbyarollingwindowmethod. Asaﬁnalstepofourprocedure,
clusteringsoriginatingfromindividualmethodsoftheHurstexponentwerecompared. Inour
exampleofexchangerates,itturnsoutthatweareabletocreateﬁnalclusteringbyrequiring
thatmembersofeachclusterareinthesameclusteratleastthespeciﬁednumberoftimesin
the individual clusterings. We expect the same to hold also in the case of other data since
“similar time series” should appear in the same cluster often, when considering different
detailsofclusteringprocedure. Therefore,ourapproachcanbedirectlyappliedalsotoother
timeseries.
Theresultswhichwehaveobtainedprovideanewapplicationofrollingwindowapproach
toHurstexponentestimation,usedearlierinCajueiroandTabak(2004),Jiangetal.(2018),
and Sensoy and Hacihasanoglu (2014). Moreover, they make it possible to extend other
clusteringanalysessuchasLahmiri(2016),byallowingtousemorethanonetimecriterion
(e.g.,severalestimationmethodsinourparticularcase).
The extension of our results can go in two directions. The ﬁrst one consists of a more
detailed interpretation of the clustering. As we noted, the estimates of the Hurst exponent
provide an information about the underlying time series and we might study the external
Timeseriesclusteringbasedontime-varying... 85
Figure9: Networkobtainedfrom7byonlyconsideringedgeswithweightsgreaterthan2.
Theverticescorrespondtoexchangeratesandtheweightsoftheedgescorrespondtonumber
oftimeseachpairofexchangeratesendeduptogetherinacluster. Theweightoftheedgeis
visualizedbythetypeofthelineandbyitscolour(3=dashed,green,4=solid,red).
factorswhichleadtothisbehaviourofthedata. Thismightalsogiveabetterunderstanding
ofclustersandwhycertainexchangerates(orotherdataconsidered)appearinthesameorin
differentclusters,respectively.
Theotherdirectioninvolvesusingdifferentmethodstoconstructindividualclusterings.
Theﬁnalcomparisonofclustersdoesnothaveanylimitationonthenumberofclusterings
whichenterit,neitheronmethodsusedtoobtainthem. Therearemanymethodsforestimating
Hurstexponents,otherdistancesbetweenvectorsofHurstexponentsmaybeconsidered,we
mayusedifferentclusteringmethods. Individualclusteringsmightgetdifferentweightsand
instead of counting the number of occurrences in the same cluster, it is possible to weight
them. Itmightbeinsightfultoseehowthesedifferentapproachesinﬂuencetheﬁnalclusters.
A possible limitation might be the need of ﬁnding a suitable trade-off between clearly
distinguishedcomponentsinthenetworkandthenumberofisolatednodes,corresponding
toclusterscontainingonetimeseries. Iftheconditionfortheexistenceofanedgebetween
nodesisnotsufﬁcientlystrict,i.e.,onlyasmallnumberofoccurrencesinthesamecluster
is required, the components are often large, which may not be always desirable. On the
other hand, a high threshold often leaves a lot of nodes without an edge, leading to one-
86 BabišandStehlíková
element clusters. However, we may be interested in ﬁnding similar time series to most of
thedata,insteadofconcludingthattheyformaseparatecluster. Apossiblesolutionmight
be a modiﬁcation of our ﬁnal clustering step. Instead of considering the components of
thenetwork,variousmethodsforﬁndingsocalledcommunitiesinconnectednetworkscan
be employed. They aim to divide the nodes into communities, which are characterized by
manyedgeswithinthenodesinacommunityandasmallnumberofedgesbetweennodesin
differentcommunities. ReviewsofsuchmethodscanbefoundinFortunato(2010)andJaved
et al. (2018). The proposed method and its possible modiﬁcations outlined above provide
a new approach for clustering time series using networks and communities, considered in
FerreiraandZhao(2015).
Toconclude,wenoteagainthattheproposedapproachcanbeusedtoanalyzeanytime
serieswithlong-rangedependence,ortimeseriesforwhichtheirregimes—persistent,anti-
persistent or having quickly decaying correlations—need to be distinguished. Therefore
weconsiderittobeaninterestingadditiontothetopicofclusteringtimeserieswiththese
properties.
Acknowledgment
WeacknowledgethecontributionoftheSlovakResearchandDevelopmentAgencyunder
theprojectAPVV-20-0311.
References
Aghabozorgi,S.,Shirkhorshidi,A.S.,&Wah,T.Y.(2015).Time-seriesclustering–adecade
review.InformationSystems,53,16–38.https://doi.org/10.1016/j.is.2015.04.007
Annis,A.A.,&Lloyd,E.H.(1976).Theexpectedvalueoftheadjustedrescaledhurstrange
of independent normal summands. Biometrika, 63(1), 111–116. https://doi.org/10.
1093/biomet/63.1.111
Auer, B. R. (2016). Pure return persistence, hurst exponents and hedge fund selection – a
practicalnote.JournalofAssetManagement,17(5),319–330.https://doi.org/10.1057/
jam.2016.7
Beran,J.(2017).Statisticsforlong-memoryprocesses.Routledge.https://doi.org/10.1201/
9780203738481
Borchers,H.W.(2019).pracma:Practicalnumericalmathfunctions(Version2.2.9)[Com-
putersoftware].TheComprehensiveRArchiveNetwork.https://cran.r-project.org/
package=pracma
Cajueiro,D.O.,&Tabak,B.M.(2004).Thehurstexponentovertime:Testingtheassertion
thatemergingmarketsarebecomingmoreefﬁcient.PhysicaA:StatisticalMechanics
anditsApplications,336(3-4),521–537.https://doi.org/10.1016/j.physa.2003.12.031
Cajueiro,D.O.,&Tabak,B.M.(2009).Testingforlong-rangedependenceintheBrazilian
termstructureofinterestrates.Chaos,Solitons&Fractals,40(4),1559–1573.https:
//doi.org/10.1016/j.chaos.2007.09.054
Di Matteo, T. (2007). Multi-scaling in ﬁnance. Quantitative Finance, 7(1), 21–36. https:
//doi.org/10.1080/14697680600969727
Feng, G.-F., Yang, H.-C., Gong, Q., & Chang, C.-P. (2021). What is the exchange rate
volatilityresponsetoCOVID-19andgovernmentinterventions?EconomicAnalysis
andPolicy,69,705–719.https://doi.org/10.1016/j.eap.2021.01.018
Timeseriesclusteringbasedontime-varying... 87
Ferreira,L.N.,&Zhao,L.(2015).Atimeseriesclusteringtechniquebasedoncommunity
detectioninnetworks.ProcediaComputerScience,53,183–190.https://doi.org/10.
1016/j.procs.2015.07.293
Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3-5), 75–174.
https://doi.org/10.1016/j.physrep.2009.11.002
Fu,T.-c.(2011).Areviewontimeseriesdatamining.EngineeringApplicationsofArtiﬁcial
Intelligence,24(1),164–181.https://doi.org/10.1016/j.engappai.2010.09.007
Han,J.,&Ge,Z.(2020).Effectofdimensionalityreductiononstockselectionwithcluster
analysisindifferentmarketsituations.ExpertSystemswithApplications,147,113226.
https://doi.org/10.1016/j.eswa.2020.113226
Hurst,H.E.(1951).Long-termstoragecapacityofreservoirs.TransactionsoftheAmerican
SocietyofCivilEngineers,116(1),770–799.https://doi.org/10.1061/taceat.0006518
Iorio, C., Frasso, G., D’Ambrosio, A., & Siciliano, R. (2018). A p-spline based clustering
approachforportfolioselection.ExpertSystemswithApplications,95,88–103.https:
//doi.org/10.1016/j.eswa.2017.11.031
Javed,M.A.,Younis,M.S.,Latif,S.,Qadir,J.,&Baig,A.(2018).Communitydetectionin
networks:Amultidisciplinaryreview.JournalofNetworkandComputerApplications,
108,87–111.https://doi.org/10.1016/j.jnca.2018.02.011
Jiang,Y.,Nie,H.,&Ruan,W.(2018).Time-varyinglong-termmemoryinbitcoinmarket.
FinanceResearchLetters,25,280–284.https://doi.org/10.1016/j.frl.2017.12.009
Kido,Y.(2016).OnthelinkbetweentheUSeconomicpolicyuncertaintyandexchangerates.
EconomicsLetters,144,49–52.https://doi.org/10.1016/j.econlet.2016.04.022
Kirchgässner, G., Wolters, J., & Hassler, U. (2013). Introduction to modern time series
analysis.Springer.https://doi.org/10.1007/978-3-642-33436-8
Lahmiri,S.(2016).ClusteringofCasablancastockmarketbasedonhurstexponentestimates.
PhysicaA:StatisticalMechanicsanditsApplications,456,310–318.https://doi.org/
10.1016/j.physa.2016.03.069
Maechler,M.,Rousseeuw,P.,Struyf,A.,Hubert,M.,&Hornik,K.(2019).pracma:Practical
numericalmathfunctions(Version2.1.0)[Computersoftware].TheComprehensive
RArchiveNetwork.https://cran.r-project.org/package=cluster
Maharaj, E. A., D’Urso, P., & Caiado, J. (2019). Time series clustering and classiﬁcation.
Chapman;Hall/CRC.https://doi.org/10.1201/9780429058264
Manasseh, C. O., Chukwu, N. O., Abada, F. C., Ogbuabor, J. E., Lawal, A. I., & Alio,
F.C.(2019).Interactions between stockpricesandexchange rates:An application
ofmultivariateVAR-GARCHmodel.CogentEconomics&Finance,7(1),1681573.
https://doi.org/10.1080/23322039.2019.1681573
Mandelbrot, B. B. (1975). Limit theorems on the self-normalized range for weakly and
strongly dependent processes. Zeitschrift für Wahrscheinlichkeitstheorie und Ver-
wandteGebiete,31(4),271–285.https://doi.org/10.1007/bf00532867
Mandelbrot, B. B., & Wallis, J. R. (1969). Robustness of the rescaled range R/S in the
measurementofnoncycliclongrunstatisticaldependence.WaterResourcesResearch,
5(5),967–988.https://doi.org/10.1029/wr005i005p00967
Massahi,M.,Mahootchi,M.,&ArshadiKhamseh,A.(2020).Developmentofanefﬁcient
cluster-basedportfoliooptimizationmodelunderrealisticmarketconditions.Empiri-
calEconomics,59(5),2423–2442.https://doi.org/10.1007/s00181-019-01802-5
O’Connell,P.,Koutsoyiannis,D.,Lins,H.F.,Markonis,Y.,Montanari,A.,&Cohn,T.(2016).
The scientiﬁc legacy of Harold Edwin hurst (1880–1978). Hydrological Sciences
Journal,61(9),1571–1590.https://doi.org/10.1080/02626667.2015.1125998
88 BabišandStehlíková
Peters, E. E. (1994). Fractal market analysis: Applying chaos theory to investment and
economics.JohnWiley&Sons.
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation
ofclusteranalysis.JournalofComputationalandAppliedMathematics,20,53–65.
https://doi.org/10.1016/0377-0427(87)90125-7
Sensoy, A., & Hacihasanoglu, E. (2014). Time-varying long range dependence in energy
futures markets. Energy Economics, 46, 318–327. https://doi.org/10.1016/j.eneco.
2014.09.023
Song,J.Y.,Chang,W.,&Song,J.W.(2019).Clusteranalysisonthestructureofthecryp-
tocurrencymarketviaBitcoin–Ethereumﬁltering.PhysicaA:StatisticalMechanics
anditsApplications,527,121339.https://doi.org/10.1016/j.physa.2019.121339
Taqqu,M.S.,Teverovsky,V.,&Willinger,W.(1995).Estimatorsforlong-rangedependence:
An empirical study. Fractals. Complex Geometry, Patterns, and Scaling in Nature
andSociety,03(04),785–798.https://doi.org/10.1142/s0218348x95000692
Weron,R.(2002).Estimatinglong-rangedependence:Finitesamplepropertiesandconﬁdence
intervals.PhysicaA:StatisticalMechanicsanditsApplications,312(1-2),285–299.
https://doi.org/10.1016/s0378-4371(02)00961-5
Wuertz, D., Setz, T., Chalabi, Y., Boudt, C., Chausse, P., & Miklovac, M. (2020). fGarch:
Rmetrics-autoregressiveconditionalheteroskedasticmodelling(Version3042.83.2)
[Computersoftware].TheComprehensiveRArchiveNetwork.https://cran.r-project.
org/package=fGarch
You, Y., & Liu, X. (2020). Forecasting short-run exchange rate volatility with monetary
fundamentals: A GARCH-MIDAS approach. Journal of Banking & Finance, 116,
105849.https://doi.org/10.1016/j.jbankﬁn.2020.105849
Zhou, Z., Fu, Z., Jiang, Y., Zeng, X., & Lin, L. (2020). Can economic policy uncertainty
predict exchange rate volatility? New evidence from the GARCH-MIDAS model.
FinanceResearchLetters,34,101258.https://doi.org/10.1016/j.frl.2019.08.006