ERK'2020, Portorož, 444-447 444
Time series classiﬁcation using time-frequency analysis and
Convolutional Neural Networks
Domen Kavran
Faculty of Electrical Engineering and Computer Science, University of Maribor
Koroˇ ska cesta 46, 2000 Maribor, Slovenia
E-mail: domen.kavran@student.um.si
Abstract
Technological advances in various industries have led
to an increase in the amount of data in which patterns,
trends and useful informations are hidden. Data are
quantitatively large, complex and generated at high
speed, which makes them impossible to process with
traditional methods. A common kind of sequential data
are time series, where time order is important.
Regression applications are often based on time series,
but they can also be used for classiﬁcation purposes. In
recent years, deep learning has proven to be very
successful in classifying complex time series data. In
this paper, a new method for time series classiﬁcation is
presented using deep learning. Different time-frequency
analysis methods are applied on time series in order to
obtain time-frequency representations, which serve as
input data to convolutional neural networks. By
combining various time-frequency representations, the
proposed method achieved average classiﬁcation
accuracy of 92,28 %.
1 Introduction
Data collection is being introduced into different areas
of industries where this has not been done before.
Multitudes of different sensors with high sampling
frequencies gather complex data. Such an environment
produces large volumes of varied data that are obtained
at high velocity. This kind of data is named ”big data”.
Important types of big data are time series, which
represent values of a given quantity over time. Physical
quantities, such as electric current, temperature, and
velocity are observed most commonly [1]. Time series
are obtained from different sources, ranging from mobile
phones and Internet of Things (IoT) devices to industrial
machines and medical equipment [2]. Acquired time
series can be noisy and may have information gaps.
Sampled time series is treated as a signal consisting of
complex sinusoids. These oscillate with different
frequencies. Each sinusoid reaches a certain amplitude
and has an offset, so-called phase. Sinusoids are present
at different time points in a signal. Its amplitude, phase
and energy density in time and frequency are analyzed
using time-frequency analysis methods, which are
presented in the following Sections.
Various methods calculate time-frequency
representations, which are limited by the ratio of time
and frequency resolution. Time series are, thus,
represented as two-dimensional images in the
time-frequency plane, where both axes represent time t
and frequencyf, with signal properties being expressed
in grayscale shades. Time-frequency representations are
used as input data to convolutional neural networks to
perform classiﬁcation. Most of the time-frequency
representation algorithms have long been developed, but
they have recently been introduced to the ﬁeld of Deep
Learning as part of the data preparation process [3]. The
challenges of time series classiﬁcation are present in
many research areas - from identiﬁcation of anomalies in
ﬁnancial markets to the automated recognition of brain
and cardiovascular diseases [2].
Feature selection is a challenging process, as domain
knowledge is often needed. An alternative approach is
offered by Convolutional Neural Networks (CNNs),
which acquire information-rich features from
multidimensional input data using an array of adaptive
convolutional ﬁlters, called kernels [4]. Many methods
for time series classiﬁcation using CNNs have been
proposed throughout the years. Residual neural network
(ResNet) is currently considered to be the
state-of-the-art deep neural network architecture for time
series classiﬁcation [5]. A recently developed ensemble
of ﬁve deep CNNs, named InceptionTime, has proven to
be very competitive with the state-of-the-art algorithm
HIVE-COTE [5].
The proposed CNN architectures use two-
dimensional and three-dimensional convolutional
kernels. Two-dimensional kernels calculate features in
time-frequency space. By combining various time-
frequency representations, different local characteristics
of the signal are used for three-dimensional feature
extraction.
This paper consists of ﬁve sections. The next section
presents data preparation using Hilbert transform and
time-frequency analysis methods. The third section
presents CNNs and proposed architectures for time
series classiﬁcation. Classiﬁcation results are presented
in the fourth section. The conclusions are given in the
last section.
445
2 Time series analysis
The analysis of a time series, treated as a signal x, is
performed using mutually independent methods and
mathematical operations performed in a sequence called
a pipeline. Each pipeline consists of several steps. The
output of thei-th step is the input to thei+1-th step. The
last step is an exception, as its output is the result of the
pipeline. The proposed pipeline has the following steps:
1. Calculation of the analytic signal x
a
using the
Hilbert transform,
2. Calculation of phase or energy density by using
one of the following time-frequency analysis
methods:
• Short-Time Fourier Transform,
• Smoothed Pseudo Wigner-Ville Distribution,
• Continuous Wavelet Transform,
3. Normalization of values to the interval[0;1].
Use of an analytic signal makes a frequency
spectrum more interpretable, but its use is also
mandatory for some time-frequency analysis methods
[6]. Phase and energy density are unique signal
characteristics in time and frequency. Normalization, as
a ﬁnal step, ensures CNNs learn faster, with input values
having equal scales without distortions. Methods in
pipeline are described in the following subsections.
2.1 Hilbert transform
Hilbert transform performs phase manipulation without
affecting the amplitude spectrum of the signal. The
Hilbert transformH of a signal x is deﬁned by (1),
whereh =
1
 t
and the Cauchy principal value is denoted
byP [6]. The transform is used to calculate an analytic
signal that is complex and does not contain negative
frequency content. The analytic signal is written by (2)
[6].
x
H
(t)=H[x(t)]=x(t)  h(t)=
1
  P
Z
1
 1
x(  )
t    d  (1)
x
a
(t)=x(t)+ix
H
(t) (2)
2.2 Short-Time Fourier Transform
Short-Time Fourier Transform (STFT) is a
time-frequency analysis method which calculates
Fourier transform on segments of a non-stationary
signal. Multiplying segments with window function w
reduces spectral leakage. Short-Time Fourier Transform
has a constant frequency and time resolution that
depends on the length of the segments. Computing
squared amplitude is represented by a spectrogram,
which shows energy density in time and frequency.
Short-Time Fourier Transform is deﬁned by (3) [7].
STFT
w
x
(t;f)=
Z
1
 1
x(  )w
  (    t)e
  i2 f  d  (3)
Short-Time Fourier Transform was demonstrated on
a recording of a bird chirp, that was sampled at 22050
Hz. Hann window function was applied on signal
segments, which overlapped by half of their length. The
spectrogram of the bird chirp is shown in Figure 1 and
the thresholded phase spectrum in time and frequency is
shown in Figure 2.
Figure 1: Spectrogram of the bird chirp.
Figure 2: Thresholded phase spectrum in time and
frequency of the bird chirp.
2.3 Smoothed Pseudo Wigner-Ville Distribution
Among many methods for time-frequency analysis,
Wigner-Ville Distribution is considered the most
advanced. It represents energy density, the same as a
spectrogram, but with higher resolution, although
artifacts called cross terms are present. Smoothed
Pseudo Wigner-Ville Distribution (SPWVD) reduces
unwanted cross terms with time-frequency ﬁltering;
however, time and frequency resolution are deteriorated.
Smoothed Pseudo Wigner-Ville Distribution is deﬁned
with (4), whereg(s) andh(  ) are window functions for
smoothing in time and frequency [8]. The result of
Smoothed Pseudo Wigner-Ville Distribution on a
recording of a bird chirp is shown in Figure 3.
SPWVD
x
(t;f)=
Z
1
 1
h(  )
"
Z
1
 1
g(s  t)x(s+
  2
)
x
  (s    2
)ds
#
e
  i2 f  d  (4)
Figure 3: SPWVD of the bird chirp.
2.4 Continuous Wavelet Transform
Continuous Wavelet Transform (CWT) is an alternative
to Short-Time Fourier Transform. An important property
of Continuous Wavelet Transform is that it does not use
Fourier transform. Instead, convolution of the signal is
446
performed using a function, called a wavelet. Time
localization is carried out with translation   of mother
wavelet . Its length is adjusted with scale s, thus
effecting time and frequency resolution. Computing
squared amplitude is represented by a scalogram, which
shows energy density in time and scale. Continuous
Wavelet Transform is deﬁned by (5) [9]. Results of
Continuous Wavelet Transform with a complex Morlet
wavelet of a bird chirp are visible on Figures 4 and 5.
CWT
 
x
( ;s )=
1
p
jsj
Z
1
 1
x(t) 
    t    s
  dt (5)
Figure 4: Scalogram of the bird chirp.
Figure 5: Thresholded phase spectrum in time and scale
of the bird chirp.
3 Convolutional Neural Networks
The most established algorithms in the ﬁeld of Deep
Learning are CNNs, which are designed to learn spatial
hierarchies of features automatically by using
convolutional layers, pooling layers and fully connected
layers. They are often used to solve computer vision
problems in industrial processes and various ﬁelds of
medicine. CNNs achieve exceptional results at seg-
mentation and recognition of objects in images and at
processing natural language [4].
3.1 Proposed CNN architectures
Figures 6 and 7 show the proposed CNN architectures
for classiﬁcation of time series of lengthN. The CNN-I
architecture is designed for time-series classiﬁcation,
based on individual time-frequency representations. The
CNN-C architecture performs time series classiﬁcation
with combined time-frequency representations, used as
input. The input sizes W  H represent the width and
height of the time-frequency representations, and the
input sizes N  N  5 are intended for combined
time-frequency representations. The hyperparameter K
represents the number of convolutional kernels, and F
p
the size of pooling ﬁlters. Convolution layers and
pooling layers use zero padding. The number of neurons
in a fully connected layer is deﬁned by the
hyperparameter N
fc
. The number of neurons in the
output layer is equal to the number of classiﬁcation
classesN
C
[4].
4 Results
Classiﬁcation was performed on the datasets with
pre-segmented time series described in Table 1 [10, 11].
All datasets (except Epileptic Seizure Recognition) had
a predetermined train and test set. For individual time
series dataset, ﬁve pipelines were deﬁned to calculate the
following normalized outputs for each time series of
lengthN:
• Phase spectrum as a result of STFT,
• Spectrogram,
• Phase spectrum as a result of CWT,
• Scalogram,
• Smoothed Pseudo Wigner-Ville Distribution.
The output of each pipeline is a time-frequency
representation of the time series, which is then used as
input data to one of ﬁve CNNs having CNN-I
architecture. Because the sizes of representations are not
equal, they need to be scaled to sizeN  N with nearest
neighbors interpolation before combining them and
forwarding them to a CNN having CNN-C architecture.
The hyperparameters of each CNN were adjusted based
on 10-fold cross validation.
Figure 6: Architecture CNN-I for time series classiﬁcation, using individual time-frequency representations as the
input.
Figure 7: Architecture CNN-C for time series classiﬁcation, using combined time-frequency representations as the
input.
447
Table 1: Time series datasets.
Dataset name Description
Time
series
length
No. of
classes
ECG5000 ECG recordings of 15 patients with ﬁve categorizations of cardiovascular diseases 140 5
InsectEPGRegularTrain EPG signals of insect interaction with plants 601 3
Epileptic Seizure Recognition Sections of EEG recordings of 500 individuals with epilepsy attacks 178 5
MelbournePedestrian Hourly number of pedestrians at ten locations in the city of Melbourne 24 10
ElectricDevices Electricity consumption of seven groups of appliances in UK households 96 7
Wafer Control measurements of sensors in the processing of silicon wafers 152 2
PowerCons Electricity consumption of households in summer and winter 144 2
The results of the proposed classiﬁcation method are
shown in Table 2. Each column contains classiﬁcation
accuracies, obtained with selected time-frequency
representations, and used as inputs to CNNs with a
chosen architecture (CNN-I or CNN-C). The highest
classiﬁcation accuracy of each dataset is marked with
bold text. Using individual time-frequency represen-
tations as inputs to CNNs with architecture CNN-I, the
highest classiﬁcation accuracies in all datasets were
achieved with phase spectrums or scalograms, obtained
with Continuous Wavelet Transform.
Convolutional Neural Networks with architecture
CNN-C, that performed classiﬁcation based on
combined time-frequency representations, achieved
higher classiﬁcation accuracies of datasets ECG5000,
InsectEPGRegularTrain, Epileptic Seizure Recognition
and MelbournePedestrian.
Table 2: Time series classiﬁcation accuracies.
Dataset name
STFT - phase
spectrum (CNN-I)
Spectrogram
(CNN-I)
CWT - phase
spectrum (CNN-I)
Scalogram
(CNN-I)
SPWVD
(CNN-I)
Combined
representations
(CNN-C)
ECG5000 93,29% 92,98% 93,69% 93,51% 93,33% 94,27%
InsectEPGRegularTrain 95,58% 83,13% 99,60% 99,60% 83,13% 100,00%
Epileptic Seizure Recognition 91,04% 92,35% 94,87% 96,70% 95,78% 97,39%
MelbournePedestrian 61,96% 63,56% 83,22% 75,30% 68,36% 88,39%
ElectricDevices 57,23% 65,27% 61,47% 69,14% 63,91% 68,28%
Wafer 98,96% 99,04% 99,43% 99,53% 98,60% 99,33%
PowerCons 91,11% 98,89% 96,11% 98,89% 96,67% 98,33%
5 Conclusion
In this paper, a new method for time series classiﬁcation
was presented, where data preparation pipelines
compute time-frequency representations, which are then
used as input to Convolutional Neural Networks for
classiﬁcation. Convolutional Neural Networks were
most successful at classifying phase spectrums and
scalograms, which were calculated with Continuous
Wavelet Transform, achieving an average accuracy of
90,07 %. Classiﬁcation accuracies of some time series
datasets have been improved by combining different
time-frequency representations, achieving an average
accuracy of 92,28 %. In the future, the proposed
Convolutional Neural Network architectures will be
adapted to perform multivariate time series
classiﬁcation.
Acknowledgment
I thank Assist. Prof. Dr. Niko Lukaˇ c for guidance and
professional assistance in writing my Master’s thesis,
which was the basis for this paper.
References
[1] N. Elgendy and A. Elragal. Big Data Analytics: A Literature
Review Paper. Advances in Data Mining: Applications and
Theoretical Aspects. 14th Industrial Conference on Data Mining,
St. Petersburg, 16.–20. July 2014. Leipzig: Springer, 214–227,
2014.
[2] G. A. Susto, A. Cenedese and M. Terzi. Time-Series Classiﬁcation
Methods: Review and Applications to Power Systems Data.
Big Data Application in Power Systems. Amsterdam: Elsevier,
179–220, 2018.
[3] J. Zhang, J. Tian, Y . Cao, Y . Yang and X. Xu. Deep
time–frequency representation and progressive decision fusion for
ECG classiﬁcation. Knowledge-Based Systems, 190, 2020.
[4] R. Yamashita, M. Nishio, R. K. G. Do and K. Togashi.
Convolutional neural networks: an overview and application in
radiology. Insights into Imaging, 9(4):611–629, 2018.
[5] H. I. Fawaz, B. Lucas, G. Forestier, C. Pelletier, D. F.
Schmidt, et al. InceptionTime: Finding AlexNet for Time Series
Classiﬁcation, 2019.
[6] M. Johansson. The Hilbert Transform. V¨ axj¨ o: V¨ axj¨ o University,
2005.
[7] F. Hlawatsch and F. Auger. Time-Frequency Analysis: Concepts
and Methods. London: ISTE, 2008.
[8] A. Djebbari and F. Bereksi-Reguig. Detection of the valvular split
within the second heart sound using the reassigned smoothed
pseudo Wigner–Ville distribution. BioMedical Engineering
OnLine, 12(1):1–21, 2013.
[9] L. Aguiar-Conraria and M. J. Soares. The Continuous Wavelet
Transform: moving beyond uni and bivariate analysis. Journal of
Economic Surveys, 28(2):344–375, 2014.
[10] A. Bagnall, E. Keogh, J. Lines, A. Bostrom and J. Large. UEA &
UCR Time Series Classiﬁcation Repository, 2020.
[11] D. Dua and C. Graff. UCI Machine Learning Repository. Irvine:
University of California, School of Information and Computer
Science, 2020.