https://doi.org/10.31449/inf.v47i10.5268 Informatica 47 (2023) 109–122 109 
 
Optimizing Deep LSTM Model through Hyperparameter Tuning for 
Sensor-Based Human Activity Recognition in Smart Home 
Mariam El Ghazi, Noura Aknin 
Information Technology and Modeling, Systems Research Unit, Abdelmalek Essaadi University, Tetouan, Morocco 
E-mail: mariam.elghazi@etu.uae.ac.ma , noura.aknin@uae.ac.ma 
Keywords: long short-term memory (LSTM), hyperparameter tuning, batch normalization, deep learning, wearable 
sensors, human activity recognition (HAR) 
Received: October 10, 2023 
Human Activity Recognition (HAR) holds significant potential in healthcare, smart homes, sports, and 
security, mainly benefiting the well-being of elderly individuals and dependents. This research 
introduces an innovative deep learning-based approach to HAR, using wearable sensors in smart home 
environments. In this paper, we conduct a comprehensive review of the state of the art, offering insights 
into existing methods, classification techniques, their performances, hyperparameter tuning strategies, 
findings, limitations, and future directions. We propose an LSTM-based deep model enriched with batch 
normalization and perform a hyperparameter tuning using Bayesian Optimization; then, we evaluate 
the model on the PAMAP2 public dataset. The model outperforms previous studies, achieving 
remarkable performance metrics, including accuracy at 97.71%, F1 score, precision, and recall, 
approaching 96.66%, 96.85%, and 96.55%, respectively. We plan to assess the model's generalization 
capabilities for future work by training it on diverse datasets such as Opportunity and WISDM. 
Furthermore, we aim to enhance the model by exploring hybrid deep model architectures and 
alternative hyperparameter tuning approaches. These efforts maximize the model's efficiency and 
adaptability in real-world scenarios. 
Povzetek: 
1  Introduction  
HAR has emerged as a crucial research area with wide-
ranging applications in healthcare, smart homes, sports, 
and security [1]. Automatically recognizing and 
categorizing human activities can significantly enhance 
the well-being and independence of elderly individuals 
and those needing care [2]. In smart home settings, 
Human Activity Recognition (HAR) systems are essential 
for delivering context-aware services, monitoring 
residents' activities, and promptly notifying caregivers in 
the event of unusual situations [3]. 
While video-based approaches can achieve HAR, they 
often raise privacy concerns due to continuous 
surveillance requirements [4]. Sensor-based HAR using 
wearable devices has gained popularity to address these 
privacy issues. The data can be discreetly collected 
without compromising individuals' privacy using 
wearable sensors, such as accelerometers, gyroscopes, 
and temperature sensors [5]. 
This study focuses on sensor-based HAR using deep 
learning models, specifically LSTM. This network is 
suitable for handling time series data, which is essential 
for HAR tasks as activities are often characterized by 
sequential patterns over time. LSTM's ability to capture 
long-term dependencies and handle variable-length input 
sequences makes it an ideal choice for this time-sensitive 
problem. 
The main contributions of this paper are as follows. 
1) We conduct an in-depth review of the state of the 
art in sensor-based HAR using deep learning. This  
 
review provides valuable insights for readers, offering a 
thorough understanding of existing methods, 
classification techniques, hyperparameter tuning 
approaches, key findings, limitations, and future 
research directions. This comprehensive overview 
serves as a benchmark for comparing advancements in 
this domain. 
2) We systematically extract the performance 
metrics achieved by models in previous studies, 
including accuracy, F1 score, precision, and Recall. 
Additionally, we assess whether these studies employed 
validation methods such as k-fold cross-validation.  
3) We propose a novel LSTM-based model 
featuring batch normalization. To enhance its 
performance, we conduct hyperparameter tuning using 
Bayesian optimization.  
4) We evaluate the efficacy of our proposed LSTM-
based model on the publicly available wearable sensor 
dataset PAMAP2. We demonstrate the model's 
effectiveness through a rigorous assessment using 
accuracy, F1 score, precision, and Recall metrics. 
Furthermore, we ensure the reliability and 
generalizability of our model by performing a 10-fold 
cross-validation. 
5) In addition to showcasing our experimental 
results, we compare them with those reported in the 
state-of-the-art. This comparative analysis positions our 
proposed method within the broader context of existing 
research, highlighting its strengths and contributions. 
110   Informatica 47 (2023) 109–122                                                                                                                            M.E. Ghazi et al. 
The paper is organized into distinct sections as follows: 
Section 2 provides a thorough review of prior studies 
related to sensor-based Human Activity Recognition 
(HAR). In Section 3, the proposed materials and methods 
are presented. Section 4 is dedicated to detailing the 
experiments and presenting the results. Finally, Section 5 
concludes the paper. This structured approach ensures 
clarity and coherence throughout the document. 
 
 
 
 
 
 
 
 
Table 1: State-of-the-art of sensor-based HAR using deep learning 
Study Year 
Classification 
Method 
Datasets 
Hyperparameter 
(HP) 
Findings 
Limitations/ 
Future Works 
Hammerla et 
al[6] 
2016 
CNN -Opportunity 
-PAMAP2 
-Daphnet Gait 
dataset 
-fANOVA to 
investigate the 
impact of HP on 
model 
performance 
-The best-performing 
model is CNN  
-DL guidelines for 
practitioners 
-Explore more hp to optimize the 
models  
-explore more datasets and more 
complex models LSTM 
Ma et al.[7] 2019 
AttnSense 
(CNN, GRU, 
Attention 
Mechanism) 
Heterogeneou
s, UniMiB-
SHAR, and 
PAMAP2 
The study explores 
HP impact, 
including CNN 
structure and 
sliding window 
width. 
The results confirm the 
model's effectiveness in 
capturing dependencies 
in sensing signals' spatial 
and temporal domains.  
Not mentioned 
Xu et al.[8] 2019 
Inno HAR 
(inception 
NN+RNN+GR
U) 
Opportunity  
PAMAP2  
Smartphone 
Dataset 
No mention of HP 
Tuning. 
The proposed model 
exhibits superior 
performance and 
demonstrates strong 
generalization. Also, it 
has significant potential 
for real-time 
applications. 
- Consider adjusting the network 
structure, including kernel sizes and 
connection methods. 
-Address the problem of class 
imbalance for HAR. 
Wan et al[9] 2020 
CNN 
UCI  
PAMAP2. 
YES, but the HP 
tuning techniques 
and ranges are not 
clearly explained. 
The results showed that 
the CNN model 
outperforms other 
models  
Explore new sensors for HAR. 
Investigate transfer learning's 
impact. If explore diverse HP 
impacts and identify optimal settings 
for varied datasets and applications. 
LSTM 
Bi-LSTM 
Gao et 
al.[10] 
2020 
DanHAR (CNN 
and Attention 
Mechanism) 
WISDM 
PAMAP2 
UNIMIB 
SHAR 
OPPORTUNI
TY  
No mention of HP 
tuning 
The proposed model 
provided good results  
Explore the impact of different HP 
on the performance of danhar and 
investigate the effectiveness on other 
datasets. 
Xu et al.[11] 2022 
Inception-
LSTM with 
Attention 
Mechanism 
Self-built 
dataset 
PAMAP2 
No HP tuning, the 
HP are set by the 
authors and kept 
consistent. 
The proposed model 
outperforms traditional 
algorithms in terms of 
accuracy and 
convergence speed 
Limitation: Requires substantial 
training data, posing potential 
overfitting risks. 
Future Directions: Explore 
alternative models and 
regularization techniques. Extend 
model application to diverse 
contexts. 
 
Thakur et 
al.[12] 
2022 
ConvAE-LSTM 
Model (CNN, 
LSTM, Auto 
Encoder) 
WISDM 
UCI 
PAMAP2 
Opportunity 
The authors 
examine the 
impact of 
hyperparameters, 
such as the type of 
Optimizer used, 
the number of 
epochs, and the 
batch size, which 
affect the model 
performance. 
The proposed Conv-AE-
LSTM model provided 
good performance. 
-The proposed method should be 
analyzed for its applicability in real-
life applications 
-To compare the proposed method 
with other DL models, and examine 
other datasets. Also, further HP 
tuning could be performed to 
optimize the model's performance. 
Tehrani et 
al.[13] 
2023 Bi-LSTM 
AreM 
Mhealth 
PAMAP2 
The study 
investigates the 
Optimal window 
size and 
percentage of 
votes. 
The proposed method 
attained a higher 
performance. 
Improve the network's performance 
by optimizing the hyperparameters 
and exploring other types of neural 
networks. 
Challa et 
al.[14] 
2023 
CNN+biLSTM+ 
HP Tuning 
PAMAP2  
UCI-HAR 
MHEALTH 
The study used the  
Rao-3 
metaheuristic 
optimization to 
search for optimal 
HP. 
The authors searched for 
optimal HP values for 
DL models because they 
are crucial for the best 
performance. 
The proposed model 
achieved good results  
Full text not open access   
Kumar et al. 
[15] 
2023 GRU 
WISDM 
PAMAP2  
KU-HAR  
Full text not open 
access   
The model achieved 
good results  
The experimental outcomes offer an 
understanding of the practicality of 
the proposed model and suggest 
potential avenues for future research. 
Optimizing Deep LSTM Model through Hyperparameter Tuning…                                          Informatica 47 (2023) 109–122   111 
111 
VOLUME 47(12), 2023 
2 Related works 
2.1 Human activity recognition overview: Human 
Activity Recognition (HAR) has become a pivotal 
research area with widespread applications in healthcare, 
smart homes, sports, and security [1]. The automatic 
detection and classification of human activities are crucial 
for enhancing the quality of life for elderly individuals and 
dependents, especially in smart home environments [2]. 
As the introduction mentions, this study will focus on 
sensor-based HAR to preserve the resident's privacy in a 
smart home.  
2.2 Current trends of the state-of-the-art (SOTA): In 
the realm of deep learning models, various architectures, 
including Convolutional Neural Networks (CNN), 
Recurrent Neural Networks (RNN), Long Short-Term 
Memory (LSTM), and Gated Recurrent Unit (GRU), have 
been employed for HAR. These models have 
demonstrated promising results in recognizing human 
activities based on sensor data. Table 1 summarizes the 
state of the art of sensor-based HAR using deep learning. 
the noticeable trends in deep learning for HAR  
Trends in deep learning models for HAR: CNN emerged 
as the most widely used deep learning model across 
studies (Hammerla et al. [6], Ma et al. [7], Wan et al. [9], 
Gao et al. [10], Challa et al. [14]). CNNs are preferred for 
their ability to capture spatial features, making them 
suitable for sensor-based activity recognition. Besides, 
LSTM networks are also ubiquitous (Wan et al. [9], Xu et 
al. [11], Tehrani et al. [13]). Bi-directional LSTMs, in 
particular, are explored for their effectiveness in capturing 
temporal dependencies [13]. Some Studies integrated 
Attention mechanisms into models (Ma et al. [7], Gao et 
al. [10], Xu et al. [11]) to enhance the focus on specific 
segments of input sequences, contributing to improved 
performance. Another study conducted by Kalabakov et 
al.[16] uses DeepConvLSTM to transfer knowledge 
between two datasets, revealing that transferring the 
weights of fewer convolutional layers is more effective. 
Some practical implications of sensor-based Human 
Activity Recognition (HAR) in areas such as sports[17], 
surveillance [18], and fall detection[19] aim to ensure a 
healthier lifestyle for older people. 
Hyperparameter tuning: Several studies (Hammerla et 
al. [6] and Thakur et al. [12]) explicitly explore 
hyperparameters, emphasizing their impact on model 
performance. However, there is a lack of consistency 
across studies regarding hyperparameter tuning. 
Dataset diversity: In the field of human activity 
recognition, researchers have used publicly available 
datasets such as Opportunity[20], WISDM V1.1[21], and 
PAMAP2[22]. These datasets have allowed them to 
develop and test activity recognition methods using 
motion sensor data, reflecting an effort to generalize 
models across different contexts. It is noticeable that the 
PAMAP2 dataset is among the most used for HAR due to 
the size of the dataset, the Variety of the performed 
activities, and multiple subjects. 
Evaluation metrics and model performance: Table 2 
presents a variety of evaluation metrics across different 
studies, reflecting a lack of standardized reporting 
practices. While some studies provide accuracy, F1 score, 
precision, and Recall (Wan et al. [9], Xu et al. [11]), others 
have incomplete metrics (Hammerla et al. [6], Gao et al. 
[10]). This inconsistency makes direct comparisons 
challenging and emphasizes the need for standardized 
evaluation practices in sensor-based HAR research. Two 
standout models in sensor-based Human Activity 
Recognition are the Inception-LSTM proposed by Xu et 
al. (2022) with an accuracy of 95.04% [11]. Additionally, 
the hybrid CNN and bi-LSTM model with 
hyperparameter tuning introduced by Challa et al. (2023) 
achieved a slightly lower accuracy of 94.91% [14]. Challa 
et al. underscores the critical role of hyperparameter 
tuning for optimizing the performance of their model in 
activity recognition tasks.  
Table 2: Related works model performance on PAMAP2 dataset 
Study Year Classification Model Accuracy F1 score Precision Recall 
K-fold cross 
Validation 
Hammerla et al[6] 2016 CNN - 93,70% - - 
No 
Hammerla et al[6] 2016 LSTM - 92,90% - - 
No 
Ma et al.[7] 2019 AttnSense - 89,30% - - 
Yes (4 folds) 
Xu et al.[8] 2019 Inno HAR - 93,50% - - 
No 
Wan et al[9] 2020 CNN 91,00% 91,16% 91,66% 90,85% 
No 
Wan et al[9] 2020 LSTM 85,86% 85,34% 86,51% 84,67% 
No 
Wan et al[9] 2020 Bi-LSTM 89,52% 89,40% 90,19% 89,02% 
No 
Gao et al.[10] 2020 DanHAR 93,16% - - - 
No 
Xu et al.[11] 2022 Inception-LSTM  95,04% 95,13% 95,06% 95,21% 
No 
Thakur et al.[12] 2022 ConvAE-LSTM  94,33% 94,46% - - 
Yes (5 folds) 
Tehrani et al.[13] 2023 Bi-LSTM -  93.41% 93.41%  93.47 % 
No 
Tehrani et al.[13] 2023 Bi-LSTM -  93,41% 93,41% 93,47% 
No 
Challa et al.[14] 2023 CNN+biLSTM+HP Tuning  94,91% - - - 
No 
Kumar et al. [15] 2023 GRU 94,77% - - - 
No 
112   Informatica 47 (2023) 109–122                                                                                                                            M.E. Ghazi et al. 
 
Most sensor-based Human Activity Recognition studies 
lack the essential practice of employing K-fold cross-
validation to evaluate the reliability and generalizability 
of deep learning models. Notably, only Ma et al. [7] 
(2019) and Thakur et al. [12] (2022) have incorporated 4-
fold and 5-fold cross-validation, respectively, 
highlighting the need for a more standardized evaluation 
methodology. The absence of K-fold cross-validation 
across studies underscores the importance of a consistent 
approach for reliable comparisons. 
 
2.3 Identified gaps in the literature: Despite the 
progress in HAR using deep learning models, a critical 
analysis reveals several gaps in the existing literature: 
• Limited hyperparameter tuning: One notable gap 
is the lack of emphasis on hyperparameter tuning in 
several 
studies (e.g., Ma et al. [7], Gao et al. [10]) highlight 
a potential gap in the exploration of optimal model 
configurations, potentially impacting the models' 
entire performance. 
• Inconsistent evaluation metrics: Another 
identified gap is the inconsistency in the choice of 
evaluation metrics across studies. While some focus 
on accuracy, others may neglect other essential 
metrics like F1 score, precision, and Recall, leading 
to an incomplete assessment of model performance. 
• Sparse adoption of K-Fold cross validation: Few 
studies employ K-fold cross-validation to validate 
their models rigorously. This approach provides a 
more robust understanding of a model's 
generalizability, yet it remains underutilized in the 
current literature. 
2.4. Future directions: In our state-of-the-art analysis, 
numerous future directions proposed by previous studies 
provide valuable insights into the evolving landscape of 
research and innovation. 
• Standardized practices: There is a need for 
standardized practices, including consistent 
hyperparameter tuning and reporting guidelines, to 
facilitate reproducibility and comparison across 
studies. 
• Transfer learning exploration: Future research 
could further explore the potential of transfer 
learning in sensor-based HAR, leveraging 
knowledge from pre-trained models to improve 
generalization. 
• Handling class imbalance: Strategies to address 
class imbalance should be a focus of future work to 
enhance the robustness and applicability of models 
in real-world scenarios. 
 
2.5 Addressing gaps in our proposed model: In light of 
the identified gaps, our work makes significant strides in 
advancing the field: 
• In-depth hyperparameter tuning: Our proposed 
model incorporates hyperparameter tuning using 
Bayesian optimization. This deliberate approach 
enhances our model's adaptability and performance, 
addressing the previously observed gap. 
• Comprehensive evaluation metrics: To overcome 
the inconsistency in evaluation metrics, we conduct 
a comprehensive assessment, including accuracy, F1 
score, precision, and Recall. This ensures a thorough 
understanding of our model's performance across 
various dimensions. 
• Rigorous K-Fold cross validation: Recognizing 
the importance of model validation, we implement a 
rigorous 10-fold cross-validation methodology. This 
validation strategy ensures the reliability and 
generalizability of our model's performance, 
addressing the underutilization of K-fold cross-
validation in previous studies. 
In summary, our work contributes to the evolution of 
sensor-based HAR by introducing an optimized LSTM 
model, specifically addressing gaps related to 
hyperparameter tuning, evaluation metrics, and model 
validation. Through these advancements, we offer a 
refined and optimized deep-learning model tailored to the 
intricacies of wearable sensor data in smart home 
environments. 
3 Material and methods 
The study introduces an LSTM-based Human Activity 
Recognition framework (as shown in Figure 1) that 
utilizes wearable sensor data from the PAMAP2 dataset. 
The sensors are placed on the chest, ankle, and hand to 
collect data related to 12 activities performed by 
individuals. The methodology involves three main stages: 
data preprocessing and segmentation in the first stage, 
data splitting into training and testing sets in the second 
stage, and training and hyperparameter tuning, followed 
by the model evaluation in the final stage. During the 
model training and tuning phase, the data is split into 70% 
for the training set and 30% for the testing set. Our 
proposed LSTM model is tested using the validation data, 
and hyperparameters are optimized using the Bayesian 
optimization approach. Subsequently, the 
hyperparameter-tuned models are evaluated using the test 
data to measure their recognition performance and to 
compare their effectiveness. 
3.1    PAMAP2 dataset: 
The dataset used in this study is The PAMAP2 dataset, 
which is widely employed in HAR research due to its 
extensive usage and relevance in this field. It contains 
sensor data from wearable devices, including inertial 
measurement units (IMUs) and physiological sensors. 
The dataset comprises recordings of various physical 
activities performed by participants, covering a wide 
range of movements and intensities. Multiple participants 
were involved, allowing for the study of individual 
variations in activity recognition. Each activity recording 
is labeled, providing ground truth data for Training and 
evaluating HAR models. The dataset's structured format 
includes separate files for different sensor modalities, 
Optimizing Deep LSTM Model through Hyperparameter Tuning…                                          Informatica 47 (2023) 109–122   113 
 
facilitating analysis and combining data for activity 
recognition. The PAMAP2 dataset is a valuable resource 
for advancing sensor-based activity recognition research. 
Table 3 presents a comprehensive overview of the 
PAMAP2 dataset, incorporating information from the 
provided documentation and our conducted experiment 
[22]. 
Figure 1: The schematic diagram of the proposed LSTM-Based model for sensor-based HAR 
 
 
Table 3: PAMAP2 dataset description 
Dataset Labels Sampling rate 
Windows 
size 
Overlap 
Features 
vector 
Total of Segments (20588) 
Training 70% Testing 30% 
PAMAP2 12 100 Hz 1s 50% (20588, 42) (14412,100,42) (6176,100,4) 
 
 
 
114   Informatica 47 (2023) 109–122                                                                                                                            M.E. Ghazi et al. 
 
Table 4 and Figure 2 show the number of instances per 
activity in the PAMAP2 dataset. Despite the imbalanced 
distribution, both the training and testing sets contain 
instances for all activities, ensuring that the model can be 
evaluated on the entire range of activities present in the 
dataset. 
 
Table 4:  PAMAP2 dataset instances per activity data 
distribution  
Activity No 
Class 
id 
Activity label #Instances 
1 0 Lying 142931 
2 1 Sitting 83738 
3 2 Standing 99973 
4 3 Walking 122906 
5 4 Running 43050 
6 5 Cycling 91340 
7 6 Nordic Walking 111832 
12 7 Ascending stairs 59314 
13 8 Descending stairs 46830 
16 9 Vacuum cleaning 86959 
17 10 Ironing 125228 
24 11 Rope jumping 15453 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 2:   Instances per activity data distribution in 
PAMAP2 dataset  
 
3.2   Long short-term memory (LSTM) 
LSTM networks are a subset of recurrent neural networks 
(RNNs) and play a crucial role in time series applications, 
especially in Human Activity Recognition (HAR). HAR 
categorizes activities based on sensor data like 
accelerometer and smartphone gyroscope readings. The 
effectiveness of LSTM networks in HAR stems from their 
ability to effectively capture and represent long-term 
dependencies inherent in sensor data [23]. 
Figure 3 shows the internal structure of LSTM consists of 
several components, including: 
1)Input gate: (it) This gate manages the flow of 
information from the input to the memory cell. It consists 
of a sigmoid activation function that produces an output 
value ranging from 0 to 1, determining the extent to which 
the input should be allowed to pass through. 
i t = σ (W i xt + U i ht -1 + b i) (Equation 1) 
 
2)Forget gate: (ft) This gate manages the flow of 
information from the previous memory cell to the current 
memory cell. Additionally, it consists of a sigmoid 
activation function that produces an output value between 
0 and 1, denoting the degree to which the previous 
memory cell should be forgotten. 
 
𝑓 𝑡 = 𝜎 (𝑊 𝑓 ∗ 𝑥 𝑡 + 𝑈 𝑓 ∗ ℎ
𝑡 −1
 + 𝑏 𝑓 ) (Equation 2) 
 
3) Output gate (Ot): This gate manages the flow of 
information from the memory cell to the output. It 
consists of a sigmoid activation function that generates an 
output value between 0 and 1, signifying the proportion 
of the memory cell to be output. 
 
o t = σ (W o xt + U o ht -1 + b o) (Equation 3) 
 
4) Temporary Cell Content (˜Ct): A candidate vector of 
new cell content that can be added to the cell state. 
 
˜C t = tanh (W c xt + U c ht-1 + b c) (Equation 4) 
 
5)Cell state(c t): This is the internal state of the memory 
cell that is updated based on the input gate, the forget gate, 
and the memory cell. 
 
  c t = f t ct -1+ i t ˜C t (Equation 5) 
 
6)Hidden State (h t): The output of the LSTM cell is a 
filtered version of the cell state. 
 
ℎ
𝑡 = 𝜎 𝑡 ∗ 𝑡𝑎𝑛 ℎ(𝑐 𝑡 ) (Equation 6) 
 
In the above equations: 
   - x t is the input at time step t. 
   - ht-1 is the previous hidden state (output) of the LSTM 
at time step t-1. 
   - σ represents the sigmoid activation function. 
   - W f, W i, W o, W c are weight matrices for the input. 
   - U f, U i, U o, U c are weight matrices for the previous 
hidden state. 
   - b f, b i, b o, b c are biased terms. 
   - tanh represents the hyperbolic tangent activation 
function. 
By employing these gate operations along with the 
memory cell, LSTM can adeptly capture long-range 
Figure.3.  The Internal Structure of The LSTM Cell[31] 
 
0
20000
40000
60000
80000
100000
120000
140000
160000
Number of instances 
Activities
Optimizing Deep LSTM Model through Hyperparameter Tuning…                                          Informatica 47 (2023) 109–122   115 
 
dependencies in sequential data and make precise 
predictions for future time steps. 
LSTM Process for HAR:  
1) Data Preparation: The initial step involves 
preprocessing the raw sensor data to extract essential 
features like acceleration, velocity, and orientation. 
Subsequently, the data is partitioned into fixed-length 
sequences, each representing a distinct activity. 
2)Input Encoding: To facilitate input into the LSTM 
network, sensor data sequences undergo encoding. This 
typically entails transforming the data into a three-
dimensional tensor with dimensions (samples, time steps, 
features). 
3)Model Training: Next, the LSTM network is trained on 
the encoded sensor data to discern patterns and 
correlations between input sequences and their respective 
activity labels. During Training, the LSTM network's 
internal state is continually updated based on the input 
sequence, enabling predictions about the activity label. 
4) Model Prediction: Once the LSTM network completes 
its Training, it becomes equipped to predict activity labels 
for new sensor data sequences. These sequences are fed 
into the LSTM network, which dynamically updates its 
internal state according to the input and produces the 
anticipated activity label. 
3.3   Hyperparameter tuning with bayesian 
optimization 
Hyperparameters are critical parameters in deep learning 
approaches as they directly influence the behavior of 
training algorithms and substantially impact the 
performance of deep learning models. Bayesian 
optimization emerges as a practical and efficient method 
for solving function optimization problems prevalent in 
computing, especially when seeking optimal model 
configuration. This approach is particularly suited for 
tackling related-function problems characterized by the 
absence of a closed analytical form. Bayesian 
optimization proves applicable to addressing various 
related function challenges, including computationally 
demanding tasks, intricate derivative evaluations, and 
non-convex functions [24] [25]. 
To use Bayesian optimization for time series problems 
and sensor HAR LSTM, the following steps can be 
followed: 
1. Define the search space: Specify the hyperparameters 
to be optimized, such as the number of LSTM layers, 
the number of hidden units, the learning rate, the 
dropout rate, etc. 
2. Define the objective function: This function evaluates 
the performance of the LSTM model using the given 
hyperparameters. 
3. Initialize the Bayesian optimization algorithm: Set the 
initial hyperparameter values and corresponding 
objective function values. 
4. Iterate the optimization process: The Bayesian 
optimization algorithm leverages a probabilistic model 
and an acquisition function to determine the subsequent 
set of hyperparameters for evaluation. Subsequently, the 
objective function is assessed with these 
hyperparameters, and the outcomes are utilized to 
update the probabilistic model. 
5. Repeat step four until convergence: The optimization 
procedure persists until a specified stopping condition is 
satisfied, such as completing a predetermined number 
of iterations or achieving a targeted level of 
performance.  
 
3.4   Batch normalization  
Batch normalization is a technique utilized in deep 
learning, including LSTM networks, to address the 
internal covariate shift problem during Training. It 
normalizes each layer's inputs in a mini-batch, making the 
data more centered around zero with unit variance. This 
leads to improved training stability, faster convergence, 
and reduced sensitivity to weight initialization. Batch 
normalization is a crucial tool for enhancing the 
efficiency and accuracy of neural network models, 
including those used in HAR and time-series 
classification tasks[26].  
 
3.5   Validation protocol 
The K-fold cross-validation protocol is a widely 
employed technique in machine learning for assessing 
model performance. It entails partitioning the dataset into 
subsets or folds [27]. In this study, we use the 10-fold 
cross-validation. The model is trained on nine folds in 
each iteration and then validated on the remaining fold. 
This process is repeated multiple times to ensure robust 
evaluation. This process is repeated ten times, and 
performance metrics are averaged across iterations to 
obtain an overall performance estimate. In this study, we 
use this validation protocol to provide a more robust and 
less biased assessment of the model's ability to generalize 
to new data, making efficient use of available data for 
evaluation.  
 
3.6   Evaluation metrics 
The experiment used various evaluation metrics to assess 
the HAR model's performance. These metrics included 
accuracy, F1 score, precision, Recall, and the confusion 
matrix. Accuracy measures the general correctness of the 
model's predictions. The F1 score balances precision and 
Recall. Precision assesses the accuracy of positive 
predictions made by the model, whereas Recall evaluates 
the model's capability to identify positive instances 
correctly. The confusion matrix offers a comprehensive 
view of the model's performance across various classes, 
providing insights into true positive, true negative, false 
positive, and false negative classifications. Together, 
these metrics comprehensively evaluate the HAR model's 
accuracy, reliability, and predictive capabilities for 
various human activities. Table 5 summarizes these 
evaluation metrics, including accuracy, precision, Recall, 
and F-measure. Understanding these performance metrics 
requires knowledge of four fundamental terms used in 
their measurement: true positive (TP), true negative (TN), 
false positive (FP), and false negative (FN). [28] 
 
 
 
 
 
116   Informatica 47 (2023) 109–122                                                                                                                            M.E. Ghazi et al. 
 
Table 5:  Evaluation metrics [28] 
Metric Formula Definition 
Accuracy 
𝑡𝑝 + 𝑡𝑛
𝑡𝑝 + 𝑡𝑛 + 𝑓𝑝 + 𝑓𝑛
 
the ratio of correct predictions 
and overall predictions 
Precision  
𝑡𝑝
𝑡𝑝 + 𝑓𝑝
 
The ratio of correct 
predictions to the total 
predicted 
Recall of 
sensitivity 
𝑡𝑝
𝑡𝑝 + 𝑓𝑛
 
the ratio of correct predictions 
to the samples in the actual 
class 
F1 score / 
F-measure 
2(𝑟𝑒𝑐𝑎𝑙𝑙 ∗ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 )
𝑟𝑒𝑐𝑎𝑙𝑙 + 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 
The weighted average of 
precision and Recall if the 
data is imbalanced 
4  Experiments and results 
4.1 Experimental design 
The experimental design of this study is structured to 
systematically investigate the effectiveness of optimizing 
the LSTM-based proposed model for sensor-based 
Human Activity Recognition (HAR) in smart homes 
through hyperparameter tuning. The research questions 
(RQ) and corresponding hypotheses(H) guide the study's 
objectives and validate the proposed model's 
performance. 
RQ1: How does hyperparameter tuning impact the 
performance of LSTM models in sensor-based Human 
Activity Recognition? 
H1: Systematic hyperparameter tuning significantly 
enhances the accuracy and robustness of LSTM models 
compared to default configurations 
RQ2: How does the proposed optimized LSTM model 
perform compared to previous Studies?  
H2: The optimized LSTM model will outperform other 
models in terms of accuracy, precision, Recall, and F1 
score. 
RQ3: How does the inclusion of batch normalization in 
the LSTM model affect its convergence speed and overall 
performance in Human Activity Recognition tasks? 
H3: The addition of batch normalization will contribute 
to faster convergence and improved model performance 
by mitigating internal covariate shifts. 
RQ4: How applicable are the findings of our optimized 
LSTM model to real-life smart home environments, 
considering practical challenges and variations in user 
behaviors? 
 H4: The model's performance will remain robust in real-
world scenarios, offering practical implications for smart 
home applications. 
4.2 Experimental environment and hyperparameter 
optimization setup 
This section presents the results obtained from the 
experiments performed on an NVIDIA GPU T4 using the 
Google Colab platform. The LSTM network's 
hyperparameters were optimized through Bayesian 
Hyperparameter Optimization, utilizing the Keras Tuner 
library[29]. The experiment setup is detailed in Table 6. 
 
 
Table 6: Experiment environment setup  
Platform Google Colab 
GPU NVIDIA GPU T4 
RAM 15 GB 
Tenserflow version 2.12.0 
Keras Version 2.12.0 
Keras Tuner 
Version 
1.3.5  
 
4.3 The proposed model 
Our proposed model is an LSTM-based HAR (as shown 
in Figure 4). The model consists of three LSTM layers, 
which are particularly effective for capturing sequential 
patterns in time-series data. 
Dropout layers are incorporated between the LSTM 
layers to prevent overfitting and enhance generalization. 
Dropout randomly deactivates certain neurons during 
Training, effectively reducing the model's reliance on 
specific features and encouraging more robust learning. 
Additionally, batch normalization is applied to stabilize 
the training process by normalizing the input to each 
layer. This ensures a more consistent and faster 
convergence during Training. The combination of 
dropout and batch normalization contributes to the 
model's ability to handle the complexity of sensor data, 
leading to improved performance in classifying human 
activities accurately. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 4:  Structure of the proposed model 
 
4.4 Experiments process   
The experiment is conducted in three distinct stages: data 
preprocessing, Training, and hyperparameter tuning, and 
model Evaluation. 
4.4.1. Data preprocessing 
In the experiment's data preprocessing stage, the raw 
sensor data obtained from wearable devices is prepared 
for the proposed model. The data is first cleaned by 
dropping irrelevant orientation columns and removing 
transient activity rows. Non-numeric data is converted to 
Optimizing Deep LSTM Model through Hyperparameter Tuning…                                          Informatica 47 (2023) 109–122   117 
 
numeric, and missing values are interpolated to ensure 
data completeness. The data is then scaled to normalize 
the input features, enabling uniformity in the data 
distribution. 
Next, the labels are encoded and converted into 
categorical variables. This step is crucial for classifying 
the activities during model training. The data is then split 
into training and testing sets, with 70% of the data used 
for Training and 30% for testing. The data is segmented 
into overlapping windows to facilitate the LSTM model's 
input format. The window size is set to 1s, and 
overlapping is 50 %. 
The data segmentation stage creates segments and 
corresponding labels for Training and testing. The 
segments and labels are reshaped to ensure compatibility 
with the LSTM model's input format. Finally, the 
experiment confirms the shape of the training and testing 
segments before proceeding to the model training and 
evaluation stages (demonstrated in Table 3). 
 
4.4.2. Training and hyperparameter tuning  
This stage aims to optimize the model's performance. 
Hyperparameter tuning is conducted using Bayesian 
optimization, intelligently training the model while 
searching for the best combination of hyperparameters. 
This fine-tuning process enhances accuracy and 
generalization in classifying human activities. 
 
Table 7:  Hyperparameter ranges for model optimization 
Hyperparameter   Range                                         
LSTM Units Integer from 64 to 256 with a step of 
32      
Dense Units Integer from 32 to 128 with a step of 
32      
Dropout Rate Float from 0.1 to 0.5 with a step of 
0.1      
Optimizer Choice of 'ADAM,' 'RMSprop', or 
'SGD' 
Learning Rate Choice of 1e-3, 1e-4, or 1e-5                  
Batch Size Choice of 32, 64, or 128                      
 
Table 7 summarizes the hyperparameter Ranges. These 
ranges represent the search space for hyperparameters 
during the Bayesian Hyperparameter Optimization 
process using the Keras Tuner. The models are fine-tuned 
by adjusting several critical hyperparameters: the LSTM 
Units, Dense Units, Dropout Rate, Optimizer, Learning 
Rate, and Batch Size. Through exploration and Tuning of 
the proposed models, this eventually results in improved 
accuracy and robust performance.  
In this study, the Bayesian Optimization process using the 
Keras Tuner library performs 10 trials to intelligently 
explore the hyperparameter space and identify the most 
optimal configurations for the proposed LSTM-based 
HAR model. Each trial involves tuning the 
hyperparameters while training the model for 50 epochs 
to ensure comprehensive learning and convergence. The 
combination of 10 trials and 50 epochs contributes to a 
thorough search and fine-tuning of the model, leading to 
improved accuracy and robust performance in HAR tasks. 
 Table 8 summarizes the hyperparameters found 
by the Keras Tuner library using Bayesian Optimization 
for the best LSTM-based HAR model. These 
hyperparameters include 64 LSTM units, 96 dense units, 
a dropout rate of 0.1, a batch size of 128, a learning rate 
of 0.001 with the ADAM optimizer, and 50 epochs for 
Training. These optimized hyperparameters lead to a 
well-balanced and efficient model that effectively 
classifies human activities with improved precision and 
performance. 
 
Table 8: The Summarized hyperparameters of the 
proposed model found by keras tuner 
Structure Hyperparameters 
LSTM Units 64 
Dense Units 96 
Dropout rate 0.1 
Training Hyperparameters 
Batch Size 128 
Learning rate 0.001 
Optimizer Adam 
Epochs 50 
Loss Function 
Cross-
entropy 
 
4.4.3. Model evaluation 
To evaluate the proposed model's performance on the 
PAMPA2 dataset, a 10-fold cross-validation approach 
was employed. The evaluation metrics used include 
accuracy, precision, Recall, and F1-score. These metrics 
were compared against those reported in previous 
literature studies conducted on the same dataset, enabling 
a comprehensive assessment of the proposed model's 
effectiveness and advancements in HAR. 
  
4.5 Experimental results  
In this section, we discuss the experimental results of the 
proposed method in terms of accuracy, F1 score, 
precision, and Recall. To prove the capability of our 
proposed model, we also compare its results with other 
approaches from numerous previous studies, as 
demonstrated in section 2.  
The performance of our proposed LSTM-based model 
was assessed using 10-fold cross-validation, as shown in 
Table 9. The results showed consistent and reliable 
performance across all folds, with a mean cross-validation 
score of 97.71% and a small standard deviation of +/- 0.4. 
The model achieved an average F1 score of 0.96660, 
accurately classifying positive and negative instances. 
The precision score averaged at 0.96855, reflecting the 
model's ability to correctly predict positive instances, 
while the average recall score was 0.96549, showing its 
capability to identify positive instances out of all actual 
positives. The model's accuracy ranged from 97.16% to 
98.54% across folds, with an average accuracy of 97.71%. 
The consistent high performance and minor variation in  
 
118   Informatica 47 (2023) 109–122                                                                                                                            M.E. Ghazi et al. 
 
these metrics indicate that the proposed LSTM-based 
HAR model effectively classifies human activities.
 
Table 9: The model performance on 10-fold cross-
validation. 
Fold Cross-
validation 
score 
F1 
score 
Precisio
n 
Recall Accurac
y 
Fold1 0.97365 0.96233 0.96435 0.96093 97.36% 
Fold2 0.97157 0.96127 0.96107 0.96171 97.16% 
Fold3 0.97641 0.96925 0.97149 0.96914 97.64% 
Fold4 0.97363 0.96312 0.96830 0.95911 97.36% 
Fold5 0.97363 0.96323 0.96558 0.96122 97.36% 
Fold6 0.98126 0.97075 0.97248 0.96923 98.13% 
Fold7 0.97918 0.96522 0.96495 0.96626 97.92% 
Fold8 0.98543 0.97364 0.97676 0.97135 98.54% 
Fold9 0.97918 0.96447 0.96671 0.96292 97.92% 
Fold1
0 
0.97710 0.97267 0.97382 0.97302 97.71% 
Mean 0.97.71 
+/- 0.4 
0.96660 0.96855 0.96549 97.71% 
 
The classification report Table 10 presents the evaluation 
metrics for a multi-class classification model. The report 
shows precision, Recall, and F1-score for each activity. 
Overall, the model demonstrates strong performance with 
an accuracy of 97%, indicating a high level of correct 
predictions across all classes.  
 
Table 10: Classification report of the proposed model 
 
The plot of accuracy and loss demonstrates the model's 
performance during Training, showcasing the increase in 
accuracy and decrease in loss over epochs, as shown in 
Figure 5. The confusion matrix of the proposed model is 
illustrated in Figure 6. 
 
 
Figure 5: Accuracy and loss training performance of our 
proposed model 
 
 
Figure 6: Confusion matrix of the proposed model. 
 
4.6 Comparative results analysis   
In this comparative results analysis, we evaluate the 
performance of our proposed LSTM-based model with 
hyperparameter tuning and batch normalization against a 
selection of previous studies in Human Activity 
Recognition (HAR), examining key metrics including 
accuracy, F1 score, precision, and recall as demonstrated 
in Table 11.
Optimizing Deep LSTM Model through Hyperparameter Tuning…                                          Informatica 47 (2023) 109–122   119 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Figure 7: Accuracy comparison of our proposed model against previous studies 
Table 11: Comparison with previous works 
Study Year Classification method Accuracy F1 score Precision Recall 
Hammerla et al[6] 2016 CNN - 93,70% - - 
Hammerla et al[6] 2016 LSTM - 92,90% - - 
Ma et al.[7] 2019 AttnSense - 89,30% - - 
Xu et al.[8] 2019 Inno HAR - 93,50% - - 
Wan et al[9] 2020 CNN 91,00% 91,16% 91,66% 90,85% 
Wan et al[9] 2020 LSTM 85,86% 85,34% 86,51% 84,67% 
Wan et al[9] 2020 Bi-LSTM 89,52% 89,40% 90,19% 89,02% 
Gao et al.[10] 2020 DanHAR 93,16% - - - 
Xu et al.[11] 2022 Inception-LSTM  95,04% 95,13% 95,06% 95,21% 
Thakur et al.[12] 2022 ConvAE-LSTM  94,33% 94,46% - - 
Tehrani et al.[13] 2023 Bi-LSTM -  93,41% 93,41% 93,47% 
Challa et al.[14] 2023 CNN+biLSTM+HP Tuning  94,91% - - - 
Kumar et al. [15] 2023 GRU 94,77% - - - 
Our proposed model  
An LSTM-based model with 
HP Tuning 
97,71% 96,66% 96,85% 96,55% 
120   Informatica 47 (2023) 109–122                                                                                                                            M.E. Ghazi et al. 
 
 
Figure 8: F1 score comparison of our proposed model against previous studies 
 
Figure 9: Precision comparison of our proposed model against previous studies 
 
Figure 10: Recall comparison of our proposed model against previous studies 
 
Optimizing Deep LSTM Model through Hyperparameter Tuning…                                          Informatica 47 (2023) 109–122   121 
 
Table 11 compares the performance of our proposed LSTM-
based model and previous studies on the PAMAP2 dataset. 
The results demonstrate the remarkable efficacy of our 
model, achieving an accuracy of 97.71%, an F1 score of 
96.66%, a precision of 96.855%, and a recall of 96.549%. 
These metrics collectively outperform all previously 
detailed state-of-the-art studies listed in Table 11. 
Particularly noteworthy is the improvement over the closest 
competitor, the Inception-LSTM by Xu et al. (2022) [11], 
with enhancements of approximately 2.67% in accuracy, 
1.53% in F1 score, 1.795% in precision, and 1.329% in 
Recall. Figures 7 to 10 present visualization charts 
comparing our model to previous studies regarding 
accuracy, F1 score, precision, and Recall, respectively. 
This considerable advancement underscores the 
effectiveness of our model's architecture, hyperparameter 
tuning, and batch normalization in pushing the boundaries 
of sensor-based Human Activity Recognition. The 
substantial improvement in accuracy holds significant 
implications for real-world applications, emphasizing the 
practical relevance of our contributions. 
4.7    Discussion  
In addressing the complex challenges of sensor-based 
Human Activity Recognition (HAR) for smart home 
environments, our study aimed to optimize Long Short-
Term Memory (LSTM) models through hyperparameter 
tuning, batch normalization, and rigorous evaluation and 
validation. We delve into the results, providing a detailed 
analysis to answer our research questions and validate our 
hypotheses. 
 
4.7.1. Impact of hyperparameter tuning  
Our investigation into the impact of hyperparameter tuning 
on our proposed LSTM model performance unveils 
nuanced insights into specific model parameters. The 
sensitivity of deep learning models, such as LSTM, to 
hyperparameter changes makes the manual search for 
optimal configurations challenging.  
Our study employed tools like Keras Tuner, leveraging 
Bayesian optimization to systematically fine-tune critical 
hyperparameters, including LSTM Units, Dense Units, 
Dropout Rate, Optimizer, Learning Rate, and Batch Size. 
The adjustments made in LSTM Units demonstrated a 
notable effect on the model's ability to capture long-term 
dependencies, contributing significantly to enhanced 
accuracy. Similarly, fine-tuning Dense Units allowed for a 
more nuanced representation of complex patterns within the 
sensor data, further improving the model's robustness. 
Our exploration of the Dropout Rate emphasized its impact 
on regularization, mitigating overfitting risks and 
promoting model generalization. The choice of Optimizer 
played a pivotal role in optimizing convergence speed and 
overall model performance, with the model showcasing 
superior results with the selected optimization strategy. 
Additionally, adjustments in Learning Rate and Batch Size 
showcased their influence on the model's learning dynamics 
and computational efficiency. Bayesian optimization 
through Keras Tuner facilitated an efficient search for the 
best configuration, considering the intricate interplay of 
these hyperparameters. 
The quantitative improvements observed across these 
individual parameters collectively underscore the 
importance of meticulous hyperparameter tuning. This 
validated our hypothesis and provided a granular 
understanding of how each parameter contributes to the 
overall robustness and accuracy of our LSTM-based model. 
 
4.7.2. The effect of batch normalization  
Our exploration of batch normalization's effects on the 
LSTM model reveals notable improvements. Batch 
normalization contributes to faster convergence and 
enhances overall model performance, mitigating internal 
covariates [26]. This technique normalizes the inputs of a 
layer during Training, leading to faster and more stable 
Training of deep neural networks. This aligns with our 
hypothesis and underscores the significance of    
normalization techniques in optimizing deep learning 
models for HAR tasks. 
4.7.2. Comparison with previous studies  
Quantitatively comparing our optimized LSTM model with 
state-of-the-art studies on the PAMAP2 dataset 
demonstrates its superior performance. The model excels in 
accuracy, precision, Recall, and F1 score, outperforming 
previous models. The model's mean accuracy across all 
folds was approximately 97.71%, with a small standard 
deviation of 0.4%. The mean F1-score was approximately 
96.66%, indicating a good balance between precision and 
Recall. Compared to previous literature, the proposed 
model outperformed most other approaches.  
The practical implications of our optimized LSTM model 
indicate that the model's robustness extends to real-world 
smart home scenarios, including applications in healthcare, 
fitness tracking, and human-computer interaction. When 
discussing potential applications, risks, and ethical 
implications, we recognize the need for ongoing ethical 
discourse in the rapidly evolving landscape of technology. 
4.7.4. Statistical analysis 
In this analysis, we opted for the Wilcoxon signed-rank test, 
a robust non-parametric method tailored for evaluating a 
single algorithm's performance across diverse studies [30]. 
This approach aligns with our objective of comparing the 
efficacy of your proposed model against other studies in a 
scenario marked by a single algorithm being tested across 
multiple contexts. The Wilcoxon test, being non-
parametric, is particularly advantageous for these analyses, 
as it eliminates the necessity for strict assumptions 
regarding the distribution of the data. The p-values obtained 
from the Wilcoxon test are instrumental in providing 
valuable indications about the statistical significance of the 
122   Informatica 47 (2023) 109–122                                                                                                                            M.E. Ghazi et al. 
 
observed differences [30]. With a conventional significance 
level of 0.05, the obtained p-value of 0.0079 for accuracy 
signifies a statistically significant distinction. This result 
underlines a significant divergence in the accuracy of your 
model compared to other studies. Similarly, the F1 score 
exhibits a p-value of 0.00195, emphasizing a substantial 
difference in your model's F1 score relative to the studies. 
Precision and Recall, while yielding p-values of 0.0625 
each, fall just short of conventional significance levels.  
The statistical analysis underscores the significant 
outperformance of your proposed model in terms of 
accuracy and F1 score. While precision and recall 
differences may not be statistically significant at the 0.05 
threshold, they signal intriguing nuances deserving further 
exploration, contributing to a comprehensive understanding 
of your model's comparative performance. 
In Summary, Our Proposed framework, deep multi-layer 
LSTM with Bayesian Optimization and batch 
normalization, achieved outstanding results in accurately 
classifying human activities using data from various 
wearable sensors. The three-stage experimental setup, 
including data preprocessing, hyperparameter tuning using 
Bayesian optimization, and model validation with 10-fold 
cross-validation, contributed to the model's robustness and 
effectiveness. The optimized hyperparameters, including 
LSTM units of 64, dense units of 96, and a dropout rate of 
0.1, were identified through Bayesian optimization. The 
model's mean accuracy across all folds was approximately 
97.71%, with a small standard deviation of 0.40%. The 
mean F1-score was approximately 0.9666, indicating a 
good balance between precision and Recall. Compared to 
previous literature, the proposed model outperformed most 
other approaches. The model's capability to handle multi-
sensor data and its successful hyperparameter tuning make 
it highly applicable in real-world scenarios. However, 
further testing on diverse datasets and real-world conditions 
is warranted to validate its generalizability. Overall, our 
proposed model showcases remarkable performance and 
potential for practical applications in various domains. 
 
5   Conclusion 
In conclusion, our research significantly advances the 
Human Activity Recognition (HAR) field using wearable 
sensor data, with a particular focus on smart home 
environments. Through an exhaustive review of the state of 
the art, we have presented a comprehensive understanding 
of existing methods, classification techniques, 
hyperparameter tuning approaches, findings, limitations, 
and future directions. 
Our proposed LSTM-based deep model, enhanced by batch 
normalization and hyperparameter tuning using Bayesian 
optimization, has demonstrated exceptional performance. 
Achieving an accuracy of 97.71% and impressive values for 
F1 score, precision, and Recall (approximately 96.66%, 
96.85%, and 96.55%, respectively), our model outperforms 
previous studies, underscoring the crucial role of 
hyperparameter optimization in activity classification. 
Looking ahead, we aim to evaluate our model further on 
diverse datasets such as OPPORTUNITY and WISDM to 
enhance its generalization capabilities. Our commitment to 
ongoing optimization involves exploring more complex 
deep model architectures and alternative hyperparameter 
tuning approaches. This pursuit aligns with our goal of 
maximizing efficiency and adaptability in real-world 
scenarios. 
 
References  
[1] S. S. Zhang et al., “Deep Learning in Human 
Activity Recognition withWearable Sensors: A 
Review on Advances,” Sensors, vol. 22, no. 4, p. 
1476, Feb. 2022. 
https://doi.org/10.3390/s22041476 
[2] M. Gochoo, F. Alnajjar, T. H. Tan, and S. Khalid, 
“Towards privacy‐preserved aging in place: A 
systematic review,” Sensors, vol. 21, no. 9, p. 3082, 
Apr. 2021. https://doi.org/10.3390/s21093082 
[3] A. Kristoffersson and M. Lindén, “A systematic 
review on the use of wearable body sensors for 
health monitoring: A qualitative synthesis,” Sensors 
(Switzerland), vol. 20, no. 5, p. 1502, Mar. 2020. 
https://doi.org/10.3390/s20051502 
[4] M. M. Islam, S. Nooruddin, F. Karray, and G. 
Muhammad, “Human activity recognition using 
tools of convolutional neural networks: A state of 
the art review, data sets, challenges, and future 
prospects,” Comput. Biol. Med., vol. 149, no. 
August, p. 106060, Oct.2022. 
https://doi.org/10.1016/j.compbiomed.2022.10606
0 
[5] Y. Zhang, I. D’haeseleer, J. Coelho, V. Vanden 
Abeele, and B. Vanrumste, “Recognition of 
bathroom activities in older adults using wearable 
sensors: A systematic review and 
recommendations,” Sensors, vol. 21, no. 6, pp. 1–
23, Mar. 2021. 
https://doi.org/10.3390/s21062176 
[6] N. Y. Hammerla, S. Halloran, and T. Plötz, “Deep, 
convolutional, and recurrent models for human 
activity recognition using wearables,” IJCAI Int. Jt. 
Conf. Artif. Intell., vol. 2016-Janua, pp. 1533–1540, 
2016. 
https://doi.org/10.48550/arXiv.1604.08880 
[7] H. Ma, W. Li, X. Zhang, S. Gao, and S. Lu, 
“Attnsense: Multi-level attention mechanism for 
multimodal human activity recognition,” IJCAI Int. 
Jt. Conf. Artif. Intell., vol. 2019-Augus, pp. 3109–
3115, 2019. 
https://doi.org/10.24963/ijcai.2019/431 
[8] C. Xu, D. Chai, J. He, X. Zhang, and S. Duan, 
“InnoHAR: A deep neural network for complex 
human activity recognition,” IEEE Access, vol. 7, 
no. c, pp. 9893–9902, 2019. 
Optimizing Deep LSTM Model through Hyperparameter Tuning…                                          Informatica 47 (2023) 109–122   123 
 
https://doi.org/10.1109/ACCESS.2018.2890675 
[9] S. Wan, L. Qi, X. Xu, C. Tong, and Z. Gu, “Deep 
Learning Models for Real-time Human Activity 
Recognition with Smartphones,” Mob. Networks 
Appl., vol. 25, no. 2, pp. 743–755, Apr. 2020. 
https://doi.org/10.1007/s11036-019-01445-x 
[10] W. Gao, L. Zhang, Q. Teng, J. He, and H. Wu, 
“DanHAR: Dual Attention Network for multimodal 
human activity recognition using wearable 
sensors,” Appl. Soft Comput., vol. 111, pp. 1–11, 
Jun. 2021. 
https://doi.org/10.1016/j.asoc.2021.107728 
[11] Y. Xu and L. Zhao, “Inception-LSTM Human 
Motion Recognition with Channel Attention 
Mechanism,” Comput. Math. Methods Med., vol. 
2022, 2022. 
https://doi.org/10.1155/2022/9173504 
[12] D. Thakur, S. Biswas, E. S. L. L. Ho, and S. 
Chattopadhyay, “ConvAE-LSTM: Convolutional 
Autoencoder Long Short-Term Memory Network 
for Smartphone-Based Human Activity 
Recognition,” IEEE Access, vol. 10, pp. 4137–
4156, Jun. 2022. 
https://doi.org/10.1109/ACCESS.2022.3140373 
[13] A. Tehrani, M. Yadollahzadeh-Tabari, A. Zehtab-
Salmasi, and R. Enayatifar, “Wearable Sensor-
Based Human Activity Recognition System 
Employing Bi-LSTM Algorithm,” Comput. J., no. 
April, 2023. 
https://doi.org/10.1093/comjnl/bxad035 
[14] S. K. Challa, A. Kumar, V. B. Semwal, and N. Dua, 
“An optimized deep learning model for human 
activity recognition using inertial measurement 
units,” Expert Syst., vol. 40, no. 10, p. e13457, Dec. 
2023. 
https://doi.org/10.1111/exsy.13457 
[15] P. Kumar and S. Suresh, “RecurrentHAR: A Novel 
Transfer Learning-Based Deep Learning Model for 
Sequential, Complex, Concurrent, Interleaved, and 
Heterogeneous Type Human Activity 
Recognition,” IETE Tech. Rev., vol. 40, no. 3, pp. 
312–333, May 
2023.https://doi.org/10.1080/02564602.2022.2101
557 
[16] S. Kalabakov, M. Gjoreski, H. Gjoreski, and M. 
Gams, “Analysis of deep transfer learning using 
deepconvlstm for human activity recognition 
fromwearable sensors,” Informatica, vol. 45, no. 2, 
pp. 289–296, 2021. 
https://doi.org/10.31449/inf.v45i2.3648 
[17] H. A. Imran, “Khail-Net: A Shallow Convolutional 
Neural Network for Recognizing Sports Activities 
Using Wearable Inertial Sensors,” IEEE Sensors 
Lett., vol. 6, no. 9, 2022. 
https://doi.org/10.1109/LSENS.2022.3197396 
[18] R. Piltaver, B. Cvetkovic, and B. Kaluža, 
“Denoising human-motion trajectories captured 
with ultra-wideband real-time location system,” 
Informatica, vol. 39, no. 3, pp. 311–322, 2015. 
[19] M. Luštrek and B. Kaluža, “Fall detection and 
activity recognition with machine learning,” 
Informatica, vol. 33, no. 2, pp. 205–212, 2009. 
[20] D. Roggen et al., “Collecting complex activity 
datasets in highly rich networked sensor 
environments,” INSS 2010 - 7th Int. Conf. 
Networked Sens. Syst., pp. 233–240,2010.  
https://doi.org/10.1109/INSS.2010.5573462 
[21] Kwapisz, J. R., Weiss, G. M., & Moore, S. A. 
(2011). Activity recognition using cell phone 
accelerometers. ACM SigKDD Explorations 
Newsletter, 12(2), 74-82. 
[22] “PAMAP2 Physical Activity Monitoring - UCI 
Machine Learning Repository.” 
https://archive.ics.uci.edu/dataset/231/pamap2+ph
ysical+activity+monitoring (accessed Jul. 21, 
2023). 
[23] S. Hochreiter and J. Schmidhuber, “Long Short-
Term Memory,” Neural Comput., vol. 9, no. 8, pp. 
1735–1780,Nov.1997. 
https://doi.org/10.1162/NECO.1997.9.8.1735 
[24] J. Suto, “The effect of hyperparameter search on 
artificial neural network in human activity 
recognition,” Open Comput. Sci., vol. 11, no. 1, pp. 
411–422, 2021. 
https://doi.org/10.1515/comp-2020-0227 
[25] S. Raziani and M. Azimbagirad, “Deep CNN 
hyperparameter optimization algorithms for sensor-
based human activity recognition,” Neurosci. 
Informatics, vol. 2, no. 3, p. 100078, Sep. 2022. 
https://doi.org/10.1016/j.neuri.2022.100078 
[26] S. Ioffe and C. Szegedy, “Batch normalization: 
Accelerating deep network training by reducing 
internal covariate shift,” 32nd Int. Conf. Mach. 
Learn. ICML 2015, vol. 1, pp. 448–456, 2015. 
[27] X. Zhang and C. A. Liu, “Model averaging 
prediction by K-fold cross-validation,” J. Econom., 
vol. 235, no. 1, pp. 280–301, 2023. 
https://doi.org/10.1016/j.jeconom.2022.04.007 
[28] M. El Ghazi and N. Aknin, “A Comparison of 
Sampling Methods for Dealing with Imbalanced 
Wearable Sensor Data in Human Activity 
Recognition using Deep Learning”, vol. 14, no. 10, 
pp. 290–305, 2023. 
https://doi.org/10.14569/IJACSA.2023.0141032 
[29] “KerasTuner.” https://keras.io/keras_tuner/ 
(accessed Jul. 23, 2023). 
[30] R. Woolson, “Wilcoxon Signed-Rank Test”, Wiley 
Encyclopedia of Clinical Trials,2008. 
https://doi.org/10.1002/9780471462422.eoct979 
[31] “File:LSTM cell.svg - Wikimedia Commons.” 
https://commons.wikimedia.org/wiki/File:LSTM_c
ell.svg (accessed Jul. 23, 2023). 
124   Informatica 47 (2023) 109–122                                                                                                                            M.E. Ghazi et al.