https://doi.org/10.31449/inf.v48i16.6461 Informatica 48 (2024) 27–36 27 
Software Test Data Management Based on Knowledge Graph 
Li Gao*, Junlin Qiu, Guanhua Chen 
Faculty of Computer and Software Engineering, Huaiyin Institute of Technology, Huai’an 223003, China 
E-mail: gaoli_edu@outlook.com 
*
Corresponding Author 
Keywords: knowledge graph, software testing, data management 
Received: June 18, 2024 
As software development models and methods mature, large-scale software systems emerge. However, a 
critical challenge remains: the lack of a comprehensive software test data management model that 
integrates basic data management with advanced knowledge reasoning. To address this issue, we 
developed a software test data management model based on knowledge graphs, enabling intelligent 
management and reasoning of software test data. The model incorporates an entity extraction model based 
on a feed-forward neural network, a knowledge graph integration method based on graph databases, and 
a knowledge reasoning submodule based on deep learning. To validate the effectiveness of our model, we 
evaluated the performance of each component individually. Our deep learning-based entity extraction 
model achieved an accuracy of 0.92, a recall of 0.88, and an F1 score of 0.90, significantly outperforming 
traditional methods such as regular expressions and dictionary-based approaches. Utilizing Cypher for 
graph database querying, our system provides accurate answers with a response time of 0.12 seconds, 
outperforming SQL and SPARQL-based querying methods. Furthermore, our approach excels in 
knowledge-based reasoning with an accuracy of 0.89 and site coverage of 0.81, surpassing both ontology-
based and graph-based reasoning methods. These results highlight the enhanced construction, querying, 
and reasoning capabilities of our knowledge graph-based approach for managing software testing data. 
Povzetek: Članek opisuje nov model za upravljanje testnih podatkov programske opreme, ki temelji na 
grafu znanja. Omogoča inteligentno organizacijo, shranjevanje in razumevanje testnih podatkov s 
pomočjo globokega učenja ter učinkovitejše iskanje in sklepanje v primerjavi s tradicionalnimi metodami. 
 
1 Introduction 
Software testing data management refers to the activities 
of effective organization, storage, maintenance, and 
utilization of these data, which aims to improve the 
efficiency and quality of software testing and to reduce the 
cost and risk of software testing [1]. These activities also 
support the automation and intelligence of software testing 
[2]. Software test data management, as an important part 
of software testing, has been receiving attention from both  
 
academia and industry. At present, there have been many 
methods and tools for software test data management 
proposed and developed, such as database-based methods, 
XML-based methods, ontology-based methods, cloud-
based methods, etc., and the market share of these methods 
this year is specifically shown in Figure 1. These methods 
and tools address some of the challenges of software test 
data management, such as data normalization, 
consistency, traceability, reusability, security, etc. to some 
extent [3].
 
 
Figure 1: Change in market share of methods and tools for software test data management. 
4,3
2,5
3,5
4,5
2,4
4,4
1,8
2,8
2
2
3
5
0 1 2 3 4 5 6
Database-based approaches
XML-based approaches
Ontology-based approaches
Cloud Computing Based Approach
Rate
Methods
2021 2020 2019
28 Informatica 48 (2024) 27–36 L. Gao et al. 
 
 
Figure 2: Comparison of approaches to software test data management. 
 
Software testing data contains a lot of unstructured 
data, and these unstructured data also have important 
value in software testing, such as for test requirement 
analysis, test case generation, test result evaluation, etc. 
On the other hand, traditional tools often can only realize 
data storage and query, but lack of semantic understanding 
of data and reasoning ability, which can not meet the 
intelligent needs of software testing, such as data-based 
knowledge discovery, knowledge sharing, knowledge 
application and so on. The comparison between traditional 
methods of software testing data management and 
intelligent methods based on knowledge graph is shown in 
Figure 2 [4, 5]. 
2 Literature review 
Software test data management plays a crucial role in 
software engineering, which covers a wide range of 
aspects from acquisition, processing, storage to migration, 
protection and utilization. This makes the process of data 
extraction, processing, storage, and migration time-
consuming and resource-intensive, as well as increasing 
technical difficulty and complexity [6, 7]. Secondly, due 
to the diverse sources of software testing data, such as 
manually generated, automatically collected, and 
externally imported, different data sources may have 
different data standards, formats, contents, and qualities, 
leading to data inconsistency, which brings difficulties 
and risks to software testing [8, 9]. In addition, software 
testing data may contain sensitive information, such as 
users’ personal information, business secrets, etc., which, 
if leaked or abused, may cause serious losses to users, 
enterprises and even national security. Therefore, the 
confidentiality, legality and security of data must be 
ensured [10]. Finally, there are still some problems in 
software testing data governance, such as the lack of 
unified standards and methods, which leads to obstacles 
and deficiencies in data management and utilization. In 
order to solve these problems, According to Durst and 
Zieba [11] suggests the need to optimize and improve the 
data testing strategy, process, use cases, execution, 
validation and summarization, and the use of professional 
testing tools and techniques to assist the data testing 
process. According to Ebert et al. [12] suggested the use 
of generative AI techniques to generate large amounts of 
synthetic data to address issues such as data volume, 
efficiency, coverage, and privacy, and the use of methods 
such as data analytics and machine learning to assess and 
improve data quality. According to Ekanayake et al. [13] 
emphasized the importance of establishing data 
governance connotations, elements, models, and 
frameworks to standardize aspects of data definition, 
classification, labeling, measurement, monitoring, and 
evaluation, and to develop data strategies, rules, standards, 
and processes for effective data management and 
utilization.In recent years, the construction of knowledge 
graphs has also made progress, specifically, According to 
Falát et al. [14] analyzed and sorted out the construction 
techniques of knowledge graphs and their combination 
with deep learning; According to Farooq [15] introduced 
common knowledge graph embedding models and 
analyzed the prospects of their application in interpretable 
prediction; These literatures provide valuable references 
and insights for the theory and application of knowledge 
graphs. 
Knowledge graph has significant advantages in 
software testing data management. First, it can provide a 
unified structured representation of structured, semi-
structured and unstructured software testing data, and 
construct a knowledge graph by extracting entities, 
attributes and relationships to achieve data normalization, 
consistency and reusability [16, 17]. Secondly, 
Knowledge Graph can provide semantic annotation and 
commentary for software testing data, and enhance the 
semantic information of the data by using knowledge 
resources such as ontology, lexicon, rules, etc., so as to 
realize the semanticization, comprehensibility and 
traceability of the data. In addition, Knowledge Graph can 
also mine implicit knowledge from data, such as 
correlation, anomalies, and data evolution through graph 
algorithms, machine learning, logical reasoning, and other 
techniques to achieve intelligence, predictability and 
optimization of data. The structure of knowledge graph is 
shown in Figure 3 [18, 19].
Traditional 
Methods
Intelligent 
approaches 
combining
Intelligent 
Processing
Ontology-based
Cloud-based
xml based
CNN
Feedforward 
neural networks
Graph database
Software 
testing process 
data
 Semi-
structured data 
Unstructured 
data
Structured data
Unprocessable
Unprocessable
Knowledge 
Discovery
Knowledge 
Sharing
Knowledge 
Application
Software Test Data Management Based on Knowledge Graph Informatica 48 (2024) 27–36 29 
 
 
Figure 3: Structure of the knowledge graph. 
 
At present, the research on software test data 
management based on knowledge graph is still in a blank 
state. This paper aims to fill this research gap and explore 
the application value and realization method of knowledge 
graph in software test data management [20]. The main 
research content includes designing and realizing a 
general software test data management knowledge graph 
model, defining entities, attributes and relationships as 
well as corresponding ontologies and rules [21]; proposing 
and realizing the knowledge extraction and knowledge 
integration process of extracting entities, attributes and 
relationships from different sources and types of software 
test data, and integrating them into the knowledge graph 
[22]; and finally designing and realizing a knowledge 
integration process that utilizes the knowledge graph’s 
data and knowledge in the Knowledge Graph to design 
and implement the knowledge query and knowledge 
reasoning process that provides intelligent support for 
various aspects of software testing requirement analysis, 
test case generation, and test result evaluation [23, 24].  
To address the challenges in software testing data 
management, various approaches have been proposed, as 
summarized in Table 1. Traditional data management 
relies on manual data handling and basic data storage and 
retrieval, which are familiar to practitioners and easy to 
implement but suffer from inconsistent data handling and 
require significant manual effort [11]. Rule-based data 
processing provides reliable data extraction for structured 
data and is easy to define rules but lacks flexibility for 
unstructured data and limited reasoning capabilities [12]. 
Dictionary-based data processing is suitable for known 
entities and offers fast data retrieval but results in 
incomplete data representation and no semantic 
enrichment [13]. 
Recent advances include the use of generative AI 
techniques for synthetic data generation and 
augmentation, which addresses data scarcity and enhances 
data diversity but is limited to data generation and lacks a 
data management framework [14]. Data analytics and 
machine learning offer automated insights and scalable 
processing but do not provide an integrated data 
management solution or semantic linking [15]. Data 
governance frameworks standardize data definitions and 
provide data classification and labeling, ensuring 
consistent data handling and enhancing data 
trustworthiness but do not offer a unified data 
management approach or integration with intelligent 
systems [16].
 
 
Software 
Test Data
Test 
Requirements
Test 
Objectives
Test 
Objects
Test 
Environme
nt
Test 
Strategy
Test Case Priority Use Case 
Number
Background
Test Script
Script 
Description
Script 
Number
Script 
Name
Script 
Code
30 Informatica 48 (2024) 27–36 L. Gao et al. 
 
Table 1: Comparison of state-of-the-art methods in software testing data management. 
Approach Key contributions Strengths Research gap addressed 
Traditional data 
management 
Manual data handlingBasic data 
storage and retrieval 
Familiarity among 
practitionersEase of 
implementation 
Inconsistent data 
handlingManual effort 
required 
Rule-based data 
processing 
Rule-based data extraction 
Reliable for structured 
dataEasy to define rules 
Inability to adapt to new 
data sourcesLimited 
reasoning capabilities 
Dictionary-
based data 
processing 
Dictionary lookup for data 
categorization 
Suitable for known 
entitiesFast data retrieval 
Incomplete data 
representationNo semantic 
enrichment 
Generative AI 
techniques 
Synthetic data generationData 
augmentation 
Addresses data 
scarcityEnhances data 
diversity 
Limited to data 
generationNo data 
management framework 
Data analytics 
and machine 
learning 
Data quality 
assessmentAutomated anomaly 
detection 
Automated insights 
Scalable processing 
No integrated data 
management solutionNo 
semantic linking 
Data governance 
frameworks 
Standardization of data 
definitionsData classification 
and labeling 
Consistent data 
handlingEnhanced data 
trustworthiness 
No unified data 
management approachNo 
integration with intelligent 
systems 
Knowledge 
graph-based 
approach 
(proposed) 
Unified structured 
representation of dataSemantic 
annotation and 
enrichmentIntelligent reasoning 
Comprehensive data 
managementSemantic 
consistencyIntelligent 
support 
Unified data management 
framework 
Integration of data and 
knowledgeIntelligent 
reasoning 
 
3 Knowledge graph-based software 
test data management modeling 
This chapter details our proposed and innovative 
knowledge graph-based software testing data management 
model [25]. The core idea of the model is to utilize the 
powerful expression and reasoning ability of knowledge 
graph to effectively organize, manage and apply all kinds 
of data in the software testing process. Its principle is 
mainly to transform all kinds of complex test data into 
forms that are easy to understand and process by 
constructing a knowledge graph containing software 
testing related knowledge, so as to realize intelligent 
management of test data [26, 27]. 
3.1 Modeling ideas 
The model idea of this paper is to consider the process of 
software test data management as a process of 
constructing and applying a knowledge graph, i.e., 
extracting entities, attributes and relationships from 
software test data, constructing a knowledge graph for 
software test data management, and then utilizing the data 
and knowledge in the knowledge graph to provide 
intelligent support for software testing. The modeling idea 
of this paper is based on the following facts: 
Firstly, software testing data contains rich knowledge, 
such as software testing requirements, use cases, results, 
etc., which can be represented in the form of entities, 
attributes and relationships, constituting a knowledge 
graph for software testing data management. Secondly, the 
knowledge graph of software testing data management can 
be structured, semanticized and intelligently processed to 
improve the quality and value of software testing data and 
provide effective support for all aspects of software testing 
[28]. Finally, the knowledge graph for software testing 
data management can be constructed from multi-source 
heterogeneous software testing data by knowledge 
extraction and knowledge integration methods, and can be 
intelligently applied by knowledge query and knowledge 
reasoning methods [29, 30].  
3.2 Modeling framework 
In this paper, we propose a framework for a software test 
data management model based on knowledge graph, as 
shown in Figure 4. The framework includes three main 
modules: knowledge graph construction module, 
knowledge graph storage module and knowledge graph 
application module [31]. 
The knowledge graph building module is the module 
responsible for extracting and integrating entities, 
attributes and relationships from software test data to build 
a knowledge graph for software test data management. 
Software Test Data Management Based on Knowledge Graph Informatica 48 (2024) 27–36 31 
The module includes two submodules: entity extraction 
submodule and knowledge graph integration submodule. 
The entity extraction submodule is a submodule that 
recognizes entities and their related attributes from 
different types and sources of software testing data using 
deep learning-based entity extraction methods [32]. The 
knowledge graph integration submodule is a submodule 
that utilizes a graph database-based knowledge graph 
integration approach to integrate entities, attributes and 
relationships extracted from software testing data into a 
unified knowledge graph for software testing data 
management [33]. 
 
 
Figure 4: Framework of the software test data 
management model. 
 
The Knowledge Graph Storage Module is the module 
responsible for storing and managing the knowledge graph 
of software test data management. The module uses a 
graph database as a storage for the knowledge graph, and 
utilizes the characteristics of a graph database, such as 
nodes, edges, labels, and attributes, to represent the 
entities, attributes, and relationships in the knowledge 
graph for software test data management, as well as the 
structure and semantics between them. 
The Knowledge Graph Application Module is the 
module responsible for providing intelligent support for 
all aspects of software testing by utilizing the knowledge 
graph of software testing data management. The module 
consists of two submodules: the knowledge query 
submodule and the knowledge reasoning submodule. The 
knowledge query submodule is a submodule that utilizes 
a query language for graphical databases, such as Cypher, 
to query the data and knowledge in the knowledge graph 
of software testing data management, and to achieve data 
retrieval and analysis. The Knowledge Reasoning 
submodule is a submodule that utilizes knowledge 
reasoning techniques, such as rule-based knowledge 
reasoning, graph-based knowledge reasoning, and 
learning-based knowledge reasoning, to derive implicit 
knowledge from the knowledge graph of software testing 
data management, and to realize knowledge discovery and 
application [34-36]. 
3.3 Modeling principles 
The model principle of this paper is based on the technique 
of knowledge graph, including knowledge extraction, 
knowledge integration, knowledge query and knowledge 
reasoning, to realize the construction and application of 
knowledge graph for software testing data management. 
The model schematic is specifically shown in Figure 5. 
Figure 5 shows a process of concept recognition and 
diagram construction. First of all, the input is "test data", 
after "encoder" processing to get "sentence vector". The 
sentence vector then interacts with the graph vector via a 
multilayer perceptron (MLP) to generate a feasibility 
score. At the same time, "sentence vectors" are also used 
for "concept recognition" and further converted into nodes 
in "graph construction". Finally, in the process of graph 
construction, a "pattern graph" is created using 
pathfinding methods. Convolutional neural networks 
(CNNs) may play a role in some aspects of this process. 
 
 
Figure 5: Model schematic. 
 
In our pursuit to automate entity extraction and 
knowledge integration for the creation of a structured 
knowledge graph, we have developed a sophisticated deep 
learning-based approach. This methodology leverages the 
strengths of both Convolutional Neural Networks (CNNs) 
and Recurrent Neural Networks, specifically Long Short-
Term Memory (LSTM) networks, to process and analyze 
textual data with the aim of extracting entities and their 
relationships from text. The architecture of our entity 
extraction model is meticulously designed, starting with 
text preprocessing to clean and tokenize the data, followed 
by the conversion of words into dense vector 
representations through word embeddings. CNNs are then 
applied to capture local features within sentences, with Bi-
LSTM layers subsequently employed to understand the 
long-range dependencies within the text. To ensure 
consistent entity tagging, we incorporate a Conditional 
Random Field (CRF) layer. The training process involves 
preparing labeled data, initializing the model with pre-
Knowledge graph building module
Knowledge graph storage module
Knowledge Graph Application Module
Integration 
Submodule
Entity 
Extraction
Graph database
Knowledge
32 Informatica 48 (2024) 27–36 L. Gao et al. 
trained embeddings, and iteratively refining the model 
through mini-batch training, loss calculation, and 
backpropagation. Model evaluation is conducted using 
precision, recall, and F1-score metrics, with 
hyperparameter tuning to optimize performance. Post-
entity extraction, we proceed with knowledge integration, 
which includes entity linking to the knowledge graph, 
relationship extraction, and graph construction. This 
integrated approach not only yields high accuracy in entity 
recognition but also facilitates the construction of a 
comprehensive and informative knowledge graph tailored 
for software testing data management. 
3.3.1 Entity extraction submodule 
In this paper, a deep learning based entity extraction 
approach is used to automatically extract entities and their 
related attributes from software test data using feed 
forward neural network models. It is based on the principle 
of setting D as software test data, E as entity, A as 
attribute, R as relationship, M as entity extraction model, 
T as data type, S as data source, O as output, F as feature, 
C as context, L as semantics, P as attention mechanism, B 
as bidirectional recurrent neural network, X as 
convolutional neural network, and Z as pre-trained 
language model, and then it can be expressed by Equation 
1 [37]. 
 
()
( ) ( ) ( ) ( )
( ) if T( ) .
( ) { ( ) if T( ) .
( ) if T( )
( ) ( ) ( ( ))
( ) ( )
( ) ( ) ( ( ))
B
X
Z
B
X
Z
O M D
M D E D A D R D
M D D Structuring
M D M D D Semi structured
M D D Unstructured
M D F D P C D
M D F D X
M D F D Z L D
=
=  
=
= = −
=
=
=
=
 (1) 
 
3.3.2 Knowledge integration submodule 
In this paper, we adopt a knowledge graph integration 
method based on graphical databases, which utilizes the 
characteristics of graphical databases to store and manage 
the knowledge graph for software test data management. 
For storing entities, this paper uses a graph database node 
to represent entities, each node contains a unique identifier 
(ID), one or more labels, and one or more Property-Value 
Pairs. For storing relationships, this paper uses edges from 
a graph database to represent relationships, where each 
edge contains a unique identifier (ID), a Type, and one or 
more Property-Value Pairs. In this paper, we use Cypher 
as the query language, which is a pattern-matching based 
query language for graph databases that can easily 
represent query patterns for graph structures, as well as 
operations such as filtering and aggregation of query 
results [38]. 
3.3.3 Knowledge reasoning submodule 
CNN-based knowledge inference submodule is another 
important component in the application of knowledge 
graph-based software test data management model, and its 
main function is to reason out the unknown data based on 
the existing data in the knowledge graph, so as to 
complement and extend the knowledge graph. The basic 
principle of CNN-based knowledge inference submodule 
is as follows: 
For example, for the entity “Selenium” and the 
relationship “support”, it can be shown in Equation 2. 
 
 
 
  0.2, 0.5, 0.7, 0.1
  0.3, 0.4, 0.6, 0.2
Selenium
support
=−
= − −
 (2) 
 
For example, for the triad (Selenium, support, 
Automation Testing), the reasonableness can be calculated 
using a score function as shown in Equation 3: 
 
( )
( )
, ,  
     
score Selenium support AutomationTesting
f Selenium support AutomationTesting = + −
 (3) 
 
Where f is a nonlinear activation function such as 
sigmoid or tanh for mapping the score to a fixed interval 
such as [0, 1] or [-1, 1]. The model parameters can be 
optimized with a loss function as shown in Equation 4 for 
the known triad (Selenium, Support, Automation Testing) 
and the unknown triad (Selenium, Support, Unit Testing). 
 
( )
, , 
0,    
  
 , , 
Selenium support
gamma score
AutomationTesting loss max
score Selenium support UnitTesting
 
 




+
=
−
(4) 
4 Evaluation and validation of 
models 
In this paper, two publicly available datasets are used to 
evaluate the performance and effectiveness of the 
knowledge graph-based software testing data management 
model, namely (1) Software Testing Data Set. This dataset 
contains data from software testing projects from 2006 to 
2014, including test requirements, test cases, test results, 
test defects, etc., with nine tables and about 100,000 
records. This dataset can be used to construct a knowledge 
graph for software testing data management, as well as 
experiments for knowledge query and reasoning. (2) 
Software Engineering Data Set: This dataset contains the 
data of software engineering projects from 2010 to 2018, 
including software requirements, software design, 
software code, software defects, etc. This dataset can be 
used for experiments of knowledge fusion and extension 
with the knowledge graph of software test data 
management. 
The Software Testing Data Set, gathered from real-
world projects between 2006 and 2014, encompasses 
approximately 100,000 records across nine tables, 
detailing test requirements, cases, results, and defects, 
along with additional contextual data. It's utilized to build 
Software Test Data Management Based on Knowledge Graph Informatica 48 (2024) 27–36 33 
a knowledge graph for testing data management and 
supports research in knowledge query and reasoning. 
Similarly, the Software Engineering Data Set, spanning 
2010 to 2018, contains around 200,000 records in 12 
tables, covering software requirements, design, code, and 
defects, plus extra metadata. It facilitates knowledge 
fusion and extends the testing data management 
knowledge graph, aiding in integrating testing with 
broader software engineering processes. Both datasets are 
publicly accessible and have been pivotal in academic 
research, ensuring the reproducibility of experimental 
results. Researchers can access these datasets through the 
provided links, adhering to the respective data usage 
guidelines. 
In this paper, the following evaluation metrics are 
used to measure the performance and effectiveness of the 
knowledge graph-based software test data management 
model, which are accuracy of knowledge graph 
construction, recall of knowledge graph construction, F1 
value of knowledge graph construction, accuracy of 
knowledge query, response time of knowledge query 
Accuracy of knowledge reasoning, coverage of 
knowledge reasoning [39]. 
In this paper, a deep learning-based entity extraction 
method and a graph database-based knowledge graph 
integration method are used to construct a knowledge 
graph for software test data management from a software 
test dataset. The experimental results are shown in Table 
2. 
 
Table 2: Experimental results of knowledge graph 
construction. 
Methodologies Accuracy 
Recall 
rate 
F1 
value 
Deep learning based 
approach 
0.92 0.88 0.90 
Regular expression 
based approach 
0.78 0.71 0.74 
Dictionary-based 
approach 
0.68 0.63 0.65 
 
As can be seen from Table 2, the deep learning-based 
approach outperforms the other two rule-based 
approaches in terms of accuracy, recall, and F1 value of 
knowledge graph construction, indicating that the deep 
learning-based approach can more effectively extract 
entities, attributes, and relationships from software test 
data and construct a more complete and accurate 
knowledge graph [40]. 
In order to validate the capability of the system for 
knowledge querying, this paper uses a graph database 
based query language, such as Cypher, to retrieve relevant 
answers from the knowledge graph of software test data 
management. In this paper, 20 natural language questions 
of different types and difficulty, involving entity queries, 
relational queries, path queries, and aggregation queries, 
are designed as a test set. In this paper, the accuracy and 
response time of the knowledge queries are calculated 
using the correct answers given manually as a reference. 
The experimental results are shown in Table 2. 
As can be seen from Table 3, the graph database-
based query language outperforms the other two relational  
 
 
 
database-based query languages in terms of knowledge 
query accuracy and response time, which indicates that the 
graph database-based query language can retrieve the 
relevant answers from the knowledge graph of software 
test data management more efficiently, and improves the 
efficiency and quality of the retrieval [41, 42]. 
In this paper, we have used a part of the data from the 
software test dataset as a training set and another part of 
the data as a test set for training knowledge-based 
reasoning models and evaluating the effectiveness of 
knowledge-based reasoning, respectively. In this paper, 
the accuracy and coverage of knowledge reasoning is 
calculated using the correct knowledge given manually as 
a reference. This paper is also compared with ontology-
based knowledge reasoning and graph-based knowledge 
reasoning, and the results are shown in Table 3. 
 
Table 3: Experimental results of knowledge query. 
Methodologies Accuracy 
Response time 
(seconds) 
A query language based 
on graph databases 
0.95 0.12 
Relational database 
based SQL 
0.85 0.23 
SPARQL based on 
relational databases 
0.80 0.28 
 
Table 4: Experimental results of knowledge-based 
reasoning. 
Methodologies Accuracy 
Site 
coverage 
Deep learning based 
approach 
0.89 0.81 
Ontology-based approach 0.75 0.68 
A graph-based approach 0.69 0.62 
 
As can be seen from Table 4, the deep learning-based 
approach outperforms the other two rule-based 
approaches in terms of accuracy and coverage of 
knowledge inference, indicating that the deep learning-
based approach can be more effective in inference of 
unknown knowledge from the knowledge graph of 
software test data management, and in complementing and 
expanding the knowledge graph [43]. 
5 Discussion 
5.1 Entity extraction and knowledge graph 
construction 
The proposed deep learning-based entity extraction 
method outperforms the rule-based approaches, as shown 
in Table 5. The deep learning approach achieves higher 
accuracy, recall, and F1 value in constructing the 
34 Informatica 48 (2024) 27–36 L. Gao et al. 
knowledge graph, demonstrating its effectiveness in 
extracting entities, attributes, and relationships from 
software testing data. 
 
 
Table 5: Comparison of entity extraction and knowledge 
graph construction. 
Method Accuracy 
Recall 
rate 
F1 
value 
Deep Learning-
Based Approach 
0.92 0.88 0.90 
Regular 
Expression-Based 
0.78 0.71 0.74 
Dictionary-Based 0.68 0.63 0.65 
 
The superior performance of the deep learning-based 
approach can be attributed to its ability to learn complex 
patterns and relationships within the data, making it more 
adaptable to variations in the input data. 
5.2 Knowledge query and reasoning 
The graph database-based query language demonstrates 
superior performance over traditional relational database 
query languages, as illustrated in Table 5. The graph 
database approach achieves higher accuracy and faster 
response times, which are critical for efficient and high-
quality retrieval of information from the knowledge graph. 
 
Table 6: Comparison of knowledge query and reasoning. 
Method Accuracy 
Response time 
(Seconds) 
Graph database-
based 
0.95 0.12 
Relational database 
(SQL) 
0.85 0.23 
SPARQL 
(Relational DB) 
0.80 0.28 
 
The graph database-based query language leverages 
the inherent structure of the knowledge graph, enabling 
faster and more precise query execution. Additionally, the 
deep learning-based approach to knowledge reasoning 
outperforms both the ontology-based and graph-based 
approaches, as shown in Table 6. The deep learning 
method achieves higher accuracy and site coverage, 
indicating its effectiveness in inferring unknown 
knowledge and complementing the existing knowledge 
graph. 
 
Table 7: Comparison of knowledge reasoning. 
Method Accuracy Site Coverage 
Deep learning-based 0.89 0.81 
Ontology-based 0.75 0.68 
Method Accuracy Site Coverage 
Graph-based 0.69 0.62 
 
As shown in Table 7, the deep learning-based 
approach benefits from its ability to learn from patterns 
and relationships in the data, allowing it to make more 
accurate inferences and expand the knowledge base. 
5.3 Novelty and benefits of the proposed 
method 
The proposed knowledge graph-based software testing 
data management model introduces a novel and beneficial 
approach by offering comprehensive data management 
through a unified structured representation, semantic 
enrichment for enhanced data understanding, intelligent 
support for inferring new knowledge and aiding decision-
making, and efficient querying via a graph database query 
language, collectively addressing the prevalent challenges 
in software testing data management and showcasing its 
unique value. 
6 Conclusion 
This paper proposes a software testing data management 
model based on knowledge graph, which can realize 
intelligent management and reasoning of software testing 
data. The model consists of three submodules, which are 
feed-forward neural network-based entity extraction 
module, graph database-based knowledge graph 
integration module, and deep learning-based knowledge 
inference module. The model provides a new idea and 
method for software testing data management, which 
helps to improve the efficiency and quality of software 
testing. 
F unding 
The research is supported by Jiangsu Province 
Industry-University-Research Cooperation Project, 
Research and Development of a Smart Constitute Site 
Safety Management System (No: BY20221107). 
Conflict of interest: The authors state no conflict of 
interest. 
R efer ence s 
[1] Ahmad, T., Iqbal, J., Ashraf, A., Truscan, D., & 
Porres, I. Model-based testing using UML activity 
diagrams: A systematic mapping study. Computer 
Science Review, 33, 98-112, 2019. 
https://doi.org/10.1016/j.cosrev.2019.07.001. 
[2] Alyahya, S. Collaborative crowdsourced software 
testing. Electronics, 11(20), 3340, 2022. 
https://doi.org/10.3390/electronics11203340. 
[3] Anthony, B. Information flow analysis of a 
knowledge mapping-based system for university 
alumni collaboration: A practical approach. Journal of 
the Knowledge Economy, 12(2), 756-787, 2021. 
https://doi.org/10.1007/s13132-020-00643-3. 
Software Test Data Management Based on Knowledge Graph Informatica 48 (2024) 27–36 35 
[4] Ben Zayed, H. A., & Maashi, M. S. Optimizing the 
software testing problem using search-based software 
engineering techniques. Intelligent Automation and 
Soft Computing, 29(1), 307-318, 2021. 
https://doi.org/10.32604/iasc.2021.017239. 
[5] Benhar, H., Idri, A., & Fernández-Alemán, J. L. A 
systematic mapping study of data preparation in heart 
disease knowledge discovery. Journal of Medical 
Systems, 43, 1-17, 2019. 
https://doi.org/10.1007/s10916-018-1134-z. 
[6] Boopathi, M., Sujatha, R., Kumar, C. S., 
Narasimman, S., & Rajan, A. Markov approach for 
quantifying the software code coverage using genetic 
algorithm in software testing. International Journal of 
Bio-Inspired Computation, 14(1), 27-45, 2019. 
https://doi.org/10.1504/ijbic.2019.101152. 
[7] Calvanese, D., Gal, A., Lanti, D., Montali, M., Mosca, 
A., & Shraga, R. Conceptually grounded mapping 
patterns for virtual knowledge graphs. Data & 
Knowledge Engineering, 145, 102157, 2023. 
https://doi.org/10.1016/j.datak.2023.102157. 
[8] Chen, T., Zhang, S. J., Wang, Y., Chen, Z. B., & Jing, 
W. F. Construction methods of knowledge mapping 
for full service power data semantic search system. 
Journal of Signal Processing Systems for Signal 
Image and Video Technology, 93, 275-284, 2021. 
https://doi.org/10.1007/s11265-020-01591-6. 
[9] Cordeiro, M., Puig, F., & Ruiz-Fernández, L. 
Realizing dynamic capabilities and organizational 
knowledge in effective innovations: the capabilities 
typological map. Journal of Knowledge Management, 
27(10), 2581-2603, 2022. 
https://doi.org/10.1108/jkm-02-2022-0080. 
[10] Drave, I., Hillemacher, S., Greifenberg, T., Kriebel, 
S., Kusmenko, E., Markthaler, M., Orth, P., Salman, 
K. S., Richenhagen, J., & Rumpe, B. SMArDT 
modeling for automotive software testing. Software-
Practice & Experience, 49(2), 301-328, 2019. 
https://doi.org/10.1002/spe.2650. 
[11] Durst, S., & Zieba, M. Mapping knowledge risks: 
Towards a better understanding of knowledge 
management. Knowledge Management Research & 
Practice, 17(1), 1-13, 2019. 
https://doi.org/10.1080/14778238.2018.1538603. 
[12] Ebert, C., Bajaj, D., & Weyrich, M. Testing software 
systems. IEEE Software, 39, 8-17, 2022. DOI: 
10.1109/ms.2022.3166755. 
[13] Eisty, N. U., & Carver, J. C. Testing research 
software: A survey. Empirical Software Engineering, 
27, 28, 2022. https://doi.org/10.1007/s10664-022-
10184-9. 
[14] Ekanayake, E., Shen, G., & Kumaraswamy, M. M. 
Mapping the knowledge domains of value 
management: A bibliometric approach. Engineering 
construction and Architectural Management, 26(3), 
499-514, 2019. https://doi.org/10.1108/ecam-06-
2018-0252. 
[15] Falát, L., Michalová, T., Madzík, P., & Marsíková, K. 
Discovering trends and journeys in knowledge-based 
human resource management: Big data smart 
literature review based on machine learning approach. 
IEEE Access, 11, 95567-95583, 2023. 
https://doi.org/10.1109/access.2023.3296140. 
[16] Farooq, R. Knowledge management and 
performance: A bibliometric analysis based on 
Scopus and WOS data (1988-2021). Journal of 
Knowledge Management, 27(7), 1948-1991, 2023. 
https://doi.org/10.1108/jkm-06-2022-0443. 
[17] Foidl, H., & Felderer, M. Integrating software quality 
models into risk-based testing. Software Quality 
Journal, 26, 809-847, 2018. 
https://doi.org/10.1007/s11219016-9345-3. 
[18] Garousi, V., Bauer, S., & Felderer, M. NLP-assisted 
software testing: A systematic mapping of the 
literature. Information and Software Technology, 
126, 106321, 2020. 
https://doi.org/10.1016/j.infsof.2020.106321. 
[19] Garousi, V., Felderer, M., Karapiçak, Ç., & Yilmaz, 
U. Testing embedded software: A survey of the 
literature. Information and Software Technology, 
104, 1445, 2018. 
https://doi.org/10.1016/j.infsof.2018.06.016. 
[20] Garousi, V., Felderer, M., & Kiliçaslan, F. N. A 
survey on software testability. Information and 
Software Technology, 108, 35-64, 2019. 
https://doi.org/10.1016/j. infsof.2018.12.003. 
[21] Garousi, V., Felderer, M., Kuhrmann, M., Herkiloglu, 
K., & Eldh, S. Exploring the industry’s challenges in 
software testing: An empirical study. Journal of 
Software-Evolution and Process, 32(8), e2251, 2020. 
https://doi.org/10.1002/smr.2251. 
[22] Garousi, V., & Küçük, B. Smells in software test 
code: A survey of knowledge in industry and 
academia. Journal of Systems and Software, 138, 52-
81, 2018. https://doi.org/10.1016/j.jss.2017.12.013. 
[23] Garousi, V., Rainer, A., Lauvås, P., & Arcuri, A. 
Software-testing education: A systematic literature 
mapping. Journal of Systems and Software, 165, 
110570, 2020. 
https://doi.org/10.1016/j.jss.2020.110570. 
[24] Ho, V. W., Harris, P. G., Kumar, R. K., & Velan, G. 
M. Knowledge maps: A tool for online assessment 
with automated feedback. Medical Education Online, 
23(1), 1457394, 2018. 
https://doi.org/10.1080/10872981.2018.1457394. 
[25] Huang, T., & Fang, C. C. Optimization of software 
test scheduling under development of modular 
software systems. Symmetry-Basel, 15(1), 195, 2023. 
https://doi.org/10.3390/ sym15010195. 
[26] Huang, Y., Glänzel, W., & Zhang, L. Tracing the 
development of mapping knowledge domains. 
Scientometrics, 126, 6201-6224, 2021. 
https://doi.org/10.1007/s11192-02003821-x. 
[27] Idri, A., Benhar, H., Fernández-Alemán, J. L., & 
Kadi, I. A systematic map of medical data 
preprocessing in knowledge discovery. Computer 
Methods and Programs in Biomedicine, 162, 69-85, 
2018. https://doi.org/10.1016/j.cmpb.2018.05.007. 
[28] Jung, P., Kang, S., & Lee, J. Automated code-based 
test selection for software product line regression 
testing. Journal of Systems and Software, 158, 
36 Informatica 48 (2024) 27–36 L. Gao et al. 
110419, 2019. 
https://doi.org/10.1016/j.jss.2019.110419. 
[29] Jung, P., Kang, S., & Lee, J. Efficient regression 
testing of software product lines by reducing 
redundant test executions. Applied Sciences-Basel, 
10(23), 8686, 2020. 
https://doi.org/10.3390/app10238686. 
[30] Kaur, V. Knowledge-based dynamic capabilities: A 
scientometric analysis of marriage between 
knowledge management and dynamic capabilities. 
Journal of Knowledge Management, 27(4), 919-952, 
2023. https://doi.org/10.1108/jkm-02-2022-0112. 
[31] Khan, M. U., Sherin, S., Lqbal, M. Z., & Zahid, R. 
Landscaping systematic mapping studies in software 
engineering: A tertiary study. Journal of Systems and 
Software, 149, 396-436, 2019. 
https://doi.org/10.1016/j.jss.2018.12.018. 
[32] Laaber, C., Gall, H. C., & Leitner, P. Applying test 
case prioritization to software microbenchmarks. 
Empirical Software Engineering, 26(6), 133, 2021. 
https://doi.org/10.1007/s10664-021-10037-x. 
[33] Lee, J., Kang, S., & Keum, C. Architecture-Based 
software testing. International Journal of Software 
Engineering and Knowledge Engineering, 28(1), 57-
77, 2018. 
https://doi.org/10.1142/s0218194018500031. 
[34] Lee, J. H. Mapping local knowledge through spatial 
text mining. Landscape and Ecological Engineering, 
19(2), 243-255, 2023. 
https://doi.org/10.1007/s11355-023-005411. 
[35] De, R., & Nanda, I. Network/security threats and 
countermeasures for cloud computing. Acta 
Electronica Malaysia, 7(1), 1-3, 2022. 
https://doi.org/10.26480/aem.01.2022.01.03 
[36] Rachman, A., Kurniawan, M., Anam, C., Putra, R. E., 
Rozi, N.F., Sulistyowati, & Pakarbudi, A. Fast 
development kangean island tourism website using 
maf-inc model. Acta Informatica Malaysia, 7(2), 83-
91, 2023. https://doi.org/10.26480/aim.02.2023.83.91 
[37] Liargkovas, G., Papadopoulou, A., Kotti, Z., & 
Spinellis, D. Software engineering education 
knowledge versus industrial needs. IEEE 
Transactions on Education, 65(3), 419-427, 2022. 
https://doi.org/10.1109/te.2021.3123889. 
[38] Marculescu, B., Feldt, R., Torkar, R., & Poulding, S. 
Transferring interactive search-based software testing 
to industry. Journal of Systems and Software, 142, 
156-170, 2018. 
https://doi.org/10.1016/j.jss.2018.04.061. 
[39] Menaouer, B., & Nada, M. The relationship between 
knowledge mapping and the open innovation process: 
The case of education system. AI Edam-Artificial 
Intelligence for Engineering Design Analysis and 
Manufacturing, 34(1), 17-29, 2020. 
https://doi.org/10.1017/s0890060419000325. 
[40] Peischl, B., Tazl, O. A., & Wotawa, F. Testing 
anticipatory systems: A systematic mapping study on 
the state of the art. Journal of Systems and Software, 
192, 111387, 2022. 
https://doi.org/10.1016/j.jss.2022.111387. 
[41] Pellegrini, M. M., Ciampi, F., Marzi, G., & Orlando, 
B. The relationship between knowledge management 
and leadership: Mapping the field and providing 
future research avenues. Journal of Knowledge 
Management, 24(6), 1445-1492, 2020. 
https://doi.org/10.1108/jkm-01-2020-0034. 
[42] Odeh, A. Exploring AI innovations in automated 
software source code generation: Progress, hurdles, 
and future paths. Informatica, An International 
Journal of Computing and Informatics, 48(8), 125-
136, 2024. https://doi.org/10.31449/inf.v48i8.5291. 
[43] Sofian, H., Yunus, N. A. M., & Ahmad, R. Systematic 
mapping: Artificial intelligence techniques in 
software engineering. IEEE Access, 10, 51021-51040, 
2022. https://doi.org/10.1109/access.2022.3174115.